I got my start with HBase in the fall of 2008. It was a young project then, released only in the preceding year. As early releases go, it was quite capable, although not without its fair share of embarrassing warts. Not bad for an Apache subproject with fewer than 10 active committers to its name! That was the height of the NoSQL hype. The term NoSQL hadn’t even been presented yet but would come into common parlance over the next year. No one could articulate why the idea was important—only that it was important— and everyone in the open source data community was obsessed with this concept. The community was polarized, with people either bashing relational databases for their foolish rigidity or mocking these new technologies for their lack of sophistication.
The people exploring this new idea were mostly in internet companies, and I came to work for such a company—a startup interested in the analysis of social media content. Facebook still enforced its privacy policies then, and Twitter wasn’t big enough to know what a Fail Whale was yet. Our interest at the time was mostly in blogs. I left a company where I’d spent the better part of three years working on a hierarchical database engine. We made extensive use of Berkeley DB, so I was familiar with data technologies that didn’t have a SQL engine. I joined a small team tasked with building a new data-management platform. We had an MS SQL database stuffed to the gills with blog posts and comments. When our daily analysis jobs breached the 18-hour mark, we knew the current system’s days were numbered.
Summary
HBase in Action has all the knowledge you need to design, build, and run applications using HBase. First, it introduces you to the fundamentals of distributed systems and large scale data handling. Then, you'll explore real-world applications and code samples with just enough theory to understand the practical techniques. You'll see how to build applications with HBase and take advantage of the MapReduce processing framework. And along the way you'll learn patterns and best practices.
About the Technology
HBase is a NoSQL storage system designed for fast, random access to large volumes of data. It runs on commodity hardware and scales smoothly from modest datasets to billions of rows and millions of columns.
About this Book
HBase in Action is an experience-driven guide that shows you how to design, build, and run applications using HBase. First, it introduces you to the fundamentals of handling big data. Then, you'll explore HBase with the help of real applications and code samples and with just enough theory to back up the practical techniques. You'll take advantage of the MapReduce processing framework and benefit from seeing HBase best practices in action.
Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.
What's Inside
-
When and how to use HBase
-
Practical examples
-
Design patterns for scalable data systems
-
Deployment, integration, and design
Written for developers and architects familiar with data storage and processing. No prior knowledge of HBase, Hadoop, or MapReduce is required.
Table of Contents
PART 1 HBASE FUNDAMENTALS
PART 2 ADVANCED CONCEPTS
PART 3 EXAMPLE APPLICATIONS
PART 4 OPERATIONALIZING HBASE
-
Introducing HBase
-
Getting started
-
Distributed HBase, HDFS, and MapReduce
-
HBase table design
-
Extending HBase with coprocessors
-
Alternative HBase clients
-
HBase by example: OpenTSDB
-
Scaling GIS on HBase
-
Deploying HBase
-
Operations