Slides for a lightning talk on HBase that I gave at Near Infinity (www.nearinfinity.com) spring 2012 conference.
The associated sample code is on GitHub at https://github.com/sleberknight/basic-hbase-examples
4. "Bigtable is a distributed storage
system for managing structured data
that is designed to scale to a very
large size: petabytes of data across
thousands of commodity
servers. Many projects at Google
store data in Bigtable including web
indexing, Google Earth, and Google
Finance."
- Bigtable: A Distributed Storage System
for Structured Data
http://labs.google.com/papers/bigtable.html
5. "A Bigtable is a sparse, distributed, persistent
multidimensional sorted map"
- Bigtable: A Distributed Storage System
for Structured Data
http://labs.google.com/papers/bigtable.html
8. The map is indexed by a row key,
column key, and a timestamp; each
value in the map is an uninterpreted array
of bytes.
- Bigtable: A Distributed Storage System
for Structured Data
http://labs.google.com/papers/bigtable.html
(row key, column key, timestamp) => value
11. Get row 20120407145045...
Row Key Timestamp Column Family "info:" Column Family
"content:"
20120407145045 t7 "info:summary" "An intro to..."
t6 "info:author" "John Doe"
t5 "Google's Bigtable is..."
t4 "Google Bigtable is..."
t3 "info:category" "Persistence"
t2 "info:author" "John"
t1 "info:title" "Intro to Bigtable"
20120320162535 t4 "info:category" "Persistence"
t3 "CouchDB is..."
t2 "info:author" "Bob Smith"
t1 "info:title" "Doc-oriented..."
12. Use HBase when you need random, realtime read/
write access to your Big Data. This project's goal is the
hosting of very large tables -- billions of rows X
millions of columns -- atop clusters of commodity
hardware. HBase is an open-source, distributed,
versioned, column-oriented store modeled after
Google's Bigtable.
- http://hbase.apache.org/
13. HBase Shell
hbase(main):001:0> create 'blog', 'info', 'content'
0 row(s) in 4.3640 seconds
hbase(main):002:0> put 'blog', '20120320162535', 'info:title', 'Document-oriented
storage using CouchDB'
0 row(s) in 0.0330 seconds
hbase(main):003:0> put 'blog', '20120320162535', 'info:author', 'Bob Smith'
0 row(s) in 0.0030 seconds
hbase(main):004:0> put 'blog', '20120320162535', 'content:', 'CouchDB is a
document-oriented...'
0 row(s) in 0.0030 seconds
hbase(main):005:0> put 'blog', '20120320162535', 'info:category', 'Persistence'
0 row(s) in 0.0030 seconds
hbase(main):006:0> get 'blog', '20120320162535'
COLUMN CELL
content: timestamp=1239135042862, value=CouchDB is a doc...
info:author timestamp=1239135042755, value=Bob Smith
info:category timestamp=1239135042982, value=Persistence
info:title timestamp=1239135042623, value=Document-oriented...
4 row(s) in 0.0140 seconds
19. // Get a row. Ask for only the data you need.
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Get get = new Get(toBytes("connor-john-m-43299"));
get.setMaxVersions(2);
get.addFamily(toBytes("personal"));
get.addColumn(toBytes("contactinfo"), toBytes("email"));
Result result = table.get(get);
20. // Update existing values, and add a new one
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Put put = new Put(toBytes("connor-john-m-43299"));
put.add(toBytes("personal"), toBytes("surname"),
toBytes("Smith"));
put.add(toBytes("contactinfo"), toBytes("email"),
toBytes("john.m.smith@gmail.com"));
put.add(toBytes("contactinfo"), toBytes("address"),
toBytes("San Diego, CA"));
table.put(put);
table.flushCommits();
table.close();
21. // Scan rows...
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Scan scan = new Scan(toBytes("smith-"));
scan.addColumn(toBytes("personal"), toBytes("givenName"));
scan.addColumn(toBytes("contactinfo", toBytes("email"));
scan.addColumn(toBytes("contactinfo", toBytes("address"));
scan.setFilter(new PageFilter(numRowsPerPage));
ResultScanner sacnner = table.getScanner(scan);
for (Result result : scanner) {
// process result...
}
22. DAta Modeling
Row key design
MATCH TO DATA ACCESS PATTERNS
WIDE VS. NARROW ROWS
23. REferences
shop.oreilly.com/product/0636920014348.do
http://shop.oreilly.com/product/0636920021773.do
(3rd edition pub date is May 29, 2012)
hbase.apache.org