SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
A P A C H E
HBASE
             Scott
          Leberknight
BACKGROUND
Google




Bigtable
"Bigtable is a distributed storage
system for managing structured data
that is designed to scale to a very
large size: petabytes of data across
thousands of commodity
servers. Many projects at Google
store data in Bigtable including web
indexing, Google Earth, and Google
Finance."


                  - Bigtable: A Distributed Storage System
                                        for Structured Data
                                 http://labs.google.com/papers/bigtable.html
"A Bigtable is a sparse, distributed, persistent
                    multidimensional sorted map"



               - Bigtable: A Distributed Storage System
                                     for Structured Data
                              http://labs.google.com/papers/bigtable.html
wtf?
distributed


    sparse


column-oriented


   versioned
The map is indexed by a row key,
column key, and a timestamp; each
value in the map is an uninterpreted array
of bytes.
                   - Bigtable: A Distributed Storage System
                                         for Structured Data
                       http://labs.google.com/papers/bigtable.html




 (row key, column key, timestamp) => value
Key Concepts:
row key => 20120407152657

column family => "personal:"

column key => "personal:givenName",
              "personal:surname"

timestamp => 1239124584398
Row Key       Timestamp         Column Family "info:"                ColumN Family
                                                                          "content:"
20120407145045      t7       "info:summary"     "An intro to..."
                    t6        "info:author"       "John Doe"
                    t5                                               "Google's Bigtable is..."
                    t4                                               "Google Bigtable is..."
                    t3       "info:category"     "Persistence"
                    t2        "info:author"          "John"
                    t1         "info:title"    "Intro to Bigtable"
20120320162535      t4       "info:category"     "Persistence"
                    t3                                                   "CouchDB is..."
                    t2        "info:author"       "Bob Smith"
                    t1         "info:title"    "Doc-oriented..."
Get row 20120407145045...
   Row Key       Timestamp         Column Family "info:"                Column Family
                                                                          "content:"
20120407145045      t7       "info:summary"     "An intro to..."
                    t6        "info:author"       "John Doe"
                    t5                                               "Google's Bigtable is..."
                    t4                                               "Google Bigtable is..."
                    t3       "info:category"     "Persistence"
                    t2        "info:author"          "John"
                    t1         "info:title"    "Intro to Bigtable"
20120320162535      t4       "info:category"     "Persistence"
                    t3                                                   "CouchDB is..."
                    t2        "info:author"       "Bob Smith"
                    t1         "info:title"    "Doc-oriented..."
Use HBase when you need random, realtime read/
write access to your Big Data. This project's goal is the
hosting of very large tables -- billions of rows X
millions of columns -- atop clusters of commodity
hardware. HBase is an open-source, distributed,
versioned, column-oriented store modeled after
Google's Bigtable.

                                   - http://hbase.apache.org/
HBase Shell
hbase(main):001:0> create 'blog', 'info', 'content'
0 row(s) in 4.3640 seconds
hbase(main):002:0> put 'blog', '20120320162535', 'info:title', 'Document-oriented
storage using CouchDB'
0 row(s) in 0.0330 seconds
hbase(main):003:0> put 'blog', '20120320162535', 'info:author', 'Bob Smith'
0 row(s) in 0.0030 seconds
hbase(main):004:0> put 'blog', '20120320162535', 'content:', 'CouchDB is a
document-oriented...'
0 row(s) in 0.0030 seconds
hbase(main):005:0> put 'blog', '20120320162535', 'info:category', 'Persistence'
0 row(s) in 0.0030 seconds
hbase(main):006:0> get 'blog', '20120320162535'
COLUMN                       CELL
 content:                    timestamp=1239135042862, value=CouchDB is a doc...
 info:author                 timestamp=1239135042755, value=Bob Smith
 info:category               timestamp=1239135042982, value=Persistence
 info:title                  timestamp=1239135042623, value=Document-oriented...
4 row(s) in 0.0140 seconds
HBase Shell



hbase(main):015:0> get 'blog', '20120407145045', {COLUMN=>'info:author', VERSIONS=>3 }
timestamp=1239135325074, value=John Doe
timestamp=1239135324741, value=John
2 row(s) in 0.0060 seconds
hbase(main):016:0> scan 'blog', { STARTROW => '20120300', STOPROW => '20120400' }
ROW                     COLUMN+CELL
 20120320162535         column=content:, timestamp=1239135042862, value=CouchDB is...
 20120320162535         column=info:author, timestamp=1239135042755, value=Bob Smith
 20120320162535         column=info:category, timestamp=1239135042982, value=Persistence
 20120320162535         column=info:title, timestamp=1239135042623, value=Document...
4 row(s) in 0.0230 seconds
Got byte[]?
// Create a new table
Configuration conf = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);

String tableName = "people";
HTableDescriptor desc = new HTableDescriptor(tableName);
desc.addFamily(new HColumnDescriptor("personal"));
desc.addFamily(new HColumnDescriptor("contactinfo"));
desc.addFamily(new HColumnDescriptor("creditcard"));
admin.createTable(desc);

System.out.printf("%s is available? %bn",
  tableName, admin.isTableAvailable(tableName));
import static org.apache.hadoop.hbase.util.Bytes.toBytes;

// Add some data into 'people' table
Configuration conf = HBaseConfiguration.create();
Put put = new Put(toBytes("connor-john-m-43299"));
put.add(toBytes("personal"), toBytes("givenName"),
        toBytes("John"));
put.add(toBytes("personal"), toBytes("mi"), toBytes("M"));
put.add(toBytes("personal"), toBytes("surname"),
        toBytes("Connor"));
put.add(toBytes("contactinfo"), toBytes("email"),
        toBytes("john.connor@gmail.com"));
table.put(put);
table.flushCommits();
table.close();
Finding data:

    get (by row key)


    scan (by row key ranges, filtering)
// Get a row. Ask for only the data you need.
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Get get = new Get(toBytes("connor-john-m-43299"));
get.setMaxVersions(2);
get.addFamily(toBytes("personal"));
get.addColumn(toBytes("contactinfo"), toBytes("email"));
Result result = table.get(get);
// Update existing values, and add a new one
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Put put = new Put(toBytes("connor-john-m-43299"));
put.add(toBytes("personal"), toBytes("surname"),
        toBytes("Smith"));
put.add(toBytes("contactinfo"), toBytes("email"),
        toBytes("john.m.smith@gmail.com"));
put.add(toBytes("contactinfo"), toBytes("address"),
        toBytes("San Diego, CA"));
table.put(put);
table.flushCommits();
table.close();
// Scan rows...
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Scan scan = new Scan(toBytes("smith-"));
scan.addColumn(toBytes("personal"), toBytes("givenName"));
scan.addColumn(toBytes("contactinfo", toBytes("email"));
scan.addColumn(toBytes("contactinfo", toBytes("address"));
scan.setFilter(new PageFilter(numRowsPerPage));
ResultScanner sacnner = table.getScanner(scan);
for (Result result : scanner) {
  // process result...
}
DAta Modeling


   Row key design


   MATCH TO DATA ACCESS PATTERNS


   WIDE VS. NARROW ROWS
REferences


                   shop.oreilly.com/product/0636920014348.do




                                     http://shop.oreilly.com/product/0636920021773.do
                                     (3rd edition pub date is May 29, 2012)
hbase.apache.org
(my info)




scott.leberknight at nearinfinity.com
www.nearinfinity.com/blogs/
twitter: sleberknight

Mais conteúdo relacionado

Mais procurados

Aligning Web Services with the Semantic Web to Create a Global Read-Write Gra...
Aligning Web Services with the Semantic Web to Create a Global Read-Write Gra...Aligning Web Services with the Semantic Web to Create a Global Read-Write Gra...
Aligning Web Services with the Semantic Web to Create a Global Read-Write Gra...
Markus Lanthaler
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDB
Kishor Parkhe
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
MongoDB
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
Tyler Brock
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
MongoDB
 

Mais procurados (20)

Chen li asterix db: 大数据处理开源平台
Chen li asterix db: 大数据处理开源平台Chen li asterix db: 大数据处理开源平台
Chen li asterix db: 大数据处理开源平台
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Aligning Web Services with the Semantic Web to Create a Global Read-Write Gra...
Aligning Web Services with the Semantic Web to Create a Global Read-Write Gra...Aligning Web Services with the Semantic Web to Create a Global Read-Write Gra...
Aligning Web Services with the Semantic Web to Create a Global Read-Write Gra...
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
 
A Semantic Description Language for RESTful Data Services to Combat Semaphobia
A Semantic Description Language for RESTful Data Services to Combat SemaphobiaA Semantic Description Language for RESTful Data Services to Combat Semaphobia
A Semantic Description Language for RESTful Data Services to Combat Semaphobia
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDB
 
NOSQL: il rinascimento dei database?
NOSQL: il rinascimento dei database?NOSQL: il rinascimento dei database?
NOSQL: il rinascimento dei database?
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source Database
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
 
Embedding a language into string interpolator
Embedding a language into string interpolatorEmbedding a language into string interpolator
Embedding a language into string interpolator
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB at GUL
MongoDB at GULMongoDB at GUL
MongoDB at GUL
 

Semelhante a HBase Lightning Talk

Semelhante a HBase Lightning Talk (20)

Hbase an introduction
Hbase an introductionHbase an introduction
Hbase an introduction
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
 
Starting with MongoDB
Starting with MongoDBStarting with MongoDB
Starting with MongoDB
 
Building Apps with MongoDB
Building Apps with MongoDBBuilding Apps with MongoDB
Building Apps with MongoDB
 
Forbes MongoNYC 2011
Forbes MongoNYC 2011Forbes MongoNYC 2011
Forbes MongoNYC 2011
 
Modeling JSON data for NoSQL document databases
Modeling JSON data for NoSQL document databasesModeling JSON data for NoSQL document databases
Modeling JSON data for NoSQL document databases
 
OSCON 2011 CouchApps
OSCON 2011 CouchAppsOSCON 2011 CouchApps
OSCON 2011 CouchApps
 
Why NoSQL Makes Sense
Why NoSQL Makes SenseWhy NoSQL Makes Sense
Why NoSQL Makes Sense
 
Why NoSQL Makes Sense
Why NoSQL Makes SenseWhy NoSQL Makes Sense
Why NoSQL Makes Sense
 
ETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDBETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDB
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
 
Big Data: Guidelines and Examples for the Enterprise Decision Maker
Big Data: Guidelines and Examples for the Enterprise Decision MakerBig Data: Guidelines and Examples for the Enterprise Decision Maker
Big Data: Guidelines and Examples for the Enterprise Decision Maker
 
Event stream processing using Kafka streams
Event stream processing using Kafka streamsEvent stream processing using Kafka streams
Event stream processing using Kafka streams
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsWebinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
 
Mongo db presentation
Mongo db presentationMongo db presentation
Mongo db presentation
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
Valtech - Big Data & NoSQL : au-delà du nouveau buzz
Valtech  - Big Data & NoSQL : au-delà du nouveau buzzValtech  - Big Data & NoSQL : au-delà du nouveau buzz
Valtech - Big Data & NoSQL : au-delà du nouveau buzz
 
2017 - TYPO3 CertiFUNcation: Mathias Schreiber - TYPO3 CMS 8 What's new
2017 - TYPO3 CertiFUNcation: Mathias Schreiber - TYPO3 CMS 8 What's new 2017 - TYPO3 CertiFUNcation: Mathias Schreiber - TYPO3 CMS 8 What's new
2017 - TYPO3 CertiFUNcation: Mathias Schreiber - TYPO3 CMS 8 What's new
 

Mais de Scott Leberknight

Mais de Scott Leberknight (20)

JShell & ki
JShell & kiJShell & ki
JShell & ki
 
JUnit Pioneer
JUnit PioneerJUnit Pioneer
JUnit Pioneer
 
JDKs 10 to 14 (and beyond)
JDKs 10 to 14 (and beyond)JDKs 10 to 14 (and beyond)
JDKs 10 to 14 (and beyond)
 
Unit Testing
Unit TestingUnit Testing
Unit Testing
 
SDKMAN!
SDKMAN!SDKMAN!
SDKMAN!
 
JUnit 5
JUnit 5JUnit 5
JUnit 5
 
AWS Lambda
AWS LambdaAWS Lambda
AWS Lambda
 
Dropwizard
DropwizardDropwizard
Dropwizard
 
RESTful Web Services with Jersey
RESTful Web Services with JerseyRESTful Web Services with Jersey
RESTful Web Services with Jersey
 
httpie
httpiehttpie
httpie
 
jps & jvmtop
jps & jvmtopjps & jvmtop
jps & jvmtop
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
 
Java 8 Lambda Expressions
Java 8 Lambda ExpressionsJava 8 Lambda Expressions
Java 8 Lambda Expressions
 
Google Guava
Google GuavaGoogle Guava
Google Guava
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
iOS
iOSiOS
iOS
 
Apache ZooKeeper
Apache ZooKeeperApache ZooKeeper
Apache ZooKeeper
 
Hadoop
HadoopHadoop
Hadoop
 
wtf is in Java/JDK/wtf7?
wtf is in Java/JDK/wtf7?wtf is in Java/JDK/wtf7?
wtf is in Java/JDK/wtf7?
 
CoffeeScript
CoffeeScriptCoffeeScript
CoffeeScript
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

HBase Lightning Talk

  • 1. A P A C H E HBASE Scott Leberknight
  • 4. "Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable including web indexing, Google Earth, and Google Finance." - Bigtable: A Distributed Storage System for Structured Data http://labs.google.com/papers/bigtable.html
  • 5. "A Bigtable is a sparse, distributed, persistent multidimensional sorted map" - Bigtable: A Distributed Storage System for Structured Data http://labs.google.com/papers/bigtable.html
  • 7. distributed sparse column-oriented versioned
  • 8. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. - Bigtable: A Distributed Storage System for Structured Data http://labs.google.com/papers/bigtable.html (row key, column key, timestamp) => value
  • 9. Key Concepts: row key => 20120407152657 column family => "personal:" column key => "personal:givenName", "personal:surname" timestamp => 1239124584398
  • 10. Row Key Timestamp Column Family "info:" ColumN Family "content:" 20120407145045 t7 "info:summary" "An intro to..." t6 "info:author" "John Doe" t5 "Google's Bigtable is..." t4 "Google Bigtable is..." t3 "info:category" "Persistence" t2 "info:author" "John" t1 "info:title" "Intro to Bigtable" 20120320162535 t4 "info:category" "Persistence" t3 "CouchDB is..." t2 "info:author" "Bob Smith" t1 "info:title" "Doc-oriented..."
  • 11. Get row 20120407145045... Row Key Timestamp Column Family "info:" Column Family "content:" 20120407145045 t7 "info:summary" "An intro to..." t6 "info:author" "John Doe" t5 "Google's Bigtable is..." t4 "Google Bigtable is..." t3 "info:category" "Persistence" t2 "info:author" "John" t1 "info:title" "Intro to Bigtable" 20120320162535 t4 "info:category" "Persistence" t3 "CouchDB is..." t2 "info:author" "Bob Smith" t1 "info:title" "Doc-oriented..."
  • 12. Use HBase when you need random, realtime read/ write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable. - http://hbase.apache.org/
  • 13. HBase Shell hbase(main):001:0> create 'blog', 'info', 'content' 0 row(s) in 4.3640 seconds hbase(main):002:0> put 'blog', '20120320162535', 'info:title', 'Document-oriented storage using CouchDB' 0 row(s) in 0.0330 seconds hbase(main):003:0> put 'blog', '20120320162535', 'info:author', 'Bob Smith' 0 row(s) in 0.0030 seconds hbase(main):004:0> put 'blog', '20120320162535', 'content:', 'CouchDB is a document-oriented...' 0 row(s) in 0.0030 seconds hbase(main):005:0> put 'blog', '20120320162535', 'info:category', 'Persistence' 0 row(s) in 0.0030 seconds hbase(main):006:0> get 'blog', '20120320162535' COLUMN CELL content: timestamp=1239135042862, value=CouchDB is a doc... info:author timestamp=1239135042755, value=Bob Smith info:category timestamp=1239135042982, value=Persistence info:title timestamp=1239135042623, value=Document-oriented... 4 row(s) in 0.0140 seconds
  • 14. HBase Shell hbase(main):015:0> get 'blog', '20120407145045', {COLUMN=>'info:author', VERSIONS=>3 } timestamp=1239135325074, value=John Doe timestamp=1239135324741, value=John 2 row(s) in 0.0060 seconds hbase(main):016:0> scan 'blog', { STARTROW => '20120300', STOPROW => '20120400' } ROW COLUMN+CELL 20120320162535 column=content:, timestamp=1239135042862, value=CouchDB is... 20120320162535 column=info:author, timestamp=1239135042755, value=Bob Smith 20120320162535 column=info:category, timestamp=1239135042982, value=Persistence 20120320162535 column=info:title, timestamp=1239135042623, value=Document... 4 row(s) in 0.0230 seconds
  • 16. // Create a new table Configuration conf = HBaseConfiguration.create(); HBaseAdmin admin = new HBaseAdmin(conf); String tableName = "people"; HTableDescriptor desc = new HTableDescriptor(tableName); desc.addFamily(new HColumnDescriptor("personal")); desc.addFamily(new HColumnDescriptor("contactinfo")); desc.addFamily(new HColumnDescriptor("creditcard")); admin.createTable(desc); System.out.printf("%s is available? %bn", tableName, admin.isTableAvailable(tableName));
  • 17. import static org.apache.hadoop.hbase.util.Bytes.toBytes; // Add some data into 'people' table Configuration conf = HBaseConfiguration.create(); Put put = new Put(toBytes("connor-john-m-43299")); put.add(toBytes("personal"), toBytes("givenName"), toBytes("John")); put.add(toBytes("personal"), toBytes("mi"), toBytes("M")); put.add(toBytes("personal"), toBytes("surname"), toBytes("Connor")); put.add(toBytes("contactinfo"), toBytes("email"), toBytes("john.connor@gmail.com")); table.put(put); table.flushCommits(); table.close();
  • 18. Finding data: get (by row key) scan (by row key ranges, filtering)
  • 19. // Get a row. Ask for only the data you need. Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "people"); Get get = new Get(toBytes("connor-john-m-43299")); get.setMaxVersions(2); get.addFamily(toBytes("personal")); get.addColumn(toBytes("contactinfo"), toBytes("email")); Result result = table.get(get);
  • 20. // Update existing values, and add a new one Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "people"); Put put = new Put(toBytes("connor-john-m-43299")); put.add(toBytes("personal"), toBytes("surname"), toBytes("Smith")); put.add(toBytes("contactinfo"), toBytes("email"), toBytes("john.m.smith@gmail.com")); put.add(toBytes("contactinfo"), toBytes("address"), toBytes("San Diego, CA")); table.put(put); table.flushCommits(); table.close();
  • 21. // Scan rows... Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "people"); Scan scan = new Scan(toBytes("smith-")); scan.addColumn(toBytes("personal"), toBytes("givenName")); scan.addColumn(toBytes("contactinfo", toBytes("email")); scan.addColumn(toBytes("contactinfo", toBytes("address")); scan.setFilter(new PageFilter(numRowsPerPage)); ResultScanner sacnner = table.getScanner(scan); for (Result result : scanner) { // process result... }
  • 22. DAta Modeling Row key design MATCH TO DATA ACCESS PATTERNS WIDE VS. NARROW ROWS
  • 23. REferences shop.oreilly.com/product/0636920014348.do http://shop.oreilly.com/product/0636920021773.do (3rd edition pub date is May 29, 2012) hbase.apache.org
  • 24. (my info) scott.leberknight at nearinfinity.com www.nearinfinity.com/blogs/ twitter: sleberknight