SlideShare uma empresa Scribd logo
1 de 15
Baixar para ler offline
HBase
           Introduction to
           column oriented
           databases




        Luís Cipriani
        @lfcipriani (twitter, linkedin, github, ...)
        22o. GURU (2012-02-25) - Sao Paulo/Brazil

sexta-feira, 24 de fevereiro de 12
ME




sexta-feira, 24 de fevereiro de 12
intro




                                     “A BigTable HBase is a sparse,
                                         distributed, persistent
                                     multidimensional sorted map”




                                                     http://research.google.com/archive/bigtable.html
sexta-feira, 24 de fevereiro de 12
intro > data model
                           {                           <-- table
                                // ...
                                "aaaaa" : {            <--   row
                                   "A" : {             <--   column family
                                      "foo" : {        <--   column (qualifier)
                                        15: "y",       <--   timestamp, value
                                         4: "m"
                                      }
                                      "bar" : {...}
                                   },
                                   "B" : {
                                      "" : {...}
                                   }
                                },
                                "aaaab" : {
                                   "A" : {
                                      "foo" : {...},
                                      "bar" : {...},
                                      "joe" : {...}
                                   },
                                   "B" : {
                                      "" : {...}
                                   }
                                },
                                // ...
                           }
sexta-feira, 24 de fevereiro de 12
intro > data model




      (Table, RowKey, Family, Column, Timestamp) → Value




sexta-feira, 24 de fevereiro de 12
intro > hadoop stack




                          • hadoop HDFS (or not)
                          • hadoop MapReduce
                          • hadoop ZooKeeper
                          • hadoop HBase
                          • hadoop Hue, Whirr, etc...



sexta-feira, 24 de fevereiro de 12
architecture




sexta-feira, 24 de fevereiro de 12
key design > read/write model




               • randon reads (get)
               • sequential reads (scan)
                 • partial key scans
               • writes (put = update)



sexta-feira, 24 de fevereiro de 12
key design > storage model




                                     http://ofps.oreilly.com/titles/9781449396107/advanced.html
sexta-feira, 24 de fevereiro de 12
key design > strategies


                                 • tall-narrow vs flat-wide
                                 • partial key scans
                                 • pagination
                                 • time series
                                    • salting
                                    • field swap
                                    • randomization
                                 • secondary indexes

sexta-feira, 24 de fevereiro de 12
key design > example




sexta-feira, 24 de fevereiro de 12
development




            • installation modes
               • standalone, pseudo-distributed, distributed
            • JRuby console
            • Access
               • java/jruby API (more features)
               • entrypoints REST, Thrift, Avro, Protobuffers
               • there several other libs


sexta-feira, 24 de fevereiro de 12
cons




            • complex config and maintenance
            • hot regions
            • no secondary index built-in
            • no transactions built-in
            • complex schema design




sexta-feira, 24 de fevereiro de 12
pros




               • distributed
               • scalable (auto-sharding)
               • built on Hadoop stack
               • handles Big Data
               • high performance for write and read
               • no SPOF
               • fault tolerant, no data loss
               • active community



sexta-feira, 24 de fevereiro de 12
Reformulação Box de Login                                                       Abril ID
   http://engineering.abril.com.br/
   http://abr.io/hbase-intro
   https://pinboard.in/u:lfcipriani/t:hbase/
   http://hbase.apache.org/




                                     ?   http://shop.oreilly.com/product/0636920014348.do




sexta-feira, 24 de fevereiro de 12

Mais conteúdo relacionado

Semelhante a Hbase: Introduction to column oriented databases

Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
Andrew Brust
 
Hadoop/Mahout/HBaseで テキスト分類器を作ったよ
Hadoop/Mahout/HBaseで テキスト分類器を作ったよHadoop/Mahout/HBaseで テキスト分類器を作ったよ
Hadoop/Mahout/HBaseで テキスト分類器を作ったよ
Naoki Yanai
 

Semelhante a Hbase: Introduction to column oriented databases (14)

Hive introduction 介绍
Hive  introduction 介绍Hive  introduction 介绍
Hive introduction 介绍
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
 
Generalized framework for using NoSQL Databases
Generalized framework for using NoSQL DatabasesGeneralized framework for using NoSQL Databases
Generalized framework for using NoSQL Databases
 
Hadoop/Mahout/HBaseで テキスト分類器を作ったよ
Hadoop/Mahout/HBaseで テキスト分類器を作ったよHadoop/Mahout/HBaseで テキスト分類器を作ったよ
Hadoop/Mahout/HBaseで テキスト分類器を作ったよ
 
20100128ebay
20100128ebay20100128ebay
20100128ebay
 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
 
HBase, no trouble
HBase, no troubleHBase, no trouble
HBase, no trouble
 
Hadoop - Introduction to Hadoop
Hadoop - Introduction to HadoopHadoop - Introduction to Hadoop
Hadoop - Introduction to Hadoop
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
 
MongoDB at GUL
MongoDB at GULMongoDB at GUL
MongoDB at GUL
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and Future
 

Mais de Luis Cipriani

Mais de Luis Cipriani (10)

Adventures with Raspberry Pi and Twitter API
Adventures with Raspberry Pi and Twitter APIAdventures with Raspberry Pi and Twitter API
Adventures with Raspberry Pi and Twitter API
 
Capturando o pulso do planeta com as APIs de Streaming do Twitter
Capturando o pulso do planeta com as APIs de Streaming do TwitterCapturando o pulso do planeta com as APIs de Streaming do Twitter
Capturando o pulso do planeta com as APIs de Streaming do Twitter
 
Twitter e suas APIs de Streaming - Campus Party Brasil 7
Twitter e suas APIs de Streaming - Campus Party Brasil 7Twitter e suas APIs de Streaming - Campus Party Brasil 7
Twitter e suas APIs de Streaming - Campus Party Brasil 7
 
Segurança de APIs HTTP, um guia sensato para desenvolvedores preocupados
Segurança de APIs HTTP, um guia sensato para desenvolvedores preocupadosSegurança de APIs HTTP, um guia sensato para desenvolvedores preocupados
Segurança de APIs HTTP, um guia sensato para desenvolvedores preocupados
 
API Caching, why your server needs some rest
API Caching, why your server needs some restAPI Caching, why your server needs some rest
API Caching, why your server needs some rest
 
Explaining A Programming Model for Context-Aware Applications in Large-Scale ...
Explaining A Programming Model for Context-Aware Applications in Large-Scale ...Explaining A Programming Model for Context-Aware Applications in Large-Scale ...
Explaining A Programming Model for Context-Aware Applications in Large-Scale ...
 
Alexandria: um Sistema de Sistemas para Publicação de Conteúdo Digital utiliz...
Alexandria: um Sistema de Sistemas para Publicação de Conteúdo Digital utiliz...Alexandria: um Sistema de Sistemas para Publicação de Conteúdo Digital utiliz...
Alexandria: um Sistema de Sistemas para Publicação de Conteúdo Digital utiliz...
 
Como um verdadeiro sistema REST funciona: arquitetura e performance na Abril
Como um verdadeiro sistema REST funciona: arquitetura e performance na AbrilComo um verdadeiro sistema REST funciona: arquitetura e performance na Abril
Como um verdadeiro sistema REST funciona: arquitetura e performance na Abril
 
Explaining Semantic Web
Explaining Semantic WebExplaining Semantic Web
Explaining Semantic Web
 
Fearless HTTP requests abuse
Fearless HTTP requests abuseFearless HTTP requests abuse
Fearless HTTP requests abuse
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Hbase: Introduction to column oriented databases

  • 1. HBase Introduction to column oriented databases Luís Cipriani @lfcipriani (twitter, linkedin, github, ...) 22o. GURU (2012-02-25) - Sao Paulo/Brazil sexta-feira, 24 de fevereiro de 12
  • 2. ME sexta-feira, 24 de fevereiro de 12
  • 3. intro “A BigTable HBase is a sparse, distributed, persistent multidimensional sorted map” http://research.google.com/archive/bigtable.html sexta-feira, 24 de fevereiro de 12
  • 4. intro > data model { <-- table // ... "aaaaa" : { <-- row "A" : { <-- column family "foo" : { <-- column (qualifier) 15: "y", <-- timestamp, value 4: "m" } "bar" : {...} }, "B" : { "" : {...} } }, "aaaab" : { "A" : { "foo" : {...}, "bar" : {...}, "joe" : {...} }, "B" : { "" : {...} } }, // ... } sexta-feira, 24 de fevereiro de 12
  • 5. intro > data model (Table, RowKey, Family, Column, Timestamp) → Value sexta-feira, 24 de fevereiro de 12
  • 6. intro > hadoop stack • hadoop HDFS (or not) • hadoop MapReduce • hadoop ZooKeeper • hadoop HBase • hadoop Hue, Whirr, etc... sexta-feira, 24 de fevereiro de 12
  • 8. key design > read/write model • randon reads (get) • sequential reads (scan) • partial key scans • writes (put = update) sexta-feira, 24 de fevereiro de 12
  • 9. key design > storage model http://ofps.oreilly.com/titles/9781449396107/advanced.html sexta-feira, 24 de fevereiro de 12
  • 10. key design > strategies • tall-narrow vs flat-wide • partial key scans • pagination • time series • salting • field swap • randomization • secondary indexes sexta-feira, 24 de fevereiro de 12
  • 11. key design > example sexta-feira, 24 de fevereiro de 12
  • 12. development • installation modes • standalone, pseudo-distributed, distributed • JRuby console • Access • java/jruby API (more features) • entrypoints REST, Thrift, Avro, Protobuffers • there several other libs sexta-feira, 24 de fevereiro de 12
  • 13. cons • complex config and maintenance • hot regions • no secondary index built-in • no transactions built-in • complex schema design sexta-feira, 24 de fevereiro de 12
  • 14. pros • distributed • scalable (auto-sharding) • built on Hadoop stack • handles Big Data • high performance for write and read • no SPOF • fault tolerant, no data loss • active community sexta-feira, 24 de fevereiro de 12
  • 15. Reformulação Box de Login Abril ID http://engineering.abril.com.br/ http://abr.io/hbase-intro https://pinboard.in/u:lfcipriani/t:hbase/ http://hbase.apache.org/ ? http://shop.oreilly.com/product/0636920014348.do sexta-feira, 24 de fevereiro de 12