SlideShare uma empresa Scribd logo
1 de 17
Baixar para ler offline
HBase schema design
    case studies

    Organized by Evan/Qingyan Liu
     qingyan123 (AT) gmail.com
              2009.7.13
The Tao is ...




De-normalization
Case 1: locations
●
    China
    ●
        Beijing
    ●
        Shanghai
    ●
        Guangzhou
    ●
        Shandong
        –   Jinan
        –   Qingdao
    ●
        Sichuan
        –   Chengdu
In RDBMS
loc_id PK   loc_name      parent_id   child_id
1           China                     2,3,4,5
2           Beijing       1
3           Shanghai      1
4           Guangzhou     1
5           Shandong      1           7,8
6           Sichuan       1           9
7           Jinan         1,5
8           Qingdao       1,5
9           Chengdu       1,6
In HBase
row                      column families
           name:         parent:           child:
<loc_id>                 parent:<loc_id>   child:<loc_id>
1          China                           child:1=state
                                           child:2=state
                                           child:3=state
                                           child:4=state
                                           child:5=state
                                           child:6=state
5          Shangdong     parent:1=nation child:7=city
                                         child:8=city
8          Qingdao       parent:1=nation
                         parent:5=state
Case 2: student-course
●
    Student
    ●
        1 S ~ many C
●
    Course
    ●
        1 C ~ many S
In RDBMS


Students                Courses
id PK      SCs          id PK
name       student_id   title
sex        course_id    introduction
age        type         teacher_id
In HBase
row                           column families
               info:               course:
<student_id>   info:name           course:<course_id>=type
               info:sex
               info:age

row                           column families
               info:               student:
<course_id>    info:title          student:<student_id>=type
               info:introduction
               info:teacher_id
Case 3: user-action
●
    users performs actions now and then
    ●
        store every events
    ●
        query recent events of a user
In RDBMS
                      Actions
                      id PK
                      user_id IDX
                      name
                      time

●   For fast SELECT id, user_id, name, time FROM Action
    WHERE user_id=XXX ORDER BY time DESC LIMIT 10
    OFFSET 20, we must create index on user_id.
    However, indices will greatly decrease insert speed
    for index-rebuild.
In HBase
row                                   column families
                              name:
<user><Long.MAX_VALUE -
System.currentTimeMillis()>
<event id>
Case 4: user-friends
●
    1 user has 1+ friends
●
    will lookup all friends of a user
In RDBMS

        Users
                           Friendships
        id IDX
                           user_id IDX
        name
                           friend_id
        sex
                           type
        age
●
    SELECT * FROM friendships WHERE
    user_id='XXX';
In HBase

row                           column families
                 info:            friend:
<user_id>        info:name        friend:<user_id>=type
                 info:sex
                 info:age

 ●
      actually, it is a graph can be represented by a
      sparse matrix.
 ●
      then you can use M/R to find sth interesting.
      e.g. the shortest path from user A to user B.
Case 5: access log
●
    each log line contains time, ip, domain, url,
    referer, browser_cookie, login_id, etc
●
    will be analyzed every 5 minutes, every hour,
    daily, weekly, and monthly
In RDBMS

Accesslog
time
ip IDX
domain
url
referer
browser_cookie IDX
login_id IDX
In HBase

row                                                  column families
                                          http:                     user
<time><INC_COUNTER>                       http:ip                   user:browser_
                                          http:domain               cookie
                                          http:url                  user:login_id
                                          http:referer




INC_COUNTER is used to distinguish the adjacent same time values.

Mais conteúdo relacionado

Mais procurados

Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
Dan McKinley
 

Mais procurados (20)

Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
ElephantDB
ElephantDBElephantDB
ElephantDB
 
ReactJS presentation.pptx
ReactJS presentation.pptxReactJS presentation.pptx
ReactJS presentation.pptx
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Practical Celery
Practical CeleryPractical Celery
Practical Celery
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Linux And perl
Linux And perlLinux And perl
Linux And perl
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Servlet life cycle
Servlet life cycleServlet life cycle
Servlet life cycle
 
Impala presentation
Impala presentationImpala presentation
Impala presentation
 

Destaque

A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915
Dan Han
 
HBaseCon 2013: General Session
HBaseCon 2013: General SessionHBaseCon 2013: General Session
HBaseCon 2013: General Session
Cloudera, Inc.
 

Destaque (20)

HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Hadoop and HBase in the Real World
Hadoop and HBase in the Real WorldHadoop and HBase in the Real World
Hadoop and HBase in the Real World
 
A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915
 
Osc2012 spring HBase Report
Osc2012 spring HBase ReportOsc2012 spring HBase Report
Osc2012 spring HBase Report
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use Cases
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
Cassandra v0.6-siryou
Cassandra v0.6-siryouCassandra v0.6-siryou
Cassandra v0.6-siryou
 
Hog user manual v3
Hog user manual v3Hog user manual v3
Hog user manual v3
 
Unlocking Data for Analysts & Developers
Unlocking Data for Analysts & DevelopersUnlocking Data for Analysts & Developers
Unlocking Data for Analysts & Developers
 
Spark!
Spark!Spark!
Spark!
 
HBaseCon 2013: General Session
HBaseCon 2013: General SessionHBaseCon 2013: General Session
HBaseCon 2013: General Session
 
20分でわかるHBase
20分でわかるHBase20分でわかるHBase
20分でわかるHBase
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBase
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
 
3d Construction Card
3d Construction Card3d Construction Card
3d Construction Card
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

20090713 Hbase Schema Design Case Studies

  • 1. HBase schema design case studies Organized by Evan/Qingyan Liu qingyan123 (AT) gmail.com 2009.7.13
  • 2. The Tao is ... De-normalization
  • 3. Case 1: locations ● China ● Beijing ● Shanghai ● Guangzhou ● Shandong – Jinan – Qingdao ● Sichuan – Chengdu
  • 4. In RDBMS loc_id PK loc_name parent_id child_id 1 China 2,3,4,5 2 Beijing 1 3 Shanghai 1 4 Guangzhou 1 5 Shandong 1 7,8 6 Sichuan 1 9 7 Jinan 1,5 8 Qingdao 1,5 9 Chengdu 1,6
  • 5. In HBase row column families name: parent: child: <loc_id> parent:<loc_id> child:<loc_id> 1 China child:1=state child:2=state child:3=state child:4=state child:5=state child:6=state 5 Shangdong parent:1=nation child:7=city child:8=city 8 Qingdao parent:1=nation parent:5=state
  • 6. Case 2: student-course ● Student ● 1 S ~ many C ● Course ● 1 C ~ many S
  • 7. In RDBMS Students Courses id PK SCs id PK name student_id title sex course_id introduction age type teacher_id
  • 8. In HBase row column families info: course: <student_id> info:name course:<course_id>=type info:sex info:age row column families info: student: <course_id> info:title student:<student_id>=type info:introduction info:teacher_id
  • 9. Case 3: user-action ● users performs actions now and then ● store every events ● query recent events of a user
  • 10. In RDBMS Actions id PK user_id IDX name time ● For fast SELECT id, user_id, name, time FROM Action WHERE user_id=XXX ORDER BY time DESC LIMIT 10 OFFSET 20, we must create index on user_id. However, indices will greatly decrease insert speed for index-rebuild.
  • 11. In HBase row column families name: <user><Long.MAX_VALUE - System.currentTimeMillis()> <event id>
  • 12. Case 4: user-friends ● 1 user has 1+ friends ● will lookup all friends of a user
  • 13. In RDBMS Users Friendships id IDX user_id IDX name friend_id sex type age ● SELECT * FROM friendships WHERE user_id='XXX';
  • 14. In HBase row column families info: friend: <user_id> info:name friend:<user_id>=type info:sex info:age ● actually, it is a graph can be represented by a sparse matrix. ● then you can use M/R to find sth interesting. e.g. the shortest path from user A to user B.
  • 15. Case 5: access log ● each log line contains time, ip, domain, url, referer, browser_cookie, login_id, etc ● will be analyzed every 5 minutes, every hour, daily, weekly, and monthly
  • 17. In HBase row column families http: user <time><INC_COUNTER> http:ip user:browser_ http:domain cookie http:url user:login_id http:referer INC_COUNTER is used to distinguish the adjacent same time values.