SlideShare uma empresa Scribd logo
1 de 14
HBase vs. Hive

     Philip Wickline
Chief Technology Officer
         Hadapt
Goals


Brief introduction to the differences between
   transactional/operational and analytical systems



Understand when to use Hive and when to use HBase and why




                                                            2
Databases




            3
Datastores




             4
Differences of Purpose : “Transaction Processing”
Operational systems
• Optimized for small short random access – reads and writes
• E.g. record that an employee invested $100 in a S&P500 index
  fund in his 401(k) *or* record that a user posted something on
  another users “wall”

Traditional DB examples
• Oracle
• MySQL
NoSQL Examples
• HBase
• MongoDB
• Cassandra
                                                                   5
Differences of Purpose: Analytics
Analytics
• Optimized for read-only computations about large amounts of
  data
• E.g. compute the average amount invested in bond funds and
  stock funds for all employees at all employers over the last 5
  years                              10
                                                           5
                                                           0                       5-10

DB Examples                                                             Option 1   0-5


• Netezza
• Vertica
                  16
                  14
                  12                                                     Option 1
NoSQL Examples    10
                   8                                           Plan                       Acme
                   6
• Hive                                                         Actual                     GM
                   4
                                                                                          Newco
                   2

• Pig              0                                                                      Oldco
                       Oct   Nov   Dec   Jan   Feb   Mar                                  Bigcorp


                                                                                                    6
HBase Data Model : Conceptual


From the BigTable paper:
“a sparse, distributed, persistent multi-dimensional sorted map”



(row : bytestring, column family : bytestring, column : bytestring,
time : int64) -> byte string




                                                                      7
HBase Map
{ ”key_1" : {
   ”columnfamily_a" : {
     ”column_i" : {
       15 : "y",
       4 : "m"
     },
     ”column_ii" : {
       15 : "d”,
   }},
   “columnfamily_b" : {
     ”column_other" : {
       6 : "w"
       3 : "o"
       1 : "w”
  }}}}
                          8
Hive Data Model : Conceptual
Traditional Relational Tables

CUSTKEY   NAME   ADDRESS      NATIONKEY   PHONE      ACCTBAL      COMMENT
451234    NEWC   196          1           111-555-   $1,231,285   NULL
          ORP    Broadway                 1212
                 …
887765    ACME   1 Main st.   2           222-555-   $46,945      “Top
                 …                        1212                    customer”




                                                                              9
HBase Data Model : Physical

Every cell stored with row, family, column and timestamp
Allows fast lookup with low copy overhead
BUT
Space inefficient (optional compression available) and inefficient
   to scan

      “key_1”   “cf_a”    “c_i”     15        “foo”
      “key_1”   “cf_a”    “c_ii”    15        “bar”
      “key_2”   “cf_a”    “c_ii”    4         “baz”




                                                                     10
Hive Data Model : Physical
Depends on the underlying storage files
Can use flat text files, RCFiles, even use HBase for storage



Standard Row Storage

    C_1        C_2        C_3        C_4
    11         12         13         14
    21         22         23         24
    31         32         33         34
    41         42         43         44
    51         52         53         54



                                                               11
Hive Data Model : RCFile
Break into row groups, and then store as columns

                         Row Group 1
       C_1          11           21           31
       C_2          12           22           32
       C_3          13           23           33
       C_4          14           24           34


                   Row Group 2
       C_1          41           51
       C_2          42           52
       C_3          43           53
       C_4          44           54



                                                   12
Informal Performance Comparison


                   Hive                HBase
  Insert Speed     batch               Fast!
  Update Speed     NA                  Fast!
  Lookup speed     MR lower bound      Fast!
                   (10s of seconds)
  Data warehouse   15x faster on one   Uh oh
  queries          test




                                               13
THANK YOU

Mais conteúdo relacionado

Mais procurados

Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql databasebigdatagurus_meetup
 
NoSQL Overview
NoSQL OverviewNoSQL Overview
NoSQL OverviewTu Hoang
 
Database Architectures and Hypertable
Database Architectures and HypertableDatabase Architectures and Hypertable
Database Architectures and Hypertablehypertable
 
Redis深入浅出
Redis深入浅出Redis深入浅出
Redis深入浅出iammutex
 
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase Cloudera, Inc.
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondGruter
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersDataWorks Summit
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용Byeongweon Moon
 
New features in Pig 0.11
New features in Pig 0.11New features in Pig 0.11
New features in Pig 0.11Hortonworks
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache HadoopOleksiy Krotov
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkDvir Volk
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoopdatasalt
 
Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?Tim Lossen
 
Pig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in AnalyticsPig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in AnalyticsJeremy Hanna
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
 

Mais procurados (20)

Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql database
 
NoSQL Overview
NoSQL OverviewNoSQL Overview
NoSQL Overview
 
Database Architectures and Hypertable
Database Architectures and HypertableDatabase Architectures and Hypertable
Database Architectures and Hypertable
 
Redis深入浅出
Redis深入浅出Redis深入浅出
Redis深入浅出
 
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its Beyond
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size Matters
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
New features in Pig 0.11
New features in Pig 0.11New features in Pig 0.11
New features in Pig 0.11
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and Spark
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoop
 
Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?
 
Apache drill
Apache drillApache drill
Apache drill
 
Pig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in AnalyticsPig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in Analytics
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 

Semelhante a H base vs hive srp vs analytics 2-14-2012

HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBasedave_revell
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 
Kerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadataKerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadataEnkitec
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introductionScott Miao
 
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...Databricks
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作James Chen
 
DMDW Extra Lesson - NoSql and MongoDB
DMDW  Extra Lesson - NoSql and MongoDBDMDW  Extra Lesson - NoSql and MongoDB
DMDW Extra Lesson - NoSql and MongoDBJohannes Hoppe
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Javasunnygleason
 
Hadoop Meets Exadata- Kerry Osborne
Hadoop Meets Exadata- Kerry OsborneHadoop Meets Exadata- Kerry Osborne
Hadoop Meets Exadata- Kerry OsborneEnkitec
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatternsgrepalex
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 

Semelhante a H base vs hive srp vs analytics 2-14-2012 (20)

NOSQL Overview
NOSQL OverviewNOSQL Overview
NOSQL Overview
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Apache hadoop
Apache hadoopApache hadoop
Apache hadoop
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Kerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadataKerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadata
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
DMDW Extra Lesson - NoSql and MongoDB
DMDW  Extra Lesson - NoSql and MongoDBDMDW  Extra Lesson - NoSql and MongoDB
DMDW Extra Lesson - NoSql and MongoDB
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
No Sql
No SqlNo Sql
No Sql
 
Hadoop Meets Exadata- Kerry Osborne
Hadoop Meets Exadata- Kerry OsborneHadoop Meets Exadata- Kerry Osborne
Hadoop Meets Exadata- Kerry Osborne
 
Mongodb lab
Mongodb labMongodb lab
Mongodb lab
 
מיכאל
מיכאלמיכאל
מיכאל
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatterns
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
Wmware NoSQL
Wmware NoSQLWmware NoSQL
Wmware NoSQL
 
Drill njhug -19 feb2013
Drill njhug -19 feb2013Drill njhug -19 feb2013
Drill njhug -19 feb2013
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 

Último

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Último (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

H base vs hive srp vs analytics 2-14-2012

  • 1. HBase vs. Hive Philip Wickline Chief Technology Officer Hadapt
  • 2. Goals Brief introduction to the differences between transactional/operational and analytical systems Understand when to use Hive and when to use HBase and why 2
  • 5. Differences of Purpose : “Transaction Processing” Operational systems • Optimized for small short random access – reads and writes • E.g. record that an employee invested $100 in a S&P500 index fund in his 401(k) *or* record that a user posted something on another users “wall” Traditional DB examples • Oracle • MySQL NoSQL Examples • HBase • MongoDB • Cassandra 5
  • 6. Differences of Purpose: Analytics Analytics • Optimized for read-only computations about large amounts of data • E.g. compute the average amount invested in bond funds and stock funds for all employees at all employers over the last 5 years 10 5 0 5-10 DB Examples Option 1 0-5 • Netezza • Vertica 16 14 12 Option 1 NoSQL Examples 10 8 Plan Acme 6 • Hive Actual GM 4 Newco 2 • Pig 0 Oldco Oct Nov Dec Jan Feb Mar Bigcorp 6
  • 7. HBase Data Model : Conceptual From the BigTable paper: “a sparse, distributed, persistent multi-dimensional sorted map” (row : bytestring, column family : bytestring, column : bytestring, time : int64) -> byte string 7
  • 8. HBase Map { ”key_1" : { ”columnfamily_a" : { ”column_i" : { 15 : "y", 4 : "m" }, ”column_ii" : { 15 : "d”, }}, “columnfamily_b" : { ”column_other" : { 6 : "w" 3 : "o" 1 : "w” }}}} 8
  • 9. Hive Data Model : Conceptual Traditional Relational Tables CUSTKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL COMMENT 451234 NEWC 196 1 111-555- $1,231,285 NULL ORP Broadway 1212 … 887765 ACME 1 Main st. 2 222-555- $46,945 “Top … 1212 customer” 9
  • 10. HBase Data Model : Physical Every cell stored with row, family, column and timestamp Allows fast lookup with low copy overhead BUT Space inefficient (optional compression available) and inefficient to scan “key_1” “cf_a” “c_i” 15 “foo” “key_1” “cf_a” “c_ii” 15 “bar” “key_2” “cf_a” “c_ii” 4 “baz” 10
  • 11. Hive Data Model : Physical Depends on the underlying storage files Can use flat text files, RCFiles, even use HBase for storage Standard Row Storage C_1 C_2 C_3 C_4 11 12 13 14 21 22 23 24 31 32 33 34 41 42 43 44 51 52 53 54 11
  • 12. Hive Data Model : RCFile Break into row groups, and then store as columns Row Group 1 C_1 11 21 31 C_2 12 22 32 C_3 13 23 33 C_4 14 24 34 Row Group 2 C_1 41 51 C_2 42 52 C_3 43 53 C_4 44 54 12
  • 13. Informal Performance Comparison Hive HBase Insert Speed batch Fast! Update Speed NA Fast! Lookup speed MR lower bound Fast! (10s of seconds) Data warehouse 15x faster on one Uh oh queries test 13

Notas do Editor

  1. Not about HadaptBe inclusive of beginnersBe brief
  2. Not a religious presentation – different systems have different properties that work for different needs
  3. 10 GB tpc_h dataCDH3B3 hive and HBaseSingle node desktop workstation, 4 cores, 8GB, a few drives