Enviar pesquisa
Carregar
Integration of HIve and HBase
•
50 gostaram
•
25,174 visualizações
Hortonworks
Seguir
Tecnologia
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 39
Baixar agora
Baixar para ler offline
Recomendados
Big data architecture
Big data architecture
Dr. Jasmine Beulah Gnanadurai
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
Databricks
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Databricks
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdf
Michael Kogan
Delta Lake with Azure Databricks
Delta Lake with Azure Databricks
Dustin Vannoy
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
Recomendados
Big data architecture
Big data architecture
Dr. Jasmine Beulah Gnanadurai
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
Databricks
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Databricks
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdf
Michael Kogan
Delta Lake with Azure Databricks
Delta Lake with Azure Databricks
Dustin Vannoy
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
Intro to HBase
Intro to HBase
alexbaranau
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
Databricks
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
Databricks
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache Phoenix
Rajeshbabu Chintaguntla
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
Hadoop
Hadoop
ABHIJEET RAJ
Apache spark
Apache spark
TEJPAL GAUTAM
Map reduce vs spark
Map reduce vs spark
Tudor Lapusan
Introduction to Data Engineering
Introduction to Data Engineering
Durga Gadiraju
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
Azure data platform overview
Azure data platform overview
James Serra
Hadoop Map Reduce
Hadoop Map Reduce
VNIT-ACM Student Chapter
Seminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
Mapreduce by examples
Mapreduce by examples
Andrea Iacono
Resilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARK
Taposh Roy
Sqoop
Sqoop
Prashant Gupta
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
Big Data Architecture
Big Data Architecture
Guido Schmutz
Big data and Hadoop
Big data and Hadoop
Rahul Agarwal
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
Yahoo Developer Network
HBase for Architects
HBase for Architects
Nick Dimiduk
Mais conteúdo relacionado
Mais procurados
Intro to HBase
Intro to HBase
alexbaranau
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
Databricks
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
Databricks
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache Phoenix
Rajeshbabu Chintaguntla
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
Hadoop
Hadoop
ABHIJEET RAJ
Apache spark
Apache spark
TEJPAL GAUTAM
Map reduce vs spark
Map reduce vs spark
Tudor Lapusan
Introduction to Data Engineering
Introduction to Data Engineering
Durga Gadiraju
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
Azure data platform overview
Azure data platform overview
James Serra
Hadoop Map Reduce
Hadoop Map Reduce
VNIT-ACM Student Chapter
Seminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
Mapreduce by examples
Mapreduce by examples
Andrea Iacono
Resilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARK
Taposh Roy
Sqoop
Sqoop
Prashant Gupta
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
Big Data Architecture
Big Data Architecture
Guido Schmutz
Big data and Hadoop
Big data and Hadoop
Rahul Agarwal
Mais procurados
(20)
Intro to HBase
Intro to HBase
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache Phoenix
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Hadoop
Hadoop
Apache spark
Apache spark
Map reduce vs spark
Map reduce vs spark
Introduction to Data Engineering
Introduction to Data Engineering
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Azure data platform overview
Azure data platform overview
Hadoop Map Reduce
Hadoop Map Reduce
Seminar Presentation Hadoop
Seminar Presentation Hadoop
Mapreduce by examples
Mapreduce by examples
Resilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARK
Sqoop
Sqoop
Optimizing Hive Queries
Optimizing Hive Queries
Big Data Architecture
Big Data Architecture
Big data and Hadoop
Big data and Hadoop
Semelhante a Integration of HIve and HBase
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
Yahoo Developer Network
HBase for Architects
HBase for Architects
Nick Dimiduk
Hbase mhug 2015
Hbase mhug 2015
Joseph Niemiec
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql database
Rishabh Dugar
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBase
Cloudera, Inc.
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks
Jan 2012 HUG: HCatalog
Jan 2012 HUG: HCatalog
Yahoo Developer Network
SoCal BigData Day
SoCal BigData Day
John Park
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
Tsz-Wo (Nicholas) Sze
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
Ameet Paranjape
Mapreduce over snapshots
Mapreduce over snapshots
enissoz
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
DataWorks Summit/Hadoop Summit
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
Hortonworks
Hadoop Trends
Hadoop Trends
Hortonworks
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
Ashish Narasimham
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
HortonworksJapan
Hadoop: today and tomorrow
Hadoop: today and tomorrow
Steve Loughran
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
Josh Elser
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
DataWorks Summit
BIGDATA ppts
BIGDATA ppts
Krisshhna Daasaarii
Semelhante a Integration of HIve and HBase
(20)
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
HBase for Architects
HBase for Architects
Hbase mhug 2015
Hbase mhug 2015
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql database
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBase
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
Jan 2012 HUG: HCatalog
Jan 2012 HUG: HCatalog
SoCal BigData Day
SoCal BigData Day
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
Mapreduce over snapshots
Mapreduce over snapshots
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
Hadoop Trends
Hadoop Trends
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hadoop: today and tomorrow
Hadoop: today and tomorrow
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
BIGDATA ppts
BIGDATA ppts
Mais de Hortonworks
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
HDF 3.2 - What's New
HDF 3.2 - What's New
Hortonworks
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Hortonworks
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Hortonworks
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
Hortonworks
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
Mais de Hortonworks
(20)
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
HDF 3.2 - What's New
HDF 3.2 - What's New
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Último
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
NavinnSomaal
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Dubai Multi Commodity Centre
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
comworks
How to write a Business Continuity Plan
How to write a Business Continuity Plan
Databarracks
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Curtis Poe
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
gvaughan
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
Stephanie Beckett
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
null - The Open Security Community
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
Alfredo García Lavilla
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
RankYa
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
charlottematthew16
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
DianaGray10
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
LoriGlavin3
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
charlottematthew16
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Manik S Magar
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Mattias Andersson
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Rizwan Syed
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Florian Wilhelm
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Zilliz
Último
(20)
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
How to write a Business Continuity Plan
How to write a Business Continuity Plan
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Integration of HIve and HBase
1.
Integration of Apache
Hive and HBase Enis Soztutar enis [at] apache [dot] org @enissoz Architecting the Future of Big Data © Hortonworks Inc. 2011 Page 1
2.
About Me • User
and committer of Hadoop since 2007 • Contributor to Apache Hadoop, HBase, Hive and Gora • Joined Hortonworks as Member of Technical Staff • Twitter: @enissoz Architecting the Future of Big Data Page 2 © Hortonworks Inc. 2011
3.
Agenda • Overview of
Hive and HBase • Hive + HBase Features and Improvements • Future of Hive and HBase • Q&A Architecting the Future of Big Data Page 3 © Hortonworks Inc. 2011
4.
Apache Hive Overview • Apache
Hive is a data warehouse system for Hadoop • SQL-like query language called HiveQL • Built for PB scale data • Main purpose is analysis and ad hoc querying • Database / table / partition / bucket – DDL Operations • SQL Types + Complex Types (ARRAY, MAP, etc) • Very extensible • Not for : small data sets, low latency queries, OLTP Architecting the Future of Big Data Page 4 © Hortonworks Inc. 2011
5.
Apache Hive Architecture
JDBC/ODBC Hive Thrift Hive Web CLI Server Interface Driver M S C Parser Planner l Metastore i e Execution Optimizer n t MapReduce HDFS RDBMS Architecting the Future of Big Data Page 5 © Hortonworks Inc. 2011
6.
Overview of Apache
HBase • Apache HBase is the Hadoop database • Modeled after Google’s BigTable • A sparse, distributed, persistent multi- dimensional sorted map • The map is indexed by a row key, column key, and a timestamp • Each value in the map is an un-interpreted array of bytes • Low latency random data access Architecting the Future of Big Data Page 6 © Hortonworks Inc. 2011
7.
Overview of Apache
HBase • Logical view: From: Bigtable: A Distributed Storage System for Structured Data, Chang, et al. Architecting the Future of Big Data Page 7 © Hortonworks Inc. 2011
8.
Apache HBase Architecture
Client HMaster Zookeeper Region Region Region server server server Region Region Region Region Region Region HDFS Architecting the Future of Big Data Page 8 © Hortonworks Inc. 2011
9.
Hive + HBase
Features and Improvements Architecting the Future of Big Data Page 9 © Hortonworks Inc. 2011
10.
Hive + HBase
Motivation • Hive and HBase has different characteristics: High latency Low latency Structured vs. Unstructured Analysts Programmers • Hive datawarehouses on Hadoop are high latency – Long ETL times – Access to real time data • Analyzing HBase data with MapReduce requires custom coding • Hive and SQL are already known by many analysts Architecting the Future of Big Data Page 10 © Hortonworks Inc. 2011
11.
Use Case 1:
HBase as ETL Data Sink From HUG - Hive/HBase Integration or, MaybeSQL? April 2010 John Sichi Facebook http://www.slideshare.net/hadoopusergroup/hive-h-basehadoopapr2010 Architecting the Future of Big Data Page 11 © Hortonworks Inc. 2011
12.
Use Case 2:
HBase as Data Source From HUG - Hive/HBase Integration or, MaybeSQL? April 2010 John Sichi Facebook http://www.slideshare.net/hadoopusergroup/hive-h-basehadoopapr2010 Architecting the Future of Big Data Page 12 © Hortonworks Inc. 2011
13.
Use Case 3:
Low Latency Warehouse From HUG - Hive/HBase Integration or, MaybeSQL? April 2010 John Sichi Facebook http://www.slideshare.net/hadoopusergroup/hive-h-basehadoopapr2010 Architecting the Future of Big Data Page 13 © Hortonworks Inc. 2011
14.
Example: Hive +
Hbase (HBase table) hbase(main):001:0> create 'short_urls', {NAME => 'u'}, {NAME=>'s'} hbase(main):014:0> scan 'short_urls' ROW COLUMN+CELL bit.ly/aaaa column=s:hits, value=100 bit.ly/aaaa column=u:url, value=hbase.apache.org/ bit.ly/abcd column=s:hits, value=123 bit.ly/abcd column=u:url, value=example.com/foo Architecting the Future of Big Data Page 14 © Hortonworks Inc. 2011
15.
Example: Hive +
HBase (Hive table) CREATE TABLE short_urls( short_url string, url string, hit_count int ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, u:url, s:hits") TBLPROPERTIES ("hbase.table.name" = ”short_urls"); Architecting the Future of Big Data Page 15 © Hortonworks Inc. 2011
16.
Storage Handler • Hive defines
HiveStorageHandler class for different storage backends: HBase/ Cassandra / MongoDB/ etc • Storage Handler has hooks for – Getting input / output formats – Meta data operations hook: CREATE TABLE, DROP TABLE, etc • Storage Handler is a table level concept – Does not support Hive partitions, and buckets Architecting the Future of Big Data Page 16 © Hortonworks Inc. 2011
17.
Apache Hive +
HBase Architecture Hive Thrift Hive Web CLI Server Interface Driver M S Parser Planner C l Metastore i Execution Optimizer e n t StorageHandler MapReduce HBase HDFS RDBMS Architecting the Future of Big Data Page 17 © Hortonworks Inc. 2011
18.
Hive + HBase
Integration • For Input/OutputFormat, getSplits(), etc underlying HBase classes are used • Column selection and certain filters can be pushed down • HBase tables can be used with other(Hadoop native) tables and SQL constructs • Hive DDL operations are converted to HBase DDL operations via the client hook. – All operations are performed by the client – No two phase commit Architecting the Future of Big Data Page 18 © Hortonworks Inc. 2011
19.
Schema / Type
Mapping Architecting the Future of Big Data Page 19 © Hortonworks Inc. 2011
20.
Schema Mapping • Hive
table + columns + column types <=> HBase table + column families (+ column qualifiers) • Every field in Hive table is mapped in order to either – The table key (using :key as selector) – A column family (cf:) -> MAP fields in Hive – A column (cf:cq) • Hive table does not need to include all columns in HBase • CREATE TABLE short_urls( short_url string, url string, hit_count int, props, map<string,string> ) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, u:url, s:hits, p:") Architecting the Future of Big Data Page 20 © Hortonworks Inc. 2011
21.
Type Mapping • Recently added
to Hive (0.9.0) • Previously all types were being converted to strings in HBase • Hive has: – Primitive types: INT, STRING, BINARY, DATE, etc – ARRAY<Type> – MAP<PrimitiveType, Type> – STRUCT<a:INT, b:STRING, c:STRING> • HBase does not have types – Bytes.toBytes() Architecting the Future of Big Data Page 21 © Hortonworks Inc. 2011
22.
Type Mapping • Table level
property "hbase.table.default.storage.type” = “binary” • Type mapping can be given per column after # – Any prefix of “binary” , eg u:url#b – Any prefix of “string” , eg u:url#s – The dash char “-” , eg u:url#- CREATE TABLE short_urls( short_url string, url string, hit_count int, props, map<string,string> ) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key#b,u:url#b,s:hits#b,p:#s") Architecting the Future of Big Data Page 22 © Hortonworks Inc. 2011
23.
Type Mapping • If the
type is not a primitive or Map, it is converted to a JSON string and serialized • Still a few rough edges for schema and type mapping: – No Hive BINARY support in HBase mapping – No mapping of HBase timestamp (can only provide put timestamp) – No arbitrary mapping of Structs / Arrays into HBase schema Architecting the Future of Big Data Page 23 © Hortonworks Inc. 2011
24.
Bulk Load • Steps to
bulk load: – Sample source data for range partitioning – Save sampling results to a file – Run CLUSTER BY query using HiveHFileOutputFormat and TotalOrderPartitioner – Import Hfiles into HBase table • Ideal setup should be SET hive.hbase.bulk=true INSERT OVERWRITE TABLE web_table SELECT …. Architecting the Future of Big Data Page 24 © Hortonworks Inc. 2011
25.
Filter Pushdown Architecting the
Future of Big Data Page 25 © Hortonworks Inc. 2011
26.
Filter Pushdown • Idea is
to pass down filter expressions to the storage layer to minimize scanned data • To access indexes at HDFS or HBase • Example: CREATE EXTERNAL TABLE users (userid LONG, email STRING, … ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler’ WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,…") SELECT ... FROM users WHERE userid > 1000000 and email LIKE ‘%@gmail.com’; -> scan.setStartRow(Bytes.toBytes(1000000)) Architecting the Future of Big Data Page 26 © Hortonworks Inc. 2011
27.
Filter Decomposition • Optimizer pushes
down the predicates to the query plan • Storage handlers can negotiate with the Hive optimizer to decompose the filter x > 3 AND upper(y) = 'XYZ’ • Handle x > 3, send upper(y) = ’XYZ’ as residual for Hive • Works with: key = 3, key > 3, etc key > 3 AND key < 100 • Only works against constant expressions Architecting the Future of Big Data Page 27 © Hortonworks Inc. 2011
28.
Security Aspects Towards fully
secure deployments Architecting the Future of Big Data Page 28 © Hortonworks Inc. 2011
29.
Security – Big
Picture • Security becomes more important to support enterprise level and multi tenant applications • 5 Different Components to ensure / impose security – HDFS – MapReduce – HBase – Zookeeper – Hive • Each component has: – Authentication – Authorization Architecting the Future of Big Data Page 29 © Hortonworks Inc. 2011
30.
HBase Security –
Closer look • Released with HBase 0.92 • Fully optional module, disabled by default • Needs an underlying secure Hadoop release • SecureRPCEngine: optional engine enforcing SASL authentication – Kerberos – DIGEST-MD5 based tokens – TokenProvider coprocessor • Access control is implemented as a Coprocessor: AccessController • Stores and distributes ACL data via Zookeeper – Sensitive data is only accessible by HBase daemons – Client does not need to authenticate to zk Architecting the Future of Big Data Page 30 © Hortonworks Inc. 2011
31.
Hive Security –
Closer look • Hive has different deployment options, security considerations should take into account different deployments • Authentication is only supported at Metastore, not on HiveServer, web interface, JDBC • Authorization is enforced at the query layer (Driver) • Pluggable authorization providers. Default one stores global/ table/partition/column permissions in Metastore GRANT ALTER ON TABLE web_table TO USER bob; CREATE ROLE db_reader GRANT SELECT, SHOW_DATABASE ON DATABASE mydb TO ROLE db_reader Architecting the Future of Big Data Page 31 © Hortonworks Inc. 2011
32.
Hive Deployment Option
1 Client CLI Driver M Authorization S C Parser Planner l Authentication i Metastore e Execution Optimizer n t A/A A/A MapReduce HBase A12n/A11N A12n/A11N HDFS RDBMS Architecting the Future of Big Data Page 32 © Hortonworks Inc. 2011
33.
Hive Deployment Option
2 Client CLI Driver M Authorization S C Parser Planner l Authentication i e Metastore n Execution Optimizer t A/A A/A MapReduce HBase A12n/A11N A12n/A11N HDFS RDBMS Architecting the Future of Big Data Page 33 © Hortonworks Inc. 2011
34.
Hive Deployment Option
3 Client JDBC/ODBC Hive Thrift Hive Web CLI Server Interface M Driver Authorization S C Parser Planner l Authentication i Metastore e Execution Optimizer n t A/A A/A MapReduce HBase A12n/A11N HDFS A12n/A11N RDBMS Architecting the Future of Big Data Page 34 © Hortonworks Inc. 2011
35.
Hive + HBase
+ Hadoop Security • Regardless of Hive’s own security, for Hive to work on secure Hadoop and HBase, we should: – Obtain delegation tokens for Hadoop and HBase jobs – Ensure to obey the storage level (HDFS, HBase) permission checks – In HiveServer deployments, authenticate and impersonate the user • Delegation tokens for Hadoop are already working • Obtaining HBase delegation tokens are released in Hive 0.9.0 Architecting the Future of Big Data Page 35 © Hortonworks Inc. 2011
36.
Future of Hive
+ HBase • Improve on schema / type mapping • Fully secure Hive deployment options • HBase bulk import improvements • Sortable signed numeric types in HBase • Filter pushdown: non key column filters • Hive random access support for HBase – https://cwiki.apache.org/HCATALOG/random-access- framework.html Architecting the Future of Big Data Page 36 © Hortonworks Inc. 2011
37.
References • Security – https://issues.apache.org/jira/browse/HIVE-2764
– https://issues.apache.org/jira/browse/HBASE-5371 – https://issues.apache.org/jira/browse/HCATALOG-245 – https://issues.apache.org/jira/browse/HCATALOG-260 – https://issues.apache.org/jira/browse/HCATALOG-244 – https://cwiki.apache.org/confluence/display/HCATALOG/Hcat+Security +Design • Type mapping / Filter Pushdown – https://issues.apache.org/jira/browse/HIVE-1634 – https://issues.apache.org/jira/browse/HIVE-1226 – https://issues.apache.org/jira/browse/HIVE-1643 – https://issues.apache.org/jira/browse/HIVE-2815 – https://issues.apache.org/jira/browse/HIVE-1643 Architecting the Future of Big Data Page 37 © Hortonworks Inc. 2011
38.
Other Resources • Hadoop Summit
– June 13-14 – San Jose, California – www.Hadoopsummit.org • Hadoop Training and Certification – Developing Solutions Using Apache Hadoop – Administering Apache Hadoop – Online classes available US, India, EMEA – http://hortonworks.com/training/ © Hortonworks Inc. 2012 Page 38
39.
Thanks Questions?
Architecting the Future of Big Data Page 39 © Hortonworks Inc. 2011
Baixar agora