Enviar pesquisa
Carregar
Hive acid-updates-summit-sjc-2014
•
Transferir como PPTX, PDF
•
1 gostou
•
4,606 visualizações
A
alanfgates
Seguir
Tecnologia
Vista de apresentação de diapositivos
Denunciar
Compartilhar
Vista de apresentação de diapositivos
Denunciar
Compartilhar
1 de 26
Baixar agora
Recomendados
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
DataWorks Summit
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
alanfgates
Hive Does ACID
Hive Does ACID
DataWorks Summit
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
DataWorks Summit
HiveACIDPublic
HiveACIDPublic
Inderaj (Raj) Bains
Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015
alanfgates
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
Recomendados
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
DataWorks Summit
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
alanfgates
Hive Does ACID
Hive Does ACID
DataWorks Summit
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
DataWorks Summit
HiveACIDPublic
HiveACIDPublic
Inderaj (Raj) Bains
Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015
alanfgates
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
alanfgates
Hive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
Big Data Spain
Apache Hive on ACID
Apache Hive on ACID
DataWorks Summit/Hadoop Summit
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
DataWorks Summit
Apache Hive ACID Project
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
ORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, Smaller
DataWorks Summit
ORC 2015
ORC 2015
t3rmin4t0r
LLAP Nov Meetup
LLAP Nov Meetup
t3rmin4t0r
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
Data organization: hive meetup
Data organization: hive meetup
t3rmin4t0r
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
Hive - 1455: Cloud Storage
Hive - 1455: Cloud Storage
Hortonworks
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
alanfgates
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
alanfgates
LLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
Julian Hyde
Hivemall talk@Hadoop summit 2014, San Jose
Hivemall talk@Hadoop summit 2014, San Jose
Makoto Yui
Mais conteúdo relacionado
Mais procurados
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
alanfgates
Hive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
Big Data Spain
Apache Hive on ACID
Apache Hive on ACID
DataWorks Summit/Hadoop Summit
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
DataWorks Summit
Apache Hive ACID Project
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
ORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, Smaller
DataWorks Summit
ORC 2015
ORC 2015
t3rmin4t0r
LLAP Nov Meetup
LLAP Nov Meetup
t3rmin4t0r
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
Data organization: hive meetup
Data organization: hive meetup
t3rmin4t0r
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
Hive - 1455: Cloud Storage
Hive - 1455: Cloud Storage
Hortonworks
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
alanfgates
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
alanfgates
LLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
Mais procurados
(20)
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
Hive: Loading Data
Hive: Loading Data
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
Apache Hive on ACID
Apache Hive on ACID
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
Apache Hive ACID Project
Apache Hive ACID Project
ORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, Smaller
ORC 2015
ORC 2015
LLAP Nov Meetup
LLAP Nov Meetup
Optimizing Hive Queries
Optimizing Hive Queries
Data organization: hive meetup
Data organization: hive meetup
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Hive - 1455: Cloud Storage
Hive - 1455: Cloud Storage
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
LLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
Destaque
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
Julian Hyde
Hivemall talk@Hadoop summit 2014, San Jose
Hivemall talk@Hadoop summit 2014, San Jose
Makoto Yui
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
Yahoo Developer Network
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
alanfgates
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
Big data spain keynote nov 2016
Big data spain keynote nov 2016
alanfgates
Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016
alanfgates
Hortonworks apache training
Hortonworks apache training
alanfgates
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison
Caserta
Empower Hive with Spark
Empower Hive with Spark
DataWorks Summit
Advanced Analytics using Apache Hive
Advanced Analytics using Apache Hive
Murtaza Doctor
How To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and Hadoop
Hortonworks
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Owen O'Malley
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Kevin Mao
Integration of Hive and HBase
Integration of Hive and HBase
Hortonworks
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
Architecting a Next Generation Data Platform
Architecting a Next Generation Data Platform
hadooparchbook
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Hortonworks
Destaque
(20)
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
Hivemall talk@Hadoop summit 2014, San Jose
Hivemall talk@Hadoop summit 2014, San Jose
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Big data spain keynote nov 2016
Big data spain keynote nov 2016
Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016
Hortonworks apache training
Hortonworks apache training
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison
Empower Hive with Spark
Empower Hive with Spark
Advanced Analytics using Apache Hive
Advanced Analytics using Apache Hive
How To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and Hadoop
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Integration of Hive and HBase
Integration of Hive and HBase
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Architecting a Next Generation Data Platform
Architecting a Next Generation Data Platform
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Semelhante a Hive acid-updates-summit-sjc-2014
Apache Hive on ACID
Apache Hive on ACID
Hortonworks
ACID Transactions in Hive
ACID Transactions in Hive
Eugene Koifman
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
DataWorks Summit
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
alanfgates
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
DataWorks Summit
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere
DataWorks Summit
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
DataWorks Summit
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
Data Con LA
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
Hortonworks
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
alanfgates
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
DataWorks Summit
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
DataWorks Summit
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
DataWorks Summit
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
Chris Nauroth
An In-Depth Look at Putting the Sting in Hive
An In-Depth Look at Putting the Sting in Hive
DataWorks Summit
Stinger hadoop summit june 2013
Stinger hadoop summit june 2013
alanfgates
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
DataWorks Summit
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
DataWorks Summit
Semelhante a Hive acid-updates-summit-sjc-2014
(20)
Apache Hive on ACID
Apache Hive on ACID
ACID Transactions in Hive
ACID Transactions in Hive
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
An In-Depth Look at Putting the Sting in Hive
An In-Depth Look at Putting the Sting in Hive
Stinger hadoop summit june 2013
Stinger hadoop summit june 2013
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
Último
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
sammart93
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Juan lago vázquez
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
MadyBayot
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
apidays
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
lior mazor
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
wesley chun
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
Dropbox
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
Architecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
apidays
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
Overkill Security
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Rustici Software
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Deepika Singh
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
rafiqahmad00786416
Último
(20)
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Architecting Cloud Native Applications
Architecting Cloud Native Applications
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
Hive acid-updates-summit-sjc-2014
1.
© Hortonworks Inc.
2014 Adding ACID Updates to Hive April 2014 Page 1 Owen O’Malley Alan Gates owen@hortonworks.com gates@hortonworks.com @owen_omalley @alanfgates
2.
© Hortonworks Inc.
2014 Page 2 •Hive Only Updates Partitions –Insert overwrite rewrites an entire partition –Forces daily or even hourly partitions •What Happens to Concurrent Readers? –Ok for inserts, but overwrite causes races –There is a zookeeper lock manager, but… •No way to delete, update, or insert rows –Makes adhoc work difficult What’s Wrong?
3.
© Hortonworks Inc.
2014 Page 3 •Hadoop and Hive have always… –Worked without ACID –Perceived as tradeoff for performance •But, your data isn’t static –It changes daily, hourly, or faster –Ad hoc solutions require a lot of work –Managing change makes the user’s life better •Do or Do Not, There is NO Try Why is ACID Critical?
4.
© Hortonworks Inc.
2014 Page 4 •Updating a Dimension Table –Changing a customer’s address •Delete Old Records –Remove records for compliance •Update/Restate Large Fact Tables –Fix problems after they are in the warehouse •Streaming Data Ingest –A continual stream of data coming in –Typically from Flume or Storm Use Cases
5.
© Hortonworks Inc.
2014 Page 5 •HDFS Does Not Allow Arbitrary Writes –Store changes as delta files –Stitched together by client on read •Writes get a Transaction ID –Sequentially assigned by Metastore •Reads get Committed Transactions –Provides snapshot consistency –No locks required –Provide a snapshot of data from start of query Design
6.
© Hortonworks Inc.
2013 Stitching Buckets Together Page 6
7.
© Hortonworks Inc.
2014 Page 7 •Partition locations remain unchanged –Still warehouse/$db/$tbl/$part •Bucket Files Structured By Transactions –Base files $part/base_$tid/bucket_* –Delta files $part/delta_$tid_$tid/bucket_* •Minor Compactions merge deltas –Read delta_$tid1_$tid1 .. delta_$tid2_$tid2 –Written as delta_$tid1_$tid2 •Compaction doesn’t disturb readers HDFS Layout
8.
© Hortonworks Inc.
2014 Page 8 •Created new AcidInput/OutputFormat –Unique key is transaction, bucket, row •Reader returns most recent update •Also Added Raw API for Compactor –Provides previous events as well •ORC implements new API –Extends records with change metadata –Add operation (d, u, i), transaction and key Input and Output Formats
9.
© Hortonworks Inc.
2014 Page 9 •Need to split buckets for MapReduce –Need to split base and deltas the same way –Use key ranges –Use indexes Distributing the Work
10.
© Hortonworks Inc.
2014 Page 10 •Existing lock managers –In memory - not durable –ZooKeeper - requires additional components to install, administer, etc. •Locks need to be integrated with transactions –commit/rollback must atomically release locks •We sort of have this database lying around which has ACID characteristics (metastore) •Transactions and locks stored in metastore •Uses metastore DB to provide unique, ascending ids for transactions and locks Transaction Manager
11.
© Hortonworks Inc.
2014 Page 11 •No explicit transactions in 0.13 –First implementation of INSERT, UPDATE, DELETE will be auto-commit –Will then add BEGIN, COMMIT, ROLLBACK •Snapshot isolation –Reader will see consistent data for the duration of his/her query –May extend to other isolation levels in the future •Current transactions can be displayed using new SHOW TRANSACTIONS statement Transaction Model
12.
© Hortonworks Inc.
2014 Page 12 •Three types of locks –shared –semi-shared (can co-exist with shared, but not other semi-shared) –exclusive •Operations require different locks –SELECT, INSERT – shared –UPDATE, DELETE – semi-shared –DROP, INSERT OVERWRITE – exclusive Locking Model
13.
© Hortonworks Inc.
2014 Page 13 •Each transaction (or batch of transactions in streaming ingest) creates a new delta file •Too many files = NameNode •Need a way to –Collect many deltas into one delta – minor compaction –Rewrite base and delta to new base – major compaction Compactor
14.
© Hortonworks Inc.
2014 Page 14 •Run when there are 10 or more deltas (configurable) •Results in base + 1 delta Minor Compaction /hive/warehouse/purchaselog/ds=201403311000/base_0028000 /hive/warehouse/purchaselog/ds=201403311000/delta_0028001_0028100 /hive/warehouse/purchaselog/ds=201403311000/delta_0028101_0028200 /hive/warehouse/purchaselog/ds=201403311000/delta_0028201_0028300 /hive/warehouse/purchaselog/ds=201403311000/delta_0028301_0028400 /hive/warehouse/purchaselog/ds=201403311000/delta_0028401_0028500 /hive/warehouse/purchaselog/ds=201403311000/base_0028000 /hive/warehouse/purchaselog/ds=201403311000/delta_0028001_0028500
15.
© Hortonworks Inc.
2014 Page 15 •Run when deltas are 10% the size of base (configurable) •Results in new base Major Compaction /hive/warehouse/purchaselog/ds=201403311000/base_0028000 /hive/warehouse/purchaselog/ds=201403311000/delta_0028001_0028100 /hive/warehouse/purchaselog/ds=201403311000/delta_0028101_0028200 /hive/warehouse/purchaselog/ds=201403311000/delta_0028201_0028300 /hive/warehouse/purchaselog/ds=201403311000/delta_0028301_0028400 /hive/warehouse/purchaselog/ds=201403311000/delta_0028401_0028500 /hive/warehouse/purchaselog/ds=201403311000/base_0028500
16.
© Hortonworks Inc.
2014 Page 16 •Metastore thrift server will schedule and execute compactions –No need for user to schedule –User can initiate via new ALTER TABLE COMPACT statement •No locking required, compactions run at same time as select, inserts –Compactor aware of readers, does not remove old files until readers have finished with them •Current compactions can be viewed via new SHOW COMPACTIONS statement Compactor Continued
17.
© Hortonworks Inc.
2014 Page 17 •Data is flowing in from generators in a stream •Without this, you have to add it to Hive in batches, often every hour –Thus your users have to wait an hour before they can see their data •New interface in hive.hcatalog.streaming lets applications write small batches of records and commit them –Users can now see data within a few seconds of it arriving from the data generators •Available for Apache Flume in HDP 2.1 –Working on Apache Storm integration Application: Streaming Ingest
18.
© Hortonworks Inc.
2014 Page 18 Streaming Ingest Illustrated Flume Agent HDFS
19.
© Hortonworks Inc.
2014 Page 19 Streaming Ingest Illustrated Flume Agent HDFS while (…) write(); commit(); Commit can be time based or size based, up to writer commit() flushes to disk and sends commit to metastore
20.
© Hortonworks Inc.
2014 Page 20 Streaming Ingest Illustrated Flume Agent HDFS while (…) write(); commit(); Next write() appends to the same file
21.
© Hortonworks Inc.
2014 Page 21 Streaming Ingest Illustrated Flume Agent HDFS while (…) write(); commit(); Reader Task Reader uses txnid to determine which records to read
22.
© Hortonworks Inc.
2014 Page 22 • Phase 1, Hive 0.13 –Transaction and new lock manager –ORC file support –Automatic and manual compaction –Snapshot isolation –Streaming ingest via Flume • Phase 2, Hive 0.14 (we hope) –INSERT … VALUES, UPDATE, DELETE –BEGIN, COMMIT, ROLLBACK • Future (all speculative based on user feedback) –Versioned or point in time queries –Additional isolation levels such as dirty read or read committed –MERGE Phases of Development
23.
© Hortonworks Inc.
2014 Page 23 •Only suitable for data warehousing, not for OLTP •Table must be bucketed, and (currently) not sorted –Sorting restriction will be removed in the future Limitations
24.
© Hortonworks Inc.
2014 Page 24 •Good –Handles compactions for us –Already has similar data model with LSM •Bad –No cross row transactions –Would require us to write a transaction manager over HBase, doable, but not less work –Hfile is column family based rather than columnar –HBase focused on point lookups and range scans –Warehousing tends to require full scans Why Not HBase?
25.
© Hortonworks Inc.
2014 Page 25 •JIRA: https://issues.apache.org/jira/browse/HI VE-5317 •Adds ACID semantics to Hive •Uses SQL standard commands –INSERT, UPDATE, DELETE •Provides scalable read and write access Conclusion
26.
© Hortonworks Inc.
2013 Thank You! Questions & Answers Page 26
Baixar agora