SlideShare a Scribd company logo
1 of 28
  Data Warehousing & Analytics on Hadoop ,[object Object],[object Object]
Why Another Data Warehousing System? ,[object Object],[object Object],[object Object]
 
 
Lets try Hadoop… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What is HIVE? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Simplifying Hadoop ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Looks like this .. Node Node Node Node Node Node 1 Gigabit 4-8 Gigabit Node = DataNode  + Map-Reduce Disks Disks Disks Disks Disks Disks
Data Warehousing at Facebook Today Web Servers Scribe Servers Filers Hive on  Hadoop Cluster Oracle RAC Federated MySQL
Hive/Hadoop Usage @ Facebook ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hadoop Usage @ Facebook ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
In Pictures
 
[object Object]
HIVE: Components HDFS Hive CLI DDL Queries Browsing Map Reduce MetaStore Thrift API SerDe Thrift Jute JSON.. Execution Parser Planner Web UI Optimizer DB
Data Model Logical Partitioning Hash Partitioning clicks HDFS MetaStore / hive/clicks /hive/clicks/ds=2008-03-25 /hive/clicks/ds=2008-03-25/0 … Tables Metastore DB Data Location Bucketing Info Partitioning Cols
Hive Query Language ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Map Reduce Example Machine 2 Machine 1 <k1, v1> <k2, v2> <k3, v3> <k4, v4> <k5, v5> <k6, v6> <nk1, nv1> <nk2, nv2> <nk3, nv3> <nk2, nv4> <nk2, nv5> <nk1, nv6> Local Map <nk2, nv4> <nk2, nv5> <nk2, nv2> <nk1, nv1> <nk3, nv3> <nk1, nv6> Global Shuffle <nk1, nv1> <nk1, nv6> <nk3, nv3> <nk2, nv4> <nk2, nv5> <nk2, nv2> Local Sort <nk2, 3> <nk1, 2> <nk3, 1> Local Reduce
Hive QL – Join ,[object Object],[object Object],[object Object]
Hive QL – Join in Map Reduce page_view user pv_users Map Reduce key value 111 < 1, 1> 111 < 1, 2> 222 < 1, 1> pageid userid time 1 111 9:08:01 2 111 9:08:13 1 222 9:08:14 userid age gender 111 25 female 222 32 male key value 111 < 2, 25> 222 < 2, 32> key value 111 < 1, 1> 111 < 1, 2> 111 < 2, 25> key value 222 < 1, 1> 222 < 2, 32> Shuffle Sort Pageid age 1 25 2 25 pageid age 1 32
Hive QL – Group By ,[object Object],[object Object],[object Object]
Hive QL – Group By in Map Reduce pv_users Map Reduce pageid age 1 25 2 25 pageid age count 1 25 1 1 32 1 pageid age 1 32 2 25 key value <1,25> 1 <2,25> 1 key value <1,32> 1 <2,25> 1 key value <1,25> 1 <1,32> 1 key value <2,25> 1 <2,25> 1 Shuffle Sort pageid age count 2 25 2
Group by Optimizations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Inserts into Files, Tables and Local Files  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Extensibility - Custom Map/Reduce Scripts ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open Source Community ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Future Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Information ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Hive Demo Paper at VLDB 2009
Hive Demo Paper at VLDB 2009Hive Demo Paper at VLDB 2009
Hive Demo Paper at VLDB 2009
Namit Jain
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
rantav
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Zheng Shao
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Yahoo Developer Network
 
Hw09 Hadoop Development At Facebook Hive And Hdfs
Hw09   Hadoop Development At Facebook  Hive And HdfsHw09   Hadoop Development At Facebook  Hive And Hdfs
Hw09 Hadoop Development At Facebook Hive And Hdfs
Cloudera, Inc.
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
Adam Kawa
 

What's hot (19)

Hive Demo Paper at VLDB 2009
Hive Demo Paper at VLDB 2009Hive Demo Paper at VLDB 2009
Hive Demo Paper at VLDB 2009
 
report on aadhaar anlysis using bid data hadoop and hive
report on aadhaar anlysis using bid data hadoop and hivereport on aadhaar anlysis using bid data hadoop and hive
report on aadhaar anlysis using bid data hadoop and hive
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 
Hive : WareHousing Over hadoop
Hive :  WareHousing Over hadoopHive :  WareHousing Over hadoop
Hive : WareHousing Over hadoop
 
Hive
HiveHive
Hive
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for HadoopLearning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hw09 Hadoop Development At Facebook Hive And Hdfs
Hw09   Hadoop Development At Facebook  Hive And HdfsHw09   Hadoop Development At Facebook  Hive And Hdfs
Hw09 Hadoop Development At Facebook Hive And Hdfs
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
 

Viewers also liked

Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
one woman satisfies 12 men
one woman satisfies 12 menone woman satisfies 12 men
one woman satisfies 12 men
fondas vakalis
 
Photos With Reflections
Photos With ReflectionsPhotos With Reflections
Photos With Reflections
fondas vakalis
 
Soil Is The Geologist S Word For Dirt
Soil Is The Geologist S Word For DirtSoil Is The Geologist S Word For Dirt
Soil Is The Geologist S Word For Dirt
kopchick4141
 
From Neo to Trinity: The Matrix Reinvented
From Neo to Trinity: The Matrix ReinventedFrom Neo to Trinity: The Matrix Reinvented
From Neo to Trinity: The Matrix Reinvented
The Difference Engine
 

Viewers also liked (20)

Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
20081009nychive
20081009nychive20081009nychive
20081009nychive
 
Apache Hive - Introduction
Apache Hive - IntroductionApache Hive - Introduction
Apache Hive - Introduction
 
Hive Evolution: ApacheCon NA 2010
Hive Evolution:  ApacheCon NA 2010Hive Evolution:  ApacheCon NA 2010
Hive Evolution: ApacheCon NA 2010
 
Allemand
AllemandAllemand
Allemand
 
one woman satisfies 12 men
one woman satisfies 12 menone woman satisfies 12 men
one woman satisfies 12 men
 
Laila
LailaLaila
Laila
 
Being Caught Stealing (Con La Mano En El Pastel)
Being Caught Stealing (Con La Mano En El Pastel)Being Caught Stealing (Con La Mano En El Pastel)
Being Caught Stealing (Con La Mano En El Pastel)
 
Allemand
AllemandAllemand
Allemand
 
Photos With Reflections
Photos With ReflectionsPhotos With Reflections
Photos With Reflections
 
Lips & Mouth
Lips & MouthLips & Mouth
Lips & Mouth
 
Dahle Communication Group
Dahle Communication GroupDahle Communication Group
Dahle Communication Group
 
Soil Is The Geologist S Word For Dirt
Soil Is The Geologist S Word For DirtSoil Is The Geologist S Word For Dirt
Soil Is The Geologist S Word For Dirt
 
5 points on twitter with frank brunke
5 points on twitter with frank brunke5 points on twitter with frank brunke
5 points on twitter with frank brunke
 
From Neo to Trinity: The Matrix Reinvented
From Neo to Trinity: The Matrix ReinventedFrom Neo to Trinity: The Matrix Reinvented
From Neo to Trinity: The Matrix Reinvented
 

Similar to Hive Percona 2009

Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
nzhang
 
Hadoop & Zing
Hadoop & ZingHadoop & Zing
Hadoop & Zing
Long Dao
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zing
zingopen
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Cloudera, Inc.
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
 

Similar to Hive Percona 2009 (20)

Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-Delhi
 
Hadoop and Hive
Hadoop and HiveHadoop and Hive
Hadoop and Hive
 
Hadoop & Zing
Hadoop & ZingHadoop & Zing
Hadoop & Zing
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zing
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Prashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEWPrashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEW
 
02 data warehouse applications with hive
02 data warehouse applications with hive02 data warehouse applications with hive
02 data warehouse applications with hive
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
 
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveHarnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
 
Training
TrainingTraining
Training
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
 
Lipstick On Pig
Lipstick On Pig Lipstick On Pig
Lipstick On Pig
 
Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson
 
Putting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at NetflixPutting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at Netflix
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Pig Experience
Pig ExperiencePig Experience
Pig Experience
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 

Hive Percona 2009

  • 1.
  • 2.
  • 3.  
  • 4.  
  • 5.
  • 6.
  • 7.
  • 8. Looks like this .. Node Node Node Node Node Node 1 Gigabit 4-8 Gigabit Node = DataNode + Map-Reduce Disks Disks Disks Disks Disks Disks
  • 9. Data Warehousing at Facebook Today Web Servers Scribe Servers Filers Hive on Hadoop Cluster Oracle RAC Federated MySQL
  • 10.
  • 11.
  • 13.  
  • 14.
  • 15. HIVE: Components HDFS Hive CLI DDL Queries Browsing Map Reduce MetaStore Thrift API SerDe Thrift Jute JSON.. Execution Parser Planner Web UI Optimizer DB
  • 16. Data Model Logical Partitioning Hash Partitioning clicks HDFS MetaStore / hive/clicks /hive/clicks/ds=2008-03-25 /hive/clicks/ds=2008-03-25/0 … Tables Metastore DB Data Location Bucketing Info Partitioning Cols
  • 17.
  • 18. Map Reduce Example Machine 2 Machine 1 <k1, v1> <k2, v2> <k3, v3> <k4, v4> <k5, v5> <k6, v6> <nk1, nv1> <nk2, nv2> <nk3, nv3> <nk2, nv4> <nk2, nv5> <nk1, nv6> Local Map <nk2, nv4> <nk2, nv5> <nk2, nv2> <nk1, nv1> <nk3, nv3> <nk1, nv6> Global Shuffle <nk1, nv1> <nk1, nv6> <nk3, nv3> <nk2, nv4> <nk2, nv5> <nk2, nv2> Local Sort <nk2, 3> <nk1, 2> <nk3, 1> Local Reduce
  • 19.
  • 20. Hive QL – Join in Map Reduce page_view user pv_users Map Reduce key value 111 < 1, 1> 111 < 1, 2> 222 < 1, 1> pageid userid time 1 111 9:08:01 2 111 9:08:13 1 222 9:08:14 userid age gender 111 25 female 222 32 male key value 111 < 2, 25> 222 < 2, 32> key value 111 < 1, 1> 111 < 1, 2> 111 < 2, 25> key value 222 < 1, 1> 222 < 2, 32> Shuffle Sort Pageid age 1 25 2 25 pageid age 1 32
  • 21.
  • 22. Hive QL – Group By in Map Reduce pv_users Map Reduce pageid age 1 25 2 25 pageid age count 1 25 1 1 32 1 pageid age 1 32 2 25 key value <1,25> 1 <2,25> 1 key value <1,32> 1 <2,25> 1 key value <1,25> 1 <1,32> 1 key value <2,25> 1 <2,25> 1 Shuffle Sort pageid age count 2 25 2
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.