SlideShare uma empresa Scribd logo
1 de 12
1 
Hands on Hadoop 
Daniel Templeton & Inyoung Cho 
Cloudera, Inc.
2 
Your Hosts 
Daniel Templeton 
• Certification Developer 
• Crusty, old HPC guy 
• Likes Perl 
Inyoung Cho 
• Certification Developer 
• Recovering Java 
Evangelist 
• Invented JavaOne Hands-on 
Labs 
©2014 Cloudera, Inc. 2 All rights reserved.
3 
What is “Big Data”? 
• Super-cool marketing buzz word 
• “Come see our new line of BIG DATA toasters…” 
• “The Five V’s” 
• Any data that is difficult to store in a traditional 
RDBMS 
• Too big, changes schemas too often, unstructured, … 
©2014 Cloudera, Inc. 3 All rights reserved.
What is Hadoop? 
©2014 Cloudera, Inc. 4 All rights reserved.
What is Hadoop? 
©2014 Cloudera, Inc. 5 All rights reserved.
6 
HDFS in a Nutshell 
• Distributed “file system” service 
• Highly scalable and fault resilient 
• Chunks files into “blocks” that are replicated and 
distributed across the cluster 
©2014 Cloudera, Inc. 6 All rights reserved.
7 
MapReduce in a Nutshell 
• Embarrassingly parallel batch execution engine 
• Two phases: map and reduce 
• https://www.youtube.com/watch?v=bcjSe0xCHbE 
• Tasks are scheduled to run where the data is 
• Jobs are written to Java API 
©2014 Cloudera, Inc. 7 All rights reserved.
8 
Hive in a Nutshell 
• SQL engine for Hadoop 
• Translates HiveQL into MapReduce jobs 
©2014 Cloudera, Inc. 8 All rights reserved.
9 
Impala in a Nutshell 
• Hive with the MapReduce 
©2014 Cloudera, Inc. 9 All rights reserved.
10 
Pig in a Nutshell 
• Script-like language for data operations 
• Translates into MapReduce jobs 
©2014 Cloudera, Inc. 10 All rights reserved.
11 
The Lab 
• Self-paced 
• Should take right about 2 hours 
• “Additional Exercises” if you finish early 
• Inyoung and I are here to answer questions 
• Have fun! 
©2014 Cloudera, Inc. 11 All rights reserved.
12 ©2014 Cloudera, Inc. All rights reserved. 
Daniel Templeton & 
Inyoung Cho

Mais conteúdo relacionado

Mais procurados

How Cloudify uses Chef as a Foundation for PaaS
How Cloudify uses Chef as a Foundation for PaaSHow Cloudify uses Chef as a Foundation for PaaS
How Cloudify uses Chef as a Foundation for PaaSNati Shalom
 
NLUUG print conference May 26 2016
NLUUG print conference May 26 2016NLUUG print conference May 26 2016
NLUUG print conference May 26 2016Igmar Palsenberg
 
AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour Amazon Web Services
 
Wido den hollander cloud stack and ceph
Wido den hollander   cloud stack and cephWido den hollander   cloud stack and ceph
Wido den hollander cloud stack and cephShapeBlue
 
Hbasecon2013 Wrap Up
Hbasecon2013 Wrap UpHbasecon2013 Wrap Up
Hbasecon2013 Wrap UpMinwoo Kim
 
Python & Cassandra - Best Friends
Python & Cassandra - Best FriendsPython & Cassandra - Best Friends
Python & Cassandra - Best FriendsJon Haddad
 
Open Datacentre
Open DatacentreOpen Datacentre
Open DatacentreDes Drury
 
Orchestrating VM & Container Deployments
Orchestrating VM & Container DeploymentsOrchestrating VM & Container Deployments
Orchestrating VM & Container DeploymentsLars Wander
 
Apache Cassandra Management
Apache Cassandra ManagementApache Cassandra Management
Apache Cassandra ManagementInstaclustr
 
Kubernetes training
Kubernetes trainingKubernetes training
Kubernetes trainingDes Drury
 
DevOps, Cloud, and the Death of Backup Tape Changers
DevOps, Cloud, and the Death of Backup Tape ChangersDevOps, Cloud, and the Death of Backup Tape Changers
DevOps, Cloud, and the Death of Backup Tape Changerske4qqq
 
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...In-Memory Computing Summit
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
2013-cloudconnect-OpenStack@BT
2013-cloudconnect-OpenStack@BT2013-cloudconnect-OpenStack@BT
2013-cloudconnect-OpenStack@BTuictamale
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesosnelsonadpresent
 
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformLarge Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformDataStax Academy
 
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraDataStax Academy
 

Mais procurados (20)

How Cloudify uses Chef as a Foundation for PaaS
How Cloudify uses Chef as a Foundation for PaaSHow Cloudify uses Chef as a Foundation for PaaS
How Cloudify uses Chef as a Foundation for PaaS
 
NLUUG print conference May 26 2016
NLUUG print conference May 26 2016NLUUG print conference May 26 2016
NLUUG print conference May 26 2016
 
AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour
 
Wido den hollander cloud stack and ceph
Wido den hollander   cloud stack and cephWido den hollander   cloud stack and ceph
Wido den hollander cloud stack and ceph
 
Hbasecon2013 Wrap Up
Hbasecon2013 Wrap UpHbasecon2013 Wrap Up
Hbasecon2013 Wrap Up
 
Amazon EMR
Amazon EMRAmazon EMR
Amazon EMR
 
Python & Cassandra - Best Friends
Python & Cassandra - Best FriendsPython & Cassandra - Best Friends
Python & Cassandra - Best Friends
 
Open Datacentre
Open DatacentreOpen Datacentre
Open Datacentre
 
Orchestrating VM & Container Deployments
Orchestrating VM & Container DeploymentsOrchestrating VM & Container Deployments
Orchestrating VM & Container Deployments
 
Apache Cassandra Management
Apache Cassandra ManagementApache Cassandra Management
Apache Cassandra Management
 
Kubernetes training
Kubernetes trainingKubernetes training
Kubernetes training
 
DevOps, Cloud, and the Death of Backup Tape Changers
DevOps, Cloud, and the Death of Backup Tape ChangersDevOps, Cloud, and the Death of Backup Tape Changers
DevOps, Cloud, and the Death of Backup Tape Changers
 
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
2013-cloudconnect-OpenStack@BT
2013-cloudconnect-OpenStack@BT2013-cloudconnect-OpenStack@BT
2013-cloudconnect-OpenStack@BT
 
Way to cloud
Way to cloudWay to cloud
Way to cloud
 
Openstack summit 2015
Openstack summit 2015Openstack summit 2015
Openstack summit 2015
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
 
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformLarge Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
 
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run Cassandra
 

Destaque (8)

Upper respiratory tract infection
Upper respiratory tract infectionUpper respiratory tract infection
Upper respiratory tract infection
 
Who is accountable?
Who is accountable?Who is accountable?
Who is accountable?
 
Midtown Csi Presentation9 July27 2011
Midtown Csi Presentation9 July27 2011Midtown Csi Presentation9 July27 2011
Midtown Csi Presentation9 July27 2011
 
1% for Education - National Education Fundraising
1% for Education - National Education Fundraising1% for Education - National Education Fundraising
1% for Education - National Education Fundraising
 
Futbol
FutbolFutbol
Futbol
 
Profitability solution for bank
Profitability solution for bankProfitability solution for bank
Profitability solution for bank
 
Five Emerging Education Trends - Are your schools ready
Five Emerging Education Trends  - Are your schools readyFive Emerging Education Trends  - Are your schools ready
Five Emerging Education Trends - Are your schools ready
 
Project Report on Financial Statement Analysis
Project Report on Financial Statement AnalysisProject Report on Financial Statement Analysis
Project Report on Financial Statement Analysis
 

Semelhante a Hands-on Hadoop Guide to Big Data and Hadoop Tools

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformInMobi Technology
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Cloudera, Inc.
 
OpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaOpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaKamesh Pemmaraju
 
Case Study: University Alabama-Birmingham.
Case Study: University Alabama-Birmingham.Case Study: University Alabama-Birmingham.
Case Study: University Alabama-Birmingham.Red_Hat_Storage
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edgeRam Kedem
 
Big data and mstr bridge the elephant
Big data and mstr   bridge the elephantBig data and mstr   bridge the elephant
Big data and mstr bridge the elephantKognitio
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Hashicorp at holaluz
Hashicorp at holaluzHashicorp at holaluz
Hashicorp at holaluzRicard Clau
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015Cloudera, Inc.
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSWJason Hubbard
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 

Semelhante a Hands-on Hadoop Guide to Big Data and Hadoop Tools (20)

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 
YARN
YARNYARN
YARN
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
OpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaOpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of Alabama
 
Case Study: University Alabama-Birmingham.
Case Study: University Alabama-Birmingham.Case Study: University Alabama-Birmingham.
Case Study: University Alabama-Birmingham.
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edge
 
Big data and mstr bridge the elephant
Big data and mstr   bridge the elephantBig data and mstr   bridge the elephant
Big data and mstr bridge the elephant
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Hashicorp at holaluz
Hashicorp at holaluzHashicorp at holaluz
Hashicorp at holaluz
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 

Mais de templedf

Couchbase Server
Couchbase ServerCouchbase Server
Couchbase Servertempledf
 
Supermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop InfrastructureSupermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop Infrastructuretempledf
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analyticstempledf
 
Datameer Analytics Solution
Datameer Analytics SolutionDatameer Analytics Solution
Datameer Analytics Solutiontempledf
 
Puppet Labs Puppet Enterprise
Puppet Labs Puppet EnterprisePuppet Labs Puppet Enterprise
Puppet Labs Puppet Enterprisetempledf
 
Pervasive DataRush
Pervasive DataRushPervasive DataRush
Pervasive DataRushtempledf
 
Composite Information Server
Composite Information ServerComposite Information Server
Composite Information Servertempledf
 

Mais de templedf (9)

Couchbase Server
Couchbase ServerCouchbase Server
Couchbase Server
 
Supermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop InfrastructureSupermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop Infrastructure
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analytics
 
Talend
TalendTalend
Talend
 
Datameer Analytics Solution
Datameer Analytics SolutionDatameer Analytics Solution
Datameer Analytics Solution
 
Puppet Labs Puppet Enterprise
Puppet Labs Puppet EnterprisePuppet Labs Puppet Enterprise
Puppet Labs Puppet Enterprise
 
Couchbase
CouchbaseCouchbase
Couchbase
 
Pervasive DataRush
Pervasive DataRushPervasive DataRush
Pervasive DataRush
 
Composite Information Server
Composite Information ServerComposite Information Server
Composite Information Server
 

Hands-on Hadoop Guide to Big Data and Hadoop Tools

  • 1. 1 Hands on Hadoop Daniel Templeton & Inyoung Cho Cloudera, Inc.
  • 2. 2 Your Hosts Daniel Templeton • Certification Developer • Crusty, old HPC guy • Likes Perl Inyoung Cho • Certification Developer • Recovering Java Evangelist • Invented JavaOne Hands-on Labs ©2014 Cloudera, Inc. 2 All rights reserved.
  • 3. 3 What is “Big Data”? • Super-cool marketing buzz word • “Come see our new line of BIG DATA toasters…” • “The Five V’s” • Any data that is difficult to store in a traditional RDBMS • Too big, changes schemas too often, unstructured, … ©2014 Cloudera, Inc. 3 All rights reserved.
  • 4. What is Hadoop? ©2014 Cloudera, Inc. 4 All rights reserved.
  • 5. What is Hadoop? ©2014 Cloudera, Inc. 5 All rights reserved.
  • 6. 6 HDFS in a Nutshell • Distributed “file system” service • Highly scalable and fault resilient • Chunks files into “blocks” that are replicated and distributed across the cluster ©2014 Cloudera, Inc. 6 All rights reserved.
  • 7. 7 MapReduce in a Nutshell • Embarrassingly parallel batch execution engine • Two phases: map and reduce • https://www.youtube.com/watch?v=bcjSe0xCHbE • Tasks are scheduled to run where the data is • Jobs are written to Java API ©2014 Cloudera, Inc. 7 All rights reserved.
  • 8. 8 Hive in a Nutshell • SQL engine for Hadoop • Translates HiveQL into MapReduce jobs ©2014 Cloudera, Inc. 8 All rights reserved.
  • 9. 9 Impala in a Nutshell • Hive with the MapReduce ©2014 Cloudera, Inc. 9 All rights reserved.
  • 10. 10 Pig in a Nutshell • Script-like language for data operations • Translates into MapReduce jobs ©2014 Cloudera, Inc. 10 All rights reserved.
  • 11. 11 The Lab • Self-paced • Should take right about 2 hours • “Additional Exercises” if you finish early • Inyoung and I are here to answer questions • Have fun! ©2014 Cloudera, Inc. 11 All rights reserved.
  • 12. 12 ©2014 Cloudera, Inc. All rights reserved. Daniel Templeton & Inyoung Cho