SlideShare uma empresa Scribd logo
1 de 16
Baixar para ler offline
Intro to Hadoop
   Jaideep Dhok
Hi!


●   I work at

●   Involved with Hadoop for 2+ years
Outline
Brief History of Hadoop
●   2005 -
●   Inspired by the GFS and MapReduce papers
    published by Google.
●   Promoted heavily by Yahoo! Since 2006
●   Today, the defacto standard in 'Big Data'
    computing
The Buzz
Why?
●   'Big Data'
●   How big? - petabyte scale
●   Scalable
●   Robust
●   Secure!
Scalability
When To Use It
●   Can you use Hadoop to do X?
    ●   Is your problem 'embarassingly' parallel?
    ●   Workflow?
        –   Dependent/Independent Tasks
    ●   Data/CPU intensive?
●   Can you use Hadoop to do X in the Clouds?
    ●   Depends where your data is
Why To Use It?
●   Ad hoc analysis
    ●   Semi/structured data
        –   Log files
        –   Text
        –   CSV, XML, anything really
        –   RDBMS
        –   NoSQL!
Use Cases
●   Analytics
    ●   User behavior
●   Reporting
●   Filtering
●   Machine Learning
●   Just storing your data
Just From The Logs
●   Suppose you run a web-site
    ●   User breakdown by browsers
    ●   Location
    ●   Understanding user session
        –   How long do they use it?
        –   Who are the active users?
        –   What part of my app they use the most?
        –   What part of my app is user X's fav?
Tools
●   Native Hadoop APIs – Java
●   Streaming – Perl, Python, Ruby, any language
    as long it has support for 'stdin' and 'stdout'
●   Pig
●   HIVE
●   Pipes – C and C++
Ecosystem
Don't Wait
●   Hadoop
    ●   hadoop.apache.org
●   Cloudera tutorials on Hadoop
●   Books
Questions?
Thank You!
jaideep.dhok@gmail.com

Mais conteúdo relacionado

Mais procurados

Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopSteve Watt
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystemDataiku
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingCloudera, Inc.
 
Map reduce and hadoop at mylife
Map reduce and hadoop at mylifeMap reduce and hadoop at mylife
Map reduce and hadoop at myliferesponseteam
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSBouquet
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesKelly Technologies
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 

Mais procurados (19)

Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystem
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Map reduce and hadoop at mylife
Map reduce and hadoop at mylifeMap reduce and hadoop at mylife
Map reduce and hadoop at mylife
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Hadoop
Hadoop Hadoop
Hadoop
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Bw tech hadoop
Bw tech hadoopBw tech hadoop
Bw tech hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 

Semelhante a Geek camp

Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache HadoopSufi Nawaz
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...Big Data Montreal
 
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...Cloudera, Inc.
 
Drupal sharing in HP7
Drupal sharing in HP7Drupal sharing in HP7
Drupal sharing in HP7jimyhuang
 
Hadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute BeginnersHadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute BeginnersSam Dias
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaData Science London
 
Scaling up wso2 bam for billions of requests and terabytes of data
Scaling up wso2 bam for billions of requests and terabytes of dataScaling up wso2 bam for billions of requests and terabytes of data
Scaling up wso2 bam for billions of requests and terabytes of dataWSO2
 
The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013scorlosquet
 
Mr hadoop seedrocket
Mr hadoop seedrocketMr hadoop seedrocket
Mr hadoop seedrocketSeedRocket
 
Drupal as a Semantic Web platform - ISWC 2012
Drupal as a Semantic Web platform - ISWC 2012Drupal as a Semantic Web platform - ISWC 2012
Drupal as a Semantic Web platform - ISWC 2012scorlosquet
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Sparkdatamantra
 
Hw09 Next Steps For Hadoop
Hw09   Next Steps For HadoopHw09   Next Steps For Hadoop
Hw09 Next Steps For HadoopCloudera, Inc.
 

Semelhante a Geek camp (20)

Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache Hadoop
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 
JOSA TechTalks - Big Data on Hadoop
JOSA TechTalks - Big Data on HadoopJOSA TechTalks - Big Data on Hadoop
JOSA TechTalks - Big Data on Hadoop
 
Unit 3 intro.pptx
Unit 3 intro.pptxUnit 3 intro.pptx
Unit 3 intro.pptx
 
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
 
Drupal sharing in HP7
Drupal sharing in HP7Drupal sharing in HP7
Drupal sharing in HP7
 
Hadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute BeginnersHadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute Beginners
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Scaling up wso2 bam for billions of requests and terabytes of data
Scaling up wso2 bam for billions of requests and terabytes of dataScaling up wso2 bam for billions of requests and terabytes of data
Scaling up wso2 bam for billions of requests and terabytes of data
 
The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Mr hadoop seedrocket
Mr hadoop seedrocketMr hadoop seedrocket
Mr hadoop seedrocket
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Drupal as a Semantic Web platform - ISWC 2012
Drupal as a Semantic Web platform - ISWC 2012Drupal as a Semantic Web platform - ISWC 2012
Drupal as a Semantic Web platform - ISWC 2012
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Hw09 Next Steps For Hadoop
Hw09   Next Steps For HadoopHw09   Next Steps For Hadoop
Hw09 Next Steps For Hadoop
 
Apache pig
Apache pigApache pig
Apache pig
 

Geek camp

  • 1. Intro to Hadoop Jaideep Dhok
  • 2. Hi! ● I work at ● Involved with Hadoop for 2+ years
  • 4. Brief History of Hadoop ● 2005 - ● Inspired by the GFS and MapReduce papers published by Google. ● Promoted heavily by Yahoo! Since 2006 ● Today, the defacto standard in 'Big Data' computing
  • 6. Why? ● 'Big Data' ● How big? - petabyte scale ● Scalable ● Robust ● Secure!
  • 8. When To Use It ● Can you use Hadoop to do X? ● Is your problem 'embarassingly' parallel? ● Workflow? – Dependent/Independent Tasks ● Data/CPU intensive? ● Can you use Hadoop to do X in the Clouds? ● Depends where your data is
  • 9. Why To Use It? ● Ad hoc analysis ● Semi/structured data – Log files – Text – CSV, XML, anything really – RDBMS – NoSQL!
  • 10. Use Cases ● Analytics ● User behavior ● Reporting ● Filtering ● Machine Learning ● Just storing your data
  • 11. Just From The Logs ● Suppose you run a web-site ● User breakdown by browsers ● Location ● Understanding user session – How long do they use it? – Who are the active users? – What part of my app they use the most? – What part of my app is user X's fav?
  • 12. Tools ● Native Hadoop APIs – Java ● Streaming – Perl, Python, Ruby, any language as long it has support for 'stdin' and 'stdout' ● Pig ● HIVE ● Pipes – C and C++
  • 14. Don't Wait ● Hadoop ● hadoop.apache.org ● Cloudera tutorials on Hadoop ● Books