SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC.
                         Copyright for all other & referenced work is retained by their respective owners.




Introducing Hadoop
Mastering Hadoop Map-reduce for Data Analysis


Shashank Tiwari
blog: shanky.org | twitter: @tshanky
st@treasuryofideas.com
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                        All other & referenced work is copyrighted to their respective owners




What is Hadoop
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                        All other & referenced work is copyrighted to their respective owners




HDFS Architecture
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                        All other & referenced work is copyrighted to their respective owners




Namenode/Datanode, JobTracker/TaskTracker
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                      All other & referenced work is copyrighted to their respective owners




MapReduce
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                      All other & referenced work is copyrighted to their respective owners




ZK Namespace
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                        All other & referenced work is copyrighted to their respective owners




Essential HBase Schema
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                         All other & referenced work is copyrighted to their respective owners




Multi-dimensional View
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                  All other & referenced work is copyrighted to their respective owners




A Map/Hash View

•{


• "row_key_1" : { "name" : {


•     "first_name" : "Jolly", "last_name" : "Goodfellow"


•     } } },


•    "location" : { "zip": "94301" },
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                         All other & referenced work is copyrighted to their respective owners




Architectural View (HBase)
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                       All other & referenced work is copyrighted to their respective owners




The Persistence Mechanism
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                         All other & referenced work is copyrighted to their respective owners




The underlying file format
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                 All other & referenced work is copyrighted to their respective owners




Installing & Setting up Hadoop

• Required software: Java 1.6.x, ssh + sshd


• Download


• Install


• Configure


   • single-node


   • pseudo-distributed


   • cluster
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                All other & referenced work is copyrighted to their respective owners




Download

• Source: http://hadoop.apache.org/


• Version:


   • 0.20.203.x -- current stable


   • 0.20.x -- previous stable


• Includes


   • Hadoop Common -- common utilities, HDFS, MapReduce
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                All other & referenced work is copyrighted to their respective owners




Install

• Extract: tar zxvf hadoop-0.20.203.0rc1.tar.gz


• Move & Create Symbolic Link


   • ln -s hadoop-0.20.203.0 hadoop


• On Windows


   • http://developer.yahoo.com/hadoop/tutorial/module3.html
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                               All other & referenced work is copyrighted to their respective owners




Configure -- single-node

• Edit: conf/hadoop-env.sh


  • Set JAVA_HOME


• Default configuration is single-node


• Start bin/hadoop (for command options)


• Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/
  single_node_setup.html
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                All other & referenced work is copyrighted to their respective owners




Configure -- pseduo-distributed

• Edit: conf/core-site.xml (configure HDFS daemon)


• Edit: conf/hdfs-site.xml (configure HDFS replication factor)


• Edit: conf/mapred-site.xml (configure MapReduce JobTracker daemon)


• Enable ssh to localhost (without passphrase)


• Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/
  single_node_setup.html
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                 All other & referenced work is copyrighted to their respective owners




Start Hadoop
• Format HDFS: bin/hadoop namenode -format


• Start all daemons: bin/start-all.sh


• Verify logs


• Browse the web interface:


   • Namenode: http://localhost:50070/


   • JobTracker: http://localhost:50030/
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                All other & referenced work is copyrighted to their respective owners




Take Hadoop for a test-drive
• Run examples (hadoop-examples-0.20.203.0.jar)


• Grep using regular expressions


  • Copy files to HDFS: bin/hadoop fs -put bin input


  • Grep for files which have text beginning with ‘start’


  • Verify output on HDFS: bin/hadoop fs -cat output/*


  • Copy output to local filesystem & verify: bin/hadoop fs -get output output
    && cat output/*
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                               All other & referenced work is copyrighted to their respective owners




Configure -- cluster
• References:


• http://hadoop.apache.org/common/docs/r0.20.203.0/cluster_setup.html
  (official documentation)


• http://developer.yahoo.com/hadoop/tutorial/module7.html (Managing a
  Hadoop Cluster. Source: YDN)


• http://wiki.datameer.com/display/DAS1/Hadoop+Cluster+Configuration+Tips
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                All other & referenced work is copyrighted to their respective owners




Questions?




• blog: shanky.org | twitter: @tshanky


• st@treasuryofideas.com

Mais conteúdo relacionado

Mais procurados

Beginning hive and_apache_pig
Beginning hive and_apache_pigBeginning hive and_apache_pig
Beginning hive and_apache_pig
Mohamed Ali Mahmoud khouder
 
Polyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudPolyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the Cloud
Andrei Savu
 
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Yahoo Developer Network
 
Hortonworks HBase Meetup Presentation
Hortonworks HBase Meetup PresentationHortonworks HBase Meetup Presentation
Hortonworks HBase Meetup Presentation
Hortonworks
 

Mais procurados (19)

Picconf12
Picconf12Picconf12
Picconf12
 
Beginning hive and_apache_pig
Beginning hive and_apache_pigBeginning hive and_apache_pig
Beginning hive and_apache_pig
 
Polyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudPolyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the Cloud
 
Apache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersApache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCoders
 
H cat berlinbuzzwords2012
H cat berlinbuzzwords2012H cat berlinbuzzwords2012
H cat berlinbuzzwords2012
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
 
Future of HCatalog
Future of HCatalogFuture of HCatalog
Future of HCatalog
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
Website designing company_in_delhi_phpwebdevelopment
Website designing company_in_delhi_phpwebdevelopmentWebsite designing company_in_delhi_phpwebdevelopment
Website designing company_in_delhi_phpwebdevelopment
 
API Design
API DesignAPI Design
API Design
 
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
 
Hortonworks HBase Meetup Presentation
Hortonworks HBase Meetup PresentationHortonworks HBase Meetup Presentation
Hortonworks HBase Meetup Presentation
 
Puppet for Everybody: Federated and Hierarchical Puppet Enterprise
Puppet for Everybody: Federated and Hierarchical Puppet EnterprisePuppet for Everybody: Federated and Hierarchical Puppet Enterprise
Puppet for Everybody: Federated and Hierarchical Puppet Enterprise
 
REDIS327
REDIS327REDIS327
REDIS327
 
Puppet Camp DC: Puppet for Everybody
Puppet Camp DC: Puppet for EverybodyPuppet Camp DC: Puppet for Everybody
Puppet Camp DC: Puppet for Everybody
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤
 
An Introduction to Apache Pig
An Introduction to Apache PigAn Introduction to Apache Pig
An Introduction to Apache Pig
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic Commands
 
Rails 6 Multi-DB 実戦投入
Rails 6 Multi-DB 実戦投入Rails 6 Multi-DB 実戦投入
Rails 6 Multi-DB 実戦投入
 

Destaque (7)

SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
SDEC2011 Introducing Hadoop
SDEC2011 Introducing HadoopSDEC2011 Introducing Hadoop
SDEC2011 Introducing Hadoop
 
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modellingSDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
 
TimeTrax CaseStudy-CocaCola-CCBPL
TimeTrax CaseStudy-CocaCola-CCBPLTimeTrax CaseStudy-CocaCola-CCBPL
TimeTrax CaseStudy-CocaCola-CCBPL
 
SDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuerSDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuer
 
Google Protocol Buffers
Google Protocol BuffersGoogle Protocol Buffers
Google Protocol Buffers
 

Semelhante a Sdec2011 shashank-introducing hadoop

Hw09 Security And Api Compatibility
Hw09   Security And Api CompatibilityHw09   Security And Api Compatibility
Hw09 Security And Api Compatibility
Cloudera, Inc.
 
Implementing Hadoop on a single cluster
Implementing Hadoop on a single clusterImplementing Hadoop on a single cluster
Implementing Hadoop on a single cluster
Salil Navgire
 
Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012
Hortonworks
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of Mahout
Korea Sdec
 

Semelhante a Sdec2011 shashank-introducing hadoop (20)

Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in Hadoop
 
Hw09 Security And Api Compatibility
Hw09   Security And Api CompatibilityHw09   Security And Api Compatibility
Hw09 Security And Api Compatibility
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
 
Presentation
PresentationPresentation
Presentation
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
 
Implementing Hadoop on a single cluster
Implementing Hadoop on a single clusterImplementing Hadoop on a single cluster
Implementing Hadoop on a single cluster
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of Mahout
 
Php Dependency Management with Composer ZendCon 2016
Php Dependency Management with Composer ZendCon 2016Php Dependency Management with Composer ZendCon 2016
Php Dependency Management with Composer ZendCon 2016
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training
 

Mais de Korea Sdec

SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and HiveSDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
Korea Sdec
 
SDEC2011 Going by TACC
SDEC2011 Going by TACCSDEC2011 Going by TACC
SDEC2011 Going by TACC
Korea Sdec
 

Mais de Korea Sdec (8)

SDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestionSDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestion
 
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and HiveSDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
 
SDEC2011 Rapidant
SDEC2011 RapidantSDEC2011 Rapidant
SDEC2011 Rapidant
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
SDEC2011 Going by TACC
SDEC2011 Going by TACCSDEC2011 Going by TACC
SDEC2011 Going by TACC
 
SDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & ExperiencesSDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & Experiences
 
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedSDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speed
 
SDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloudSDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloud
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Sdec2011 shashank-introducing hadoop

  • 1. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC. Copyright for all other & referenced work is retained by their respective owners. Introducing Hadoop Mastering Hadoop Map-reduce for Data Analysis Shashank Tiwari blog: shanky.org | twitter: @tshanky st@treasuryofideas.com
  • 2. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners What is Hadoop
  • 3. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners HDFS Architecture
  • 4. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Namenode/Datanode, JobTracker/TaskTracker
  • 5. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners MapReduce
  • 6. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners ZK Namespace
  • 7. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Essential HBase Schema
  • 8. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Multi-dimensional View
  • 9. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners A Map/Hash View •{ • "row_key_1" : { "name" : { • "first_name" : "Jolly", "last_name" : "Goodfellow" • } } }, • "location" : { "zip": "94301" },
  • 10. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Architectural View (HBase)
  • 11. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners The Persistence Mechanism
  • 12. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners The underlying file format
  • 13. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Installing & Setting up Hadoop • Required software: Java 1.6.x, ssh + sshd • Download • Install • Configure • single-node • pseudo-distributed • cluster
  • 14. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Download • Source: http://hadoop.apache.org/ • Version: • 0.20.203.x -- current stable • 0.20.x -- previous stable • Includes • Hadoop Common -- common utilities, HDFS, MapReduce
  • 15. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Install • Extract: tar zxvf hadoop-0.20.203.0rc1.tar.gz • Move & Create Symbolic Link • ln -s hadoop-0.20.203.0 hadoop • On Windows • http://developer.yahoo.com/hadoop/tutorial/module3.html
  • 16. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Configure -- single-node • Edit: conf/hadoop-env.sh • Set JAVA_HOME • Default configuration is single-node • Start bin/hadoop (for command options) • Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/ single_node_setup.html
  • 17. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Configure -- pseduo-distributed • Edit: conf/core-site.xml (configure HDFS daemon) • Edit: conf/hdfs-site.xml (configure HDFS replication factor) • Edit: conf/mapred-site.xml (configure MapReduce JobTracker daemon) • Enable ssh to localhost (without passphrase) • Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/ single_node_setup.html
  • 18. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Start Hadoop • Format HDFS: bin/hadoop namenode -format • Start all daemons: bin/start-all.sh • Verify logs • Browse the web interface: • Namenode: http://localhost:50070/ • JobTracker: http://localhost:50030/
  • 19. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Take Hadoop for a test-drive • Run examples (hadoop-examples-0.20.203.0.jar) • Grep using regular expressions • Copy files to HDFS: bin/hadoop fs -put bin input • Grep for files which have text beginning with ‘start’ • Verify output on HDFS: bin/hadoop fs -cat output/* • Copy output to local filesystem & verify: bin/hadoop fs -get output output && cat output/*
  • 20. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Configure -- cluster • References: • http://hadoop.apache.org/common/docs/r0.20.203.0/cluster_setup.html (official documentation) • http://developer.yahoo.com/hadoop/tutorial/module7.html (Managing a Hadoop Cluster. Source: YDN) • http://wiki.datameer.com/display/DAS1/Hadoop+Cluster+Configuration+Tips
  • 21. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Questions? • blog: shanky.org | twitter: @tshanky • st@treasuryofideas.com