SlideShare uma empresa Scribd logo
1 de 17
Hadoop
        Framework for Distributed Applications




                                        Nishant M Gandhi
                                BE 4th YEAR Comp. Eng.
C K Pithawalla College of Engineering & Technology,Surat.
Hadoop

• Introduction
• History
• Key Technologies
  – MapReduce
  – HDFS
• Other Projects On Hadoop
• Conclusion
Introduction:

What is                       ?
 Hadoop is a framework for running applications on large clusters
 built of commodity hardware.
                          ----HADOOP WIKI


  Hadoop is a free, Java-based programming framework that
  supports the processing of large data sets in a distributed
  computing environment.
Introduction (conti..)


 #1 Open Source
 #2 Part of Apache group
 #3 Power of JAVA
 #4 Supported By Big Web Giant Companies




#1 Google’s Powerful Computation MapReduce Technology
#2 Hadoop Distributed File System(HDFS) inspired by Google File
System(GFS)
#3 Used for Cluster & Distributed Computing
#4 Support from…
History:
Inventor Doug Cutting, creator of Apache Lucene

The Origin of the Name “Hadoop”:

The name my kid gave a stuffed yellow elephant. Short, relatively easy to
spell and pronounce, meaningless, and not used elsewhere: those are my
naming criteria. ---Daug Cutting.

Started with building Web Search Engine
   •Nutch in 2002
   •Aim was to index billions of pages
   •Architecture can’t support billions of pages
Google’s GFS in 2003 solved storage problem
   •Nutch Distributed Filesystem(NDFS) in 2004
Google’s MapReduce in 2004
   •MapReduce implimented in Nutch 2005
Feb 2006 they moved out of Nutch to form an independent
subproject of Lucene called Hadoop.
History (conti..)
At around the same time, Doug Cutting joined Yahoo
February 2008 , Yahoo! announced that its production search index
was being generated by a 10,000-core Hadoop cluster




 In January 2008, Hadoop was made its own top-level project at
 apache, confirming its success and its diverse, active community.
 By this time Hadoop was being used by many other companies
 besides Yahoo! such as
     • Last.fm
     • Facebook
     • The New York Times
     • Twitter
     • Microsoft
     • IBM
Key Technologies:
   •MapReduce
     -Computational Parallel Programming Model
     -Technology developed by google




   •Hadoop Distributed File System
     -Distributed File System for large data set
     -Inspired by Google File System
Key Technologies: MapReduce
Key Technologies: MapReduce

 •   Programming model developed at Google

 •   Sort/merge based distributed computing

 •   Initially, it was intended for their internal search/indexing
     application, but now used extensively by more organizations
     (e.g., Yahoo, Amazon.com, IBM, etc.)

 •   It is functional style programming (e.g., LISP) that is naturally
     parallelizable across a large cluster of workstations or PCS.

 •   The underlying system takes care of the partitioning of the
     input data, scheduling the program’s execution across several
     machines, handling machine failures, and managing required
     inter-machine communication. (This is the key for Hadoop’s
     success)
Key Technologies: HDFS

 At Google MapReduce operation are run on a special file system
  called Google File System (GFS) that is highly optimized for this
  purpose.

 GFS is not open source.

 Doug Cutting and others at Yahoo! reverse engineered the GFS
  and called it Hadoop Distributed File System (HDFS).
Key Technologies: HDFS
Key Technologies: HDFS

  •   Very Large Distributed File System
      – 10K nodes, 100 million files, 10 PB
  •   Assumes Commodity Hardware
      – Files are replicated to handle hardware failure
      – Detect failures and recovers from them
  •   Optimized for Batch Processing
      – Data locations exposed so that computations can move to
      where data resides
      – Provides very high aggregate bandwidth
  •   User Space, runs on heterogeneous OS
Other Projects on Hadoop:

        ZooKeeper: co-ordination services



        Pig: A high-level data-flow language and execution
        framework for parallel computation.



        Hive:A data warehouse infrastructure that provides
        data summarization and ad hoc querying.



         Chukwa: A data collection system for managing
         large distributed systems.
Other Projects on Hadoop:

               Avro: Apache Avro is a data serialization system.
               Avro provides:
               •Rich data structures.
               •A compact, fast, binary data format.
               •A container file, to store persistent data.
               •Simple integration with dynamic languages.


               Just as Google's Bigtable leverages the
               distributed data storage provided by the
               Google File System, HBase provides
               Bigtable-like capabilities on top of
               Hadoop Core.
Hadoop Architecture on DELL C Series
Server:
Conclusion:



Hadoop has been very effective solution for companies dealing
 with the data in perabytes.

It has solved many problems in industry related to huge data
 management and distributed system.

As it is open source, so it is adopted by companies widely.
Thank You…..

Mais conteúdo relacionado

Mais procurados

Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop TechnologyManish Borkar
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Cloudera, Inc.
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoopjoelcrabb
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...Simplilearn
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopApache Apex
 

Mais procurados (20)

Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 

Destaque

Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewKonstantin V. Shvachko
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learnedtcurdt
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - OverviewJay
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Designsudhakara st
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingBart Vandewoestyne
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersRahul Jain
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop EasyNick Dimiduk
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil
 

Destaque (20)

Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologies
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
 

Semelhante a Hadoop

Semelhante a Hadoop (20)

Hadoop training
Hadoop trainingHadoop training
Hadoop training
 
Anju
AnjuAnju
Anju
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Cap 10 ingles
Cap  10 inglesCap  10 ingles
Cap 10 ingles
 
Cap 10 ingles
Cap  10 inglesCap  10 ingles
Cap 10 ingles
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Hadoop
HadoopHadoop
Hadoop
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 

Mais de Nishant Gandhi

Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Nishant Gandhi
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyNishant Gandhi
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large GraphsNishant Gandhi
 
Graph Coloring Algorithms on Pregel Model using Hadoop
Graph Coloring Algorithms on Pregel Model using HadoopGraph Coloring Algorithms on Pregel Model using Hadoop
Graph Coloring Algorithms on Pregel Model using HadoopNishant Gandhi
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsNishant Gandhi
 
Packet tracer practical guide
Packet tracer practical guidePacket tracer practical guide
Packet tracer practical guideNishant Gandhi
 

Mais de Nishant Gandhi (8)

Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Graph Coloring Algorithms on Pregel Model using Hadoop
Graph Coloring Algorithms on Pregel Model using HadoopGraph Coloring Algorithms on Pregel Model using Hadoop
Graph Coloring Algorithms on Pregel Model using Hadoop
 
Neo4j vs giraph
Neo4j vs giraphNeo4j vs giraph
Neo4j vs giraph
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problems
 
Packet tracer practical guide
Packet tracer practical guidePacket tracer practical guide
Packet tracer practical guide
 
Hadoop Report
Hadoop ReportHadoop Report
Hadoop Report
 

Hadoop

  • 1. Hadoop Framework for Distributed Applications Nishant M Gandhi BE 4th YEAR Comp. Eng. C K Pithawalla College of Engineering & Technology,Surat.
  • 2. Hadoop • Introduction • History • Key Technologies – MapReduce – HDFS • Other Projects On Hadoop • Conclusion
  • 3. Introduction: What is ? Hadoop is a framework for running applications on large clusters built of commodity hardware. ----HADOOP WIKI Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.
  • 4. Introduction (conti..) #1 Open Source #2 Part of Apache group #3 Power of JAVA #4 Supported By Big Web Giant Companies #1 Google’s Powerful Computation MapReduce Technology #2 Hadoop Distributed File System(HDFS) inspired by Google File System(GFS) #3 Used for Cluster & Distributed Computing #4 Support from…
  • 5. History: Inventor Doug Cutting, creator of Apache Lucene The Origin of the Name “Hadoop”: The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria. ---Daug Cutting. Started with building Web Search Engine •Nutch in 2002 •Aim was to index billions of pages •Architecture can’t support billions of pages Google’s GFS in 2003 solved storage problem •Nutch Distributed Filesystem(NDFS) in 2004 Google’s MapReduce in 2004 •MapReduce implimented in Nutch 2005 Feb 2006 they moved out of Nutch to form an independent subproject of Lucene called Hadoop.
  • 6. History (conti..) At around the same time, Doug Cutting joined Yahoo February 2008 , Yahoo! announced that its production search index was being generated by a 10,000-core Hadoop cluster In January 2008, Hadoop was made its own top-level project at apache, confirming its success and its diverse, active community. By this time Hadoop was being used by many other companies besides Yahoo! such as • Last.fm • Facebook • The New York Times • Twitter • Microsoft • IBM
  • 7. Key Technologies: •MapReduce -Computational Parallel Programming Model -Technology developed by google •Hadoop Distributed File System -Distributed File System for large data set -Inspired by Google File System
  • 9. Key Technologies: MapReduce • Programming model developed at Google • Sort/merge based distributed computing • Initially, it was intended for their internal search/indexing application, but now used extensively by more organizations (e.g., Yahoo, Amazon.com, IBM, etc.) • It is functional style programming (e.g., LISP) that is naturally parallelizable across a large cluster of workstations or PCS. • The underlying system takes care of the partitioning of the input data, scheduling the program’s execution across several machines, handling machine failures, and managing required inter-machine communication. (This is the key for Hadoop’s success)
  • 10. Key Technologies: HDFS  At Google MapReduce operation are run on a special file system called Google File System (GFS) that is highly optimized for this purpose.  GFS is not open source.  Doug Cutting and others at Yahoo! reverse engineered the GFS and called it Hadoop Distributed File System (HDFS).
  • 12. Key Technologies: HDFS • Very Large Distributed File System – 10K nodes, 100 million files, 10 PB • Assumes Commodity Hardware – Files are replicated to handle hardware failure – Detect failures and recovers from them • Optimized for Batch Processing – Data locations exposed so that computations can move to where data resides – Provides very high aggregate bandwidth • User Space, runs on heterogeneous OS
  • 13. Other Projects on Hadoop: ZooKeeper: co-ordination services Pig: A high-level data-flow language and execution framework for parallel computation. Hive:A data warehouse infrastructure that provides data summarization and ad hoc querying. Chukwa: A data collection system for managing large distributed systems.
  • 14. Other Projects on Hadoop: Avro: Apache Avro is a data serialization system. Avro provides: •Rich data structures. •A compact, fast, binary data format. •A container file, to store persistent data. •Simple integration with dynamic languages. Just as Google's Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop Core.
  • 15. Hadoop Architecture on DELL C Series Server:
  • 16. Conclusion: Hadoop has been very effective solution for companies dealing with the data in perabytes. It has solved many problems in industry related to huge data management and distributed system. As it is open source, so it is adopted by companies widely.