SlideShare uma empresa Scribd logo
1 de 20
Presented by- Rajat Tripathi
Roll No-16EJGCS025
Introduction
•Big data is a term used to describe the voluminous amount of
unstructured and semi-structured data a company creates.
•Data that would take too much time and cost too much money to
load into a relational database for analysis.
•Big data doesn't refer to any specific quantity, the term is often used
when speaking about petabytes and exabytes of data.
So What is The Problem?
The transfer speed is around 100 MB/s
A standard disk is 1 Terabyte
Time to read entire disk= 10000 seconds or 3 Hours!
Increase in processing time may not be as helpful because
•
•
Network bandwidth is now more of a limiting factor Physical limits of
processor chips have been reached
So What do we Do?
•The obvious solution is that we use multiple
processors to solve the same problem by
fragmenting it into pieces.
•Imagine if we had 100 drives, each holding
one hundredth of the data. Working in
parallel, we could read the data in under two
minutes.
Distributed v/s Parallelization
Parallelization- Multiple computers to a single network to achieve a
particular goal. Involves multiple computers
Each computer has its own memory , Allow scalability sharing. Resource
and helps to perform computation
Distributed Computing- Divides a single task between n task easily
computation Multiple processors or tasks simultaneously on a single unit.
It occurs in a single computer. Computer can have shared memory or
distributed memory. It increase the performance of computer
Problems In Distributed Computing
Hardware Failure
As soon as we start using many pieces of hardware, the chance that
one will fail is fairly high.
Combine the data after analysis
Most analysis tasks need to be able to combine the data in some way;
data read from one disk may need to be combined with the data from
any of the other 99 disks.
To The Rescue
Apache Hadoop is a framework for running applications on large
cluster built of commodity hardware.
A common way of avoiding data loss is through replication:
redundant copies of the data are kept by the system so that in the
event of failure, there is another copy available. The Hadoop
Distributed Filesystem (HDFS), takes care of this problem.
The second problem is solved by a simple programming model-
Mapreduce. Hadoop is the popular open source implementation
of MapReduce, a powerful tool designed for deep analysis and
transformation of very large data sets.
What is ? Technology?
The most well known technology used for Big Data is Hadoop.
It is an open source project by the Apache Foundation to handle
large data processing
•
It was inspired by Google’s MapReduce and Google File System
(GFS) papers in 2003 and 2004
•
It was originally conceived by Doug Cutting in 2005 and first used by
Yahoo! in 2006 .
It is made by apache software foundation in 2011
.
•
It is named after his son’s pet elephant incidentally
•
It is basically a distributed file system which is written in Java.
Hadoop Approach to distributed Computing
The theoretical 1000-CPU machine would cost a very large amount
of money, far more than 1,000 single-CPU.
Hadoop will tie these smaller and more reasonably priced machines
together into a single cost-effective computer cluster.
Hadoop provides a simplified programming model which allows the
user to quickly write and test distributed systems, and its’ efficient,
automatic distribution of data and work across machines and in turn
utilizing the underlying parallelism of the CPU cores.
Developers of Hadoop
 Michael j. cafarella Doug Cutting
Doug Cutting
and Michael
J.Cafarella
developed
Hadoop to
support
distribution
for the Notch
search engine project. This project was funded by Yahoo
Famous users of Hadoop?
Hadoop Components
HDFS
Self-healing
high-bandwidth
clustered storage
Map Reduce
Processing
Fault-tolerant
distributed
processing
Storage
Hadoop Map Reduce
MapReduce is a programming model
Programs written in this functional style are automatically parallelized and
executed on a large cluster of commodity machines
MapReduce is an associated implementation for processing and generating
large data sets.
The Programming Model of Map Reduce
Map, written by the user, takes an input pair and produces a set of intermediate
key/value pairs. The MapReduce library groups together all intermediate values
associated with the same intermediate key I and passes them to the Reduce
function.
The Reduce function, also written by the user, accepts an intermediate key I and a
set of values
Files systems that manage the storage across a network of
machines are called distributed filesystems.
Hadoop comes with a distributed filesystem called HDFS,
which stands for Hadoop Distributed Filesystem.
.
HDFS, the Hadoop Distributed File System, is a distributed file system
designed to hold very large amounts of data (terabytes or even
petabytes), and provide high-throughput access to this information.
The Hadoop distributed file system (HDFS) is a distributed,
scalable, and portable file-system written in Java for the Hadoop
framework.
Hadoop Distributed File System(HDFS)
Name nodes and Data nodes
A HDFS cluster has two types of node operating in a master-slave
pattern: a namenode (the master) and a number of datanodes (slave).
The namenode manages the filesystem namespace. It maintains the
filesystem tree and the metadata for all the files and directories in the
tree.
Datanodes are the work horses of the filesystem.It manages storage
attached to the nodes that they run on.
HDFS exposes a file system namespace and allows user data to be
stored in files.
Internally, a file is split into one or more blocks and these blocks are
stored in a set of DataNodes.
Conclusion
So major companies like facebook amazon,yahoo,etc. are
adapting Hadoop and in future there can be many names in the
list.
This technology has bright future scope because day by day need of
data would increase and security issues also the major point.
Hence Hadoop Technology is the best appropriate approach for
handling the large data in smart way and its future is bright…
Thank You

Mais conteúdo relacionado

Mais procurados

Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringBADR
 
Presentation on Hadoop Technology
Presentation on Hadoop TechnologyPresentation on Hadoop Technology
Presentation on Hadoop TechnologyOpenDev
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processingroyans
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 

Mais procurados (20)

Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data Engineering
 
Presentation on Hadoop Technology
Presentation on Hadoop TechnologyPresentation on Hadoop Technology
Presentation on Hadoop Technology
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big data
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Hadoop
Hadoop Hadoop
Hadoop
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
Hadoop Hadoop
Hadoop
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processing
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 

Semelhante a Seminar ppt

Semelhante a Seminar ppt (20)

hadoop
hadoophadoop
hadoop
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Anju
AnjuAnju
Anju
 
Hadoop .pdf
Hadoop .pdfHadoop .pdf
Hadoop .pdf
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Hadoop
HadoopHadoop
Hadoop
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Cppt
CpptCppt
Cppt
 

Último

PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGYpruthirajnayak525
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxaryanv1753
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxJohnree4
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸mathanramanathan2005
 
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...Henrik Hanke
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.KathleenAnnCordero2
 
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRRINDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRRsarwankumar4524
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationNathan Young
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Escort Service
 
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxEngaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxAsifArshad8
 
Quality by design.. ppt for RA (1ST SEM
Quality by design.. ppt for  RA (1ST SEMQuality by design.. ppt for  RA (1ST SEM
Quality by design.. ppt for RA (1ST SEMCharmi13
 
Early Modern Spain. All about this period
Early Modern Spain. All about this periodEarly Modern Spain. All about this period
Early Modern Spain. All about this periodSaraIsabelJimenez
 
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comsaastr
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxCarrieButtitta
 
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRachelAnnTenibroAmaz
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...漢銘 謝
 

Último (20)

PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptx
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptx
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸
 
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
 
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRRINDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism Presentation
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170
 
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxEngaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
 
Quality by design.. ppt for RA (1ST SEM
Quality by design.. ppt for  RA (1ST SEMQuality by design.. ppt for  RA (1ST SEM
Quality by design.. ppt for RA (1ST SEM
 
Early Modern Spain. All about this period
Early Modern Spain. All about this periodEarly Modern Spain. All about this period
Early Modern Spain. All about this period
 
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptx
 
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
 

Seminar ppt

  • 1. Presented by- Rajat Tripathi Roll No-16EJGCS025
  • 2. Introduction •Big data is a term used to describe the voluminous amount of unstructured and semi-structured data a company creates. •Data that would take too much time and cost too much money to load into a relational database for analysis. •Big data doesn't refer to any specific quantity, the term is often used when speaking about petabytes and exabytes of data.
  • 3. So What is The Problem? The transfer speed is around 100 MB/s A standard disk is 1 Terabyte Time to read entire disk= 10000 seconds or 3 Hours! Increase in processing time may not be as helpful because • • Network bandwidth is now more of a limiting factor Physical limits of processor chips have been reached
  • 4. So What do we Do? •The obvious solution is that we use multiple processors to solve the same problem by fragmenting it into pieces. •Imagine if we had 100 drives, each holding one hundredth of the data. Working in parallel, we could read the data in under two minutes.
  • 5. Distributed v/s Parallelization Parallelization- Multiple computers to a single network to achieve a particular goal. Involves multiple computers Each computer has its own memory , Allow scalability sharing. Resource and helps to perform computation Distributed Computing- Divides a single task between n task easily computation Multiple processors or tasks simultaneously on a single unit. It occurs in a single computer. Computer can have shared memory or distributed memory. It increase the performance of computer
  • 6. Problems In Distributed Computing Hardware Failure As soon as we start using many pieces of hardware, the chance that one will fail is fairly high. Combine the data after analysis Most analysis tasks need to be able to combine the data in some way; data read from one disk may need to be combined with the data from any of the other 99 disks.
  • 7. To The Rescue Apache Hadoop is a framework for running applications on large cluster built of commodity hardware. A common way of avoiding data loss is through replication: redundant copies of the data are kept by the system so that in the event of failure, there is another copy available. The Hadoop Distributed Filesystem (HDFS), takes care of this problem. The second problem is solved by a simple programming model- Mapreduce. Hadoop is the popular open source implementation of MapReduce, a powerful tool designed for deep analysis and transformation of very large data sets.
  • 8. What is ? Technology? The most well known technology used for Big Data is Hadoop. It is an open source project by the Apache Foundation to handle large data processing • It was inspired by Google’s MapReduce and Google File System (GFS) papers in 2003 and 2004 • It was originally conceived by Doug Cutting in 2005 and first used by Yahoo! in 2006 . It is made by apache software foundation in 2011 . • It is named after his son’s pet elephant incidentally • It is basically a distributed file system which is written in Java.
  • 9. Hadoop Approach to distributed Computing The theoretical 1000-CPU machine would cost a very large amount of money, far more than 1,000 single-CPU. Hadoop will tie these smaller and more reasonably priced machines together into a single cost-effective computer cluster. Hadoop provides a simplified programming model which allows the user to quickly write and test distributed systems, and its’ efficient, automatic distribution of data and work across machines and in turn utilizing the underlying parallelism of the CPU cores.
  • 10. Developers of Hadoop  Michael j. cafarella Doug Cutting Doug Cutting and Michael J.Cafarella developed Hadoop to support distribution for the Notch search engine project. This project was funded by Yahoo
  • 11. Famous users of Hadoop?
  • 12. Hadoop Components HDFS Self-healing high-bandwidth clustered storage Map Reduce Processing Fault-tolerant distributed processing Storage
  • 13. Hadoop Map Reduce MapReduce is a programming model Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines MapReduce is an associated implementation for processing and generating large data sets.
  • 14. The Programming Model of Map Reduce Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs. The MapReduce library groups together all intermediate values associated with the same intermediate key I and passes them to the Reduce function.
  • 15. The Reduce function, also written by the user, accepts an intermediate key I and a set of values
  • 16. Files systems that manage the storage across a network of machines are called distributed filesystems. Hadoop comes with a distributed filesystem called HDFS, which stands for Hadoop Distributed Filesystem. . HDFS, the Hadoop Distributed File System, is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes), and provide high-throughput access to this information. The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file-system written in Java for the Hadoop framework. Hadoop Distributed File System(HDFS)
  • 17. Name nodes and Data nodes A HDFS cluster has two types of node operating in a master-slave pattern: a namenode (the master) and a number of datanodes (slave). The namenode manages the filesystem namespace. It maintains the filesystem tree and the metadata for all the files and directories in the tree. Datanodes are the work horses of the filesystem.It manages storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes.
  • 18.
  • 19. Conclusion So major companies like facebook amazon,yahoo,etc. are adapting Hadoop and in future there can be many names in the list. This technology has bright future scope because day by day need of data would increase and security issues also the major point. Hence Hadoop Technology is the best appropriate approach for handling the large data in smart way and its future is bright…