Big data and hadoop

BIG DATA & HADOOP
Presented by :
Roushan Kumar Sinha
Rollno:1521613031
B.Tech (I.T-2nd yr.)
topics
• What is big data.
• Types of big data.
• So what is the problem.
• So what do we do.
• Three characteristics of big data.
• Google’s solution.
• What is Hadoop.
• HDFS
• HDFS architecture.
• Map reduce.
• Conclusion.
What is big data?
• Lots of data(terabytes,petabytes,or exabytes)
• Big data is the term for a collection of data set so large and complex that
it becomes difficult to process using on-hand database management
tools or our traditional data processing application.
• The challenge include capture, storage, search, sharing, transfer,
analysis and visualization.
• System/enterprises generates huge amount of data from terabytes to
petabytes of information.
Single Jet engine can generate
10+terabytes of data in 30 minutes of a
flight time.
Big data and hadoop
Big data and hadoop
SoWhat IsThe Problem?
• The transfer is about 100 mb/sec.
• A standred disk is 1 terabyte.
• Time to read entire disk = 10000 sec. or 3 hrs !
• Increase in processing time may not be as helpful because
• Network bandwidth is now more of a limiting factor.
• Physical limits processor chips have been reached.
Big data and hadoop
Big data and hadoop
GOOGLE’S Solution
• Google solved this problem using an algorithm called MapReduce.This
algorithm divides the task into small parts and assigns those parts to many
computers connected over the network, and collects the results to form the
final result dataset.
• Above diagram shows various commodity hardwares which could be single
CPU machines or servers with higher capacity.
What is Hadoop?
• Apache Hadoop is a framework that allows for the distributed
processing of large data set across cluster of commodity computers
using a simple programming model.
• It is an open source data management with scale out storage &
distributed processing.
Hadoop key characteristics
Hadoop Distibuted File System (HDFS)
•Hadoop File System was developed using distributed
file system design. It is run on commodity hardware.
Unlike other distributed systems, HDFS is highly
faulttolerant and designed using low-cost hardware.
•HDFS holds very large amount of data and provides
easier access.To store such huge data, the files are
stored across multiple machines.These files are stored
in redundant fashion to rescue the system from possible
data losses in case of failure. HDFS also makes
applications available to parallel processing.
Features of HDFS
•It is suitable for the distributed storage and processing.
•Hadoop provides a command interface to interact with
HDFS.
•The built-in servers of namenode and datanode help
users to easily check the status of cluster.
•Streaming access to file system data.
•HDFS provides file permissions and authentication.
Big data and hadoop
Goals of HDFS
•Fault detection and recovery : Since HDFS includes a
large number of commodity hardware, failure of
components is frequent.Therefore HDFS should have
mechanisms for quick and automatic fault detection
and recovery.
•Huge datasets : HDFS should have hundreds of nodes
per cluster to manage the applications having huge
datasets.
•Hardware at data : A requested task can be done
efficiently, when the computation takes place near the
data. Especially where huge datasets are involved, it
reduces the network traffic and increases the
throughput.
Big data and hadoop
How Map Reduce works
• A map reduce job splits the data-set into independent chunks which are
processed by the map tasks in a completely parallel manner.
• The framework sorts the outputs of the maps which are then input to the
reduce task.
• Typically boyh he input and the output of the job are stored in a file
system.The framework takes care of scheduling tasks monitoring the
and re-executes the failed tasks.
• Hadoop runs the job by dividing it into tasks,of which they are two types:
map tasks and reduce tasks.
Big data and hadoop
Big data and hadoop
Big data and hadoop
THANKYOU
1 de 21

Recomendados

Hadoop Technology por
Hadoop TechnologyHadoop Technology
Hadoop TechnologyEce Seçil AKBAŞ
1.8K visualizações73 slides
Anju por
AnjuAnju
AnjuAnju Shekhawat
417 visualizações23 slides
Hadoop technology por
Hadoop technologyHadoop technology
Hadoop technologySohini~~ Music
1.2K visualizações25 slides
Hadoop por
HadoopHadoop
HadoopTuan Cuong Luu
988 visualizações37 slides
Hadoop seminar por
Hadoop seminarHadoop seminar
Hadoop seminarKrishnenduKrishh
477 visualizações49 slides
Hadoop Technology por
Hadoop TechnologyHadoop Technology
Hadoop TechnologyAtul Kushwaha
2.5K visualizações22 slides

Mais conteúdo relacionado

Mais procurados

Apache hadoop technology : Beginners por
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
215 visualizações21 slides
Seminar ppt por
Seminar pptSeminar ppt
Seminar pptRajatTripathi34
49 visualizações20 slides
Hadoop Fundamentals por
Hadoop FundamentalsHadoop Fundamentals
Hadoop Fundamentalsits_skm
1.5K visualizações30 slides
Hadoop por
HadoopHadoop
HadoopKasam Sharif
154 visualizações11 slides
Hadoop and big data por
Hadoop and big dataHadoop and big data
Hadoop and big dataSharad Pandey
559 visualizações20 slides
Cloud Computing: Hadoop por
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoopdarugar
6.9K visualizações21 slides

Mais procurados(18)

Apache hadoop technology : Beginners por Shweta Patnaik
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik215 visualizações
Seminar ppt por RajatTripathi34
Seminar pptSeminar ppt
Seminar ppt
RajatTripathi3449 visualizações
Hadoop Fundamentals por its_skm
Hadoop FundamentalsHadoop Fundamentals
Hadoop Fundamentals
its_skm1.5K visualizações
Hadoop por Kasam Sharif
HadoopHadoop
Hadoop
Kasam Sharif154 visualizações
Hadoop and big data por Sharad Pandey
Hadoop and big dataHadoop and big data
Hadoop and big data
Sharad Pandey559 visualizações
Cloud Computing: Hadoop por darugar
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoop
darugar6.9K visualizações
Hadoop: The elephant in the room por cacois
Hadoop: The elephant in the roomHadoop: The elephant in the room
Hadoop: The elephant in the room
cacois833 visualizações
PPT on Hadoop por Shubham Parmar
PPT on HadoopPPT on Hadoop
PPT on Hadoop
Shubham Parmar23.1K visualizações
Hadoop technology por tipanagiriharika
Hadoop technologyHadoop technology
Hadoop technology
tipanagiriharika2.9K visualizações
2. hadoop fundamentals por Lokesh Ramaswamy
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
Lokesh Ramaswamy727 visualizações
Seminar Presentation Hadoop por Varun Narang
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang81.5K visualizações
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO... por AyeeshaParveen
Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
AyeeshaParveen40 visualizações
Hadoop: Distributed data processing por royans
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processing
royans1.7K visualizações
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours... por AyeeshaParveen
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
AyeeshaParveen44 visualizações
hadoop por swatic018
hadoophadoop
hadoop
swatic018169 visualizações
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women por maharajothip1
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
maharajothip119 visualizações

Similar a Big data and hadoop

getFamiliarWithHadoop por
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoopAmirReza Mohammadi
54 visualizações54 slides
hadoop por
hadoophadoop
hadoopswatic018
96 visualizações29 slides
Cppt Hadoop por
Cppt HadoopCppt Hadoop
Cppt Hadoopchunkypandey12
27 visualizações31 slides
Cppt por
CpptCppt
Cpptchunkypandey12
126 visualizações31 slides
Cppt por
CpptCppt
Cpptchunkypandey12
147 visualizações31 slides
Apache hadoop basics por
Apache hadoop basicsApache hadoop basics
Apache hadoop basicssaili mane
1.1K visualizações20 slides

Similar a Big data and hadoop(20)

getFamiliarWithHadoop por AmirReza Mohammadi
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
AmirReza Mohammadi54 visualizações
hadoop por swatic018
hadoophadoop
hadoop
swatic01896 visualizações
Cppt Hadoop por chunkypandey12
Cppt HadoopCppt Hadoop
Cppt Hadoop
chunkypandey1227 visualizações
Cppt por chunkypandey12
CpptCppt
Cppt
chunkypandey12126 visualizações
Cppt por chunkypandey12
CpptCppt
Cppt
chunkypandey12147 visualizações
Apache hadoop basics por saili mane
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
saili mane1.1K visualizações
Hadoop hive presentation por Arvind Kumar
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
Arvind Kumar5.5K visualizações
Apache hadoop, hdfs and map reduce Overview por Nisanth Simon
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon961 visualizações
Introduction to HDFS and MapReduce por Derek Chen
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
Derek Chen105 visualizações
Introduccion a Hadoop / Introduction to Hadoop por GERARDO BARBERENA
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
GERARDO BARBERENA401 visualizações
Big Data and Hadoop por Mr. Ankit
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Mr. Ankit253 visualizações
Managing Big data with Hadoop por Nalini Mehta
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
Nalini Mehta787 visualizações
Hadoop por chandinisanz
HadoopHadoop
Hadoop
chandinisanz74 visualizações
Big data Hadoop por Ayyappan Paramesh
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
Ayyappan Paramesh468 visualizações
Big Data: An Overview por C. Scyphers
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
C. Scyphers14.7K visualizações
Big data and hadoop overvew por Kunal Khanna
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
Kunal Khanna773 visualizações
Seminar_Report_hadoop por Varun Narang
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
Varun Narang12.3K visualizações
Hadoop training-in-hyderabad por sreehari orienit
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
sreehari orienit405 visualizações
Hadoop introduction , Why and What is Hadoop ? por sudhakara st
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st71.4K visualizações

Último

Piloting & Scaling Successfully With Microsoft Viva por
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft VivaRichard Harbridge
12 visualizações160 slides
6g - REPORT.pdf por
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdfLiveplex
10 visualizações23 slides
Special_edition_innovator_2023.pdf por
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdfWillDavies22
17 visualizações6 slides
ChatGPT and AI for Web Developers por
ChatGPT and AI for Web DevelopersChatGPT and AI for Web Developers
ChatGPT and AI for Web DevelopersMaximiliano Firtman
187 visualizações82 slides
Kyo - Functional Scala 2023.pdf por
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfFlavio W. Brasil
298 visualizações92 slides
Tunable Laser (1).pptx por
Tunable Laser (1).pptxTunable Laser (1).pptx
Tunable Laser (1).pptxHajira Mahmood
24 visualizações37 slides

Último(20)

Piloting & Scaling Successfully With Microsoft Viva por Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva
Richard Harbridge12 visualizações
6g - REPORT.pdf por Liveplex
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdf
Liveplex10 visualizações
Special_edition_innovator_2023.pdf por WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2217 visualizações
ChatGPT and AI for Web Developers por Maximiliano Firtman
ChatGPT and AI for Web DevelopersChatGPT and AI for Web Developers
ChatGPT and AI for Web Developers
Maximiliano Firtman187 visualizações
Kyo - Functional Scala 2023.pdf por Flavio W. Brasil
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdf
Flavio W. Brasil298 visualizações
Tunable Laser (1).pptx por Hajira Mahmood
Tunable Laser (1).pptxTunable Laser (1).pptx
Tunable Laser (1).pptx
Hajira Mahmood24 visualizações
virtual reality.pptx por G036GaikwadSnehal
virtual reality.pptxvirtual reality.pptx
virtual reality.pptx
G036GaikwadSnehal11 visualizações
From chaos to control: Managing migrations and Microsoft 365 with ShareGate! por sammart93
From chaos to control: Managing migrations and Microsoft 365 with ShareGate!From chaos to control: Managing migrations and Microsoft 365 with ShareGate!
From chaos to control: Managing migrations and Microsoft 365 with ShareGate!
sammart939 visualizações
Microsoft Power Platform.pptx por Uni Systems S.M.S.A.
Microsoft Power Platform.pptxMicrosoft Power Platform.pptx
Microsoft Power Platform.pptx
Uni Systems S.M.S.A.52 visualizações
Voice Logger - Telephony Integration Solution at Aegis por Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma31 visualizações
AMAZON PRODUCT RESEARCH.pdf por JerikkLaureta
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdf
JerikkLaureta19 visualizações
Five Things You SHOULD Know About Postman por Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman30 visualizações
The details of description: Techniques, tips, and tangents on alternative tex... por BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada126 visualizações
Data-centric AI and the convergence of data and model engineering: opportunit... por Paolo Missier
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
Paolo Missier39 visualizações
Empathic Computing: Delivering the Potential of the Metaverse por Mark Billinghurst
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the Metaverse
Mark Billinghurst476 visualizações
Igniting Next Level Productivity with AI-Infused Data Integration Workflows por Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software257 visualizações
Unit 1_Lecture 2_Physical Design of IoT.pdf por StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 visualizações

Big data and hadoop

  • 1. BIG DATA & HADOOP Presented by : Roushan Kumar Sinha Rollno:1521613031 B.Tech (I.T-2nd yr.)
  • 2. topics • What is big data. • Types of big data. • So what is the problem. • So what do we do. • Three characteristics of big data. • Google’s solution. • What is Hadoop. • HDFS • HDFS architecture. • Map reduce. • Conclusion.
  • 3. What is big data? • Lots of data(terabytes,petabytes,or exabytes) • Big data is the term for a collection of data set so large and complex that it becomes difficult to process using on-hand database management tools or our traditional data processing application. • The challenge include capture, storage, search, sharing, transfer, analysis and visualization. • System/enterprises generates huge amount of data from terabytes to petabytes of information. Single Jet engine can generate 10+terabytes of data in 30 minutes of a flight time.
  • 6. SoWhat IsThe Problem? • The transfer is about 100 mb/sec. • A standred disk is 1 terabyte. • Time to read entire disk = 10000 sec. or 3 hrs ! • Increase in processing time may not be as helpful because • Network bandwidth is now more of a limiting factor. • Physical limits processor chips have been reached.
  • 9. GOOGLE’S Solution • Google solved this problem using an algorithm called MapReduce.This algorithm divides the task into small parts and assigns those parts to many computers connected over the network, and collects the results to form the final result dataset. • Above diagram shows various commodity hardwares which could be single CPU machines or servers with higher capacity.
  • 10. What is Hadoop? • Apache Hadoop is a framework that allows for the distributed processing of large data set across cluster of commodity computers using a simple programming model. • It is an open source data management with scale out storage & distributed processing.
  • 12. Hadoop Distibuted File System (HDFS) •Hadoop File System was developed using distributed file system design. It is run on commodity hardware. Unlike other distributed systems, HDFS is highly faulttolerant and designed using low-cost hardware. •HDFS holds very large amount of data and provides easier access.To store such huge data, the files are stored across multiple machines.These files are stored in redundant fashion to rescue the system from possible data losses in case of failure. HDFS also makes applications available to parallel processing.
  • 13. Features of HDFS •It is suitable for the distributed storage and processing. •Hadoop provides a command interface to interact with HDFS. •The built-in servers of namenode and datanode help users to easily check the status of cluster. •Streaming access to file system data. •HDFS provides file permissions and authentication.
  • 15. Goals of HDFS •Fault detection and recovery : Since HDFS includes a large number of commodity hardware, failure of components is frequent.Therefore HDFS should have mechanisms for quick and automatic fault detection and recovery. •Huge datasets : HDFS should have hundreds of nodes per cluster to manage the applications having huge datasets. •Hardware at data : A requested task can be done efficiently, when the computation takes place near the data. Especially where huge datasets are involved, it reduces the network traffic and increases the throughput.
  • 17. How Map Reduce works • A map reduce job splits the data-set into independent chunks which are processed by the map tasks in a completely parallel manner. • The framework sorts the outputs of the maps which are then input to the reduce task. • Typically boyh he input and the output of the job are stored in a file system.The framework takes care of scheduling tasks monitoring the and re-executes the failed tasks. • Hadoop runs the job by dividing it into tasks,of which they are two types: map tasks and reduce tasks.