Anju

GLOBAL INSTITUTE OF TECHNOLOGY
A
SEMINAR PROJECT
ON
BIG DATA and HADOOP
Submitted By:
Anju shekhawat
Submitted To:
Miss. Nishi Sharma
Introduction
• Apache Hadoop is an open-source software framework
for distributed storage and distributed processing of Big
Data on clusters of commodity Hardware.
• Process the data using simple programming model.
• Hadoop Distributed File System (HDFS) splits files into
large blocks (default 64MB or 128MB) and distributes
the blocks amongst the nodes in the cluster.
Origin of Apache Hadoop
•The origin of Apache Hadoop Projects is from
Google White paper series on Big Table,MapReduce
& GFS.
•Later on Yahoo & many other contributors
implements Google’s White paper.
•Doug Cutting, Hadoop’s creator, named the
framework after his child's stuffed toy elephant.
Keyword Behind Hadoop Is Big Data
1. Bigdata is the term for the collection of datasets so large and
complex thats difficult to process using traditional data
processing application.
2. Lots of data in Terabytes or Petabytes.
Characteristics of Big Data
3V's of Data
Types of Data

Un-Structured Data : PDF, Word, Text, Email Body
Data.

Semi-Structured Data : XML File Data.

Structured Data: RDBMS Data.
Big Data Challenges Hadoop Resolve
Big data brings with it two fundamental challenges: how
to store and work with voluminous data sizes, and more
important, how to understand data and turn it into a
competitive advantage.
Hadoop fills a gap in the market by effectively storing
and providing computational capabilities over substantial
amounts of data.
Hadoop??
The Apache Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of computers using simple
programming models.
It is designed to scale up from single servers to thousands of machines, each
offering local computation and storage. Rather than rely on hardware to deliver
high-availability.
Compaines using Hadoop:
1. Google
2. Yahoo
3. Amazon
4. Facebook
5. Twitter
6. IBM
7. Rackspace
and lots more...
Hadoop Core Components:
HDFS : Hadoop Distibuted File System. (Storage)
MapReduce : Programming Model. (Processing)
Hadoop Distributed File System(HDFS)
Hadoop Distributed File System (HDFS) is a distributed filesystem
designed to hold very large volume of data.It is a block-structured file
system where
•Individual files are broken into blocks of fixed size.
•These blocks are stored across a cluster of one or more machines with
data storage capacity.
•Individual machines in the cluster are referred to as DataNodes.
Components of HDFS
• Name Node
1. Master of the system.
2. Maintains and manage the blocks which are present on the
'Data Nodes'.
• Data Node
1. Slaves which are deployed on each machines and provide
actual storage.
2. Responsible for serving read and write requests for the
Clients.
• Backup Node
This is responsible for performing periodic checkpoints.
HDFS Architecture
Map Reduce
• MapReduce is a programming model.
• Programs written in this functional style are
automatically parallelized and executed on a large
cluster of commodity machines.
• MapReduce is an associated implementation for
processing and generating large data sets.
• The role of the programmer is to define map and
reduce functions, where the map function outputs
key/value tuples, which are processed by reduce
functions to produce the final output.
Map Reduce Procedure
MapReduce
MAP
map function that
processes a key/value
pair to generate a set of
intermediate key/value
pairs
REDUCE and a reduce function
that merges all
intermediate values
associated with the same
intermediate key.
Components of Map Reduce
• JobTracker
It is the service in Hadoop which send map reduce
tasks to specific nodes in the cluster.
• TaskTracker
TaskTracker are the slaves which are deployed on each
machine. They are responsible for running the map
and reduce tasks as instructed by JobTracker.
Job Tracker and Task Tracker
Map Reduce Working
• A Map-Reduce job usually splits the input data-set into independent
chunks which are processed by the map tasks in a completely parallel
manner.
• The framework sorts the outputs of the maps, which are then input to
the reduce tasks.
• Typically both the input and the output of the job are stored in a file-
system. The framework takes care of scheduling tasks, monitoring
them and re-executes the failed tasks.
• A MapReduce job is a unit of work that the client wants to be
performed: it consists of the input data, the MapReduce program, and
configuration information. Hadoop runs the job by dividing it into
tasks, of which there are two types: map tasks and reduce tasks.
Map Reduce Process
Hadoop hosted in the cloud

Amazon Elastic MapReduce

Hadoop on Microsoft Azure

Hadoop on Open Stack Instances

Hadoop on Google Cloud Platform

Hadoop on Cloud era
Features of Hadoop
• Scalable
• Cost effective
• Flexible
• Reliable & Fault Tolerant
Future scope
• Apache Hadoop's MapReduce and HDFS components originally
derived respectively from Google's MapReduce and Google File
System (GFS) papers. By the above description we can understand
the need of Big Data in future, So Hadoop can be the best of
maintenance and efficient implementation of large data.
• This technology has bright future scope because day by day need of
data would increase and security issues also major point. In now a
days many Multinational organizations are prefer Hadoop over
RDBMS.
• So major companies like Facebook, amazon, yahoo & LinkedIn etc.
are adapting Hadoop and in future there can be many names in the
list.
• Hence Hadoop Technology is the best appropriate approach for
handling the data in smart way and its future is bright.
Any Queries ???
Anju
1 de 23

Recomendados

Hadoop por
Hadoop Hadoop
Hadoop Shamama Kamal
430 visualizações15 slides
PPT on Hadoop por
PPT on HadoopPPT on Hadoop
PPT on HadoopShubham Parmar
23.1K visualizações14 slides
HADOOP TECHNOLOGY ppt por
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
4.4K visualizações22 slides
Hadoop seminar por
Hadoop seminarHadoop seminar
Hadoop seminarKrishnenduKrishh
477 visualizações49 slides
Apache Hadoop por
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
5.1K visualizações34 slides
Hadoop por
Hadoop Hadoop
Hadoop ABHIJEET RAJ
525 visualizações31 slides

Mais conteúdo relacionado

Mais procurados

Hadoop Architecture por
Hadoop ArchitectureHadoop Architecture
Hadoop ArchitectureDr. C.V. Suresh Babu
1.6K visualizações11 slides
Apache Hadoop - Big Data Engineering por
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringBADR
498 visualizações55 slides
Hadoop technology por
Hadoop technologyHadoop technology
Hadoop technologytipanagiriharika
2.9K visualizações85 slides
Big Data and Hadoop - An Introduction por
Big Data and Hadoop - An IntroductionBig Data and Hadoop - An Introduction
Big Data and Hadoop - An IntroductionNagarjuna Kanamarlapudi
1K visualizações27 slides
Big Data and Hadoop por
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
2.6K visualizações42 slides
Hadoop: Distributed Data Processing por
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingCloudera, Inc.
5.2K visualizações19 slides

Mais procurados(20)

Apache Hadoop - Big Data Engineering por BADR
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data Engineering
BADR498 visualizações
Hadoop technology por tipanagiriharika
Hadoop technologyHadoop technology
Hadoop technology
tipanagiriharika2.9K visualizações
Big Data and Hadoop por Flavio Vit
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Flavio Vit2.6K visualizações
Hadoop: Distributed Data Processing por Cloudera, Inc.
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
Cloudera, Inc.5.2K visualizações
Introduction to Big Data & Hadoop Architecture - Module 1 por Rohit Agrawal
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal1.3K visualizações
Hadoop Ecosystem por Sandip Darwade
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
Sandip Darwade2.9K visualizações
Hadoop Presentation - PPT por Anand Pandey
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
Anand Pandey1.5K visualizações
Hadoop And Their Ecosystem por sunera pathan
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan450 visualizações
Hadoop por Tuan Cuong Luu
HadoopHadoop
Hadoop
Tuan Cuong Luu988 visualizações
Introduction to Hadoop Technology por Manish Borkar
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar2.4K visualizações
Apache hadoop introduction and architecture por Harikrishnan K
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
Harikrishnan K131 visualizações
Hadoop Technology por Atul Kushwaha
Hadoop TechnologyHadoop Technology
Hadoop Technology
Atul Kushwaha2.5K visualizações
Seminar Presentation Hadoop por Varun Narang
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang81.5K visualizações
Hadoop por Nishant Gandhi
HadoopHadoop
Hadoop
Nishant Gandhi19.5K visualizações
2. hadoop fundamentals por Lokesh Ramaswamy
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
Lokesh Ramaswamy727 visualizações
Hadoop technology por Sohini~~ Music
Hadoop technologyHadoop technology
Hadoop technology
Sohini~~ Music1.2K visualizações
Hadoop Architecture por Ganesh B
Hadoop Architecture Hadoop Architecture
Hadoop Architecture
Ganesh B253 visualizações

Destaque

IBM's big data seminar programme -moving beyond Hadoop - Ian Radmore, IBM por
IBM's big data seminar programme -moving beyond Hadoop - Ian Radmore, IBMIBM's big data seminar programme -moving beyond Hadoop - Ian Radmore, IBM
IBM's big data seminar programme -moving beyond Hadoop - Ian Radmore, IBMInternet World
1.4K visualizações19 slides
Hadoop bigdata overview por
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overviewharithakannan
1.5K visualizações24 slides
Big Data Hadoop (Overview) por
Big Data Hadoop (Overview)Big Data Hadoop (Overview)
Big Data Hadoop (Overview)Rohit Srivastava
264 visualizações19 slides
Introduction to Bigdata and HADOOP por
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
1K visualizações23 slides
Big data and hadoop overvew por
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
773 visualizações62 slides
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala por
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala Desing Pathshala
1.2K visualizações75 slides

Destaque(16)

IBM's big data seminar programme -moving beyond Hadoop - Ian Radmore, IBM por Internet World
IBM's big data seminar programme -moving beyond Hadoop - Ian Radmore, IBMIBM's big data seminar programme -moving beyond Hadoop - Ian Radmore, IBM
IBM's big data seminar programme -moving beyond Hadoop - Ian Radmore, IBM
Internet World1.4K visualizações
Hadoop bigdata overview por harithakannan
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
harithakannan1.5K visualizações
Big Data Hadoop (Overview) por Rohit Srivastava
Big Data Hadoop (Overview)Big Data Hadoop (Overview)
Big Data Hadoop (Overview)
Rohit Srivastava264 visualizações
Introduction to Bigdata and HADOOP por vinoth kumar
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
vinoth kumar1K visualizações
Big data and hadoop overvew por Kunal Khanna
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
Kunal Khanna773 visualizações
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala por Desing Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Desing Pathshala1.2K visualizações
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB... por Usama Fayyad
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Usama Fayyad3.7K visualizações
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData... por Mahantesh Angadi
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Mahantesh Angadi4K visualizações
Hadoop and BigData - July 2016 por Ranjith Sekar
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
Ranjith Sekar1.2K visualizações
BigData - Hadoop -by 侯圣文@secooler por Shengwen HOU(侯圣文)
BigData - Hadoop -by 侯圣文@secooler BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler
Shengwen HOU(侯圣文)735 visualizações
Big Data & Hadoop Tutorial por Edureka!
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
Edureka!90.2K visualizações
Big data and Hadoop por Rahul Agarwal
Big data and HadoopBig data and Hadoop
Big data and Hadoop
Rahul Agarwal82.3K visualizações
Hadoop introduction , Why and What is Hadoop ? por sudhakara st
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st71.4K visualizações
What is Big Data? por Bernard Marr
What is Big Data?What is Big Data?
What is Big Data?
Bernard Marr585.3K visualizações
Big Data Analytics with Hadoop por Philippe Julio
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio441.9K visualizações
Big data ppt por Nasrin Hussain
Big  data pptBig  data ppt
Big data ppt
Nasrin Hussain560.5K visualizações

Similar a Anju

Hadoop por
HadoopHadoop
Hadoopchandinisanz
74 visualizações38 slides
Unit IV.pdf por
Unit IV.pdfUnit IV.pdf
Unit IV.pdfKennyPratheepKumar
2 visualizações21 slides
Hadoop info por
Hadoop infoHadoop info
Hadoop infoNikita Sure
281 visualizações25 slides
Apache hadoop, hdfs and map reduce Overview por
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewNisanth Simon
961 visualizações28 slides
Introduction to Hadoop and Hadoop component por
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
2.7K visualizações15 slides
Big data Analytics Hadoop por
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics HadoopMishika Bharadwaj
3.3K visualizações59 slides

Similar a Anju(20)

Hadoop por chandinisanz
HadoopHadoop
Hadoop
chandinisanz74 visualizações
Hadoop info por Nikita Sure
Hadoop infoHadoop info
Hadoop info
Nikita Sure281 visualizações
Apache hadoop, hdfs and map reduce Overview por Nisanth Simon
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon961 visualizações
Introduction to Hadoop and Hadoop component por rebeccatho
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho2.7K visualizações
Big data Analytics Hadoop por Mishika Bharadwaj
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
Mishika Bharadwaj3.3K visualizações
Cppt Hadoop por chunkypandey12
Cppt HadoopCppt Hadoop
Cppt Hadoop
chunkypandey1227 visualizações
Cppt por chunkypandey12
CpptCppt
Cppt
chunkypandey12126 visualizações
Cppt por chunkypandey12
CpptCppt
Cppt
chunkypandey12147 visualizações
Hadoop hive presentation por Arvind Kumar
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
Arvind Kumar5.5K visualizações
Seminar ppt por RajatTripathi34
Seminar pptSeminar ppt
Seminar ppt
RajatTripathi3449 visualizações
hadoop por swatic018
hadoophadoop
hadoop
swatic01896 visualizações
hadoop por swatic018
hadoophadoop
hadoop
swatic018169 visualizações
Hadoop and Big Data por Harshdeep Kaur
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
Harshdeep Kaur1.1K visualizações
Big data and hadoop por Roushan Sinha
Big data and hadoopBig data and hadoop
Big data and hadoop
Roushan Sinha32 visualizações
hadoop por Deep Mehta
hadoophadoop
hadoop
Deep Mehta294 visualizações
Hadoop a Natural Choice for Data Intensive Log Processing por Hitendra Kumar
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
Hitendra Kumar3.8K visualizações
Hadoop architecture-tutorial por vinayiqbusiness
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
vinayiqbusiness38 visualizações
Presentation por ch samaram
PresentationPresentation
Presentation
ch samaram831 visualizações
Bigdata and hadoop por Aditi Yadav
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoop
Aditi Yadav89 visualizações

Último

Transcript: The Details of Description Techniques tips and tangents on altern... por
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...BookNet Canada
119 visualizações15 slides
Future of Learning - Yap Aye Wee.pdf por
Future of Learning - Yap Aye Wee.pdfFuture of Learning - Yap Aye Wee.pdf
Future of Learning - Yap Aye Wee.pdfNUS-ISS
38 visualizações11 slides
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors por
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensorssugiuralab
11 visualizações15 slides
.conf Go 2023 - Data analysis as a routine por
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routineSplunk
90 visualizações12 slides
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze por
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng TszeDigital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng TszeNUS-ISS
19 visualizações47 slides
STPI OctaNE CoE Brochure.pdf por
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdfmadhurjyapb
12 visualizações1 slide

Último(20)

Transcript: The Details of Description Techniques tips and tangents on altern... por BookNet Canada
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...
BookNet Canada119 visualizações
Future of Learning - Yap Aye Wee.pdf por NUS-ISS
Future of Learning - Yap Aye Wee.pdfFuture of Learning - Yap Aye Wee.pdf
Future of Learning - Yap Aye Wee.pdf
NUS-ISS38 visualizações
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors por sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab11 visualizações
.conf Go 2023 - Data analysis as a routine por Splunk
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine
Splunk90 visualizações
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze por NUS-ISS
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng TszeDigital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze
NUS-ISS19 visualizações
STPI OctaNE CoE Brochure.pdf por madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb12 visualizações
Web Dev - 1 PPT.pdf por gdsczhcet
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet52 visualizações
Tunable Laser (1).pptx por Hajira Mahmood
Tunable Laser (1).pptxTunable Laser (1).pptx
Tunable Laser (1).pptx
Hajira Mahmood21 visualizações
Igniting Next Level Productivity with AI-Infused Data Integration Workflows por Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software91 visualizações
Spesifikasi Lengkap ASUS Vivobook Go 14 por Dot Semarang
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14
Dot Semarang35 visualizações
How the World's Leading Independent Automotive Distributor is Reinventing Its... por NUS-ISS
How the World's Leading Independent Automotive Distributor is Reinventing Its...How the World's Leading Independent Automotive Distributor is Reinventing Its...
How the World's Leading Independent Automotive Distributor is Reinventing Its...
NUS-ISS15 visualizações
Understanding GenAI/LLM and What is Google Offering - Felix Goh por NUS-ISS
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix Goh
NUS-ISS39 visualizações
PharoJS - Zürich Smalltalk Group Meetup November 2023 por Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi113 visualizações
Report 2030 Digital Decade por Massimo Talia
Report 2030 Digital DecadeReport 2030 Digital Decade
Report 2030 Digital Decade
Massimo Talia13 visualizações
Perth MeetUp November 2023 por Michael Price
Perth MeetUp November 2023 Perth MeetUp November 2023
Perth MeetUp November 2023
Michael Price12 visualizações
AMAZON PRODUCT RESEARCH.pdf por JerikkLaureta
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdf
JerikkLaureta14 visualizações
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum... por NUS-ISS
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
NUS-ISS28 visualizações
Black and White Modern Science Presentation.pptx por maryamkhalid2916
Black and White Modern Science Presentation.pptxBlack and White Modern Science Presentation.pptx
Black and White Modern Science Presentation.pptx
maryamkhalid291614 visualizações

Anju

  • 1. GLOBAL INSTITUTE OF TECHNOLOGY A SEMINAR PROJECT ON BIG DATA and HADOOP Submitted By: Anju shekhawat Submitted To: Miss. Nishi Sharma
  • 2. Introduction • Apache Hadoop is an open-source software framework for distributed storage and distributed processing of Big Data on clusters of commodity Hardware. • Process the data using simple programming model. • Hadoop Distributed File System (HDFS) splits files into large blocks (default 64MB or 128MB) and distributes the blocks amongst the nodes in the cluster.
  • 3. Origin of Apache Hadoop •The origin of Apache Hadoop Projects is from Google White paper series on Big Table,MapReduce & GFS. •Later on Yahoo & many other contributors implements Google’s White paper. •Doug Cutting, Hadoop’s creator, named the framework after his child's stuffed toy elephant.
  • 4. Keyword Behind Hadoop Is Big Data 1. Bigdata is the term for the collection of datasets so large and complex thats difficult to process using traditional data processing application. 2. Lots of data in Terabytes or Petabytes.
  • 5. Characteristics of Big Data 3V's of Data
  • 6. Types of Data  Un-Structured Data : PDF, Word, Text, Email Body Data.  Semi-Structured Data : XML File Data.  Structured Data: RDBMS Data.
  • 7. Big Data Challenges Hadoop Resolve Big data brings with it two fundamental challenges: how to store and work with voluminous data sizes, and more important, how to understand data and turn it into a competitive advantage. Hadoop fills a gap in the market by effectively storing and providing computational capabilities over substantial amounts of data.
  • 8. Hadoop?? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability. Compaines using Hadoop: 1. Google 2. Yahoo 3. Amazon 4. Facebook 5. Twitter 6. IBM 7. Rackspace and lots more...
  • 9. Hadoop Core Components: HDFS : Hadoop Distibuted File System. (Storage) MapReduce : Programming Model. (Processing)
  • 10. Hadoop Distributed File System(HDFS) Hadoop Distributed File System (HDFS) is a distributed filesystem designed to hold very large volume of data.It is a block-structured file system where •Individual files are broken into blocks of fixed size. •These blocks are stored across a cluster of one or more machines with data storage capacity. •Individual machines in the cluster are referred to as DataNodes.
  • 11. Components of HDFS • Name Node 1. Master of the system. 2. Maintains and manage the blocks which are present on the 'Data Nodes'. • Data Node 1. Slaves which are deployed on each machines and provide actual storage. 2. Responsible for serving read and write requests for the Clients. • Backup Node This is responsible for performing periodic checkpoints.
  • 13. Map Reduce • MapReduce is a programming model. • Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. • MapReduce is an associated implementation for processing and generating large data sets. • The role of the programmer is to define map and reduce functions, where the map function outputs key/value tuples, which are processed by reduce functions to produce the final output.
  • 14. Map Reduce Procedure MapReduce MAP map function that processes a key/value pair to generate a set of intermediate key/value pairs REDUCE and a reduce function that merges all intermediate values associated with the same intermediate key.
  • 15. Components of Map Reduce • JobTracker It is the service in Hadoop which send map reduce tasks to specific nodes in the cluster. • TaskTracker TaskTracker are the slaves which are deployed on each machine. They are responsible for running the map and reduce tasks as instructed by JobTracker.
  • 16. Job Tracker and Task Tracker
  • 17. Map Reduce Working • A Map-Reduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. • The framework sorts the outputs of the maps, which are then input to the reduce tasks. • Typically both the input and the output of the job are stored in a file- system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks. • A MapReduce job is a unit of work that the client wants to be performed: it consists of the input data, the MapReduce program, and configuration information. Hadoop runs the job by dividing it into tasks, of which there are two types: map tasks and reduce tasks.
  • 19. Hadoop hosted in the cloud  Amazon Elastic MapReduce  Hadoop on Microsoft Azure  Hadoop on Open Stack Instances  Hadoop on Google Cloud Platform  Hadoop on Cloud era
  • 20. Features of Hadoop • Scalable • Cost effective • Flexible • Reliable & Fault Tolerant
  • 21. Future scope • Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers. By the above description we can understand the need of Big Data in future, So Hadoop can be the best of maintenance and efficient implementation of large data. • This technology has bright future scope because day by day need of data would increase and security issues also major point. In now a days many Multinational organizations are prefer Hadoop over RDBMS. • So major companies like Facebook, amazon, yahoo & LinkedIn etc. are adapting Hadoop and in future there can be many names in the list. • Hence Hadoop Technology is the best appropriate approach for handling the data in smart way and its future is bright.