SlideShare a Scribd company logo
1 of 23
Download to read offline
Hadoop/MapReduce
Computing Paradigm
1
Special Topics in DBs
Large-Scale Data Management
Large-Scale Data Analytics
2
 MapReduce computing paradigm (E.g., Hadoop) vs.Traditional database
systems
Database
vs.
 Many enterprises are turning to Hadoop
 Especially applications generating big data
 Web applications, social networks, scientific applications
www.kellytechno.com
Why Hadoop is able to compete?
3
Scalability (petabytes of data, thousands
of machines)
Database
vs.
Flexibility in accepting all data formats
(no schema)
Commodity inexpensive hardware
Efficient and simple fault-tolerant
mechanism
Performance (tons of indexing, tuning,
data organization tech.)
Features:
- Provenance tracking
-Annotation management
- ….
www.kellytechno.com
What is Hadoop
4
 Hadoop is a software framework for distributed processing of large
datasets across large clusters of computers
 Large datasets Terabytes or petabytes of data
 Large clusters  hundreds or thousands of nodes
 Hadoop is open-source implementation for Google MapReduce
 Hadoop is based on a simple programming model called
MapReduce
 Hadoop is based on a simple data model, any data will fit
www.kellytechno.com
What is Hadoop (Cont’d)
5
 Hadoop framework consists on two main layers
 Distributed file system (HDFS)
 Execution engine (MapReduce)
www.kellytechno.com
Hadoop Master/Slave Architecture
6
 Hadoop is designed as a master-slave shared-nothing architecture
Master node (single node)
Many slave nodes
www.kellytechno.com
Design Principles of Hadoop
7
 Need to process big data
 Need to parallelize computation across thousands of nodes
 Commodity hardware
 Large number of low-end cheap machines working in parallel to
solve a computing problem
 This is in contrast to Parallel DBs
 Small number of high-end expensive machines
www.kellytechno.com
Design Principles of Hadoop
8
 Automatic parallelization & distribution
 Hidden from the end-user
 Fault tolerance and automatic recovery
 Nodes/tasks will fail and will recover automatically
 Clean and simple programming abstraction
 Users only provide two functions “map” and “reduce”
www.kellytechno.com
How Uses MapReduce/Hadoop
9
 Google: Inventors of MapReduce computing paradigm
 Yahoo: Developing Hadoop open-source of MapReduce
 IBM, Microsoft, Oracle
 Facebook,Amazon,AOL, NetFlex
 Many others + universities and research labs
www.kellytechno.com
Hadoop: How it Works
10 www.kellytechno.com
Hadoop Architecture
11
Master node (single node)
Many slave nodes
• Distributed file system (HDFS)
• Execution engine (MapReduce)
www.kellytechno.com
Hadoop Distributed File System (HDFS)
12
Centralized namenode
- Maintains metadata info about files
Many datanode (1000s)
- Store the actual data
- Files are divided into blocks
- Each block is replicated N times
(Default = 3)
File F 1 2 3 4 5
Blocks (64 MB)
www.kellytechno.com
Main Properties of HDFS
13
 Large: A HDFS instance may consist of thousands
of server machines, each storing part of the file
system’s data
 Replication: Each data block is replicated many
times (default is 3)
 Failure: Failure is the norm rather than exception
 Fault Tolerance: Detection of faults and quick,
automatic recovery from them is a core architectural
goal of HDFS
 Namenode is consistently checking Datanodes
www.kellytechno.com
Map-Reduce Execution Engine
(Example: Color Count)
14
Shuffle & Sorting
based on k
Reduce
Reduce
Reduce
Map
Map
Map
Map
Input blocks on
HDFS
Produces (k, v)
( , 1)
Parse-hash
Parse-hash
Parse-hash
Parse-hash
Consumes(k, [v])
( , [1,1,1,1,1,1..])
Produces(k’, v’)
( , 100)
Users only provide the“Map”and“Reduce”functions
www.kellytechno.com
Properties of MapReduce Engine
15
 JobTracker is the master node (runs with the namenode)
 Receives the user’s job
 Decides on how many tasks will run (number of mappers)
 Decides on where to run each mapper (concept of locality)
• This file has 5 Blocks  run 5 map tasks
• Where to run the task reading block “1”
• Try to run it on Node 1 or Node 3
Node 1 Node 2 Node 3
www.kellytechno.com
Properties of MapReduce Engine (Cont’d)
16
 TaskTracker is the slave node (runs on each datanode)
 Receives the task from JobTracker
 Runs the task until completion (either map or reduce task)
 Always in communication with the JobTracker reporting progress
Reduce
Reduce
Reduce
Map
Map
Map
Map
Parse-hash
Parse-hash
Parse-hash
Parse-hash
In this example,1 map-reduce job consists
of 4 map tasks and 3 reduce tasks
www.kellytechno.com
Key-Value Pairs
17
 Mappers and Reducers are users’ code (provided functions)
 Just need to obey the Key-Value pairs interface
 Mappers:
 Consume <key, value> pairs
 Produce <key, value> pairs
 Reducers:
 Consume <key, <list of values>>
 Produce <key, value>
 Shuffling and Sorting:
 Hidden phase between mappers and reducers
 Groups all similar keys from all mappers, sorts and passes them to a
certain reducer in the form of <key, <list of values>>
www.kellytechno.com
MapReduce Phases
18
Deciding on what will be the key and what will be the value  developer’s
responsibility
www.kellytechno.com
Example 1: Word Count
19
 Job: Count the occurrences of each word in a data set
Map
Tasks
Reduce
Tasks
www.kellytechno.com
Example 2: Color Count
20
Shuffle & Sorting
based on k
Reduce
Reduce
Reduce
Map
Map
Map
Map
Input blocks on
HDFS
Produces (k, v)
( , 1)
Parse-hash
Parse-hash
Parse-hash
Parse-hash
Consumes(k, [v])
( , [1,1,1,1,1,1..])
Produces(k’, v’)
( , 100)
Job: Count the number of each color in a data set
Part0003
Part0002
Part0001
That’s the output file, it has 3
parts on probably 3 different
machines
www.kellytechno.com
Example 3: Color Filter
21
Job: Select only the blue and the green colors
Input blocks on
HDFS
Map
Map
Map
Map
Produces (k, v)
( , 1)
Write to HDFS
Write to HDFS
Write to HDFS
Write to HDFS
• Each map task will select only the
blue or green colors
• No need for reduce phase
Part0001
Part0002
Part0003
Part0004
That’s the output file, it has 4
parts on probably 4 different
machines
www.kellytechno.com
Bigger Picture: Hadoop vs. Other Systems
22
Distributed Databases Hadoop
Computing Model - Notion of transactions
- Transaction is the unit of work
- ACID properties, Concurrency control
- Notion of jobs
- Job is the unit of work
- No concurrency control
Data Model - Structured data with known schema
- Read/Write mode
- Any data will fit in any format
- (un)(semi)structured
- ReadOnly mode
Cost Model - Expensive servers - Cheap commodity machines
FaultTolerance - Failures are rare
- Recovery mechanisms
- Failures are common over thousands of
machines
- Simple yet efficient fault tolerance
Key Characteristics - Efficiency, optimizations, fine-tuning - Scalability, flexibility, fault tolerance
• Cloud Computing
• A computing model where any computing infrastructure can run on
the cloud
• Hardware & Software are provided as remote services
• Elastic: grows and shrinks based on the user’s demand
• Example:Amazon EC2
www.kellytechno.com
23
ThankYou
Presented By

More Related Content

What's hot

HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practicesHadoop User Group
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012Joydeep Sen Sarma
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemRajkumar Singh
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015Joydeep Sen Sarma
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
An intriduction to hive
An intriduction to hiveAn intriduction to hive
An intriduction to hiveReza Ameri
 

What's hot (20)

HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practices
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
SQOOP - RDBMS to Hadoop
SQOOP - RDBMS to HadoopSQOOP - RDBMS to Hadoop
SQOOP - RDBMS to Hadoop
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
October 2014 HUG : Hive On Spark
October 2014 HUG : Hive On SparkOctober 2014 HUG : Hive On Spark
October 2014 HUG : Hive On Spark
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hadoop
HadoopHadoop
Hadoop
 
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
An intriduction to hive
An intriduction to hiveAn intriduction to hive
An intriduction to hive
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
 
6.hive
6.hive6.hive
6.hive
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 

Viewers also liked

Day 1 1505 - 1550 - pearl 1 - vimal kumar khanna
Day 1   1505 - 1550 - pearl 1 - vimal kumar khannaDay 1   1505 - 1550 - pearl 1 - vimal kumar khanna
Day 1 1505 - 1550 - pearl 1 - vimal kumar khannaPMI2011
 
Sync is hard: building offline-first Android apps from the ground up
Sync is hard: building offline-first Android apps from the ground up	Sync is hard: building offline-first Android apps from the ground up
Sync is hard: building offline-first Android apps from the ground up droidcon Dubai
 
Architecture Patterns - Open Discussion
Architecture Patterns - Open DiscussionArchitecture Patterns - Open Discussion
Architecture Patterns - Open DiscussionNguyen Tung
 
Logging Application Behavior to MongoDB
Logging Application Behavior to MongoDBLogging Application Behavior to MongoDB
Logging Application Behavior to MongoDBRobert Stewart
 
Facebook architecture presentation: scalability challenge
Facebook architecture presentation: scalability challengeFacebook architecture presentation: scalability challenge
Facebook architecture presentation: scalability challengeCristina Munoz
 
Facebook Architecture - Breaking it Open
Facebook Architecture - Breaking it OpenFacebook Architecture - Breaking it Open
Facebook Architecture - Breaking it OpenHARMAN Services
 
2017 Silicon Valley Investment Trends by Edith Yeung
2017 Silicon Valley Investment Trends by Edith Yeung 2017 Silicon Valley Investment Trends by Edith Yeung
2017 Silicon Valley Investment Trends by Edith Yeung Edith Yeung
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M usersJongyoon Choi
 
The Australian Startup Stack
The Australian Startup StackThe Australian Startup Stack
The Australian Startup StackStripe
 
Scalable JavaScript Application Architecture
Scalable JavaScript Application ArchitectureScalable JavaScript Application Architecture
Scalable JavaScript Application ArchitectureNicholas Zakas
 
深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具Amazon Web Services
 
Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine LearningMemory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine LearningWes McKinney
 
Mapa procesos pmbok 5
Mapa procesos pmbok 5Mapa procesos pmbok 5
Mapa procesos pmbok 5Pedro Arcas
 
Event Driven Architecture
Event Driven ArchitectureEvent Driven Architecture
Event Driven ArchitectureStefan Norberg
 
Solution Architecture – Approach to Rapidly Scoping The Initial Solution Options
Solution Architecture – Approach to Rapidly Scoping The Initial Solution OptionsSolution Architecture – Approach to Rapidly Scoping The Initial Solution Options
Solution Architecture – Approach to Rapidly Scoping The Initial Solution OptionsAlan McSweeney
 
Structured Approach to Solution Architecture
Structured Approach to Solution ArchitectureStructured Approach to Solution Architecture
Structured Approach to Solution ArchitectureAlan McSweeney
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkVolker Hirsch
 

Viewers also liked (18)

Hadoop
HadoopHadoop
Hadoop
 
Day 1 1505 - 1550 - pearl 1 - vimal kumar khanna
Day 1   1505 - 1550 - pearl 1 - vimal kumar khannaDay 1   1505 - 1550 - pearl 1 - vimal kumar khanna
Day 1 1505 - 1550 - pearl 1 - vimal kumar khanna
 
Sync is hard: building offline-first Android apps from the ground up
Sync is hard: building offline-first Android apps from the ground up	Sync is hard: building offline-first Android apps from the ground up
Sync is hard: building offline-first Android apps from the ground up
 
Architecture Patterns - Open Discussion
Architecture Patterns - Open DiscussionArchitecture Patterns - Open Discussion
Architecture Patterns - Open Discussion
 
Logging Application Behavior to MongoDB
Logging Application Behavior to MongoDBLogging Application Behavior to MongoDB
Logging Application Behavior to MongoDB
 
Facebook architecture presentation: scalability challenge
Facebook architecture presentation: scalability challengeFacebook architecture presentation: scalability challenge
Facebook architecture presentation: scalability challenge
 
Facebook Architecture - Breaking it Open
Facebook Architecture - Breaking it OpenFacebook Architecture - Breaking it Open
Facebook Architecture - Breaking it Open
 
2017 Silicon Valley Investment Trends by Edith Yeung
2017 Silicon Valley Investment Trends by Edith Yeung 2017 Silicon Valley Investment Trends by Edith Yeung
2017 Silicon Valley Investment Trends by Edith Yeung
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M users
 
The Australian Startup Stack
The Australian Startup StackThe Australian Startup Stack
The Australian Startup Stack
 
Scalable JavaScript Application Architecture
Scalable JavaScript Application ArchitectureScalable JavaScript Application Architecture
Scalable JavaScript Application Architecture
 
深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具
 
Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine LearningMemory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine Learning
 
Mapa procesos pmbok 5
Mapa procesos pmbok 5Mapa procesos pmbok 5
Mapa procesos pmbok 5
 
Event Driven Architecture
Event Driven ArchitectureEvent Driven Architecture
Event Driven Architecture
 
Solution Architecture – Approach to Rapidly Scoping The Initial Solution Options
Solution Architecture – Approach to Rapidly Scoping The Initial Solution OptionsSolution Architecture – Approach to Rapidly Scoping The Initial Solution Options
Solution Architecture – Approach to Rapidly Scoping The Initial Solution Options
 
Structured Approach to Solution Architecture
Structured Approach to Solution ArchitectureStructured Approach to Solution Architecture
Structured Approach to Solution Architecture
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of Work
 

Similar to Hadoop trainting in hyderabad@kelly technologies

Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesKelly Technologies
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreKelly Technologies
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overviewharithakannan
 
Meethadoop
MeethadoopMeethadoop
MeethadoopIIIT-H
 
Hadoop and Mapreduce Introduction
Hadoop and Mapreduce IntroductionHadoop and Mapreduce Introduction
Hadoop and Mapreduce Introductionrajsandhu1989
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Hadoop: A distributed framework for Big Data
Hadoop: A distributed framework for Big DataHadoop: A distributed framework for Big Data
Hadoop: A distributed framework for Big DataDhanashri Yadav
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopApache Apex
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Andrey Vykhodtsev
 
Processing Big Data: An Introduction to Data Intensive Computing
Processing Big Data: An Introduction to Data Intensive ComputingProcessing Big Data: An Introduction to Data Intensive Computing
Processing Big Data: An Introduction to Data Intensive ComputingCollin Bennett
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalabilityWANdisco Plc
 
Hadoop online-training
Hadoop online-trainingHadoop online-training
Hadoop online-trainingGeohedrick
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 

Similar to Hadoop trainting in hyderabad@kelly technologies (20)

Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Hadoop and Mapreduce Introduction
Hadoop and Mapreduce IntroductionHadoop and Mapreduce Introduction
Hadoop and Mapreduce Introduction
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Hadoop: A distributed framework for Big Data
Hadoop: A distributed framework for Big DataHadoop: A distributed framework for Big Data
Hadoop: A distributed framework for Big Data
 
ch02-mapreduce.pptx
ch02-mapreduce.pptxch02-mapreduce.pptx
ch02-mapreduce.pptx
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
hadoop
hadoophadoop
hadoop
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
Processing Big Data: An Introduction to Data Intensive Computing
Processing Big Data: An Introduction to Data Intensive ComputingProcessing Big Data: An Introduction to Data Intensive Computing
Processing Big Data: An Introduction to Data Intensive Computing
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalability
 
Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
 
Hadoop online-training
Hadoop online-trainingHadoop online-training
Hadoop online-training
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 

More from Kelly Technologies

Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadKelly Technologies
 
Data science institutes in hyderabad
Data science institutes in hyderabadData science institutes in hyderabad
Data science institutes in hyderabadKelly Technologies
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabadKelly Technologies
 
Hadoop training institute in hyderabad
Hadoop training institute in hyderabadHadoop training institute in hyderabad
Hadoop training institute in hyderabadKelly Technologies
 
Hadoop institutes in hyderabad
Hadoop institutes in hyderabadHadoop institutes in hyderabad
Hadoop institutes in hyderabadKelly Technologies
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesKelly Technologies
 
Websphere mb training in hyderabad
Websphere mb training in hyderabadWebsphere mb training in hyderabad
Websphere mb training in hyderabadKelly Technologies
 
Oracle training-institutes-in-hyderabad
Oracle training-institutes-in-hyderabadOracle training-institutes-in-hyderabad
Oracle training-institutes-in-hyderabadKelly Technologies
 
Hadoop training institutes in bangalore
Hadoop training institutes in bangaloreHadoop training institutes in bangalore
Hadoop training institutes in bangaloreKelly Technologies
 
Hadoop training institute in bangalore
Hadoop training institute in bangaloreHadoop training institute in bangalore
Hadoop training institute in bangaloreKelly Technologies
 
Salesforce crm-training-in-bangalore
Salesforce crm-training-in-bangaloreSalesforce crm-training-in-bangalore
Salesforce crm-training-in-bangaloreKelly Technologies
 
Qlikview training in hyderabad
Qlikview training in hyderabadQlikview training in hyderabad
Qlikview training in hyderabadKelly Technologies
 
Project Management Planning training in hyderabad
Project Management Planning training in hyderabadProject Management Planning training in hyderabad
Project Management Planning training in hyderabadKelly Technologies
 

More from Kelly Technologies (20)

Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science institutes in hyderabad
Data science institutes in hyderabadData science institutes in hyderabad
Data science institutes in hyderabad
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabad
 
Hadoop training institute in hyderabad
Hadoop training institute in hyderabadHadoop training institute in hyderabad
Hadoop training institute in hyderabad
 
Hadoop institutes in hyderabad
Hadoop institutes in hyderabadHadoop institutes in hyderabad
Hadoop institutes in hyderabad
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
 
Sas training in hyderabad
Sas training in hyderabadSas training in hyderabad
Sas training in hyderabad
 
Websphere mb training in hyderabad
Websphere mb training in hyderabadWebsphere mb training in hyderabad
Websphere mb training in hyderabad
 
Oracle training-institutes-in-hyderabad
Oracle training-institutes-in-hyderabadOracle training-institutes-in-hyderabad
Oracle training-institutes-in-hyderabad
 
Hadoop training institutes in bangalore
Hadoop training institutes in bangaloreHadoop training institutes in bangalore
Hadoop training institutes in bangalore
 
Hadoop training institute in bangalore
Hadoop training institute in bangaloreHadoop training institute in bangalore
Hadoop training institute in bangalore
 
Tableau training in bangalore
Tableau training in bangaloreTableau training in bangalore
Tableau training in bangalore
 
Salesforce crm-training-in-bangalore
Salesforce crm-training-in-bangaloreSalesforce crm-training-in-bangalore
Salesforce crm-training-in-bangalore
 
Oracle training in hyderabad
Oracle training in hyderabadOracle training in hyderabad
Oracle training in hyderabad
 
Qlikview training in hyderabad
Qlikview training in hyderabadQlikview training in hyderabad
Qlikview training in hyderabad
 
Spark training-in-bangalore
Spark training-in-bangaloreSpark training-in-bangalore
Spark training-in-bangalore
 
Project Management Planning training in hyderabad
Project Management Planning training in hyderabadProject Management Planning training in hyderabad
Project Management Planning training in hyderabad
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 
Oracle training in_hyderabad
Oracle training in_hyderabadOracle training in_hyderabad
Oracle training in_hyderabad
 

Recently uploaded

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 

Recently uploaded (20)

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 

Hadoop trainting in hyderabad@kelly technologies

  • 1. Hadoop/MapReduce Computing Paradigm 1 Special Topics in DBs Large-Scale Data Management
  • 2. Large-Scale Data Analytics 2  MapReduce computing paradigm (E.g., Hadoop) vs.Traditional database systems Database vs.  Many enterprises are turning to Hadoop  Especially applications generating big data  Web applications, social networks, scientific applications www.kellytechno.com
  • 3. Why Hadoop is able to compete? 3 Scalability (petabytes of data, thousands of machines) Database vs. Flexibility in accepting all data formats (no schema) Commodity inexpensive hardware Efficient and simple fault-tolerant mechanism Performance (tons of indexing, tuning, data organization tech.) Features: - Provenance tracking -Annotation management - …. www.kellytechno.com
  • 4. What is Hadoop 4  Hadoop is a software framework for distributed processing of large datasets across large clusters of computers  Large datasets Terabytes or petabytes of data  Large clusters  hundreds or thousands of nodes  Hadoop is open-source implementation for Google MapReduce  Hadoop is based on a simple programming model called MapReduce  Hadoop is based on a simple data model, any data will fit www.kellytechno.com
  • 5. What is Hadoop (Cont’d) 5  Hadoop framework consists on two main layers  Distributed file system (HDFS)  Execution engine (MapReduce) www.kellytechno.com
  • 6. Hadoop Master/Slave Architecture 6  Hadoop is designed as a master-slave shared-nothing architecture Master node (single node) Many slave nodes www.kellytechno.com
  • 7. Design Principles of Hadoop 7  Need to process big data  Need to parallelize computation across thousands of nodes  Commodity hardware  Large number of low-end cheap machines working in parallel to solve a computing problem  This is in contrast to Parallel DBs  Small number of high-end expensive machines www.kellytechno.com
  • 8. Design Principles of Hadoop 8  Automatic parallelization & distribution  Hidden from the end-user  Fault tolerance and automatic recovery  Nodes/tasks will fail and will recover automatically  Clean and simple programming abstraction  Users only provide two functions “map” and “reduce” www.kellytechno.com
  • 9. How Uses MapReduce/Hadoop 9  Google: Inventors of MapReduce computing paradigm  Yahoo: Developing Hadoop open-source of MapReduce  IBM, Microsoft, Oracle  Facebook,Amazon,AOL, NetFlex  Many others + universities and research labs www.kellytechno.com
  • 10. Hadoop: How it Works 10 www.kellytechno.com
  • 11. Hadoop Architecture 11 Master node (single node) Many slave nodes • Distributed file system (HDFS) • Execution engine (MapReduce) www.kellytechno.com
  • 12. Hadoop Distributed File System (HDFS) 12 Centralized namenode - Maintains metadata info about files Many datanode (1000s) - Store the actual data - Files are divided into blocks - Each block is replicated N times (Default = 3) File F 1 2 3 4 5 Blocks (64 MB) www.kellytechno.com
  • 13. Main Properties of HDFS 13  Large: A HDFS instance may consist of thousands of server machines, each storing part of the file system’s data  Replication: Each data block is replicated many times (default is 3)  Failure: Failure is the norm rather than exception  Fault Tolerance: Detection of faults and quick, automatic recovery from them is a core architectural goal of HDFS  Namenode is consistently checking Datanodes www.kellytechno.com
  • 14. Map-Reduce Execution Engine (Example: Color Count) 14 Shuffle & Sorting based on k Reduce Reduce Reduce Map Map Map Map Input blocks on HDFS Produces (k, v) ( , 1) Parse-hash Parse-hash Parse-hash Parse-hash Consumes(k, [v]) ( , [1,1,1,1,1,1..]) Produces(k’, v’) ( , 100) Users only provide the“Map”and“Reduce”functions www.kellytechno.com
  • 15. Properties of MapReduce Engine 15  JobTracker is the master node (runs with the namenode)  Receives the user’s job  Decides on how many tasks will run (number of mappers)  Decides on where to run each mapper (concept of locality) • This file has 5 Blocks  run 5 map tasks • Where to run the task reading block “1” • Try to run it on Node 1 or Node 3 Node 1 Node 2 Node 3 www.kellytechno.com
  • 16. Properties of MapReduce Engine (Cont’d) 16  TaskTracker is the slave node (runs on each datanode)  Receives the task from JobTracker  Runs the task until completion (either map or reduce task)  Always in communication with the JobTracker reporting progress Reduce Reduce Reduce Map Map Map Map Parse-hash Parse-hash Parse-hash Parse-hash In this example,1 map-reduce job consists of 4 map tasks and 3 reduce tasks www.kellytechno.com
  • 17. Key-Value Pairs 17  Mappers and Reducers are users’ code (provided functions)  Just need to obey the Key-Value pairs interface  Mappers:  Consume <key, value> pairs  Produce <key, value> pairs  Reducers:  Consume <key, <list of values>>  Produce <key, value>  Shuffling and Sorting:  Hidden phase between mappers and reducers  Groups all similar keys from all mappers, sorts and passes them to a certain reducer in the form of <key, <list of values>> www.kellytechno.com
  • 18. MapReduce Phases 18 Deciding on what will be the key and what will be the value  developer’s responsibility www.kellytechno.com
  • 19. Example 1: Word Count 19  Job: Count the occurrences of each word in a data set Map Tasks Reduce Tasks www.kellytechno.com
  • 20. Example 2: Color Count 20 Shuffle & Sorting based on k Reduce Reduce Reduce Map Map Map Map Input blocks on HDFS Produces (k, v) ( , 1) Parse-hash Parse-hash Parse-hash Parse-hash Consumes(k, [v]) ( , [1,1,1,1,1,1..]) Produces(k’, v’) ( , 100) Job: Count the number of each color in a data set Part0003 Part0002 Part0001 That’s the output file, it has 3 parts on probably 3 different machines www.kellytechno.com
  • 21. Example 3: Color Filter 21 Job: Select only the blue and the green colors Input blocks on HDFS Map Map Map Map Produces (k, v) ( , 1) Write to HDFS Write to HDFS Write to HDFS Write to HDFS • Each map task will select only the blue or green colors • No need for reduce phase Part0001 Part0002 Part0003 Part0004 That’s the output file, it has 4 parts on probably 4 different machines www.kellytechno.com
  • 22. Bigger Picture: Hadoop vs. Other Systems 22 Distributed Databases Hadoop Computing Model - Notion of transactions - Transaction is the unit of work - ACID properties, Concurrency control - Notion of jobs - Job is the unit of work - No concurrency control Data Model - Structured data with known schema - Read/Write mode - Any data will fit in any format - (un)(semi)structured - ReadOnly mode Cost Model - Expensive servers - Cheap commodity machines FaultTolerance - Failures are rare - Recovery mechanisms - Failures are common over thousands of machines - Simple yet efficient fault tolerance Key Characteristics - Efficiency, optimizations, fine-tuning - Scalability, flexibility, fault tolerance • Cloud Computing • A computing model where any computing infrastructure can run on the cloud • Hardware & Software are provided as remote services • Elastic: grows and shrinks based on the user’s demand • Example:Amazon EC2 www.kellytechno.com