SlideShare uma empresa Scribd logo
1 de 12
Hadoop – Big Deal !!
Author : Abhishek Kumar
+1-323-806-5474
Contents
• What is Hadoop
• Hadoop Components
• Why Hadoop
• HDFS
• HDFS Features
• When Not to use Hadoop
• HDFS Components
• DFS and HDFS
• Hadoop & Big data – Relatives !
What is Hadoop
• Conventional Definition
• Framework for Distributed Processing of Large Datasets( usually unstructured data)
across clusters of Commodity Hardware.
• Well, I Have been really bad with these bookish definitions never understood the heavy
terms used in them . So, here are some explanations –
• Distributed Processing : Spreading a heavy task across various workers/resources
to improve the time taken to deliver the task.
• Large Datasets(Unstructured data) : The data which does not have any defined
structure /format or size.
• Commodity Hardware : Hardware easily available usually with low performance
issues . These can failover anytime.
So, as of now we can say that Hadoop is nothing but a system that stores huge volume of unstructured data in a way that the data can
be accessed for reading faster.
• Fun Fact: Hadoop follows all standards, directory structure and other patters
of LINUX/UNIX. Most details easily available on “Apache” web site.
Hadoop Components
Level 0 - Hadoop
HDFS
MapReduce
Hadoop Distributed File System
Simple Programming Model
HDFS : HDFS is just a file system that serves the storage of data, in hadoop way.
MapReduce : Though termed as joint word but Map and Reduce are 2 separate
programs that helps in defining the Map for data spread in distributed
environment and reduce the complexity/volume of data sent/received or
processed.
Why Hadoop !
• So If Hadoop is another storage system then why so hype !
• Yes, Hadoop again is a Distributed File Processing system but I see something
that makes it different or in fact special “Faster I/O Processing using
commodity hardware”.
• We all know that this generation has no issue with Storage
size. We have TBs of hard drives available at home too .
But, only problem remains is accessing the huge volume of
unstructured data using low performance I/O devices we
have. This is where Hadoop enters to rescue. How !! .. We
might know that through other slides.
• Fun Fact: Hadoop is not a software which you can download and install
on your system. It is a set of tools organized to serve some specific
purpose.
HDFS
Conventional Definition : HDFS is a file system designed for storing very large files with streaming
data access patterns running clusters on commodity hardware.
Like Name Says – It is a Distributed File System following some specific protocols/standards or
techniques, we will call Hadoop way 
Map Reduce
Engine
__________
HDFS Cluster
Job Tracker
__________
Name Node
Task Tracker
____________
Data Node
Task Tracker
____________
Data Node
Task Tracker
____________
Data Node
Task Tracker
____________
Data Node
HDFS Advantages
• Fault Tolerance
• Now, if Hadoop has an important highlight in its definition i.e.
”using commodity hardware”, then we can be certain of failovers.
But Hadoop handles this failing nodes very effectively and
ensures that we do not loose any data anyway. How – read about
replication ..
• Handles large Datasets
• No doubt why companies like Facebook, Google, yahoo etc.
prefers it. So proven system for handling large data sets.
• Streaming access to File system data
• You have your “youtube” videos using this .
• High Performance
• The facts says that the processing time for data using Hadoop is
“n”-times faster, where n is “number of nodes/data nodes”.
When Not to Use Hadoop/HDFS
• For many small files used in transactions
• Low Latency data access
• When there are many people who modifies the data/files (
multiple writers) arbitrarily.
HDFS Components
Name Node
(Job Tracker)
Data Nodes
(Task Trackers)
Name Node : This component of HDFS is generally on a High Performance machine and
if we talk in layman terms, it is kind of “Index” for the data spread across several data
nodes. We can also call it metadata storage process.
Data Node : This is responsible for storing actual data. This runs as Daemon in local
machines.
Fun Fact: Daemon is a resident program that runs in background on your machine as
processes. Daemon is terminology used in UNIX. In DOS we call it TSR.
DFS and HDFS
• So, what is difference between a regular Distributed File
System and Hadoop !!
• Hadoop processes the data in local nodes and just transmits the
output to Client while in regular DFS data is brought to master node
from various nodes for processing. So quiet obvious – Hadoop has to
transfer less amount of data( just the output) over network while a
regular DFS has to transfer huge volume of data on network. This
Makes Hadoop winner for faster processing!!
• This type of processing of data on data nodes is called data
localization which is one of the important super powers of
Hadoop ..
Hadoop & Big Data – Relatives !
Relation is not very complex. Its just like simple husband-wife relation
where Hadoop comes in just to resolves issues with Big data .
In other words, Big data provides challenges for Hadoop to resolve.
Thanks !
Probably will provide more details in
next presentation 

Mais conteúdo relacionado

Mais procurados

Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
DataLogix Hadoop Solution
DataLogix Hadoop SolutionDataLogix Hadoop Solution
DataLogix Hadoop SolutionDataLogix B.V.
 
Hadoop Fundamentals
Hadoop FundamentalsHadoop Fundamentals
Hadoop Fundamentalsits_skm
 
2.introduction to hdfs
2.introduction to hdfs2.introduction to hdfs
2.introduction to hdfsdatabloginfo
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationSameer Tiwari
 
Hadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataHadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataCyanny LIANG
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaData Con LA
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi namboori
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 

Mais procurados (19)

Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
DataLogix Hadoop Solution
DataLogix Hadoop SolutionDataLogix Hadoop Solution
DataLogix Hadoop Solution
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Hadoop Fundamentals
Hadoop FundamentalsHadoop Fundamentals
Hadoop Fundamentals
 
Hadoop
Hadoop Hadoop
Hadoop
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
2.introduction to hdfs
2.introduction to hdfs2.introduction to hdfs
2.introduction to hdfs
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animation
 
Hadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataHadoop distributed computing framework for big data
Hadoop distributed computing framework for big data
 
Hadoop hdfs
Hadoop hdfsHadoop hdfs
Hadoop hdfs
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jha
 
Hadoop
HadoopHadoop
Hadoop
 
Pptx present
Pptx presentPptx present
Pptx present
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS Architecture
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 

Destaque

Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveHyderabad Scalability Meetup
 
Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1Sandeep Kunkunuru
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsCloudera, Inc.
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Cloudera, Inc.
 

Destaque (11)

Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Deep dive hadoop
Deep dive hadoopDeep dive hadoop
Deep dive hadoop
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep Dive
 
Hadoop
HadoopHadoop
Hadoop
 
HDFS Deep Dive
HDFS Deep DiveHDFS Deep Dive
HDFS Deep Dive
 
Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 

Semelhante a Hadoop – big deal

OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxAltafKhadim
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFSKavyaGo
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorialvinayiqbusiness
 
Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfSheetal Jain
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete informationbhargavi804095
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptManiMaran230751
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopMr. Ankit
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyJay Nagar
 

Semelhante a Hadoop – big deal (20)

Hadoop
HadoopHadoop
Hadoop
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Hadoop Tutorial for Beginners
Hadoop Tutorial for BeginnersHadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdf
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentation
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete information
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Anju
AnjuAnju
Anju
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 

Último

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 

Último (20)

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 

Hadoop – big deal

  • 1. Hadoop – Big Deal !! Author : Abhishek Kumar +1-323-806-5474
  • 2. Contents • What is Hadoop • Hadoop Components • Why Hadoop • HDFS • HDFS Features • When Not to use Hadoop • HDFS Components • DFS and HDFS • Hadoop & Big data – Relatives !
  • 3. What is Hadoop • Conventional Definition • Framework for Distributed Processing of Large Datasets( usually unstructured data) across clusters of Commodity Hardware. • Well, I Have been really bad with these bookish definitions never understood the heavy terms used in them . So, here are some explanations – • Distributed Processing : Spreading a heavy task across various workers/resources to improve the time taken to deliver the task. • Large Datasets(Unstructured data) : The data which does not have any defined structure /format or size. • Commodity Hardware : Hardware easily available usually with low performance issues . These can failover anytime. So, as of now we can say that Hadoop is nothing but a system that stores huge volume of unstructured data in a way that the data can be accessed for reading faster. • Fun Fact: Hadoop follows all standards, directory structure and other patters of LINUX/UNIX. Most details easily available on “Apache” web site.
  • 4. Hadoop Components Level 0 - Hadoop HDFS MapReduce Hadoop Distributed File System Simple Programming Model HDFS : HDFS is just a file system that serves the storage of data, in hadoop way. MapReduce : Though termed as joint word but Map and Reduce are 2 separate programs that helps in defining the Map for data spread in distributed environment and reduce the complexity/volume of data sent/received or processed.
  • 5. Why Hadoop ! • So If Hadoop is another storage system then why so hype ! • Yes, Hadoop again is a Distributed File Processing system but I see something that makes it different or in fact special “Faster I/O Processing using commodity hardware”. • We all know that this generation has no issue with Storage size. We have TBs of hard drives available at home too . But, only problem remains is accessing the huge volume of unstructured data using low performance I/O devices we have. This is where Hadoop enters to rescue. How !! .. We might know that through other slides. • Fun Fact: Hadoop is not a software which you can download and install on your system. It is a set of tools organized to serve some specific purpose.
  • 6. HDFS Conventional Definition : HDFS is a file system designed for storing very large files with streaming data access patterns running clusters on commodity hardware. Like Name Says – It is a Distributed File System following some specific protocols/standards or techniques, we will call Hadoop way  Map Reduce Engine __________ HDFS Cluster Job Tracker __________ Name Node Task Tracker ____________ Data Node Task Tracker ____________ Data Node Task Tracker ____________ Data Node Task Tracker ____________ Data Node
  • 7. HDFS Advantages • Fault Tolerance • Now, if Hadoop has an important highlight in its definition i.e. ”using commodity hardware”, then we can be certain of failovers. But Hadoop handles this failing nodes very effectively and ensures that we do not loose any data anyway. How – read about replication .. • Handles large Datasets • No doubt why companies like Facebook, Google, yahoo etc. prefers it. So proven system for handling large data sets. • Streaming access to File system data • You have your “youtube” videos using this . • High Performance • The facts says that the processing time for data using Hadoop is “n”-times faster, where n is “number of nodes/data nodes”.
  • 8. When Not to Use Hadoop/HDFS • For many small files used in transactions • Low Latency data access • When there are many people who modifies the data/files ( multiple writers) arbitrarily.
  • 9. HDFS Components Name Node (Job Tracker) Data Nodes (Task Trackers) Name Node : This component of HDFS is generally on a High Performance machine and if we talk in layman terms, it is kind of “Index” for the data spread across several data nodes. We can also call it metadata storage process. Data Node : This is responsible for storing actual data. This runs as Daemon in local machines. Fun Fact: Daemon is a resident program that runs in background on your machine as processes. Daemon is terminology used in UNIX. In DOS we call it TSR.
  • 10. DFS and HDFS • So, what is difference between a regular Distributed File System and Hadoop !! • Hadoop processes the data in local nodes and just transmits the output to Client while in regular DFS data is brought to master node from various nodes for processing. So quiet obvious – Hadoop has to transfer less amount of data( just the output) over network while a regular DFS has to transfer huge volume of data on network. This Makes Hadoop winner for faster processing!! • This type of processing of data on data nodes is called data localization which is one of the important super powers of Hadoop ..
  • 11. Hadoop & Big Data – Relatives ! Relation is not very complex. Its just like simple husband-wife relation where Hadoop comes in just to resolves issues with Big data . In other words, Big data provides challenges for Hadoop to resolve.
  • 12. Thanks ! Probably will provide more details in next presentation 