SlideShare uma empresa Scribd logo
1 de 9
HADOOP
Syed Measum Haider Naqvi
Network Specialist
INTRODUCTION
We have entered an era of Big Data. Huge information is for the most part
accumulation of information sets so extensive and complex that it is
exceptionally hard to handle them utilizing close by database
administration devices. The principle challenges with Big databases
incorporate creation, curation, stockpiling, sharing, inquiry, examination
and perception. So to deal with these databases we require, "exceedingly
parallel software's". As a matter of first importance, information is
procured from diverse sources, for example, online networking,
customary undertaking information or sensor information and so forth.
Flume can be utilized to secure information from online networking, for
example, twitter. At that point, this information can be composed utilizing
conveyed document frameworks, for example, Hadoop File System. These
record frameworks are extremely proficient when number of peruses are
high when contrasted with composes.
DEFINING HADOOP
Hadoop is an open-source framework that allows to collection and
procedure big data in a distributed environment across clusters of
computers using simple programming models. It is designed to scale up
from single servers to thousands of machines, each offering local
computation and storage. Rather than rely on hardware to deliver high-
availability, the library itself is designed to detect and handle failures at
the application layer, so delivering a highly-available service on top of a
cluster of computers, each of which may be prone to failures.
HADOOP MODULES
▪ Hadoop Common: The common utilities that support the other
Hadoop modules.
▪ Hadoop Distributed File System (HDFS™): A distributed file system
that provides high-throughput access to application data.
▪ Hadoop YARN: A framework for job scheduling and cluster resource
management.
▪ Hadoop Map Reduce: A YARN-based system for parallel processing of
large data sets.
HADOOP DISTRIBUTED FILE SYSTEM
(HDFS™):
HDFS is structured similarly to a regular Unix file system except that data storage is distributed
across several machines. It is not intended as a replacement to a regular file system, but
rather as a file system-like layer for large distributed systems to use. It has in built
mechanisms to handle machine outages, and is optimized for throughput rather than latency.
▪ There are two and a half types of machine in a HDFS cluster:
▪ Data node - The Data Nodes are responsible for serving read and write requests from the
file system’s clients. The Data Nodes also perform block creation, deletion, and replication
upon instruction from the Name Node.
▪ Name node - The Name Node executes file system namespace operations like opening,
closing, and renaming files and directories. It also determines the mapping of blocks to
Data Nodes.
▪ Secondary Name node - this is NOT a backup name node, but is a separate service that
keeps a copy of both the edit logs, and file system image, merging them periodically to
keep the size reasonable. this is soon being deprecated in favor of the backup node and
the checkpoint node, but the functionality remains similar (if not the same)
HDFS ARCHITECTURE
HDFS FEATURES
HDFS also has a bunch of unique features that make it ideal for distributed systems:
 Failure tolerant - data can be duplicated across multiple data nodes to protect against
machine failures. The industry standard seems to be a replication factor of 3
(everything is stored on three machines).
 Scalability - data transfers happen directly with the data nodes so your read/write
capacity scales fairly well with the number of data nodes
 Space - need more disk space? Just add more data nodes and re-balance
 Industry standard - Lots of other distributed applications build on top of HDFS (H
Base, Map-Reduce)
 Pairs well with Map Reduce - As we shall learn
DATA ORGANIZATION
• Data Blocks : HDFS is designed to support very large files. Applications that are
compatible with HDFS are those that deal with large data sets. These applications write
their data only once but they read it one or more times and require these reads to be
satisfied at streaming speeds. HDFS supports write-once-read-many semantics on files.
A typical block size used by HDFS is 64 MB. Thus, an HDFS file is chopped up into 64
MB chunks, and if possible, each chunk will reside on a different Data Node.
• Staging : A client requests to create a file does not reach the Name Node immediately.
In fact, initially the HDFS client caches the file data into a temporary local file.
Application writes are transparently redirected to this temporary local file. When the
local file accumulates data worth over one HDFS block size, the client contacts the
Name Node. The Name Node inserts the file name into the file system hierarchy and
allocates a data block for it. The Name Node responds to the client request with the
identity of the Data Node and the destination data block. Then the client flushes the
block of data from the local temporary file to the specified Data Node. When a file is
closed, the remaining un-flushed data in the temporary local file is transferred to the
Data Node. The client then tells the Name Node that the file is closed. At this point, the
Name Node commits the file creation operation into a persistent store. If the Name
Node dies before the file is closed, the file is lost.
DATA ORGANIZATION
▪ Replication Pipelining : When a client is writing data to an HDFS file, its data is first written
to a local file as explained in the previous section. Suppose the HDFS file has a replication
factor of three. When the local file accumulates a full block of user data, the client
retrieves a list of Data Nodes from the Name Node. This list contains the Data Nodes that
will host a replica of that block. The client then flushes the data block to the first Data
Node. The first Data Node starts receiving the data in small portions, writes each portion to
its local repository and transfers that portion to the second Data Node in the list. The
second Data Node, in turn starts receiving each portion of the data block, writes that
portion to its repository and then flushes that portion to the third Data Node. Finally, the
third Data Node writes the data to its local repository. Thus, a Data Node can be receiving
data from the previous one in the pipeline and at the same time forwarding data to the
next one in the pipeline. Thus, the data is pipelined from one Data Node to the next.
▪ Accessibility : HDFS can be accessed from applications in many different ways. Natively,
HDFS provides a File System Java API for applications to use. A C language wrapper for this
Java API is also available. In addition, an HTTP browser can also be used to browse the files
of an HDFS instance. Work is in progress to expose HDFS through the WebDAV protocol.
▪ FS Shell: HDFS allows user data to be organized in the form of files and directories. It
provides a command line interface called FS shell that lets a user interact with the data in
HDFS.

Mais conteúdo relacionado

Mais procurados

HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENTHYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENTIJCSEA Journal
 
Comparative study of relational and non relations database performances using...
Comparative study of relational and non relations database performances using...Comparative study of relational and non relations database performances using...
Comparative study of relational and non relations database performances using...IAEME Publication
 
Distributed processing
Distributed processingDistributed processing
Distributed processingNeil Stein
 
data resource management
 data resource management data resource management
data resource managementsoodsurbhi123
 
Trends in the Database
Trends in the DatabaseTrends in the Database
Trends in the DatabaseMarlon Jamera
 
DATA RESOURCE MANAGEMENT
DATA RESOURCE MANAGEMENT DATA RESOURCE MANAGEMENT
DATA RESOURCE MANAGEMENT huma sh
 
Current trends in dbms
Current trends in dbmsCurrent trends in dbms
Current trends in dbmsDaisy Joy
 
A Survey on Big Data, Hadoop and it’s Ecosystem
A Survey on Big Data, Hadoop and it’s EcosystemA Survey on Big Data, Hadoop and it’s Ecosystem
A Survey on Big Data, Hadoop and it’s Ecosystemrahulmonikasharma
 
HDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsHDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsAhmad Assaf
 
Overview of Big Data by Sunny
Overview of Big Data by SunnyOverview of Big Data by Sunny
Overview of Big Data by SunnyDignitasDigital1
 
Trends in Database Management
Trends in Database ManagementTrends in Database Management
Trends in Database ManagementMarlon Jamera
 
Csci12 report aug18
Csci12 report aug18Csci12 report aug18
Csci12 report aug18karenostil
 
Uses of dbms
Uses of dbmsUses of dbms
Uses of dbmsMISY
 
DEE 431 Introduction to DBMS Slide 1
DEE 431 Introduction to DBMS Slide 1DEE 431 Introduction to DBMS Slide 1
DEE 431 Introduction to DBMS Slide 1YOGESH SINGH
 
CURRENT AND FUTURE TRENDS IN DBMS
CURRENT AND FUTURE TRENDS IN DBMSCURRENT AND FUTURE TRENDS IN DBMS
CURRENT AND FUTURE TRENDS IN DBMSGayathri P
 

Mais procurados (20)

HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENTHYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
 
Comparative study of relational and non relations database performances using...
Comparative study of relational and non relations database performances using...Comparative study of relational and non relations database performances using...
Comparative study of relational and non relations database performances using...
 
Distributed processing
Distributed processingDistributed processing
Distributed processing
 
Big data analytics.
Big data analytics.Big data analytics.
Big data analytics.
 
data resource management
 data resource management data resource management
data resource management
 
Trends in the Database
Trends in the DatabaseTrends in the Database
Trends in the Database
 
DATA RESOURCE MANAGEMENT
DATA RESOURCE MANAGEMENT DATA RESOURCE MANAGEMENT
DATA RESOURCE MANAGEMENT
 
Current trends in DBMS
Current trends in DBMSCurrent trends in DBMS
Current trends in DBMS
 
Current trends in dbms
Current trends in dbmsCurrent trends in dbms
Current trends in dbms
 
ppt on database management
ppt on database management ppt on database management
ppt on database management
 
A Survey on Big Data, Hadoop and it’s Ecosystem
A Survey on Big Data, Hadoop and it’s EcosystemA Survey on Big Data, Hadoop and it’s Ecosystem
A Survey on Big Data, Hadoop and it’s Ecosystem
 
HDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsHDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data Portals
 
Overview of Big Data by Sunny
Overview of Big Data by SunnyOverview of Big Data by Sunny
Overview of Big Data by Sunny
 
Trends in Database Management
Trends in Database ManagementTrends in Database Management
Trends in Database Management
 
Introduction to Database
Introduction to DatabaseIntroduction to Database
Introduction to Database
 
Csci12 report aug18
Csci12 report aug18Csci12 report aug18
Csci12 report aug18
 
Uses of dbms
Uses of dbmsUses of dbms
Uses of dbms
 
DEE 431 Introduction to DBMS Slide 1
DEE 431 Introduction to DBMS Slide 1DEE 431 Introduction to DBMS Slide 1
DEE 431 Introduction to DBMS Slide 1
 
NoSQL
NoSQLNoSQL
NoSQL
 
CURRENT AND FUTURE TRENDS IN DBMS
CURRENT AND FUTURE TRENDS IN DBMSCURRENT AND FUTURE TRENDS IN DBMS
CURRENT AND FUTURE TRENDS IN DBMS
 

Semelhante a Hadoop

Semelhante a Hadoop (20)

module 2.pptx
module 2.pptxmodule 2.pptx
module 2.pptx
 
big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing data
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Introduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptxIntroduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptx
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Hadoop
HadoopHadoop
Hadoop
 
Hdfs
HdfsHdfs
Hdfs
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File System
 
Hadoop
HadoopHadoop
Hadoop
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
 
Unit-3.pptx
Unit-3.pptxUnit-3.pptx
Unit-3.pptx
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hdfs
HdfsHdfs
Hdfs
 
Hadoop paper
Hadoop paperHadoop paper
Hadoop paper
 

Último

➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 

Último (20)

➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 

Hadoop

  • 1. HADOOP Syed Measum Haider Naqvi Network Specialist
  • 2. INTRODUCTION We have entered an era of Big Data. Huge information is for the most part accumulation of information sets so extensive and complex that it is exceptionally hard to handle them utilizing close by database administration devices. The principle challenges with Big databases incorporate creation, curation, stockpiling, sharing, inquiry, examination and perception. So to deal with these databases we require, "exceedingly parallel software's". As a matter of first importance, information is procured from diverse sources, for example, online networking, customary undertaking information or sensor information and so forth. Flume can be utilized to secure information from online networking, for example, twitter. At that point, this information can be composed utilizing conveyed document frameworks, for example, Hadoop File System. These record frameworks are extremely proficient when number of peruses are high when contrasted with composes.
  • 3. DEFINING HADOOP Hadoop is an open-source framework that allows to collection and procedure big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high- availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
  • 4. HADOOP MODULES ▪ Hadoop Common: The common utilities that support the other Hadoop modules. ▪ Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data. ▪ Hadoop YARN: A framework for job scheduling and cluster resource management. ▪ Hadoop Map Reduce: A YARN-based system for parallel processing of large data sets.
  • 5. HADOOP DISTRIBUTED FILE SYSTEM (HDFS™): HDFS is structured similarly to a regular Unix file system except that data storage is distributed across several machines. It is not intended as a replacement to a regular file system, but rather as a file system-like layer for large distributed systems to use. It has in built mechanisms to handle machine outages, and is optimized for throughput rather than latency. ▪ There are two and a half types of machine in a HDFS cluster: ▪ Data node - The Data Nodes are responsible for serving read and write requests from the file system’s clients. The Data Nodes also perform block creation, deletion, and replication upon instruction from the Name Node. ▪ Name node - The Name Node executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to Data Nodes. ▪ Secondary Name node - this is NOT a backup name node, but is a separate service that keeps a copy of both the edit logs, and file system image, merging them periodically to keep the size reasonable. this is soon being deprecated in favor of the backup node and the checkpoint node, but the functionality remains similar (if not the same)
  • 7. HDFS FEATURES HDFS also has a bunch of unique features that make it ideal for distributed systems:  Failure tolerant - data can be duplicated across multiple data nodes to protect against machine failures. The industry standard seems to be a replication factor of 3 (everything is stored on three machines).  Scalability - data transfers happen directly with the data nodes so your read/write capacity scales fairly well with the number of data nodes  Space - need more disk space? Just add more data nodes and re-balance  Industry standard - Lots of other distributed applications build on top of HDFS (H Base, Map-Reduce)  Pairs well with Map Reduce - As we shall learn
  • 8. DATA ORGANIZATION • Data Blocks : HDFS is designed to support very large files. Applications that are compatible with HDFS are those that deal with large data sets. These applications write their data only once but they read it one or more times and require these reads to be satisfied at streaming speeds. HDFS supports write-once-read-many semantics on files. A typical block size used by HDFS is 64 MB. Thus, an HDFS file is chopped up into 64 MB chunks, and if possible, each chunk will reside on a different Data Node. • Staging : A client requests to create a file does not reach the Name Node immediately. In fact, initially the HDFS client caches the file data into a temporary local file. Application writes are transparently redirected to this temporary local file. When the local file accumulates data worth over one HDFS block size, the client contacts the Name Node. The Name Node inserts the file name into the file system hierarchy and allocates a data block for it. The Name Node responds to the client request with the identity of the Data Node and the destination data block. Then the client flushes the block of data from the local temporary file to the specified Data Node. When a file is closed, the remaining un-flushed data in the temporary local file is transferred to the Data Node. The client then tells the Name Node that the file is closed. At this point, the Name Node commits the file creation operation into a persistent store. If the Name Node dies before the file is closed, the file is lost.
  • 9. DATA ORGANIZATION ▪ Replication Pipelining : When a client is writing data to an HDFS file, its data is first written to a local file as explained in the previous section. Suppose the HDFS file has a replication factor of three. When the local file accumulates a full block of user data, the client retrieves a list of Data Nodes from the Name Node. This list contains the Data Nodes that will host a replica of that block. The client then flushes the data block to the first Data Node. The first Data Node starts receiving the data in small portions, writes each portion to its local repository and transfers that portion to the second Data Node in the list. The second Data Node, in turn starts receiving each portion of the data block, writes that portion to its repository and then flushes that portion to the third Data Node. Finally, the third Data Node writes the data to its local repository. Thus, a Data Node can be receiving data from the previous one in the pipeline and at the same time forwarding data to the next one in the pipeline. Thus, the data is pipelined from one Data Node to the next. ▪ Accessibility : HDFS can be accessed from applications in many different ways. Natively, HDFS provides a File System Java API for applications to use. A C language wrapper for this Java API is also available. In addition, an HTTP browser can also be used to browse the files of an HDFS instance. Work is in progress to expose HDFS through the WebDAV protocol. ▪ FS Shell: HDFS allows user data to be organized in the form of files and directories. It provides a command line interface called FS shell that lets a user interact with the data in HDFS.