Hadoop: A distributed framework for Big Data

•Transferir como PPTX, PDF•

0 gostou•198 visualizações

Hadoop is a Java software framework that supports data-intensive distributed applications and is developed under open source license. It enables applications to work with thousands of nodes and petabytes of data.

Engenharia

 What is Big-Data?
 What is Hadoop?
 Why Distributed File System?
 Hadoop Distributed File System (HDFS)
 Replication & Rack Awareness

 Major Problems in Distributed File
System
 Hadoop Computing Model(MapReduce)
 Advantages Of Hadoop
 Disadvantages Of Hadoop
 Prominent Users
 Tools

 Big data refers to data volumes in the range of
exabytes (1018) and beyond.i.e.large amount of data
 We define “Big Data” as the amount of data just beyond
technology’s capability to store,manage and process efficiently.

Doug Cutting
2005: Doug Cutting and Michael J. Cafarella developed
Hadoop to support distribution for the Nutch search
engine project.
The project was funded by Yahoo.
2006: Yahoo gave the project to Apache
Software Foundation.

• Hadoop was created by Doug Cutting and Mike
Cafarella in 2005. Cutting, who was working at Yahoo!
• Hadoop is a software framework for distributed
processing of large datasets across large clusters of
computers
• Hadoop is open-source implementation for Google
MapReduce
• Hadoop is based on a simple programming model called
MapReduce

• Hadoop is based on a simple data model, any data will
fit.
• ApacheHadoop is an open-source software
framework written in Java for distributed storage
• Hadoop framework consists on two main layers
• Distributed file system (HDFS)
• Execution engine (MapReduce)
• Hadoop is one time write many time read.

Parallel processing used in hadoop for processing data so less time
required for processing huge amount of data.

 Single name node and many data nodes
 Name node maintains the file system
metadata
 Files are split into fixed sized blocks
and stored on data nodes (Default
64MB)
 Data blocks are replicated for fault
tolerance and fast access (Default is 3)
 Datanodes periodically send heartbeats
to namenode
 HDFS is a master-slave architecture
 Master: name node
 Slaves: data nodes (100s or 1000s of
nodes)

JOB TRACKER
TASK
TRACKER
Reduce
Map
TASK
TRACKER
Map
Reduce
TASK
TRACKER
Map
Reduce
Client

 Under Replication:- Total Replication < Replication
Factor
 Over Replication:- Total Replication > Replication
Factor

1)Hardware Failure
2)Large Data Sets
3) Redundancy Of Data

Two main phases: Map and Reduce
• Any job is converted into map and reduce tasks
• Developers need ONLY to implement the Map
and Reduce classes
MapReduce is a master-slave architecture
• Master: JobTracker
• Slaves: TaskTrackers (100s or 1000s of
tasktrackers)
• Every data node is running a tasktracker

Mapper and Reducers consume and produce (Key, Value) pairs
• Users define the data type of the Key and Value
• Shuffling & Sorting phase:
• Map output is shuffled such that all same-key records go to the same reducer
• Each reducer may receive multiple key sets
• Each reducer sorts its records to group similar keys, then process each group

Job: Count the occurrences of each word in a data set
Map
Tasks
Reduce
Tasks
Reduce phase is optional: Jobs can be Map Only

1)Scalable
2)Cost effective
3)Flexible
4)Fast
5)Resilient to failure

1)Security Concerns
2)Vulnerable By Nature
3)Not Fit for Small Data
4)Potential Stability Issues
5)General Limitations

1)Yahoo!
2)Facebook
3)Hadoop hosting in the Cloud
4)Hadoop on Microsoft Azure
5)Hadoop on Amazon EC2/S3 services
6)Amazon Elastic MapReduce

NoSQL:-
Databases,MongoDB, CouchDB, Cassandra, Redis,
BigTable, Hbase, Hypertable, ZooKeeper .
MapReduce :-
Hadoop, Hive, Pig, Cascading, Caffeine, S4, MapR,
Flume, Kafka, Oozie, Greenplum
Storage:-
S3, Hadoop Distributed File System

Servers :-
EC2, Google App Engine, Elastic, Beanstalk.
Processing :-
R, Yahoo! Pipes, Mechanical Turk,ElasticSearch,
BigSheets, Tinkerpop.

Hadoop: A distributed framework for Big Data

Mais conteúdo relacionado

Mais procurados

Apache HadoopAjit Koti

Hadoop TechnologyEce Seçil AKBAŞ

Hadoop TechnologyAtul Kushwaha

HadoopMallikarjuna G D

Hadoop trainting-in-hyderabad@kelly technologiesKelly Technologies

Analytics 3Srikanth Ayithy

Map reduce and hadoop at myliferesponseteam

Hadoop And Their Ecosystemsunera pathan

HADOOP TECHNOLOGY pptsravya raju

Hadoop data analysisVakul Vankadaru

Hadoop-Quick introductionSandeep Singh

HadoopOded Rotter

Introduction to Hadoop and Hadoop component rebeccatho

Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal

P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.MaharajothiP

Hadoop technologySohini~~ Music

Getting started big dataKibrom Gebrehiwot

Hadoop introductionChirag Ahuja

2. hadoop fundamentalsLokesh Ramaswamy

THE SOLUTION FOR BIG DATATarak Tar

Mais procurados (20)

Apache Hadoop

Hadoop Technology

Hadoop

Hadoop trainting-in-hyderabad@kelly technologies

Analytics 3

Map reduce and hadoop at mylife

Hadoop And Their Ecosystem

HADOOP TECHNOLOGY ppt

Hadoop data analysis

Hadoop-Quick introduction

Hadoop

Introduction to Hadoop and Hadoop component

Introduction to Big Data & Hadoop Architecture - Module 1

P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.

Hadoop technology

Getting started big data

Hadoop introduction

2. hadoop fundamentals

THE SOLUTION FOR BIG DATA

Semelhante a Hadoop: A distributed framework for Big Data

HadoopAnil Reddy

Fundamental of Big Data with Hadoop and HiveSharjeel Imtiaz

Introduction to HadoopYork University

Big Data Technologies - HadoopTalentica Software

Lecture 2 part 1Jazan University

02 Hadoop.pptx HADOOP VENNELA DONTHIREDDYVenneladonthireddy1

Hadoop bigdata overviewharithakannan

Hadoop trainting in hyderabad@kelly technologiesKelly Technologies

Big data applicationsJuan Pablo Paz Grau, Ph.D., PMP

Big data and hadoopMohit Tare

hadoopDeep Mehta

Hadoop and Distributed ComputingFederico Cargnelutti

002 Introduction to hadoop v3Dendej Sawarnkatat

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju

THE SOLUTION FOR BIG DATATarak Tar

Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenmaharajothip1

Big data & hadoopAbhi Goyan

Unit IV.pdfKennyPratheepKumar

Big data and hadoop overvewKunal Khanna

Big Data Architecture and DeploymentCisco Canada

Semelhante a Hadoop: A distributed framework for Big Data (20)

Hadoop

Fundamental of Big Data with Hadoop and Hive

Introduction to Hadoop

Big Data Technologies - Hadoop

Lecture 2 part 1

02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY

Hadoop bigdata overview

Hadoop trainting in hyderabad@kelly technologies

Big data applications

Big data and hadoop

hadoop

Hadoop and Distributed Computing

002 Introduction to hadoop v3

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015

THE SOLUTION FOR BIG DATA

Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women

Big data & hadoop

Unit IV.pdf

Big data and hadoop overvew

Big Data Architecture and Deployment

Último

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEslot gacor bisa pakai pulsa

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

Porous Ceramics seminar and technical writingrakeshbaidya232001

KubeKraft presentation @CloudNativeHooghlysanyuktamishra911

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

Extrusion Processes and Their Limitations120cr0395

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Roadmap to Membership of RICS - Pathways and RoutesM Maged Hegazy, LLM, MBA, CCP, P3O

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor

Hadoop: A distributed framework for Big Data

1. A Seminar presentation On

2.  What is Big-Data?  What is Hadoop?  Why Distributed File System?  Hadoop Distributed File System (HDFS)  Replication & Rack Awareness

3.  Major Problems in Distributed File System  Hadoop Computing Model(MapReduce)  Advantages Of Hadoop  Disadvantages Of Hadoop  Prominent Users  Tools

4.  Big data refers to data volumes in the range of exabytes (1018) and beyond.i.e.large amount of data  We define “Big Data” as the amount of data just beyond technology’s capability to store,manage and process efficiently.

6. Doug Cutting 2005: Doug Cutting and Michael J. Cafarella developed Hadoop to support distribution for the Nutch search engine project. The project was funded by Yahoo. 2006: Yahoo gave the project to Apache Software Foundation.

7. • Hadoop was created by Doug Cutting and Mike Cafarella in 2005. Cutting, who was working at Yahoo! • Hadoop is a software framework for distributed processing of large datasets across large clusters of computers • Hadoop is open-source implementation for Google MapReduce • Hadoop is based on a simple programming model called MapReduce

8. • Hadoop is based on a simple data model, any data will fit. • ApacheHadoop is an open-source software framework written in Java for distributed storage • Hadoop framework consists on two main layers • Distributed file system (HDFS) • Execution engine (MapReduce) • Hadoop is one time write many time read.

9. Parallel processing used in hadoop for processing data so less time required for processing huge amount of data.

10. Datanodes can be organized into racks

11.  Single name node and many data nodes  Name node maintains the file system metadata  Files are split into fixed sized blocks and stored on data nodes (Default 64MB)  Data blocks are replicated for fault tolerance and fast access (Default is 3)  Datanodes periodically send heartbeats to namenode  HDFS is a master-slave architecture  Master: name node  Slaves: data nodes (100s or 1000s of nodes)

12. JOB TRACKER TASK TRACKER Reduce Map TASK TRACKER Map Reduce TASK TRACKER Map Reduce Client

13.  Under Replication:- Total Replication < Replication Factor  Over Replication:- Total Replication > Replication Factor

14.

15.

16.

17. 1)Hardware Failure 2)Large Data Sets 3) Redundancy Of Data

18. Two main phases: Map and Reduce • Any job is converted into map and reduce tasks • Developers need ONLY to implement the Map and Reduce classes MapReduce is a master-slave architecture • Master: JobTracker • Slaves: TaskTrackers (100s or 1000s of tasktrackers) • Every data node is running a tasktracker

19. Mapper and Reducers consume and produce (Key, Value) pairs • Users define the data type of the Key and Value • Shuffling & Sorting phase: • Map output is shuffled such that all same-key records go to the same reducer • Each reducer may receive multiple key sets • Each reducer sorts its records to group similar keys, then process each group

20. Job: Count the occurrences of each word in a data set Map Tasks Reduce Tasks Reduce phase is optional: Jobs can be Map Only

21. 1)Scalable 2)Cost effective 3)Flexible 4)Fast 5)Resilient to failure

22. 1)Security Concerns 2)Vulnerable By Nature 3)Not Fit for Small Data 4)Potential Stability Issues 5)General Limitations

23. 1)Yahoo! 2)Facebook 3)Hadoop hosting in the Cloud 4)Hadoop on Microsoft Azure 5)Hadoop on Amazon EC2/S3 services 6)Amazon Elastic MapReduce

24. NoSQL:- Databases,MongoDB, CouchDB, Cassandra, Redis, BigTable, Hbase, Hypertable, ZooKeeper . MapReduce :- Hadoop, Hive, Pig, Cascading, Caffeine, S4, MapR, Flume, Kafka, Oozie, Greenplum Storage:- S3, Hadoop Distributed File System

25. Servers :- EC2, Google App Engine, Elastic, Beanstalk. Processing :- R, Yahoo! Pipes, Mechanical Turk,ElasticSearch, BigSheets, Tinkerpop.

Hadoop: A distributed framework for Big Data

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Hadoop: A distributed framework for Big Data

Semelhante a Hadoop: A distributed framework for Big Data (20)

Último

Último (20)

Hadoop: A distributed framework for Big Data