Introduction to Big Data and Hadoop

•Transferir como PPTX, PDF•

3 gostaram•165 visualizações

This document provides an introduction to Hadoop and big data. It defines big data as large amounts of data from a variety of structured, semi-structured, and unstructured sources that is difficult to store, analyze, and visualize due to its volume, velocity, and variety. Hadoop is introduced as an open source framework for distributed processing and storage of large datasets across clusters of commodity hardware. Key Hadoop components like HDFS, MapReduce, YARN and daemons like NameNode, DataNode, ResourceManager and NodeManager are described. Modes of operation for Hadoop including standalone, pseudo-distributed and fully distributed are also outlined.

Educação

Hadoop
Development
Series
By Sandeep Patil
4/11/2017 1Footer Text

Introduction to Big Data
and Hadoop
4/11/2017Footer Text 2

What is Big Data??
• Large amount of Data .
• Its a popular term used to express exponential growth of
data .
• Big data is difficult to store , collect , maintain , Analyze
and Visualize .
4/11/2017Footer Text 3

Big Data characteristics
• Volume :-
Large amount of data .
• Velocity :-
The rate at which data is getting generated
• Variety :-
Different types of Data
- Structured data ,eg MySql
- Semi-Structured data, eg xml , json
- Unstructured data, eg text , audio, video
4/11/2017Footer Text 4

Big Data sources
• Social Media
• Banks
• Instruments
• Websites
• Stock Market
4/11/2017Footer Text 5

Use cases of Big Data
• Recommendation engines
• Analyzing Call Detail Record(CDR)
• Fraud Detection
• Market Basket Analysis
• Sentimental Analysis
4/11/2017Footer Text 6

Hadoop Introduction
• Open source framework that allows distributed
processing of large datasets on the cluster of commodity
hardware
• Hadoop is a data management tool and uses scale out
storage .
4/11/2017Footer Text 7

Defining Hadoop Cluster
• Size of data is most important factor while defining
hadoop cluster
4/11/2017Footer Text 8
5 Servers with 10 TB storage
capacity each
Total Storage Capacity : - 50TB

Defining Hadoop Cluster
4/11/2017Footer Text 9
7 Servers with 10 TB storage
capacity each
Total storage capacity : 70TB

Hadoop Components
• Hadoop 1 Componets
- HDFS (Hadoop distributed file system)
- MapReduce
• Hadoop 2 Component
- HDFS (Hadoop distributed file system)
- YARN/MRv2
4/11/2017Footer Text 10
HDFS
MR/
YARN
Storage/
Reads-Writes
Processing

Hadoop Daemons
• Hadoop 1 Daemos
Namenode
Datanode
Secondary Namenode
job Tracker
Task Tracker
4/11/2017Footer Text 11
HDFS MapReduce
NameNode
DataNode
Job Tracker
Task Tracker

Hadoop Daemons
• Hadoop 2 Daemos
Namenode
Datanode
Secondary Namenode
Resource Manager
Node Manager
4/11/2017Footer Text 12
HDFS MapReduce
NameNode
DataNode
Resource Manager
Node Manager

Hadoop Master Slave
Architecture
4/11/2017Footer Text 13
HDFS MR/YARN
NameNode DataNode ResourceManager NodeManager
Master Slave Master Slave

Hadoop Cluster
• Assume that we have hadoop cluster with 4 nodes
4/11/2017Footer Text 14
Master
NameNode
ResourceManager
Slave
DataNode
NodeManager

Secondary Name Node
• Secondary Namenode is not a hot backup for Namenode
.
• It just takes hourly backup of Namenode metadata
• It is can be used to Restart a crashed Hadoop Cluster
• Secondary Namenode is an important demon for
Hadoop1 , However in hadoop2 It is not that much
Important .
4/11/2017Footer Text 15

Modes of Operation
• Stand Alone
• Pseudo Distributed
• Fully Distributed
4/11/2017Footer Text 16

Next Video
• Comparison between Hadoop1 and Hadoop2
4/11/2017Footer Text 17

Like and Subscribe
4/11/2017Footer Text 18
sdp117@gmail.com

Mais conteúdo relacionado

Mais procurados

Hadoop training by keylabsSiva Sankar

Redis memory optimization sripathi, CTO hashedinHashedIn Technologies

Big dataAlisha Roy

Hadoop introductionRabindra Nath Nandi

Redis databaseÑáwrás Ñzár

Intro to Hadoop and MapReduceJosi Aranda

Redis as database - HashedInHashedIn Technologies

IDL Support of HDFThe HDF-EOS Tools and Information Center

Hadoop TechnologyEce Seçil AKBAŞ

Getting started with big data in Azure HDInsightNilesh Gule

What is HDFS | Hadoop Distributed File System | EdurekaEdureka!

Basic Hadoop Architecture V1 vs V2VIVEKVANAVAN

Big data and hadoop anupamaAnupama Prabhudesai

Alluxio Data Orchestration Platform for the CloudShubham Tagra

Integrating HDF5 with SRBThe HDF-EOS Tools and Information Center

Introduce to sparkYen Hao Huang

Apache Con Eu2008 Hadoop Tour Tom Whitetomwhite

Why Your MongoDB Needs RedisItamar Haber

Practical Use of a NoSQLIBM Cloud Data Services

HDF5 Performance Enhancements with the Elimination of Unlimited DimensionThe HDF-EOS Tools and Information Center

Mais procurados (20)

Hadoop training by keylabs

Redis memory optimization sripathi, CTO hashedin

Big data

Hadoop introduction

Redis database

Intro to Hadoop and MapReduce

Redis as database - HashedIn

IDL Support of HDF

Hadoop Technology

Getting started with big data in Azure HDInsight

What is HDFS | Hadoop Distributed File System | Edureka

Basic Hadoop Architecture V1 vs V2

Big data and hadoop anupama

Alluxio Data Orchestration Platform for the Cloud

Integrating HDF5 with SRB

Introduce to spark

Apache Con Eu2008 Hadoop Tour Tom White

Why Your MongoDB Needs Redis

Practical Use of a NoSQL

HDF5 Performance Enhancements with the Elimination of Unlimited Dimension

Semelhante a Introduction to Big Data and Hadoop

Hadoop development series(1)Amar kumar

Aziksa hadoop architecture santosh jhaData Con LA

Introduction to BIg Data and HadoopAmir Shaikh

Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn

Big dataMayuri Verma

Unit IV.pdfKennyPratheepKumar

Apache hadoop: POSH Meetup Palo Alto, CA April 2014Kevin Crocker

Apache Hadoop Big Data TechnologyJay Nagar

HadoopMallikarjuna G D

Hadoop training in bangaloreKelly Technologies

Hadoopavnishagr

HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptManiMaran230751

Hadoop Distributed File SystemNilaNila16

An introduction toClass Presentation byDamon A. Runion.docxgreg1eden90113

Introduction to HDFS and MapReduceDerek Chen

List of Engineering Colleges in UttarakhandRoorkee College of Engineering, Roorkee

Hadoop.pptxarslanhaneef

Hadoop.pptxsonukumar379092

HadoopHimanshu Soni

Hadoop and BigData - July 2016Ranjith Sekar

Semelhante a Introduction to Big Data and Hadoop (20)

Hadoop development series(1)

Aziksa hadoop architecture santosh jha

Introduction to BIg Data and Hadoop

Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...

Big data

Unit IV.pdf

Apache hadoop: POSH Meetup Palo Alto, CA April 2014

Apache Hadoop Big Data Technology

Hadoop

Hadoop training in bangalore

Hadoop

HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt

Hadoop Distributed File System

An introduction toClass Presentation byDamon A. Runion.docx

Introduction to HDFS and MapReduce

List of Engineering Colleges in Uttarakhand

Hadoop.pptx

Hadoop

Hadoop and BigData - July 2016

Último

ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli

Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri

HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection

YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxConquiztadors- the Quiz Society of Sri Venkateswara College

Raw materials used in Herbal Cosmetics.pptxAshokrao Mane college of Pharmacy Peth-Vadgaon

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝9953056974 Low Rate Call Girls In Saket, Delhi NCR

Keynote by Prof. Wurzer at Nordex about IP-designMIPLM

Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543

Full Stack Web Development Course for BeginnersSabitha Banu

Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105

4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239

FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxConquiztadors- the Quiz Society of Sri Venkateswara College

Influencing policy (training slides from Fast Track Impact)Mark Reed

Computed Fields and api Depends in the Odoo 17Celine George

Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27

Karra SKD Conference Presentation Revised.pptxAshokKarra1

Proudly South Africa powerpoint Thorisha.pptxthorishapillay1

MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma

Difference Between Search & Browse Methods in Odoo 17Celine George

What is Model Inheritance in Odoo 17 ERPCeline George

Introduction to Big Data and Hadoop

1. Hadoop Development Series By Sandeep Patil 4/11/2017 1Footer Text

2. Introduction to Big Data and Hadoop 4/11/2017Footer Text 2

3. What is Big Data?? • Large amount of Data . • Its a popular term used to express exponential growth of data . • Big data is difficult to store , collect , maintain , Analyze and Visualize . 4/11/2017Footer Text 3

4. Big Data characteristics • Volume :- Large amount of data . • Velocity :- The rate at which data is getting generated • Variety :- Different types of Data - Structured data ,eg MySql - Semi-Structured data, eg xml , json - Unstructured data, eg text , audio, video 4/11/2017Footer Text 4

5. Big Data sources • Social Media • Banks • Instruments • Websites • Stock Market 4/11/2017Footer Text 5

6. Use cases of Big Data • Recommendation engines • Analyzing Call Detail Record(CDR) • Fraud Detection • Market Basket Analysis • Sentimental Analysis 4/11/2017Footer Text 6

7. Hadoop Introduction • Open source framework that allows distributed processing of large datasets on the cluster of commodity hardware • Hadoop is a data management tool and uses scale out storage . 4/11/2017Footer Text 7

8. Defining Hadoop Cluster • Size of data is most important factor while defining hadoop cluster 4/11/2017Footer Text 8 5 Servers with 10 TB storage capacity each Total Storage Capacity : - 50TB

9. Defining Hadoop Cluster 4/11/2017Footer Text 9 7 Servers with 10 TB storage capacity each Total storage capacity : 70TB

10. Hadoop Components • Hadoop 1 Componets - HDFS (Hadoop distributed file system) - MapReduce • Hadoop 2 Component - HDFS (Hadoop distributed file system) - YARN/MRv2 4/11/2017Footer Text 10 HDFS MR/ YARN Storage/ Reads-Writes Processing

11. Hadoop Daemons • Hadoop 1 Daemos Namenode Datanode Secondary Namenode job Tracker Task Tracker 4/11/2017Footer Text 11 HDFS MapReduce NameNode DataNode Job Tracker Task Tracker

12. Hadoop Daemons • Hadoop 2 Daemos Namenode Datanode Secondary Namenode Resource Manager Node Manager 4/11/2017Footer Text 12 HDFS MapReduce NameNode DataNode Resource Manager Node Manager

13. Hadoop Master Slave Architecture 4/11/2017Footer Text 13 HDFS MR/YARN NameNode DataNode ResourceManager NodeManager Master Slave Master Slave

14. Hadoop Cluster • Assume that we have hadoop cluster with 4 nodes 4/11/2017Footer Text 14 Master NameNode ResourceManager Slave DataNode NodeManager

15. Secondary Name Node • Secondary Namenode is not a hot backup for Namenode . • It just takes hourly backup of Namenode metadata • It is can be used to Restart a crashed Hadoop Cluster • Secondary Namenode is an important demon for Hadoop1 , However in hadoop2 It is not that much Important . 4/11/2017Footer Text 15

16. Modes of Operation • Stand Alone • Pseudo Distributed • Fully Distributed 4/11/2017Footer Text 16

17. Next Video • Comparison between Hadoop1 and Hadoop2 4/11/2017Footer Text 17

18. Like and Subscribe 4/11/2017Footer Text 18 sdp117@gmail.com

Introduction to Big Data and Hadoop

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Introduction to Big Data and Hadoop

Semelhante a Introduction to Big Data and Hadoop (20)

Último

Último (20)

Introduction to Big Data and Hadoop