SlideShare uma empresa Scribd logo
1 de 42
In The Name of Allah The Most Merciful The
Most Gracious
• Name: Abdul Nasir Afridi
• Roll Number:01
• Batch#10
• Subject: Advanced Database And Data
mining.
Page-1
Research Article
1. Performance Evaluation of Read and
Write operations in Hadoop Distributed
File System.
Published: 2014 Sixth International
Symposium on Parallel Architectures,
Algorithms and Programming
Conference Paper: IEEE Computer
Society
Authors: Dr T Ragunathan et al.
7B-2
Research Article
• High Performance and Fault Tolerant
Distributed File System for Big Data
Storage and Processing using Hadoop
• Published: 2014 International
Conference on Intelligent Computing
Applications
• © 2014 IEEE Conference Publishing
Services
7B-3
Research Article
• A Distributed Storage Model for EHR
Based on HBase
• Published: © 2011 IEEE International
Conference on Information
Management, Innovation Management
and Industrial Engineering
7B-4
Research Article
7B-5
H-Store: A High-Performance, Distributed
Main Memory Transaction Processing
System
Published: August 23-28, 2008, Auckland,
New Zealand
Conference Paper:ACM 978-1-60558-
306-8/08/08
Copyright 2008 VLDB Endowment,
• Keywords-
• Hadoop Distributed File System(HDFS);
• H-Base
• Electronic healthcare record(EHR)
• Distritued Storage
• Big Data
• MapReduce
7B-6
What is Apache Hadoop?
• Hadoop Distributed File System:
• HDFS, the storage layer of Hadoop, is a
distributed, scalable, Java-based file system
adept at storing large volumes of unstructured
data
• It is an open-source system developed by
Apache in Java.
• It is designed to handle very large data sets.
• It is designed to scale to very large clusters.
• It is designed to run on commodity hardware.
7B-7
Hadoop echosystem
7B-8
Hadoop History
7B-9
Hadoop Echosystem
7B-10
Hadoop echosystem
7B-11
Hadoop echosystem
• Hadoop Distributed File System:HDFS, the
storage layer of
• Hadoop, is a distributed, scalable, Java-based
file system.
• It offers data replication.
• It offers automatic failover in the event of a
crash. •
• It automatically fragments storage over the
cluster. •
• It brings processing to the data. •
• Its supportlarge volumes of file into the milion7B-12
Hadoop echosystem
• MapReduce:
• MapReduce is a software framework that
serves as the compute layer of Hadoop.
• MapReduce jobs are divided into two
parts.The mapfunction divides a query into
multiple parts and processes data at the node
level.
• The reducefunction aggregates the results of
the map function to determine the answer to
the query.
7B-13
Hadoop echosystem
• Hive:
Hive is a Hadoop-based data warehouse
developed by Facebook. It allows users to
write queries in SQL, which are then
converted to map-reduce. This allows SQL
programmers with no map-reduce experience
to use the warehouse and makes it easier to
integrate with business intelligence and
visualization tools such as Micro Strategy,
Tableau, Revolutions Analytics, etc
7B-14
Hadoop echosystem
• Pig:
Pig Latin is a Hadoop-based language
developed by Yahoo.
It is relatively easy to learn and is adept at
very deep, very long data pipelines (a
limitation of SQL.)
Pig, originally developed at Yahoo research,
is a high-level language for building map-
reduce programs for Hadoop,
thus simplifying the use of map-reduce. It is a
data flow language that provides high-level
commands7B-15
Hadoop echosystem
7B-16
Hadoop echosystem
• HBase:
• HBase is a non-relational database that
allows for low-latency, quick lookups in
Hadoop.
• It adds transactional capabilities to
Hadoop, allowing users to conduct
updates,inserts, and deletes.
• E-Bay and Facebook use HBase
heavily
7B-17
Hadoop echosystem
• Flume:
• Flume is a framework for populating
Hadoop with data.
• Agents are populated throughout ones’
IT infrastructure (inside web servers,
application servers, and mobile devices,
for example) to collect data and
integrate it into Hadoop.
7B-18
Hadoop echosystem
• Oozie:
• Oozie is a workflow processing system that
lets users define a series of jobs written in
multiple languages (such as mapreduce, Pig
and Hive) then intelligently links them to one
another.
• Oozie allows users to specify, for example,
that a particular query is only to be initiated
after specified previous jobs on which it relies
for data are completed
7B-19
Hadoop echosystem
• Whirr:
• Whirr is a set of libraries that allows
users to easily spin-up Hadoop clusters
on top of Amazon EC2, Rackspace, or
any virtual infrastructure.
• It supports all major virtualized
infrastructure vendors on the market
7B-20
Hadoop echosystem
• Avro:
• Avro is a data serialization system that
allows for encoding the schema of
Hadoop files.
• It is adept at parsing data and
performing removed procedure calls.
7B-21
Hadoop echosystem
• Mahout:
• Mahout is a data-mining library.
• It takes the most popular data-mining
algorithms for performing clustering,
regression testing, and statistical
modeling
• and implements them using the map-
reduce mode
7B-22
7B-23
Hadoop echosystem
• Sqoop:
• Sqoop is a connectivity tool for moving data
from non-Hadoop data stores such as
relational databases and data warehouses
into Hadoop.
• It allows users to specify the target location
inside of Hadoop and instruct Sqoop to move
data from Oracle, Teradata, or other relational
databases to the target
7B-24
Hadoop Configuration File
7B-25
Data Ingress And Egress
7B-26
Joining Type Venn Diagram
7B-27
Big data
Big data is being generated by everything
around us at all times.
 Every digital process and social media
exchange produces it.
 Systems, sensors and mobile devices
transmit it.
Big data is arriving from multiple sources at an
alarming velocity, volume and variety.
To extract meaningful value from big data,
you need optimal processing power, analytics
capabilities and skills.
7B-28
Big Data
7B-29
Typical Hadoop cluster integrates MapReduce and
HFDS
Master/slave architecture
7B-30
Pictorial Representation Hadoop
7B-31
Physical Architecture of Hadoop echosystem
7B-32
HDFS
7B-33
MapReduce
7B-34
HDFS Namenode
7B-35
Scheduling
• By default
▫ Hadoop uses FIFO to schedule jobs.
▫  No preemption once a job is running.
In Hadoop version 2.x fair scheduling
introduces.assigning resources to
applications such that all applications
get, on average, an equal share of
resources over time
7B-36
Hadoop Implementation
7B-37
References
• Reference
• The Ministry of Health of P . R. China.
Health records infrastructure and data
standards.[CP/OL].[ 2009 05]
http://www.moh.gov.cn/publicfiles/busin
ess/cmsresources/mohbgt/cmsrsdocum
ent/doc4359.doc
• Jonathan R. Owens. Hadoop Real-
World Solutions Cookbook Copyright©
2013 Packt Publishing
7B-38
References
• HDFS:Architecture[OL].http://hadoop.apache.
org/
• Terabyte sort[OL]. http://sortbenchmark.org/.
• T. White, Hadoop: The Definitive Guide.
O'Reilly Media, Yahoo! Press, June 5, 2009.
• Mahesh, Bharath, Keerthivasan, “Review of
Distributed File Systems: Concepts and Case
Studies” ECE 677 Distributed Computing
Systems - Fall 2010
• Jeff Markham , Apache Hadoop™ YARN.
• Addison-Wesley Press ,2014
7B-39
References
• Eric Sammer ,Hadoop Operations
Copyright © 2012 Published by O’Reilly
Media
• Kevin Sitto and Marshall Presser,Field
Guide to Hadoop, Copyright © 2015,
Published by O’Reilly Media
• John Wiley & Sons, NoSQL For
Dummies® New Jersey Media and
software compilation copyright © 2015
7B-40
7B-41
7B-42

Mais conteúdo relacionado

Mais procurados

Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview EMC
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesMapR Technologies
 
Hotel inspection data set analysis copy
Hotel inspection data set analysis   copyHotel inspection data set analysis   copy
Hotel inspection data set analysis copySharon Moses
 
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, HortonworksHortonworks
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersDataWorks Summit/Hadoop Summit
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production SuccessAllen Day, PhD
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPTAnand Pandey
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
 
Hadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsHadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsDataWorks Summit
 

Mais procurados (20)

Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Case study on big data
Case study on big dataCase study on big data
Case study on big data
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best Practices
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Hotel inspection data set analysis copy
Hotel inspection data set analysis   copyHotel inspection data set analysis   copy
Hotel inspection data set analysis copy
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
 
Big data & hadoop
Big data & hadoopBig data & hadoop
Big data & hadoop
 
Hadoop Family and Ecosystem
Hadoop Family and EcosystemHadoop Family and Ecosystem
Hadoop Family and Ecosystem
 
Big data
Big dataBig data
Big data
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Apache Hadoop at 10
Apache Hadoop at 10Apache Hadoop at 10
Apache Hadoop at 10
 
Hadoop Fundamentals I
Hadoop Fundamentals IHadoop Fundamentals I
Hadoop Fundamentals I
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Hadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsHadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the experts
 

Destaque

Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operationSubhas Kumar Ghosh
 
Introduction hadoop adminisrtation
Introduction hadoop adminisrtationIntroduction hadoop adminisrtation
Introduction hadoop adminisrtationAnjalli Pushpa
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configurationSubhas Kumar Ghosh
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2benjaminwootton
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 

Destaque (8)

Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
 
Introduction hadoop adminisrtation
Introduction hadoop adminisrtationIntroduction hadoop adminisrtation
Introduction hadoop adminisrtation
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 

Semelhante a Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxiaeronlineexm
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptxVIJAYAPRABAP
 

Semelhante a Hadoop Distriubted File System (HDFS) presentation 27- 5-2015 (20)

Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Anju
AnjuAnju
Anju
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptx
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
Hadoop
Hadoop Hadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Hive
HiveHive
Hive
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx
 

Último

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 

Último (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 

Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

  • 1. In The Name of Allah The Most Merciful The Most Gracious • Name: Abdul Nasir Afridi • Roll Number:01 • Batch#10 • Subject: Advanced Database And Data mining. Page-1
  • 2. Research Article 1. Performance Evaluation of Read and Write operations in Hadoop Distributed File System. Published: 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming Conference Paper: IEEE Computer Society Authors: Dr T Ragunathan et al. 7B-2
  • 3. Research Article • High Performance and Fault Tolerant Distributed File System for Big Data Storage and Processing using Hadoop • Published: 2014 International Conference on Intelligent Computing Applications • © 2014 IEEE Conference Publishing Services 7B-3
  • 4. Research Article • A Distributed Storage Model for EHR Based on HBase • Published: © 2011 IEEE International Conference on Information Management, Innovation Management and Industrial Engineering 7B-4
  • 5. Research Article 7B-5 H-Store: A High-Performance, Distributed Main Memory Transaction Processing System Published: August 23-28, 2008, Auckland, New Zealand Conference Paper:ACM 978-1-60558- 306-8/08/08 Copyright 2008 VLDB Endowment,
  • 6. • Keywords- • Hadoop Distributed File System(HDFS); • H-Base • Electronic healthcare record(EHR) • Distritued Storage • Big Data • MapReduce 7B-6
  • 7. What is Apache Hadoop? • Hadoop Distributed File System: • HDFS, the storage layer of Hadoop, is a distributed, scalable, Java-based file system adept at storing large volumes of unstructured data • It is an open-source system developed by Apache in Java. • It is designed to handle very large data sets. • It is designed to scale to very large clusters. • It is designed to run on commodity hardware. 7B-7
  • 12. Hadoop echosystem • Hadoop Distributed File System:HDFS, the storage layer of • Hadoop, is a distributed, scalable, Java-based file system. • It offers data replication. • It offers automatic failover in the event of a crash. • • It automatically fragments storage over the cluster. • • It brings processing to the data. • • Its supportlarge volumes of file into the milion7B-12
  • 13. Hadoop echosystem • MapReduce: • MapReduce is a software framework that serves as the compute layer of Hadoop. • MapReduce jobs are divided into two parts.The mapfunction divides a query into multiple parts and processes data at the node level. • The reducefunction aggregates the results of the map function to determine the answer to the query. 7B-13
  • 14. Hadoop echosystem • Hive: Hive is a Hadoop-based data warehouse developed by Facebook. It allows users to write queries in SQL, which are then converted to map-reduce. This allows SQL programmers with no map-reduce experience to use the warehouse and makes it easier to integrate with business intelligence and visualization tools such as Micro Strategy, Tableau, Revolutions Analytics, etc 7B-14
  • 15. Hadoop echosystem • Pig: Pig Latin is a Hadoop-based language developed by Yahoo. It is relatively easy to learn and is adept at very deep, very long data pipelines (a limitation of SQL.) Pig, originally developed at Yahoo research, is a high-level language for building map- reduce programs for Hadoop, thus simplifying the use of map-reduce. It is a data flow language that provides high-level commands7B-15
  • 17. Hadoop echosystem • HBase: • HBase is a non-relational database that allows for low-latency, quick lookups in Hadoop. • It adds transactional capabilities to Hadoop, allowing users to conduct updates,inserts, and deletes. • E-Bay and Facebook use HBase heavily 7B-17
  • 18. Hadoop echosystem • Flume: • Flume is a framework for populating Hadoop with data. • Agents are populated throughout ones’ IT infrastructure (inside web servers, application servers, and mobile devices, for example) to collect data and integrate it into Hadoop. 7B-18
  • 19. Hadoop echosystem • Oozie: • Oozie is a workflow processing system that lets users define a series of jobs written in multiple languages (such as mapreduce, Pig and Hive) then intelligently links them to one another. • Oozie allows users to specify, for example, that a particular query is only to be initiated after specified previous jobs on which it relies for data are completed 7B-19
  • 20. Hadoop echosystem • Whirr: • Whirr is a set of libraries that allows users to easily spin-up Hadoop clusters on top of Amazon EC2, Rackspace, or any virtual infrastructure. • It supports all major virtualized infrastructure vendors on the market 7B-20
  • 21. Hadoop echosystem • Avro: • Avro is a data serialization system that allows for encoding the schema of Hadoop files. • It is adept at parsing data and performing removed procedure calls. 7B-21
  • 22. Hadoop echosystem • Mahout: • Mahout is a data-mining library. • It takes the most popular data-mining algorithms for performing clustering, regression testing, and statistical modeling • and implements them using the map- reduce mode 7B-22
  • 23. 7B-23
  • 24. Hadoop echosystem • Sqoop: • Sqoop is a connectivity tool for moving data from non-Hadoop data stores such as relational databases and data warehouses into Hadoop. • It allows users to specify the target location inside of Hadoop and instruct Sqoop to move data from Oracle, Teradata, or other relational databases to the target 7B-24
  • 26. Data Ingress And Egress 7B-26
  • 27. Joining Type Venn Diagram 7B-27
  • 28. Big data Big data is being generated by everything around us at all times.  Every digital process and social media exchange produces it.  Systems, sensors and mobile devices transmit it. Big data is arriving from multiple sources at an alarming velocity, volume and variety. To extract meaningful value from big data, you need optimal processing power, analytics capabilities and skills. 7B-28
  • 30. Typical Hadoop cluster integrates MapReduce and HFDS Master/slave architecture 7B-30
  • 32. Physical Architecture of Hadoop echosystem 7B-32
  • 36. Scheduling • By default ▫ Hadoop uses FIFO to schedule jobs. ▫  No preemption once a job is running. In Hadoop version 2.x fair scheduling introduces.assigning resources to applications such that all applications get, on average, an equal share of resources over time 7B-36
  • 38. References • Reference • The Ministry of Health of P . R. China. Health records infrastructure and data standards.[CP/OL].[ 2009 05] http://www.moh.gov.cn/publicfiles/busin ess/cmsresources/mohbgt/cmsrsdocum ent/doc4359.doc • Jonathan R. Owens. Hadoop Real- World Solutions Cookbook Copyright© 2013 Packt Publishing 7B-38
  • 39. References • HDFS:Architecture[OL].http://hadoop.apache. org/ • Terabyte sort[OL]. http://sortbenchmark.org/. • T. White, Hadoop: The Definitive Guide. O'Reilly Media, Yahoo! Press, June 5, 2009. • Mahesh, Bharath, Keerthivasan, “Review of Distributed File Systems: Concepts and Case Studies” ECE 677 Distributed Computing Systems - Fall 2010 • Jeff Markham , Apache Hadoop™ YARN. • Addison-Wesley Press ,2014 7B-39
  • 40. References • Eric Sammer ,Hadoop Operations Copyright © 2012 Published by O’Reilly Media • Kevin Sitto and Marshall Presser,Field Guide to Hadoop, Copyright © 2015, Published by O’Reilly Media • John Wiley & Sons, NoSQL For Dummies® New Jersey Media and software compilation copyright © 2015 7B-40
  • 41. 7B-41
  • 42. 7B-42