SlideShare uma empresa Scribd logo
1 de 30
Hadoop Jon 
By HumoyunJon Lee
90% OF THE WORLD’S DATA HAS BEEN GENERATED IN THE LAST 
THREE YEARS ALONE, AND IT IS GROWING 
AT EVEN A MORE RAPID RATE. 
BIG DATA 
The world has been exponential data growth, due to social media, 
mobility, E-commerce and other factors. 
• Volume 
• Variety 
• Velocity
“Big Data is like teenage sex; 
everyone talks about it, 
nobody really knows how to do it, 
everyone thinks everyone else is doing it, 
so everyone claims they are doing it” 
Dan Ariely, Duke University
Big Data Ecosystem
To Address This Issue 
We need HadoopJon
A Shared Nothing Network or 
What is that Hadoop
The Apache Hadoop software library is a framework that allows for the 
distributed processing of large data sets across clusters of computers using 
simple programming models. It is designed to scale up from single servers to 
thousands of machines, each offering local computation and storage. Rather 
than rely on hardware to deliver high-availability, the library itself is designed 
to detect and handle failures at the application layer, so delivering a highly-available 
service on top of a cluster of computers, each of which may be prone 
to failures.
Prerequisites : 
• Installing Java v1.5+ 
• Adding dedicated Hadoop system user. 
• Configuring SSH access. 
• Disabling IPv6. 
Installing HadoopJon
Configuring Hadoop : 
a. hadoop-env.sh 
b. core-site.xml 
c. mapred-site.xml 
d. hdfs-site.xml
Hadoop comes with several web interfaces which are by 
default available at these locations: 
• http://localhost:50070/ – web UI of the NameNode daemon 
• http://localhost:50030/ – web UI of the JobTracker daemon 
• http://localhost:50060/ – web UI of the TaskTracker daemon 
Hadoop Web Interfaces
Reliable 
Hadoop 
Features 
Flexible Economical 
Scalable 
Hadoop Key Characteristics:
• Scalable – New nodes can be added as needed, and added without 
needing to change data formats, how data is loaded, how jobs are 
written, or the applications on top. 
• Economical – Hadoop brings massively parallel computing to 
commodity servers. The result is a sizeable decrease in the cost per 
terabyte of storage, which in turn makes it affordable to model all 
your data.
• Flexible – Hadoop is schema-less, and can absorb any type of data, 
structured or not, from any number of sources. Data from multiple 
sources can be joined and aggregated in arbitrary ways enabling 
deeper analyses than any one system can provide. 
• Reliable – When you lose a node, the system redirects work to 
another location of the data and continues processing without missing 
a beat
Hadoop Ecosystem
HDFS Architecture
• HDFS is designed to store a very large amount of information 
(terabytes or petabytes). This requires spreading the data across a 
large number of machines. 
• HDFS stores data reliably. If individual machines in the cluster fail, 
data is still being available with data redundancy. 
Hadoop Distributed File 
System (HDFS):
• HDFS provides fast, scalable access to the information loaded on the 
clusters. It is possible to serve a larger number of clients by simply 
adding more machines to the cluster. 
• HDFS integrate well with Hadoop MapReduce, allowing data to be 
read and computed upon locally whenever needed. 
• HDFS was originally built as infrastructure for the Apache Nutch 
web search engine project
Hadoop does not require expensive, highly reliable hardware. It is 
designed to run on clusters of commodity hardware, an HDFS instance 
may consist of hundreds or thousands of server machines, each storing 
part of the file system’s data. The fact that there are a huge number of 
components and that each component has a non-trivial probability of 
failure means that some component of HDFS is always non-functional. 
Therefore, detection of faults and quick, automatic recovery from them 
is a core architectural goal of HDFS. 
Commodity Hardware Failure:
Applications that run on HDFS need continuous access to their data 
sets. HDFS is designed more for batch processing rather than interactive 
use by users. The emphasis is on high throughput of data access rather 
than low latency of data access. 
Continuous Data Access:
Applications that run on HDFS have large data sets. A typical file in 
HDFS is gigabytes to terabytes in size. So, HDFS is tuned to support 
large files. 
It is also worth examining the 
applications for which using HDFS 
does not work so well. While this 
may change in the future, these are 
areas where HDFS is not a good fit 
today: 
Very Large Data Files:
• Low-latency data access 
• Lots of small files 
• Multiple writers, arbitrary file modifications
• Pig is an open-source high-level dataflow 
system. 
• It provides a simple language for queries and 
data manipulation Pig Latin, that is compiled 
into MapReduce jobs that are run on Hadoop. 
• Why is it important? 
- Companies like Yahoo, Google and Microsoft 
are collecting vast sets in the form of click 
steams, search logs, and web crawls. 
- Some form of ad-hoc processing and analysis 
of all of this information is required. 
What is Pig
• An ad-hoc way of creating and executing MapReduce jobs on very 
large data sets 
• Rapid Development 
• No Java is required 
• Developed byYahoo! 
Why was Pig created?
• Pig is a data flow language. It is at the top of Hadoop and makes it 
possible to create complex jobs to process large volumes of data 
quicly and efficiently. 
• It will consume any data that you feed it: Structured, semi-structured, 
or unstructured. 
• Pig provides the common data operations (filters, joins, ordering) and 
nested data types (tuple, bags, and maps) which are missing in 
MapReduce. 
• PIG scripts are easier and faster to write than standard Java Hadoop 
jobs and PIG has lot of clever optimizations like multi query 
execution, which can make your complex queries execute quiker. 
Where I should Use PIG
• Hive is a data warehouse infrastructure built 
on top of Hadoop. 
• It facilitates querying large datasets residing 
on a distributed storage. 
• It provides a mechanism to project structure 
on to the data and query the data using a 
SQL-like query language called “HiveQL”. 
What is Hive
• Hive was developed by Facebook and was open sourced in 2008 . 
• Data stored in Hadoop is inaccessible to business users. 
• High level languages like Pig, Cascading etc are geared towards 
developers. 
• SQL is a common language that is known to many. Hive was 
developed to give access to data stored in HadoopJon, translating 
SQL like queries into map reduce jobs. 
Why hive was developed
Hammaga rahmat 
Nahorgi Presentatsiya 
over

Mais conteúdo relacionado

Mais procurados

Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemRajkumar Singh
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברגTaldor Group
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview EMC
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
Presentation on Hadoop Technology
Presentation on Hadoop TechnologyPresentation on Hadoop Technology
Presentation on Hadoop TechnologyOpenDev
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceDerek Chen
 

Mais procurados (20)

Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Hadoop
HadoopHadoop
Hadoop
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Presentation on Hadoop Technology
Presentation on Hadoop TechnologyPresentation on Hadoop Technology
Presentation on Hadoop Technology
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 

Semelhante a Hadoop jon

M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction葵慶 李
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Abdul Nasir
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook AhmedDoukh
 

Semelhante a Hadoop jon (20)

Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Hadoop
HadoopHadoop
Hadoop
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
HDFS
HDFSHDFS
HDFS
 
Hadoop
HadoopHadoop
Hadoop
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 

Último

BDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Film show production powerpoint for site
Film show production powerpoint for siteFilm show production powerpoint for site
Film show production powerpoint for siteAshtonCains
 
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<Health
 
Finance-and-Operations-in-the-Azure-Cloud.pdf
Finance-and-Operations-in-the-Azure-Cloud.pdfFinance-and-Operations-in-the-Azure-Cloud.pdf
Finance-and-Operations-in-the-Azure-Cloud.pdfandersonwille2024
 
Capstone slidedeck for my capstone final edition.pdf
Capstone slidedeck for my capstone final edition.pdfCapstone slidedeck for my capstone final edition.pdf
Capstone slidedeck for my capstone final edition.pdfeliklein8
 
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...Nitya salvi
 
CASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFE
CASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFECASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFE
CASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFECall girl Jaipur
 
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdfSEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdfmacawdigitalseo2023
 
Film show pre-production powerpoint for site
Film show pre-production powerpoint for siteFilm show pre-production powerpoint for site
Film show pre-production powerpoint for siteAshtonCains
 
Film show investigation powerpoint for the site
Film show investigation powerpoint for the siteFilm show investigation powerpoint for the site
Film show investigation powerpoint for the siteAshtonCains
 
Production diary Film the city powerpoint
Production diary Film the city powerpointProduction diary Film the city powerpoint
Production diary Film the city powerpointAshtonCains
 
International Airport Call Girls 🥰 8617370543 Service Offer VIP Hot Model
International Airport Call Girls 🥰 8617370543 Service Offer VIP Hot ModelInternational Airport Call Girls 🥰 8617370543 Service Offer VIP Hot Model
International Airport Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Marketing Plan - Social Media. The Sparks Foundation
Marketing Plan -  Social Media. The Sparks FoundationMarketing Plan -  Social Media. The Sparks Foundation
Marketing Plan - Social Media. The Sparks Foundationsolidgbemi
 
Ignite Your Online Influence: Sociocosmos - Where Social Media Magic Happens
Ignite Your Online Influence: Sociocosmos - Where Social Media Magic HappensIgnite Your Online Influence: Sociocosmos - Where Social Media Magic Happens
Ignite Your Online Influence: Sociocosmos - Where Social Media Magic HappensSocioCosmos
 
Vellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Vellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceVellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Vellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceDamini Dixit
 
College & House wife Call Girls in Paharganj 9634446618 -Best Escort call gi...
College & House wife  Call Girls in Paharganj 9634446618 -Best Escort call gi...College & House wife  Call Girls in Paharganj 9634446618 -Best Escort call gi...
College & House wife Call Girls in Paharganj 9634446618 -Best Escort call gi...Heena Escort Service
 
Capstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdfCapstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdfeliklein8
 
Unlock the power of Instagram with SocioCosmos. Start your journey towards so...
Unlock the power of Instagram with SocioCosmos. Start your journey towards so...Unlock the power of Instagram with SocioCosmos. Start your journey towards so...
Unlock the power of Instagram with SocioCosmos. Start your journey towards so...SocioCosmos
 
Interpreting the brief for the media IDY
Interpreting the brief for the media IDYInterpreting the brief for the media IDY
Interpreting the brief for the media IDYgalaxypingy
 
Capstone slide deck on the TikTok revolution
Capstone slide deck on the TikTok revolutionCapstone slide deck on the TikTok revolution
Capstone slide deck on the TikTok revolutioneliklein8
 

Último (20)

BDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort Service
 
Film show production powerpoint for site
Film show production powerpoint for siteFilm show production powerpoint for site
Film show production powerpoint for site
 
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
 
Finance-and-Operations-in-the-Azure-Cloud.pdf
Finance-and-Operations-in-the-Azure-Cloud.pdfFinance-and-Operations-in-the-Azure-Cloud.pdf
Finance-and-Operations-in-the-Azure-Cloud.pdf
 
Capstone slidedeck for my capstone final edition.pdf
Capstone slidedeck for my capstone final edition.pdfCapstone slidedeck for my capstone final edition.pdf
Capstone slidedeck for my capstone final edition.pdf
 
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
 
CASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFE
CASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFECASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFE
CASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFE
 
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdfSEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
 
Film show pre-production powerpoint for site
Film show pre-production powerpoint for siteFilm show pre-production powerpoint for site
Film show pre-production powerpoint for site
 
Film show investigation powerpoint for the site
Film show investigation powerpoint for the siteFilm show investigation powerpoint for the site
Film show investigation powerpoint for the site
 
Production diary Film the city powerpoint
Production diary Film the city powerpointProduction diary Film the city powerpoint
Production diary Film the city powerpoint
 
International Airport Call Girls 🥰 8617370543 Service Offer VIP Hot Model
International Airport Call Girls 🥰 8617370543 Service Offer VIP Hot ModelInternational Airport Call Girls 🥰 8617370543 Service Offer VIP Hot Model
International Airport Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Marketing Plan - Social Media. The Sparks Foundation
Marketing Plan -  Social Media. The Sparks FoundationMarketing Plan -  Social Media. The Sparks Foundation
Marketing Plan - Social Media. The Sparks Foundation
 
Ignite Your Online Influence: Sociocosmos - Where Social Media Magic Happens
Ignite Your Online Influence: Sociocosmos - Where Social Media Magic HappensIgnite Your Online Influence: Sociocosmos - Where Social Media Magic Happens
Ignite Your Online Influence: Sociocosmos - Where Social Media Magic Happens
 
Vellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Vellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceVellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Vellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
 
College & House wife Call Girls in Paharganj 9634446618 -Best Escort call gi...
College & House wife  Call Girls in Paharganj 9634446618 -Best Escort call gi...College & House wife  Call Girls in Paharganj 9634446618 -Best Escort call gi...
College & House wife Call Girls in Paharganj 9634446618 -Best Escort call gi...
 
Capstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdfCapstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdf
 
Unlock the power of Instagram with SocioCosmos. Start your journey towards so...
Unlock the power of Instagram with SocioCosmos. Start your journey towards so...Unlock the power of Instagram with SocioCosmos. Start your journey towards so...
Unlock the power of Instagram with SocioCosmos. Start your journey towards so...
 
Interpreting the brief for the media IDY
Interpreting the brief for the media IDYInterpreting the brief for the media IDY
Interpreting the brief for the media IDY
 
Capstone slide deck on the TikTok revolution
Capstone slide deck on the TikTok revolutionCapstone slide deck on the TikTok revolution
Capstone slide deck on the TikTok revolution
 

Hadoop jon

  • 1. Hadoop Jon By HumoyunJon Lee
  • 2. 90% OF THE WORLD’S DATA HAS BEEN GENERATED IN THE LAST THREE YEARS ALONE, AND IT IS GROWING AT EVEN A MORE RAPID RATE. BIG DATA The world has been exponential data growth, due to social media, mobility, E-commerce and other factors. • Volume • Variety • Velocity
  • 3. “Big Data is like teenage sex; everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it” Dan Ariely, Duke University
  • 5. To Address This Issue We need HadoopJon
  • 6. A Shared Nothing Network or What is that Hadoop
  • 7. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
  • 8.
  • 9.
  • 10. Prerequisites : • Installing Java v1.5+ • Adding dedicated Hadoop system user. • Configuring SSH access. • Disabling IPv6. Installing HadoopJon
  • 11. Configuring Hadoop : a. hadoop-env.sh b. core-site.xml c. mapred-site.xml d. hdfs-site.xml
  • 12. Hadoop comes with several web interfaces which are by default available at these locations: • http://localhost:50070/ – web UI of the NameNode daemon • http://localhost:50030/ – web UI of the JobTracker daemon • http://localhost:50060/ – web UI of the TaskTracker daemon Hadoop Web Interfaces
  • 13. Reliable Hadoop Features Flexible Economical Scalable Hadoop Key Characteristics:
  • 14. • Scalable – New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top. • Economical – Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data.
  • 15. • Flexible – Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any one system can provide. • Reliable – When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat
  • 18. • HDFS is designed to store a very large amount of information (terabytes or petabytes). This requires spreading the data across a large number of machines. • HDFS stores data reliably. If individual machines in the cluster fail, data is still being available with data redundancy. Hadoop Distributed File System (HDFS):
  • 19. • HDFS provides fast, scalable access to the information loaded on the clusters. It is possible to serve a larger number of clients by simply adding more machines to the cluster. • HDFS integrate well with Hadoop MapReduce, allowing data to be read and computed upon locally whenever needed. • HDFS was originally built as infrastructure for the Apache Nutch web search engine project
  • 20. Hadoop does not require expensive, highly reliable hardware. It is designed to run on clusters of commodity hardware, an HDFS instance may consist of hundreds or thousands of server machines, each storing part of the file system’s data. The fact that there are a huge number of components and that each component has a non-trivial probability of failure means that some component of HDFS is always non-functional. Therefore, detection of faults and quick, automatic recovery from them is a core architectural goal of HDFS. Commodity Hardware Failure:
  • 21. Applications that run on HDFS need continuous access to their data sets. HDFS is designed more for batch processing rather than interactive use by users. The emphasis is on high throughput of data access rather than low latency of data access. Continuous Data Access:
  • 22. Applications that run on HDFS have large data sets. A typical file in HDFS is gigabytes to terabytes in size. So, HDFS is tuned to support large files. It is also worth examining the applications for which using HDFS does not work so well. While this may change in the future, these are areas where HDFS is not a good fit today: Very Large Data Files:
  • 23. • Low-latency data access • Lots of small files • Multiple writers, arbitrary file modifications
  • 24. • Pig is an open-source high-level dataflow system. • It provides a simple language for queries and data manipulation Pig Latin, that is compiled into MapReduce jobs that are run on Hadoop. • Why is it important? - Companies like Yahoo, Google and Microsoft are collecting vast sets in the form of click steams, search logs, and web crawls. - Some form of ad-hoc processing and analysis of all of this information is required. What is Pig
  • 25. • An ad-hoc way of creating and executing MapReduce jobs on very large data sets • Rapid Development • No Java is required • Developed byYahoo! Why was Pig created?
  • 26.
  • 27. • Pig is a data flow language. It is at the top of Hadoop and makes it possible to create complex jobs to process large volumes of data quicly and efficiently. • It will consume any data that you feed it: Structured, semi-structured, or unstructured. • Pig provides the common data operations (filters, joins, ordering) and nested data types (tuple, bags, and maps) which are missing in MapReduce. • PIG scripts are easier and faster to write than standard Java Hadoop jobs and PIG has lot of clever optimizations like multi query execution, which can make your complex queries execute quiker. Where I should Use PIG
  • 28. • Hive is a data warehouse infrastructure built on top of Hadoop. • It facilitates querying large datasets residing on a distributed storage. • It provides a mechanism to project structure on to the data and query the data using a SQL-like query language called “HiveQL”. What is Hive
  • 29. • Hive was developed by Facebook and was open sourced in 2008 . • Data stored in Hadoop is inaccessible to business users. • High level languages like Pig, Cascading etc are geared towards developers. • SQL is a common language that is known to many. Hive was developed to give access to data stored in HadoopJon, translating SQL like queries into map reduce jobs. Why hive was developed
  • 30. Hammaga rahmat Nahorgi Presentatsiya over