THE SOLUTION FOR BIG DATA

T
THE SOLUTION FOR BIG DATA
NAME:SIVAKOTI TARAKA SATYA PHANINDRA
ROLL NO:15K81D5824
COURSE: CSE M.TECH/SEM-1
CONTENT:
Data – Trends in storing data.
BigData – Problems in IT industry
Why BigData ?
Introduction to HADOOP
HDFS (Hadoop Distributed File System)
 MapReduce
Prominent users of Hadoop.
Conclusion
Data – Trends in storing data
What is data--- Any real world symbol (character, numeric,
special character) or a of group of them is said to be data it
may be of the visual or audio or scriptural , images, etc​.,
File system
Databases
Cloud (internet)
BIG DATA:
What is big data—In IT, it is a collection of data sets so
large and complex data that it becomes difficult to process
using on-hand database management tools or traditional
data processing applications.
 As of 2016, limits on the size of data sets that are
feasible to process in reasonable time were on the order
of Exabyte of data.​(KBs MBs GBs TBs PB
ZB )
THE SOLUTION FOR BIG DATA
BIGDATA and problems with it.
 Daily about 0.8 Petabytes of updates are being made
into FACEBOOK including 50 millions photos.​
 Daily, YOUTUBE is loaded with videos that can be watched for one year
continuously​
 Limitations are encountered due to large data sets in many areas, including
meteorology, genomics, complex physics simulations, and biological and
environmental research.
 Also affect Internet search, finance and business informatics.
 The challenges include in capture, retrieval, storage, search, sharing, analysis,
and visualization.​
Why BIG DATA ?
Unstructured DATA growth !
THEN WHAT COULD BE THE SOLUTION
FOR BIGDATA ?
Hadoop’s Developers:
 2005: Doug Cutting and Michael J. Cafarella developed Hadoop to
support distribution for the Nutch search engine project.
 The project was funded by Yahoo.
 2006: Yahoo gave the project to Apache Software Foundation.
Doug Cutting
What is Hadoop?
 It is a open source software written in java
 Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of
computers using simple programming models.
 It is designed to scale up from single servers to thousands of
machines, each offering local computation and storage.
• Apache top level project,
open-source
implementation of
frameworks for reliable,
scalable, distributed
computing and data
storage.
• It is a flexible and highly-
available architecture for
large scale computation and
data processing on a
network of commodity
hardware.
THE SOLUTION FOR BIG DATA
The project includes these modules:
Hadoop Common
Hadoop Distributed File System(HDFS)
Hadoop MapReduce
1.Hadoop Commons
 It provides access to the filesystems supported by Hadoop.
 The Hadoop Common package contains the necessary JAR
files and scripts needed to start Hadoop.
 The package also provides source code, documentation,
and a contribution section which includes projects from
the Hadoop Community (Avro, Cassandra, Chukwa, Hbase,
Hive, Mahout, Pig, ZooKeeper)
2. Hadoop Distributed File System
(HDFS):
 Hadoop uses HDFS, a distributed file system based on GFS (Google
File System), as its shared filesystem.
 HDFS architecture divides files into large chunks (~64MB)
distributed across data servers (this is configurable).
 It has a namenode and datanodes
What does a HDFS contain
 HDFS consists of a global namenodes or namespaces and they are
federated.
 The datanodes are used as common storage for blocks by all the
Namenodes.
 Each datanode registers with all the Namenodes in the cluster.
 Datanodes send periodic heartbeats and block reports and handles
commands from the Namenodes
Structure of Hadoop system:
Master Node :
Name Node
Secondary Name Node
Job Tracker
Slaves :
Data Node
Task Tracker
MASTER NODE:
 Master node
 Keeps track of namespace and metadata about items
 Keeps track of MapReduce jobs in the system
 Hadoop currently configured with centurion064 as the master
node
 Hadoop is locally installed in each system.
 Installed location is in /localtmp/hadoop/hadoop-0.15.3
SLAVE NODES:
 Slave nodes
 Manage blocks of data sent from master node
 In common, these are the chunkservers
 Currently centurion060, centurion064 are the two slave nodes being
used.
 Slave nodes store their data in /localtmp/hadoop/hadoop-dfs (this is
automatically created by the DFS)
 Once you use the DFS, relative paths are from /usr/{your usr id}
THE SOLUTION FOR BIG DATA
Advantages and Limitations of HDFS :
 Reduce traffic on job scheduling.
 File access can be achieved through the native Java or
language of the users' choice (C++, Java, Python, PHP, Ruby,
Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml),
 It cannot be directly mounted by an existing operating
system.
 It should be provided with UNIX or LUNIX system.
3.Hadoop MAPREDUCE SYSTEM:
 The Hadoop MapReduce framework harnesses a cluster of
machines and executes user defined MapReduce jobs across
the nodes in the cluster.
 A MapReduce computation has two phases
 a map phase and
 a reduce phase.
MAP AND REDUCE METHODS USAGE…
Map function
Reduce function
Run this program as a
MapReduce job
WORD COUNT OVER A GIVEN SET OF STRINGS
We 1
love 1
India 1
We 1
Play 1
Tennis 1
Love 1
India 1
We 2
Tennis 1
Play 1
Map Reduce
MAPREDUCE IN WITH NO REDUCE TASKS
MAPREDUCE WITH TWO REDUCE TASKS - AUTOMATIC
PARALLEL EXECUTION IN MAPREDUCE
Shuffle and sort in MapReduce with
multiple reduce tasks
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
Prominent users of HADOOP
 Amazon – 100 nodes
 Facebook – two clusters of 8000 and 3000 nodes
 Adobe – 80 node system
 EBay – 532 node cluster
 yahoo – cluster of about 4500 nodes
 IIIT Hyderabad – 30 node cluster
Trending :Hadoop Job’s
Salaries Tend in Hadoop:
Achievements :
 2008 - Hadoop Wins Terabyte Sort Benchmark (sorted 1 terabyte of data in 209
seconds, compared to previous record of 297 seconds)
 2009 - Avro and Chukwa became new members of Hadoop Framework family
 2010 - Hadoop's Hbase, Hive and Pig subprojects completed, adding more
computational power to Hadoop framework
 2011 - ZooKeeper Completed
 March 2011 - Apache Hadoop takes top prize at Media Guardian Innovation
Award
 2013 - Hadoop 1.1.2 and Hadoop 2.0.3 alpha.
- Ambari, Cassandra, Mahout have been added
Conclusion:
It reduce traffic on capture, storage, search, sharing, analysis, and
visualization.
A huge amount of data could be stored and large computations
could be done in a single compound with full safety and security
at cheap cost.
BIGDATA and BIGDATA-SOLUTIONS is one of the burning issues in
the present IT industry so, work on those will surely make you
more useful to that.
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
1 de 38

Recomendados

Big Data and Hadoop - An Introduction por
Big Data and Hadoop - An IntroductionBig Data and Hadoop - An Introduction
Big Data and Hadoop - An IntroductionNagarjuna Kanamarlapudi
1K visualizações27 slides
Hadoop technology por
Hadoop technologyHadoop technology
Hadoop technologytipanagiriharika
2.9K visualizações85 slides
Introduction to Hadoop Technology por
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop TechnologyManish Borkar
2.4K visualizações25 slides
Hadoop por
HadoopHadoop
HadoopNishant Gandhi
19.5K visualizações17 slides
Hadoop info por
Hadoop infoHadoop info
Hadoop infoNikita Sure
281 visualizações25 slides
Introduction to Hadoop por
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoopjoelcrabb
2.5K visualizações30 slides

Mais conteúdo relacionado

Mais procurados

Hadoop Seminar Report por
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar ReportAtul Kushwaha
3.2K visualizações19 slides
Hadoop Technologies por
Hadoop TechnologiesHadoop Technologies
Hadoop TechnologiesKannappan Sirchabesan
5.8K visualizações10 slides
Big Data and Hadoop por
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
2.6K visualizações42 slides
Introduction to Big Data & Hadoop Architecture - Module 1 por
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
1.3K visualizações14 slides
Hadoop installation, Configuration, and Mapreduce program por
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
1.8K visualizações43 slides
Apache Hadoop por
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
5.1K visualizações34 slides

Mais procurados(20)

Hadoop Seminar Report por Atul Kushwaha
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
Atul Kushwaha3.2K visualizações
Big Data and Hadoop por Flavio Vit
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Flavio Vit2.6K visualizações
Introduction to Big Data & Hadoop Architecture - Module 1 por Rohit Agrawal
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal1.3K visualizações
Hadoop installation, Configuration, and Mapreduce program por Praveen Kumar Donta
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
Praveen Kumar Donta1.8K visualizações
Apache Hadoop por Ajit Koti
Apache HadoopApache Hadoop
Apache Hadoop
Ajit Koti5.1K visualizações
EclipseCon Keynote: Apache Hadoop - An Introduction por Cloudera, Inc.
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
Cloudera, Inc.1.8K visualizações
Introduction to Apache Hadoop Ecosystem por Mahabubur Rahaman
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
Mahabubur Rahaman5.3K visualizações
PPT on Hadoop por Shubham Parmar
PPT on HadoopPPT on Hadoop
PPT on Hadoop
Shubham Parmar23.2K visualizações
Hadoop seminar por KrishnenduKrishh
Hadoop seminarHadoop seminar
Hadoop seminar
KrishnenduKrishh481 visualizações
Introduction to Big Data & Hadoop por Edureka!
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
Edureka!2.4K visualizações
Large Scale Math with Hadoop MapReduce por Hortonworks
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduce
Hortonworks19.4K visualizações
Hadoop basics por Laxmi Rauth
Hadoop basicsHadoop basics
Hadoop basics
Laxmi Rauth65 visualizações
Introduction to Hadoop part1 por Giovanna Roda
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1
Giovanna Roda156 visualizações
HDFS por Steve Loughran
HDFSHDFS
HDFS
Steve Loughran4.4K visualizações
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka por Edureka!
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
Edureka!1.5K visualizações
Seminar_Report_hadoop por Varun Narang
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
Varun Narang12.3K visualizações
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed... por Uwe Printz
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Uwe Printz19.3K visualizações
Big Data and Hadoop Introduction por Dzung Nguyen
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
Dzung Nguyen801 visualizações
Apache hadoop introduction and architecture por Harikrishnan K
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
Harikrishnan K131 visualizações

Destaque

EC_NL_2015_08 por
EC_NL_2015_08EC_NL_2015_08
EC_NL_2015_08Nathan Nelson
324 visualizações9 slides
Comparision of biogeography of microbial mapping por
Comparision of biogeography of microbial mappingComparision of biogeography of microbial mapping
Comparision of biogeography of microbial mappingpriyanka kandasamy
364 visualizações6 slides
літературно мистецький жовтень2015 por
літературно мистецький жовтень2015літературно мистецький жовтень2015
літературно мистецький жовтень2015Юлия Тер-Давлатян
1.2K visualizações13 slides
літературно мистецький календар травень 2015 por
літературно мистецький календар травень 2015літературно мистецький календар травень 2015
літературно мистецький календар травень 2015Юлия Тер-Давлатян
1.1K visualizações13 slides
Digipak Analysis por
Digipak AnalysisDigipak Analysis
Digipak AnalysisCamstewart17
173 visualizações5 slides
Curriculum Vitae - Nenko Todorov por
Curriculum Vitae - Nenko TodorovCurriculum Vitae - Nenko Todorov
Curriculum Vitae - Nenko TodorovNenko Todorov
462 visualizações3 slides

Destaque(20)

EC_NL_2015_08 por Nathan Nelson
EC_NL_2015_08EC_NL_2015_08
EC_NL_2015_08
Nathan Nelson324 visualizações
Comparision of biogeography of microbial mapping por priyanka kandasamy
Comparision of biogeography of microbial mappingComparision of biogeography of microbial mapping
Comparision of biogeography of microbial mapping
priyanka kandasamy364 visualizações
літературно мистецький календар травень 2015 por Юлия Тер-Давлатян
літературно мистецький календар травень 2015літературно мистецький календар травень 2015
літературно мистецький календар травень 2015
Юлия Тер-Давлатян1.1K visualizações
Digipak Analysis por Camstewart17
Digipak AnalysisDigipak Analysis
Digipak Analysis
Camstewart17173 visualizações
Curriculum Vitae - Nenko Todorov por Nenko Todorov
Curriculum Vitae - Nenko TodorovCurriculum Vitae - Nenko Todorov
Curriculum Vitae - Nenko Todorov
Nenko Todorov462 visualizações
літературно мистецький календар грудень 2015 por Юлия Тер-Давлатян
літературно мистецький календар грудень 2015літературно мистецький календар грудень 2015
літературно мистецький календар грудень 2015
Юлия Тер-Давлатян2.8K visualizações
Pollution control por priyanka kandasamy
Pollution controlPollution control
Pollution control
priyanka kandasamy385 visualizações
Evaluation questions 1 por Camstewart17
Evaluation questions 1Evaluation questions 1
Evaluation questions 1
Camstewart17129 visualizações
Swapnil Bhavsar - Resume por swapnil bhavsar
Swapnil Bhavsar - ResumeSwapnil Bhavsar - Resume
Swapnil Bhavsar - Resume
swapnil bhavsar336 visualizações
Літературно-мистецький календар ЛИСТОПАД 2016 por Юлия Тер-Давлатян
Літературно-мистецький календар ЛИСТОПАД 2016Літературно-мистецький календар ЛИСТОПАД 2016
Літературно-мистецький календар ЛИСТОПАД 2016
Юлия Тер-Давлатян1.5K visualizações
Actual faerie powerpoint por julianajuliejuju
Actual faerie powerpointActual faerie powerpoint
Actual faerie powerpoint
julianajuliejuju618 visualizações
Brynn Kardash Resume May 2016 por Kardash Brynn
Brynn Kardash Resume May 2016Brynn Kardash Resume May 2016
Brynn Kardash Resume May 2016
Kardash Brynn271 visualizações
Bai 1 lam quen voi sql 2008 por Phương Nhung
Bai 1 lam quen voi sql 2008Bai 1 lam quen voi sql 2008
Bai 1 lam quen voi sql 2008
Phương Nhung292 visualizações
информация по тренингу por blondik1289
информация по тренингуинформация по тренингу
информация по тренингу
blondik1289336 visualizações
Hum07 heritage s lideshow c and d - johnny por Huawaii
Hum07   heritage s lideshow c and d - johnnyHum07   heritage s lideshow c and d - johnny
Hum07 heritage s lideshow c and d - johnny
Huawaii135 visualizações
The-Path-to-2016-Success por Matt Robbins
The-Path-to-2016-SuccessThe-Path-to-2016-Success
The-Path-to-2016-Success
Matt Robbins204 visualizações
E learning por Mohamed Thiam
E learningE learning
E learning
Mohamed Thiam189 visualizações
金融监管框架的改革国际经验和中国的选择 por Beixiao(Robert) Liu
金融监管框架的改革国际经验和中国的选择金融监管框架的改革国际经验和中国的选择
金融监管框架的改革国际经验和中国的选择
Beixiao(Robert) Liu344 visualizações

Similar a THE SOLUTION FOR BIG DATA

Big data Analytics Hadoop por
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics HadoopMishika Bharadwaj
3.3K visualizações59 slides
Unit IV.pdf por
Unit IV.pdfUnit IV.pdf
Unit IV.pdfKennyPratheepKumar
2 visualizações21 slides
Hadoop and BigData - July 2016 por
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
1.2K visualizações53 slides
hadoop por
hadoophadoop
hadoopswatic018
96 visualizações29 slides
hadoop por
hadoophadoop
hadoopswatic018
169 visualizações29 slides
Anju por
AnjuAnju
AnjuAnju Shekhawat
417 visualizações23 slides

Similar a THE SOLUTION FOR BIG DATA(20)

Big data Analytics Hadoop por Mishika Bharadwaj
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
Mishika Bharadwaj3.3K visualizações
Hadoop and BigData - July 2016 por Ranjith Sekar
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
Ranjith Sekar1.2K visualizações
hadoop por swatic018
hadoophadoop
hadoop
swatic01896 visualizações
hadoop por swatic018
hadoophadoop
hadoop
swatic018169 visualizações
Anju por Anju Shekhawat
AnjuAnju
Anju
Anju Shekhawat417 visualizações
Bigdata and hadoop por Aditi Yadav
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoop
Aditi Yadav89 visualizações
BIG DATA: Apache Hadoop por Oleksiy Krotov
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
Oleksiy Krotov494 visualizações
Seminar ppt por RajatTripathi34
Seminar pptSeminar ppt
Seminar ppt
RajatTripathi3449 visualizações
project report on hadoop por Manoj Jangalva
project report on hadoopproject report on hadoop
project report on hadoop
Manoj Jangalva476 visualizações
Hadoop Big Data A big picture por J S Jodha
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
J S Jodha727 visualizações
Introduction to Hadoop and Hadoop component por rebeccatho
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho2.7K visualizações
Hadoop por Zubair Arshad
HadoopHadoop
Hadoop
Zubair Arshad196 visualizações
Distributed Systems Hadoop.pptx por AlAmin638189
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
AlAmin6381896 visualizações
Overview of big data & hadoop v1 por Thanh Nguyen
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
Thanh Nguyen1K visualizações
Cppt Hadoop por chunkypandey12
Cppt HadoopCppt Hadoop
Cppt Hadoop
chunkypandey1227 visualizações
Cppt por chunkypandey12
CpptCppt
Cppt
chunkypandey12126 visualizações
Cppt por chunkypandey12
CpptCppt
Cppt
chunkypandey12147 visualizações
Big data por Abilash Mavila
Big dataBig data
Big data
Abilash Mavila174 visualizações
Overview of Big data, Hadoop and Microsoft BI - version1 por Thanh Nguyen
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen2.4K visualizações

Último

PB CV v0.3 por
PB CV v0.3PB CV v0.3
PB CV v0.3Pedro Borracha
15 visualizações16 slides
unmasking toxicity in online gaming por
unmasking toxicity in online gamingunmasking toxicity in online gaming
unmasking toxicity in online gamingaminabumelha
5 visualizações10 slides
Deafening Silence por
Deafening SilenceDeafening Silence
Deafening SilenceSarah Carpino
40 visualizações14 slides
Competition and Professional Sports – OECD – December 2023 OECD discussion por
Competition and Professional Sports – OECD – December 2023 OECD discussionCompetition and Professional Sports – OECD – December 2023 OECD discussion
Competition and Professional Sports – OECD – December 2023 OECD discussionOECD Directorate for Financial and Enterprise Affairs
278 visualizações3 slides
Competition and Professional Sports – BUDZINSKI – December 2023 OECD discussion por
Competition and Professional Sports – BUDZINSKI – December 2023 OECD discussionCompetition and Professional Sports – BUDZINSKI – December 2023 OECD discussion
Competition and Professional Sports – BUDZINSKI – December 2023 OECD discussionOECD Directorate for Financial and Enterprise Affairs
268 visualizações6 slides
Timeahead Agency Pitch Deck.pdf por
Timeahead Agency Pitch Deck.pdfTimeahead Agency Pitch Deck.pdf
Timeahead Agency Pitch Deck.pdfHabib-ur- Rehman
23 visualizações13 slides

Último(20)

PB CV v0.3 por Pedro Borracha
PB CV v0.3PB CV v0.3
PB CV v0.3
Pedro Borracha15 visualizações
unmasking toxicity in online gaming por aminabumelha
unmasking toxicity in online gamingunmasking toxicity in online gaming
unmasking toxicity in online gaming
aminabumelha5 visualizações
Deafening Silence por Sarah Carpino
Deafening SilenceDeafening Silence
Deafening Silence
Sarah Carpino40 visualizações
Timeahead Agency Pitch Deck.pdf por Habib-ur- Rehman
Timeahead Agency Pitch Deck.pdfTimeahead Agency Pitch Deck.pdf
Timeahead Agency Pitch Deck.pdf
Habib-ur- Rehman23 visualizações
Cafeteria-Blog 41. por Hollywood Actress
Cafeteria-Blog 41.Cafeteria-Blog 41.
Cafeteria-Blog 41.
Hollywood Actress5 visualizações
ERGONOMIC RISK ASSESSMENT (ERA).pptx por j967z4hcnp
ERGONOMIC RISK ASSESSMENT (ERA).pptxERGONOMIC RISK ASSESSMENT (ERA).pptx
ERGONOMIC RISK ASSESSMENT (ERA).pptx
j967z4hcnp8 visualizações
Consolidated Career Maps (1).pdf por vishankchauhan1
Consolidated Career Maps (1).pdfConsolidated Career Maps (1).pdf
Consolidated Career Maps (1).pdf
vishankchauhan112 visualizações
PPS.pptx por mdabzayub
PPS.pptxPPS.pptx
PPS.pptx
mdabzayub6 visualizações
PRESENTATION.pptx por yunuskhan558800
PRESENTATION.pptxPRESENTATION.pptx
PRESENTATION.pptx
yunuskhan5588006 visualizações
What I learnt in Antarctica about leadership, well-being and climate change por kristinashields1
What I learnt in Antarctica about leadership, well-being and climate changeWhat I learnt in Antarctica about leadership, well-being and climate change
What I learnt in Antarctica about leadership, well-being and climate change
kristinashields123 visualizações
ORAL PRESENTATION por alghalakhalid15
ORAL PRESENTATIONORAL PRESENTATION
ORAL PRESENTATION
alghalakhalid155 visualizações

THE SOLUTION FOR BIG DATA

  • 1. THE SOLUTION FOR BIG DATA NAME:SIVAKOTI TARAKA SATYA PHANINDRA ROLL NO:15K81D5824 COURSE: CSE M.TECH/SEM-1
  • 2. CONTENT: Data – Trends in storing data. BigData – Problems in IT industry Why BigData ? Introduction to HADOOP HDFS (Hadoop Distributed File System)  MapReduce Prominent users of Hadoop. Conclusion
  • 3. Data – Trends in storing data What is data--- Any real world symbol (character, numeric, special character) or a of group of them is said to be data it may be of the visual or audio or scriptural , images, etc​., File system Databases Cloud (internet)
  • 4. BIG DATA: What is big data—In IT, it is a collection of data sets so large and complex data that it becomes difficult to process using on-hand database management tools or traditional data processing applications.  As of 2016, limits on the size of data sets that are feasible to process in reasonable time were on the order of Exabyte of data.​(KBs MBs GBs TBs PB ZB )
  • 6. BIGDATA and problems with it.  Daily about 0.8 Petabytes of updates are being made into FACEBOOK including 50 millions photos.​  Daily, YOUTUBE is loaded with videos that can be watched for one year continuously​  Limitations are encountered due to large data sets in many areas, including meteorology, genomics, complex physics simulations, and biological and environmental research.  Also affect Internet search, finance and business informatics.  The challenges include in capture, retrieval, storage, search, sharing, analysis, and visualization.​
  • 9. THEN WHAT COULD BE THE SOLUTION FOR BIGDATA ?
  • 10. Hadoop’s Developers:  2005: Doug Cutting and Michael J. Cafarella developed Hadoop to support distribution for the Nutch search engine project.  The project was funded by Yahoo.  2006: Yahoo gave the project to Apache Software Foundation. Doug Cutting
  • 11. What is Hadoop?  It is a open source software written in java  Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.  It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
  • 12. • Apache top level project, open-source implementation of frameworks for reliable, scalable, distributed computing and data storage. • It is a flexible and highly- available architecture for large scale computation and data processing on a network of commodity hardware.
  • 14. The project includes these modules: Hadoop Common Hadoop Distributed File System(HDFS) Hadoop MapReduce
  • 15. 1.Hadoop Commons  It provides access to the filesystems supported by Hadoop.  The Hadoop Common package contains the necessary JAR files and scripts needed to start Hadoop.  The package also provides source code, documentation, and a contribution section which includes projects from the Hadoop Community (Avro, Cassandra, Chukwa, Hbase, Hive, Mahout, Pig, ZooKeeper)
  • 16. 2. Hadoop Distributed File System (HDFS):  Hadoop uses HDFS, a distributed file system based on GFS (Google File System), as its shared filesystem.  HDFS architecture divides files into large chunks (~64MB) distributed across data servers (this is configurable).  It has a namenode and datanodes
  • 17. What does a HDFS contain  HDFS consists of a global namenodes or namespaces and they are federated.  The datanodes are used as common storage for blocks by all the Namenodes.  Each datanode registers with all the Namenodes in the cluster.  Datanodes send periodic heartbeats and block reports and handles commands from the Namenodes
  • 18. Structure of Hadoop system: Master Node : Name Node Secondary Name Node Job Tracker Slaves : Data Node Task Tracker
  • 19. MASTER NODE:  Master node  Keeps track of namespace and metadata about items  Keeps track of MapReduce jobs in the system  Hadoop currently configured with centurion064 as the master node  Hadoop is locally installed in each system.  Installed location is in /localtmp/hadoop/hadoop-0.15.3
  • 20. SLAVE NODES:  Slave nodes  Manage blocks of data sent from master node  In common, these are the chunkservers  Currently centurion060, centurion064 are the two slave nodes being used.  Slave nodes store their data in /localtmp/hadoop/hadoop-dfs (this is automatically created by the DFS)  Once you use the DFS, relative paths are from /usr/{your usr id}
  • 22. Advantages and Limitations of HDFS :  Reduce traffic on job scheduling.  File access can be achieved through the native Java or language of the users' choice (C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml),  It cannot be directly mounted by an existing operating system.  It should be provided with UNIX or LUNIX system.
  • 23. 3.Hadoop MAPREDUCE SYSTEM:  The Hadoop MapReduce framework harnesses a cluster of machines and executes user defined MapReduce jobs across the nodes in the cluster.  A MapReduce computation has two phases  a map phase and  a reduce phase.
  • 24. MAP AND REDUCE METHODS USAGE… Map function Reduce function Run this program as a MapReduce job
  • 25. WORD COUNT OVER A GIVEN SET OF STRINGS We 1 love 1 India 1 We 1 Play 1 Tennis 1 Love 1 India 1 We 2 Tennis 1 Play 1 Map Reduce
  • 26. MAPREDUCE IN WITH NO REDUCE TASKS
  • 27. MAPREDUCE WITH TWO REDUCE TASKS - AUTOMATIC PARALLEL EXECUTION IN MAPREDUCE
  • 28. Shuffle and sort in MapReduce with multiple reduce tasks
  • 32. Prominent users of HADOOP  Amazon – 100 nodes  Facebook – two clusters of 8000 and 3000 nodes  Adobe – 80 node system  EBay – 532 node cluster  yahoo – cluster of about 4500 nodes  IIIT Hyderabad – 30 node cluster
  • 34. Salaries Tend in Hadoop:
  • 35. Achievements :  2008 - Hadoop Wins Terabyte Sort Benchmark (sorted 1 terabyte of data in 209 seconds, compared to previous record of 297 seconds)  2009 - Avro and Chukwa became new members of Hadoop Framework family  2010 - Hadoop's Hbase, Hive and Pig subprojects completed, adding more computational power to Hadoop framework  2011 - ZooKeeper Completed  March 2011 - Apache Hadoop takes top prize at Media Guardian Innovation Award  2013 - Hadoop 1.1.2 and Hadoop 2.0.3 alpha. - Ambari, Cassandra, Mahout have been added
  • 36. Conclusion: It reduce traffic on capture, storage, search, sharing, analysis, and visualization. A huge amount of data could be stored and large computations could be done in a single compound with full safety and security at cheap cost. BIGDATA and BIGDATA-SOLUTIONS is one of the burning issues in the present IT industry so, work on those will surely make you more useful to that.