SlideShare a Scribd company logo
1 of 24
P.SAMPATH BHARGAV
16311A05N1
CSE-D
AND
CONTENTS
Big data
 WHAT IS BIG DATA
 VOLUME
 VELOCITY
 VARIETY
 SOURCES OF BIG DATA
 CHALLENGES WITH BIG DATA
 TECHNOLOGIES TO MEET BIG DATA
Hadoop
 HISTORY OF HADOOP
 BEFORE HADOOP
 ARCHITECTURE
 COMPANIES WHICH USE HADOOP
 BIG DATA JOB ROLES
NAME SYMBOL VALUE
KILOBYTE KB 10^3
MEGABYTE MB 10^6
GIGABYTE GB 10^9
TERABYTE TB 10^12
PETABYTE PB 10^15
EXABYTE EB 10^18
ZETTABYTE ZB 10^21
YOTTABYTES YB 10^24
WHAT IS “BIG DATA”?
 “BIG DATA” is data that is big in
volume
velocity and
Variety
“TODAY’S BIG MAY BE TOMMOROW’S NORMAL”
3 V’S
VELOCITY
VARIETY
 Varieties deals with a wide range of data types
Structured data - RDMS
Semi – structured data – HTML,XML
Unstructured data – audios, videos, emails, photos,
pdf, social media
SOURCES OF “BIG DATA”
 Social media
 Machine log data
 Public web
 Docs
 Business apps
 Data storage
CHALLENGES WITH “BIG DATA”
CAPTURE
STORAGE
CURATION
SEARCH
ANALYSIS
TRANSFER
VISUALIZATION
PRIVACY VIOLATIONS
WHAT KIND OF TECHNOLOGIES NEEDED TO
MEET CHALLENGES POSED BY “BIG DATA”
 Cheap and abundant storage
 Faster processors to help with quicker processing of
big data
 Affordable open-source, distributed big data
platforms, such as “hadoop”
 Cloud computing and other flexible resource
allocation arrangement
HISTORY OF HADOOP
 It was created by DOUG CUTTING and MICHEAL
CAFARELLA in 2005
 2003 – NUTCH open source search engine( lucene
,sphinx ,etc…)
 (google published some papers mentioning about
DFS and MAP REDUCE)
 After yahoo took this initiative step
 Then the creation of hadoop took place
 Hadoop 0.1.0 was relesed april 2006
 As of now hadoop 2.8 is available
BEFORE HADOOP
 Suppose you are having 100tb of data in a data
center
 And one time you want to retrieve some 2tb of data
and you wrote a code to do that let us say a 100kb of
code
 To get that out you should get that data out to your
system and do that you supposed to do…
 i.e where ever you should run that program, to that
system you should fetch that data
“COMPUTATION IS ALWAYS PROCESSOR BORN”
ARCHITECTURE
 Hadoop is a collection of several tools….
MAP REDUCE
FILE SYSTEM
(HDFS)
PROJECTS
Contd…
 HDFS – (hadoop distributed file system)
a distributed file system that stores data on
commodity machines ,providing very high aggregate
bandwidth across the cluster(storing)
 MAP REDUCE – a system for parllel processing of
large data sets(processing)
Contd…
 HDFS - name node
secondary name node
job tracker
data node
task tracker
Master node
Slave nodes
PROCESS…
 File -> name node -> division into blocks ->
replication of blocks by three times ->addressing
that each replicated blocks in the name node
 Suppose if any error occurred with the hardware
then that information is let to know to name node
and set the number of the data replicated constant
By again replicating to set the number as three
 And mentioning the address of the node to the name
node so that there is no error in processing
FILE
64MB
64MB
JOB TRACKER
COMPANIES WHICH USE “HADOOP”
“BIG DATA” JOB ROLES
CHIEF DATA OFFICER
BIG DATA SCIENTIST
BIG DATA ANALYST
BIG DATA VISUALIZER
BIG DATA MANAGER
BIG DATA SOLUTIONS AECHITECT
BIG DATA ENGINEER
BIG DATA RESEARCHER
BIG DATA CONSULTANT
Big data

More Related Content

What's hot

Cred_hadoop_presenatation
Cred_hadoop_presenatationCred_hadoop_presenatation
Cred_hadoop_presenatation
Ashish Saraf
 

What's hot (20)

Big data computing
Big data computingBig data computing
Big data computing
 
simple introduction to hadoop
simple introduction to hadoopsimple introduction to hadoop
simple introduction to hadoop
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Rebot Project Contents and Description
Rebot Project Contents and DescriptionRebot Project Contents and Description
Rebot Project Contents and Description
 
Big data présentation
Big data présentationBig data présentation
Big data présentation
 
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
 
Big Data Summer training presentation
Big Data Summer training presentationBig Data Summer training presentation
Big Data Summer training presentation
 
Big data presentation
Big data presentationBig data presentation
Big data presentation
 
Introduction to SARA's Hadoop Hackathon - dec 7th 2010
Introduction to SARA's Hadoop Hackathon - dec 7th 2010Introduction to SARA's Hadoop Hackathon - dec 7th 2010
Introduction to SARA's Hadoop Hackathon - dec 7th 2010
 
JOSA TechTalks - Big Data on Hadoop
JOSA TechTalks - Big Data on HadoopJOSA TechTalks - Big Data on Hadoop
JOSA TechTalks - Big Data on Hadoop
 
MongoDB and Hadoop Handling for Big Data
MongoDB and Hadoop Handling for Big DataMongoDB and Hadoop Handling for Big Data
MongoDB and Hadoop Handling for Big Data
 
Big data solution capacity planning
Big data solution capacity planningBig data solution capacity planning
Big data solution capacity planning
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Microsoft on Big Data
Microsoft on Big DataMicrosoft on Big Data
Microsoft on Big Data
 
Bigdata & Hadoop
Bigdata & HadoopBigdata & Hadoop
Bigdata & Hadoop
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To Hadoop
 
Sf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseSf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBase
 
Intro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceIntro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and Mapreduce
 
Cred_hadoop_presenatation
Cred_hadoop_presenatationCred_hadoop_presenatation
Cred_hadoop_presenatation
 
Hadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyHadoop 101 - Big Data Technology
Hadoop 101 - Big Data Technology
 

Similar to Big data

Large Scale Data With Hadoop
Large Scale Data With HadoopLarge Scale Data With Hadoop
Large Scale Data With Hadoop
guest27e6764
 

Similar to Big data (20)

PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and Hadoop
 
An introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAn introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoop
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
 
Big data & Hadoop & How we use it at Alchetron
Big data & Hadoop & How we use it at AlchetronBig data & Hadoop & How we use it at Alchetron
Big data & Hadoop & How we use it at Alchetron
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Large Scale Data With Hadoop
Large Scale Data With HadoopLarge Scale Data With Hadoop
Large Scale Data With Hadoop
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Big data

  • 2. CONTENTS Big data  WHAT IS BIG DATA  VOLUME  VELOCITY  VARIETY  SOURCES OF BIG DATA  CHALLENGES WITH BIG DATA  TECHNOLOGIES TO MEET BIG DATA Hadoop  HISTORY OF HADOOP  BEFORE HADOOP  ARCHITECTURE  COMPANIES WHICH USE HADOOP  BIG DATA JOB ROLES
  • 3. NAME SYMBOL VALUE KILOBYTE KB 10^3 MEGABYTE MB 10^6 GIGABYTE GB 10^9 TERABYTE TB 10^12 PETABYTE PB 10^15 EXABYTE EB 10^18 ZETTABYTE ZB 10^21 YOTTABYTES YB 10^24
  • 4. WHAT IS “BIG DATA”?  “BIG DATA” is data that is big in volume velocity and Variety “TODAY’S BIG MAY BE TOMMOROW’S NORMAL” 3 V’S
  • 6. VARIETY  Varieties deals with a wide range of data types Structured data - RDMS Semi – structured data – HTML,XML Unstructured data – audios, videos, emails, photos, pdf, social media
  • 7. SOURCES OF “BIG DATA”  Social media  Machine log data  Public web  Docs  Business apps  Data storage
  • 8. CHALLENGES WITH “BIG DATA” CAPTURE STORAGE CURATION SEARCH ANALYSIS TRANSFER VISUALIZATION PRIVACY VIOLATIONS
  • 9. WHAT KIND OF TECHNOLOGIES NEEDED TO MEET CHALLENGES POSED BY “BIG DATA”  Cheap and abundant storage  Faster processors to help with quicker processing of big data  Affordable open-source, distributed big data platforms, such as “hadoop”  Cloud computing and other flexible resource allocation arrangement
  • 10.
  • 11.
  • 12. HISTORY OF HADOOP  It was created by DOUG CUTTING and MICHEAL CAFARELLA in 2005  2003 – NUTCH open source search engine( lucene ,sphinx ,etc…)  (google published some papers mentioning about DFS and MAP REDUCE)  After yahoo took this initiative step  Then the creation of hadoop took place  Hadoop 0.1.0 was relesed april 2006  As of now hadoop 2.8 is available
  • 13.
  • 14.
  • 15. BEFORE HADOOP  Suppose you are having 100tb of data in a data center  And one time you want to retrieve some 2tb of data and you wrote a code to do that let us say a 100kb of code  To get that out you should get that data out to your system and do that you supposed to do…  i.e where ever you should run that program, to that system you should fetch that data “COMPUTATION IS ALWAYS PROCESSOR BORN”
  • 16. ARCHITECTURE  Hadoop is a collection of several tools…. MAP REDUCE FILE SYSTEM (HDFS) PROJECTS
  • 17. Contd…  HDFS – (hadoop distributed file system) a distributed file system that stores data on commodity machines ,providing very high aggregate bandwidth across the cluster(storing)  MAP REDUCE – a system for parllel processing of large data sets(processing)
  • 18. Contd…  HDFS - name node secondary name node job tracker data node task tracker Master node Slave nodes
  • 19.
  • 20. PROCESS…  File -> name node -> division into blocks -> replication of blocks by three times ->addressing that each replicated blocks in the name node  Suppose if any error occurred with the hardware then that information is let to know to name node and set the number of the data replicated constant By again replicating to set the number as three  And mentioning the address of the node to the name node so that there is no error in processing
  • 22. COMPANIES WHICH USE “HADOOP”
  • 23. “BIG DATA” JOB ROLES CHIEF DATA OFFICER BIG DATA SCIENTIST BIG DATA ANALYST BIG DATA VISUALIZER BIG DATA MANAGER BIG DATA SOLUTIONS AECHITECT BIG DATA ENGINEER BIG DATA RESEARCHER BIG DATA CONSULTANT