SlideShare uma empresa Scribd logo
1 de 12
What is Big Data..??
Big Data is the collection of the large and complex
amount of data that it becomes difficult for traditional
database processing applications to store and
process it.
From what point onwards
big data starts..?
Assumption: After certain size, data is
said to be big data else it is small data. But, this is
not the case.
Company A
1 TB of data Client C
2 TB of data
Client B
500 GB of data
I will process your
data as my system’s
processing capability
is up to 2 TB
(terabytes)
Sorry, I cannot
process your data
because my system’s
capability for data
processing is up to
500GB only
Hey, I want
to process my
1 TB data
Company A
1 TB of data
Client C
2 TB of data
Client B
500 GB of data
Big Data can start from anywhere. It depends
upon the capability of the organization.
Data
Is
Big Data
Unable to
handle the
processing
request of
data with
size more
than 500 GB
Classification of Big Data
Big Data is classified into the concept of 5 V’s which
are helpful in determining which type of data will be
difficult for us to process and which not.
Following 5 V’s are:
 Volume
 Variety
 Velocity
 Veracity
 Value
Let us understand them one by one.
Volume
Volume refers to the amount of data
Let us understand this with a simple scenario.
At any social media platform, say Facebook, there are 5
million users. So, these users exchange pictures, share
videos, send or post messages hence generating
terabytes or petabytes of data.
With time, the number of users is expected to increase
and hence amount of data it will generate will be very
large.
Large amount of data results in the creation of large
files.
Variety
Variety of data is different types of data that is being
generated from various sources
Data can be:
• Structured Data is a type of data that is stored in the
form of any record or file. It is easy to queried, or
analysed e.g. tables
• Semi-structured Data is a type of data that is not
stored in any kind of repository like RDBMS. Rather, it
contains data that has information associated with it
e.g. XML document, Log files
• Unstructured Data is a type of data that is not
organized into any format. They can be accessed
easily e.g. photos, videos.
Velocity
Velocity refers to the speed of processing of data
It basically keeps the record of number of users per unit
of time.
More number of users ultimately results in the
generation of large amount of data thereby affecting the
speed to process the data.
Veracity
As we know that data collected from various sources
will have lots of inconsistencies and uncertainties. So, it
is obvious that when you will extract useful information
from such big amount of data, then on dumping
remaining data, there will be some data packages that
are bound to loose in the process.
What we have to do is, we have to fill in the gaps and
again mine it and process it to achieve desired goals.
Value
Value of data is meaningful information
As the amount of data is increasing with time so the
bigger problem arises which is, how to extract useful
data from this large amount of data.
What we have to do first is, we have to extract
meaningful data from the collection of data and then
some analytics has to be performed over the extracted
data.
The result obtained after analysis should be of some
value.
Extracting value out of big amount of data is itself a
challenge.
Sources of Big Data
Some of the sources of Big Data are:
• Users
• Systems
• Applications and Sensors
• Social Media
• Small-scale, mid-scale and large-scale Industries and
so on
These sources are generating large and large amount of
data with varying speeds and also with varying formats
of data. All these factors are creating challenges for
traditional database systems and hence giving the term
‘BIG DATA’
Problems with Big Data
• Storing exponentially huge datasets
• Processing the data with complex structures i.e. data
can be structured, semi-structured or unstructured
• Speed of data processing
In other words, we can conclude that big data problem
arises on the basis of 3 prime factors i.e. VOLUME,
VARIETY and VELOCITY
Solution..?
APACHE
HADOOP

Mais conteúdo relacionado

Mais procurados

Class 1 - Introduction to Big data.pptx
Class 1 - Introduction to Big data.pptxClass 1 - Introduction to Big data.pptx
Class 1 - Introduction to Big data.pptx
tejayasam
 

Mais procurados (20)

What is Big Data ?
What is Big Data ?What is Big Data ?
What is Big Data ?
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introduction
 
What is big data
What is big dataWhat is big data
What is big data
 
Big Data Tutorial V4
Big Data Tutorial V4Big Data Tutorial V4
Big Data Tutorial V4
 
Beekman5 std ppt_08
Beekman5 std ppt_08Beekman5 std ppt_08
Beekman5 std ppt_08
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data
 
Protection of big data privacy
Protection of big data privacyProtection of big data privacy
Protection of big data privacy
 
U0 vqmtq3m tc=
U0 vqmtq3m tc=U0 vqmtq3m tc=
U0 vqmtq3m tc=
 
Big data
Big dataBig data
Big data
 
Class 1 - Introduction to Big data.pptx
Class 1 - Introduction to Big data.pptxClass 1 - Introduction to Big data.pptx
Class 1 - Introduction to Big data.pptx
 
Datamining
DataminingDatamining
Datamining
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Big Data
Big DataBig Data
Big Data
 
Big Data
Big DataBig Data
Big Data
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Big data and hadoop ecosystem essentials for managers
Big data and hadoop ecosystem essentials for managersBig data and hadoop ecosystem essentials for managers
Big data and hadoop ecosystem essentials for managers
 
Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and Issues
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 

Semelhante a Video 1 big data

Big Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptxBig Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptx
varun453331
 
Big Data in Practice.pdf
Big Data in Practice.pdfBig Data in Practice.pdf
Big Data in Practice.pdf
Tom Tan
 

Semelhante a Video 1 big data (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big Data
 
Big Data
Big DataBig Data
Big Data
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
 
Big data
Big dataBig data
Big data
 
Big Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptxBig Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptx
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Big data explanation with real time use case
 Big data explanation with real time use case Big data explanation with real time use case
Big data explanation with real time use case
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
Big Data Presentation
Big Data PresentationBig Data Presentation
Big Data Presentation
 
All About Big Data
All About Big Data All About Big Data
All About Big Data
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big Data in Practice.pdf
Big Data in Practice.pdfBig Data in Practice.pdf
Big Data in Practice.pdf
 

Último

Último (20)

SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 

Video 1 big data

  • 1. What is Big Data..?? Big Data is the collection of the large and complex amount of data that it becomes difficult for traditional database processing applications to store and process it. From what point onwards big data starts..? Assumption: After certain size, data is said to be big data else it is small data. But, this is not the case.
  • 2. Company A 1 TB of data Client C 2 TB of data Client B 500 GB of data I will process your data as my system’s processing capability is up to 2 TB (terabytes) Sorry, I cannot process your data because my system’s capability for data processing is up to 500GB only Hey, I want to process my 1 TB data
  • 3. Company A 1 TB of data Client C 2 TB of data Client B 500 GB of data Big Data can start from anywhere. It depends upon the capability of the organization. Data Is Big Data Unable to handle the processing request of data with size more than 500 GB
  • 4. Classification of Big Data Big Data is classified into the concept of 5 V’s which are helpful in determining which type of data will be difficult for us to process and which not. Following 5 V’s are:  Volume  Variety  Velocity  Veracity  Value Let us understand them one by one.
  • 5. Volume Volume refers to the amount of data Let us understand this with a simple scenario. At any social media platform, say Facebook, there are 5 million users. So, these users exchange pictures, share videos, send or post messages hence generating terabytes or petabytes of data. With time, the number of users is expected to increase and hence amount of data it will generate will be very large. Large amount of data results in the creation of large files.
  • 6. Variety Variety of data is different types of data that is being generated from various sources Data can be: • Structured Data is a type of data that is stored in the form of any record or file. It is easy to queried, or analysed e.g. tables • Semi-structured Data is a type of data that is not stored in any kind of repository like RDBMS. Rather, it contains data that has information associated with it e.g. XML document, Log files • Unstructured Data is a type of data that is not organized into any format. They can be accessed easily e.g. photos, videos.
  • 7. Velocity Velocity refers to the speed of processing of data It basically keeps the record of number of users per unit of time. More number of users ultimately results in the generation of large amount of data thereby affecting the speed to process the data.
  • 8. Veracity As we know that data collected from various sources will have lots of inconsistencies and uncertainties. So, it is obvious that when you will extract useful information from such big amount of data, then on dumping remaining data, there will be some data packages that are bound to loose in the process. What we have to do is, we have to fill in the gaps and again mine it and process it to achieve desired goals.
  • 9. Value Value of data is meaningful information As the amount of data is increasing with time so the bigger problem arises which is, how to extract useful data from this large amount of data. What we have to do first is, we have to extract meaningful data from the collection of data and then some analytics has to be performed over the extracted data. The result obtained after analysis should be of some value. Extracting value out of big amount of data is itself a challenge.
  • 10. Sources of Big Data Some of the sources of Big Data are: • Users • Systems • Applications and Sensors • Social Media • Small-scale, mid-scale and large-scale Industries and so on These sources are generating large and large amount of data with varying speeds and also with varying formats of data. All these factors are creating challenges for traditional database systems and hence giving the term ‘BIG DATA’
  • 11. Problems with Big Data • Storing exponentially huge datasets • Processing the data with complex structures i.e. data can be structured, semi-structured or unstructured • Speed of data processing In other words, we can conclude that big data problem arises on the basis of 3 prime factors i.e. VOLUME, VARIETY and VELOCITY Solution..?