SlideShare uma empresa Scribd logo
1 de 15
Myself Archana R
Assistant Professor In
Department Of Computer Science
SACWC.
I am here because I love to give
presentations.
BIG DATAAND ANALYTICS?
• Big data analytics is the use of advanced analytic techniques against very large, diverse
data sets that include structured, semi-structured and unstructured data, from different
sources, and in different sizes from terabytes to zetta bytes.
• Big data analytics refers to the method of analysing huge volumes of data, or big data. ...
The major aim of Big Data Analytics is to discover new patterns and relationships which
might be invisible, and it can provide new insights about the users who created it.
BIG DATAANALYTICS EXAMPLE
• Big data analytics helps businesses to get insights from today's huge data resources.
People, organizations, and machines now produce massive amounts of data. Social
media, cloud applications, and machine sensor data are just some examples.
Why is big data analytics important?
• Big data analytics helps organizations harness their data and use it to identify new
opportunities. That, in turn, leads to smarter business moves, more efficient operations,
higher profits and happier customers.
BIG DATAANALYTICS TOOLS
• Hadoop - helps in storing and analysing data.
• MongoDB - used on datasets that change frequently.
• Talend - used for data integration and management.
• Cassandra - a distributed database used to handle chunks of data.
• Spark - used for real-time processing and analysing large amounts of data.
WHAT ARE THE CONCEPTS OF BIG DATA?
• Big data was originally associated with three key concepts: volume, variety, and
velocity.
• The analysis of big data presents challenges in sampling, and thus previously
allowing for only observations and sampling.
WHAT ARE THE THREE TYPES OF BIG DATA?
• Big data is classified in three ways:
• Structured Data.
• Unstructured Data.
• Semi-Structured Data.
DIFFERENCE BETWEEN DATAAND BIG DATA?
• Any definition is a bit circular, as “Big” data is still data of course. Data is a set of qualitative
or quantitative variables – it can be structured or unstructured, machine readable or not, digital
or analogue, personal or not. ... Hence, BIG DATA, is not just “more” data.
• What is the size of big data?
• The term Big Data implies a large amount of information (terabytes and petabytes). It is
important to understand that to solve a particular business case, the value usually does not
have the entire volume, but only a small part. However, in advance this valuable component
cannot be determined without analysis.
HOW HADOOP WORKS
• Hadoop makes it easier to use all the storage and processing capacity in cluster servers,
and to execute distributed processes against huge amounts of data.
• Applications that collect data in various formats can place data into the Hadoop cluster
by using an API operation to connect to the Name Node.
• To run a job to query the data, provide a Map Reduce job made up of many map and
reduce tasks that run against the data in HDFS spread across the Data Nodes.
• Map tasks run on each node against the input files supplied, and reducers run to
aggregate and organize the final output.
• Spark – An open source, distributed processing system commonly used for big data
workloads. Apache Spark uses in-memory caching and optimized execution for fast
performance, and it supports general batch processing, streaming analytics, machine learning,
graph databases, and ad hoc queries.
• Presto – An open source, distributed SQL query engine optimized for low-latency, ad-hoc
analysis of data. It supports the ANSI SQL standard, including complex queries, aggregations,
joins, and window functions. Presto can process data from multiple data sources including the
Hadoop Distributed File System (HDFS) and Amazon S3.
• hive– Allows users to leverage Hadoop MapReduce using a SQL interface, enabling analytics
at a massive scale, in addition to distributed and fault-tolerant data warehousing.
• HBase– An open source, non-relational, versioned database that runs on top of Amazon S3
(using EMRFS) or the Hadoop Distributed File System (HDFS). HBase is a massively
scalable, distributed big data store built for random, strictly consistent, real-time access for
tables with billions of rows and millions of columns.
• Zeppelin – An interactive notebook that enables interactive data exploration.
RUNNING HADOOP ON AWS
• Amazon EMR is a managed service that lets you process and analyze large datasets using the
latest versions of bigdata processing frameworks such as Apache Hadoop, Spark, HBase, and
Presto on fully customizable clusters.
• Easy to use : You can launch an Amazon EMR cluster in minutes. You don’t need to worry
about node provisioning, cluster setup, Hadoop configuration, or cluster tuning.
• Low cost : Amazon EMR pricing is simple and predictable: You pay an hourly rate for every
instance hour you use and you can leverage Spot Instances for greater savings.
• Elastic : With Amazon EMR, you can provision one, hundreds, or thousands of compute
instances to process data at any scale.
• Transient : You can use EMRFS to run clusters on-demand based on HDFS data stored
persistently in Amazon S3. As jobs finish, you can shut down a cluster and have the data saved
in Amazon. You pay only for the compute time that the cluster is running.
• Secure : Amazon EMR uses all common security characteristics of AWS services:
• Identity and Access Management (IAM) roles and policies to manage permissions.
• Encryption in-transit and at-rest to help you protect your data and meet compliance
standards, such as HIPAA.
• Security groups to control inbound and outbound network traffic to your cluster nodes.
HADOOP ECOSYSTEM
• The term Hadoop is a general term that may refer to any of the following: The overall Hadoop
Ecosystem, which encompasses both the core modules and related sub-modules.
• The core Hadoop modules, including Hadoop Distributed File System (HDFS™), Yet Another
Resource Negotiator (YARN), MapReduce, and Hadoop Common (discussed below). These are
the basic building blocks of a typical Hadoop deployment.
• Hadoop-related sub-modules, including: Apache Hive™, Apache Impala™,
Apache Pig™, and Apache Zookeeper™, among others. These related pieces of software can
be used to customize, improve upon, or extend the functionality of core Hadoop.
HADOOP MODULES
• HDFS — Hadoop Distributed File System. HDFS is a Java-based system that allows large
data sets to be stored across nodes in a cluster in a fault-tolerant manner.
• YARN — Yet Another Resource Negotiator. YARN is used for cluster resource management,
planning tasks, and scheduling jobs that are running on Hadoop.
• Map Reduce —map reduce is both a programming model and big data processing engine
used for the parallel processing of large data sets. Originally, Map Reduce was the only
execution engine available in Hadoop, but later on, Hadoop added support for others,
including apache tez™ and apache sparker™.
• Hadoop Common — Hadoop Common provides a set of services across libraries and utilities
to support the other Hadoop modules.
BENEFITS OF HADOOP
• Scalability — Unlike traditional systems that limit data storage, Hadoop is scalable as it
operates in a distributed environment. This allowed data architects to build early datalakes on
Hadoop. Learn more about the history and evoluation of data lakes.
• Resilience — The Hadoop Distributed File System (HDFS) is fundamentally resilient. Data
stored on any node of a Hadoop cluster is also replicated on other nodes of the cluster to
prepare for the possibility of hardware or software failures. This intentionally redundant
design ensures fault tolerance. If one node goes down, there is always a backup of the data
available in the cluster.
• Flexibility — unlike traditional relational database management systems, when working with
Hadoop, you can store data in any format, including semi-structured or unstructured formats.
Hadoop enables businesses to easily access new data sources and tap into different types of
data.
Fundamentals of big data analytics and Hadoop

Mais conteúdo relacionado

Mais procurados

Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview pptVIKAS KATARE
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big AnalyticsAjay Ohri
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solrboorad
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1RojaT4
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKristof Jozsa
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampSpotle.ai
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013boorad
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBhavya Gulati
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations PresentationAdam Doyle
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challengesfazail amin
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop siliconsudipt
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleSpringPeople
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...yashbheda
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataJoey Li
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data AnalyticsS P Sajjan
 

Mais procurados (20)

Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Big Data
Big DataBig Data
Big Data
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challenges
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 

Semelhante a Fundamentals of big data analytics and Hadoop

Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdataTom Rogers
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeSysfore Technologies
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataDebajani Mohanty
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystemnallagangus
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingFarzad Nozarian
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With HadoopUmair Shafique
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoopdatabloginfo
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Google Data Engineering.pdf
Google Data Engineering.pdfGoogle Data Engineering.pdf
Google Data Engineering.pdfavenkatram
 
Data Engineering on GCP
Data Engineering on GCPData Engineering on GCP
Data Engineering on GCPBlibBlobb
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologiesneeraj rathore
 
Module-2_HADOOP.pptx
Module-2_HADOOP.pptxModule-2_HADOOP.pptx
Module-2_HADOOP.pptxShreyasKv13
 
BIg Data Analytics-Module-2 vtu engineering.pptx
BIg Data Analytics-Module-2 vtu engineering.pptxBIg Data Analytics-Module-2 vtu engineering.pptx
BIg Data Analytics-Module-2 vtu engineering.pptxVishalBH1
 

Semelhante a Fundamentals of big data analytics and Hadoop (20)

Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Hadoop
HadoopHadoop
Hadoop
 
finap ppt conference.pptx
finap ppt conference.pptxfinap ppt conference.pptx
finap ppt conference.pptx
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoop
 
paper
paperpaper
paper
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Google Data Engineering.pdf
Google Data Engineering.pdfGoogle Data Engineering.pdf
Google Data Engineering.pdf
 
Data Engineering on GCP
Data Engineering on GCPData Engineering on GCP
Data Engineering on GCP
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
Big Data Concepts
Big Data ConceptsBig Data Concepts
Big Data Concepts
 
Module-2_HADOOP.pptx
Module-2_HADOOP.pptxModule-2_HADOOP.pptx
Module-2_HADOOP.pptx
 
BIg Data Analytics-Module-2 vtu engineering.pptx
BIg Data Analytics-Module-2 vtu engineering.pptxBIg Data Analytics-Module-2 vtu engineering.pptx
BIg Data Analytics-Module-2 vtu engineering.pptx
 

Mais de Archana Gopinath

Data Transfer & Manipulation.pptx
Data Transfer & Manipulation.pptxData Transfer & Manipulation.pptx
Data Transfer & Manipulation.pptxArchana Gopinath
 
DP _ CO Instruction Format.pptx
DP _ CO Instruction Format.pptxDP _ CO Instruction Format.pptx
DP _ CO Instruction Format.pptxArchana Gopinath
 
Language for specifying lexical Analyzer
Language for specifying lexical AnalyzerLanguage for specifying lexical Analyzer
Language for specifying lexical AnalyzerArchana Gopinath
 
Implementation of lexical analyser
Implementation of lexical analyserImplementation of lexical analyser
Implementation of lexical analyserArchana Gopinath
 
A simple approach of lexical analyzers
A simple approach of lexical analyzersA simple approach of lexical analyzers
A simple approach of lexical analyzersArchana Gopinath
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical AnalyzerArchana Gopinath
 
minimization the number of states of DFA
minimization the number of states of DFAminimization the number of states of DFA
minimization the number of states of DFAArchana Gopinath
 
Regular Expression to Finite Automata
Regular Expression to Finite AutomataRegular Expression to Finite Automata
Regular Expression to Finite AutomataArchana Gopinath
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSArchana Gopinath
 
Programming with R in Big Data Analytics
Programming with R in Big Data AnalyticsProgramming with R in Big Data Analytics
Programming with R in Big Data AnalyticsArchana Gopinath
 
If statements in c programming
If statements in c programmingIf statements in c programming
If statements in c programmingArchana Gopinath
 
Guided media Transmission Media
Guided media Transmission MediaGuided media Transmission Media
Guided media Transmission MediaArchana Gopinath
 

Mais de Archana Gopinath (18)

Data Transfer & Manipulation.pptx
Data Transfer & Manipulation.pptxData Transfer & Manipulation.pptx
Data Transfer & Manipulation.pptx
 
DP _ CO Instruction Format.pptx
DP _ CO Instruction Format.pptxDP _ CO Instruction Format.pptx
DP _ CO Instruction Format.pptx
 
Language for specifying lexical Analyzer
Language for specifying lexical AnalyzerLanguage for specifying lexical Analyzer
Language for specifying lexical Analyzer
 
Implementation of lexical analyser
Implementation of lexical analyserImplementation of lexical analyser
Implementation of lexical analyser
 
A simple approach of lexical analyzers
A simple approach of lexical analyzersA simple approach of lexical analyzers
A simple approach of lexical analyzers
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical Analyzer
 
minimization the number of states of DFA
minimization the number of states of DFAminimization the number of states of DFA
minimization the number of states of DFA
 
Regular Expression to Finite Automata
Regular Expression to Finite AutomataRegular Expression to Finite Automata
Regular Expression to Finite Automata
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICS
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Hadoop
HadoopHadoop
Hadoop
 
Programming with R in Big Data Analytics
Programming with R in Big Data AnalyticsProgramming with R in Big Data Analytics
Programming with R in Big Data Analytics
 
If statements in c programming
If statements in c programmingIf statements in c programming
If statements in c programming
 
un Guided media
un Guided mediaun Guided media
un Guided media
 
Guided media Transmission Media
Guided media Transmission MediaGuided media Transmission Media
Guided media Transmission Media
 
Main Memory RAM and ROM
Main Memory RAM and ROMMain Memory RAM and ROM
Main Memory RAM and ROM
 
Java thread life cycle
Java thread life cycleJava thread life cycle
Java thread life cycle
 
PCSTt11 overview of java
PCSTt11 overview of javaPCSTt11 overview of java
PCSTt11 overview of java
 

Último

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 

Último (20)

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 

Fundamentals of big data analytics and Hadoop

  • 1. Myself Archana R Assistant Professor In Department Of Computer Science SACWC. I am here because I love to give presentations.
  • 2. BIG DATAAND ANALYTICS? • Big data analytics is the use of advanced analytic techniques against very large, diverse data sets that include structured, semi-structured and unstructured data, from different sources, and in different sizes from terabytes to zetta bytes. • Big data analytics refers to the method of analysing huge volumes of data, or big data. ... The major aim of Big Data Analytics is to discover new patterns and relationships which might be invisible, and it can provide new insights about the users who created it.
  • 3. BIG DATAANALYTICS EXAMPLE • Big data analytics helps businesses to get insights from today's huge data resources. People, organizations, and machines now produce massive amounts of data. Social media, cloud applications, and machine sensor data are just some examples. Why is big data analytics important? • Big data analytics helps organizations harness their data and use it to identify new opportunities. That, in turn, leads to smarter business moves, more efficient operations, higher profits and happier customers.
  • 4. BIG DATAANALYTICS TOOLS • Hadoop - helps in storing and analysing data. • MongoDB - used on datasets that change frequently. • Talend - used for data integration and management. • Cassandra - a distributed database used to handle chunks of data. • Spark - used for real-time processing and analysing large amounts of data.
  • 5. WHAT ARE THE CONCEPTS OF BIG DATA? • Big data was originally associated with three key concepts: volume, variety, and velocity. • The analysis of big data presents challenges in sampling, and thus previously allowing for only observations and sampling.
  • 6. WHAT ARE THE THREE TYPES OF BIG DATA? • Big data is classified in three ways: • Structured Data. • Unstructured Data. • Semi-Structured Data.
  • 7. DIFFERENCE BETWEEN DATAAND BIG DATA? • Any definition is a bit circular, as “Big” data is still data of course. Data is a set of qualitative or quantitative variables – it can be structured or unstructured, machine readable or not, digital or analogue, personal or not. ... Hence, BIG DATA, is not just “more” data. • What is the size of big data? • The term Big Data implies a large amount of information (terabytes and petabytes). It is important to understand that to solve a particular business case, the value usually does not have the entire volume, but only a small part. However, in advance this valuable component cannot be determined without analysis.
  • 8. HOW HADOOP WORKS • Hadoop makes it easier to use all the storage and processing capacity in cluster servers, and to execute distributed processes against huge amounts of data. • Applications that collect data in various formats can place data into the Hadoop cluster by using an API operation to connect to the Name Node. • To run a job to query the data, provide a Map Reduce job made up of many map and reduce tasks that run against the data in HDFS spread across the Data Nodes. • Map tasks run on each node against the input files supplied, and reducers run to aggregate and organize the final output.
  • 9. • Spark – An open source, distributed processing system commonly used for big data workloads. Apache Spark uses in-memory caching and optimized execution for fast performance, and it supports general batch processing, streaming analytics, machine learning, graph databases, and ad hoc queries. • Presto – An open source, distributed SQL query engine optimized for low-latency, ad-hoc analysis of data. It supports the ANSI SQL standard, including complex queries, aggregations, joins, and window functions. Presto can process data from multiple data sources including the Hadoop Distributed File System (HDFS) and Amazon S3. • hive– Allows users to leverage Hadoop MapReduce using a SQL interface, enabling analytics at a massive scale, in addition to distributed and fault-tolerant data warehousing. • HBase– An open source, non-relational, versioned database that runs on top of Amazon S3 (using EMRFS) or the Hadoop Distributed File System (HDFS). HBase is a massively scalable, distributed big data store built for random, strictly consistent, real-time access for tables with billions of rows and millions of columns. • Zeppelin – An interactive notebook that enables interactive data exploration.
  • 10. RUNNING HADOOP ON AWS • Amazon EMR is a managed service that lets you process and analyze large datasets using the latest versions of bigdata processing frameworks such as Apache Hadoop, Spark, HBase, and Presto on fully customizable clusters. • Easy to use : You can launch an Amazon EMR cluster in minutes. You don’t need to worry about node provisioning, cluster setup, Hadoop configuration, or cluster tuning. • Low cost : Amazon EMR pricing is simple and predictable: You pay an hourly rate for every instance hour you use and you can leverage Spot Instances for greater savings.
  • 11. • Elastic : With Amazon EMR, you can provision one, hundreds, or thousands of compute instances to process data at any scale. • Transient : You can use EMRFS to run clusters on-demand based on HDFS data stored persistently in Amazon S3. As jobs finish, you can shut down a cluster and have the data saved in Amazon. You pay only for the compute time that the cluster is running. • Secure : Amazon EMR uses all common security characteristics of AWS services: • Identity and Access Management (IAM) roles and policies to manage permissions. • Encryption in-transit and at-rest to help you protect your data and meet compliance standards, such as HIPAA. • Security groups to control inbound and outbound network traffic to your cluster nodes.
  • 12. HADOOP ECOSYSTEM • The term Hadoop is a general term that may refer to any of the following: The overall Hadoop Ecosystem, which encompasses both the core modules and related sub-modules. • The core Hadoop modules, including Hadoop Distributed File System (HDFS™), Yet Another Resource Negotiator (YARN), MapReduce, and Hadoop Common (discussed below). These are the basic building blocks of a typical Hadoop deployment. • Hadoop-related sub-modules, including: Apache Hive™, Apache Impala™, Apache Pig™, and Apache Zookeeper™, among others. These related pieces of software can be used to customize, improve upon, or extend the functionality of core Hadoop.
  • 13. HADOOP MODULES • HDFS — Hadoop Distributed File System. HDFS is a Java-based system that allows large data sets to be stored across nodes in a cluster in a fault-tolerant manner. • YARN — Yet Another Resource Negotiator. YARN is used for cluster resource management, planning tasks, and scheduling jobs that are running on Hadoop. • Map Reduce —map reduce is both a programming model and big data processing engine used for the parallel processing of large data sets. Originally, Map Reduce was the only execution engine available in Hadoop, but later on, Hadoop added support for others, including apache tez™ and apache sparker™. • Hadoop Common — Hadoop Common provides a set of services across libraries and utilities to support the other Hadoop modules.
  • 14. BENEFITS OF HADOOP • Scalability — Unlike traditional systems that limit data storage, Hadoop is scalable as it operates in a distributed environment. This allowed data architects to build early datalakes on Hadoop. Learn more about the history and evoluation of data lakes. • Resilience — The Hadoop Distributed File System (HDFS) is fundamentally resilient. Data stored on any node of a Hadoop cluster is also replicated on other nodes of the cluster to prepare for the possibility of hardware or software failures. This intentionally redundant design ensures fault tolerance. If one node goes down, there is always a backup of the data available in the cluster. • Flexibility — unlike traditional relational database management systems, when working with Hadoop, you can store data in any format, including semi-structured or unstructured formats. Hadoop enables businesses to easily access new data sources and tap into different types of data.