SlideShare uma empresa Scribd logo
1 de 44
CS8091 / Big Data Analytics
III Year / VI Semester
Objectives
 To study the basic Big Data and analytics concepts,
HDFS and MapReduce.
 To learn the fundamentals of Clustering and
Classification.
 To understand the fundamental concepts of
Association and different types of Recommendation
System.
 To learn about stream computing and study various
case studies.
 To have an introductory knowledge about NoSQL
Data Management and Visualization.
Unit I - INTRODUCTION TO BIG
DATA
Evolution of Big data - Best Practices for Big data
Analytics - Big data characteristics - Validating - The
Promotion of the Value of Big Data - Big Data Use
Cases- Characteristics of Big Data Applications -
Perception and Quantification of Value -Understanding
Big Data Storage - A General Overview of High-
Performance Architecture - HDFS - MapReduce and
YARN - Map Reduce Programming Model.
DATA
The quantities, characters, or symbols on which
operations are performed by a computer, which may be stored
and transmitted in the form of electrical signals and recorded on
magnetic, optical, or mechanical recording media.
BIG DATA
Big Data is also data but with a huge size. Big Data is a
term used to describe a collection of data that is huge in size and
yet growing exponentially with time.
“Big Data” is data whose scale, diversity, and complexity require
new architecture, techniques, algorithms, and analytics to
manage it and extract value and hidden knowledge from it

BIG DATA
Units of Memory-
 Byte
 Kilo Byte
 Mega Byte
 Giga Byte
 Tera Byte
 Peta Byte
 Exa Byte
 Zetta Byte
 Yotta Byte
BIG DATA - Sources
BIG DATA - Sources
 Primary sources of Big Data
 Social data:
 Likes,
Tweets & Retweets,
Comments,
Video Uploads, and general media
BIG DATA - Sources
 Primary sources of Big Data
 Machine data:
 Industrial equipment,
 sensors that are installed in machinery,
 web logs which track user behavior
 Sensors such as medical devices, smart meters,
road cameras, satellites, games
BIG DATA - Sources
 Primary sources of Big Data
 Transactional data:
 Invoices,
 Payment orders,
 Storage records,
 Delivery receipts
BIG DATA – Data Structures
BIG DATA – Data Structures
Structured data:
 Data containing a defined data type, format, and
structure.
BIG DATA – Data Structures
Semi-structured data:
 Semi-structured data is information that does not
reside in a relational database but that have some
organizational properties that make it easier to
analyze.
 Example: XML Data
BIG DATA – Data Structures
Quasi-structured data:
 It consists of textual data with erratic data formats,
and can be formatted with effort, software tools,
and time. An example of quasi-structured data is
the data about which webpages a user visited and
in what order.
BIG DATA – Data Structures
Quasi-structured data:
BIG DATA – Data Structures
Unstructured data:
Data that has no inherent structure, which may
include text documents, PDFs, images, and video.
BIG DATA – Data Structures
A clickstream that can be parsed and mined by
data scientists to discover usage patterns and
uncover relationships among clicks and areas
of interest on a website or group of sites.
Types of Data Repositories, from an
Analyst Perspective
Data Repository Characteristics
Spreadsheets and
data marts
Spreadsheets and low-volume
databases for recordkeeping
Analyst depends on data extracts
Types of Data Repositories, from an
Analyst Perspective
Data Repository Characteristics
Data Warehouses Centralized data containers in a purpose-built
space
Supports BI and reporting, but restricts robust
analyses
Analyst dependent on IT and DBAs for data access
and schema changes
Analysts must spend significant time to get
aggregated and disaggregated data extracts from
multiple sources.
Types of Data Repositories, from an
Analyst Perspective
Data Repository Characteristics
Analytic Sandbox
(workspaces)
Data assets gathered from multiple sources and
technologies for analysis
Enables flexible, high-performance analysis in a
nonproduction environment; can leverage in-
database processing
Reduces costs and risks associated with data
replication into “shadow” file systems
“Analyst owned” rather than “DBA owned”
State of the Practice in Analytics
Business Driver Examples
Optimize business operations Sales, pricing, profitability,
efficiency
Identify business risk Customer churn, fraud, default
Predict new business
opportunities
Upsell, cross-sell, best new
customer prospects
Comply with laws or regulatory
requirements
Anti-Money Laundering, Fair
Lending, Basel II-III,
Sarbanes-Oxley (SOX)
BI Versus Data Science
BI Versus Data Science
BI systems make it easy to answer questions
related to:
Quarter-to-date revenue,
Progress toward quarterly targets, and
Understand how much of a given product was sold
in a prior quarter or year
BI Versus Data Science
Data Science tends to use disaggregated data
in a
more forward-looking,
exploratory way,
focusing on analyzing the present and enabling
informed decisions about the future.
BI Versus Data Science
BI problems tend to require highly structured
data organized in rows and columns for
accurate reporting,
 Data Science projects tend to use many types
of data sources, including large or
unconventional datasets
Current Analytical Architecture
Most organizations still have data warehouses
that provide excellent support for traditional
reporting and
simple data analysis activities but
unfortunately have a more difficult time
supporting more robust analyses.
Current Analytical Architecture
Current Analytical Architecture
 For data sources to be loaded into the data
warehouse, data needs to be well understood,
structured, and normalized with the
appropriate data type definitions.
Current Analytical Architecture
 Although this kind of centralization enables
security, backup, and failover of highly critical
data,
it also means that data typically must go through
significant preprocessing and checkpoints before
it can enter this sort of controlled environment
Current Analytical Architecture
 As a result of this level of control on the
EDW, additional local systems may emerge in
the form of departmental warehouses and local
data marts that business users create to
accommodate their need for flexible analysis.
Current Analytical Architecture
 Once in the data warehouse, data is read by
additional applications across the enterprise for
BI and reporting purposes.
 These are high-priority operational processes
getting critical data feeds from the data
warehouses and repositories
Current Analytical Architecture
 Analysts create data extracts from the EDW to
analyze data offline in R or other local
analytical tools.
Current Analytical Architecture
 Because new data sources slowly accumulate
in the EDW due to the rigorous validation and
data structuring process, data is slow to move
into the EDW, and the data schema is slow to
change.
Current Analytical Architecture
 Departmental data warehouses may have been
originally designed for a specific purpose and
set of business needs, some of which may be
forced into existing schemas to enable BI and
the creation of OLAP cubes for analysis and
reporting.
Drivers of Big Data
Drivers of Big Data
The data now comes from multiple sources,
such as these:
 Medical information, such as genomic sequencing
and diagnostic imaging
Photos and video footage uploaded to the World
Wide Web.
Drivers of Big Data
The data now comes from multiple sources,
such as these:
 Video surveillance, such as the thousands of video
cameras spread across a city
Mobile devices, which provide geospatial location
data of the users, as well as metadata about text
messages, phone calls, and application usage on
smart phones.
Drivers of Big Data
The data now comes from multiple sources,
such as these:
 Smart devices, which provide sensor-based
collection of information from smart electric grids,
smart buildings, and many other public and
industry infrastructures
Nontraditional IT devices, including the use of
radio-frequency identification (RFID) readers,
GPS navigation systems, and seismic processing.
Emerging Big Data Ecosystem and a
New Approach to Analytics
Emerging Big Data Ecosystem and a
New Approach to Analytics
 Data devices
 “Sensornet” gather data from multiple locations
and continuously generate new data about this
data.
 The video game provider captures data about the
skill and levels attained by the player.
Emerging Big Data Ecosystem and a
New Approach to Analytics
 Data devices
 As a consequence, the game provider can fine-
tune the difficulty of the game, suggest other
related games that would most likely interest the
user, and offer additional equipment and
enhancements for the character based on the user’s
age, gender, and interests.
Emerging Big Data Ecosystem and a
New Approach to Analytics
 Data collectors
 Retail stores tracking the path a customer takes
through their store while pushing a shopping cart
with an RFID chip so they can gauge which
products get the most foot traffic using geospatial
data collected from the RFID chips
Emerging Big Data Ecosystem and a
New Approach to Analytics
 Data aggregators
 Organizations compile data from the devices and usage
patterns collected by government agencies, retail stores,
and websites.
In turn, they can choose to transform and package the data
as products to sell to list brokers, who may want to
generate marketing lists of people who may be good targets
for specific ad campaigns.
Emerging Big Data Ecosystem and a
New Approach to Analytics
 Data users and buyers
 These groups directly benefit from the data collected and
aggregated by others within the data value chain.

Mais conteĂșdo relacionado

Mais procurados

Data warehouse physical design
Data warehouse physical designData warehouse physical design
Data warehouse physical designEr. Nawaraj Bhandari
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouseJ M
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Data integration
Data integrationData integration
Data integrationUmar Alharaky
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computinghuda2018
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse ArchitecturesTheju Paul
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big DataSeval Çapraz
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modelingvivekjv
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesKrish_ver2
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data VisualizationRaffael Marty
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceData Science Thailand
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data EcosystemLucian Neghina
 
Data warehousing
Data warehousingData warehousing
Data warehousingShruti Dalela
 
Big Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesBig Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesSlideTeam
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesFellowBuddy.com
 
IOT DATA MANAGEMENT AND COMPUTE STACK.pptx
IOT DATA MANAGEMENT AND COMPUTE STACK.pptxIOT DATA MANAGEMENT AND COMPUTE STACK.pptx
IOT DATA MANAGEMENT AND COMPUTE STACK.pptxMeghaShree665225
 

Mais procurados (20)

Data warehouse physical design
Data warehouse physical designData warehouse physical design
Data warehouse physical design
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data integration
Data integrationData integration
Data integration
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse Architectures
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big Data
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data Science
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Big Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesBig Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation Slides
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
 
The cloud ecosystem
The cloud ecosystemThe cloud ecosystem
The cloud ecosystem
 
IOT DATA MANAGEMENT AND COMPUTE STACK.pptx
IOT DATA MANAGEMENT AND COMPUTE STACK.pptxIOT DATA MANAGEMENT AND COMPUTE STACK.pptx
IOT DATA MANAGEMENT AND COMPUTE STACK.pptx
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 

Semelhante a CS8091_BDA_Unit_I_Analytical_Architecture

Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChicago Hadoop Users Group
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...Experfy
 
BDA Mod1@AzDOCUMENTS.in.pdf
BDA Mod1@AzDOCUMENTS.in.pdfBDA Mod1@AzDOCUMENTS.in.pdf
BDA Mod1@AzDOCUMENTS.in.pdfJayanthSram
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
Big Data Analytics Research Report
Big Data Analytics Research ReportBig Data Analytics Research Report
Big Data Analytics Research ReportIla Group
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentationAASTHA PANDEY
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracleSourabh Saxena
 
Impact of big data on DCMI market
Impact of big data on DCMI marketImpact of big data on DCMI market
Impact of big data on DCMI marketMohsin Baig
 
International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)ijfcst journal
 
International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)ijfcst journal
 
Business Analytics Paradigm Change
Business Analytics Paradigm ChangeBusiness Analytics Paradigm Change
Business Analytics Paradigm ChangeDmitry Anoshin
 
International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)MiajackB
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Casesaziksa
 
Aziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaAziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaData Con LA
 

Semelhante a CS8091_BDA_Unit_I_Analytical_Architecture (20)

Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
Data Science
Data ScienceData Science
Data Science
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
 
Abstract
AbstractAbstract
Abstract
 
CTP Data Warehouse
CTP Data WarehouseCTP Data Warehouse
CTP Data Warehouse
 
BDA Mod1@AzDOCUMENTS.in.pdf
BDA Mod1@AzDOCUMENTS.in.pdfBDA Mod1@AzDOCUMENTS.in.pdf
BDA Mod1@AzDOCUMENTS.in.pdf
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Big Data Analytics Research Report
Big Data Analytics Research ReportBig Data Analytics Research Report
Big Data Analytics Research Report
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentation
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracle
 
Impact of big data on DCMI market
Impact of big data on DCMI marketImpact of big data on DCMI market
Impact of big data on DCMI market
 
International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)
 
International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)
 
Business Analytics Paradigm Change
Business Analytics Paradigm ChangeBusiness Analytics Paradigm Change
Business Analytics Paradigm Change
 
International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Aziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaAziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jha
 

Mais de Palani Kumar

CS8091_BDA_Unit_V_NoSQL
CS8091_BDA_Unit_V_NoSQLCS8091_BDA_Unit_V_NoSQL
CS8091_BDA_Unit_V_NoSQLPalani Kumar
 
CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingCS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingPalani Kumar
 
CS8091_BDA_Unit_III_Content_Based_Recommendation
CS8091_BDA_Unit_III_Content_Based_RecommendationCS8091_BDA_Unit_III_Content_Based_Recommendation
CS8091_BDA_Unit_III_Content_Based_RecommendationPalani Kumar
 
CS8091_BDA_Unit_II_Clustering
CS8091_BDA_Unit_II_ClusteringCS8091_BDA_Unit_II_Clustering
CS8091_BDA_Unit_II_ClusteringPalani Kumar
 
IT8005_EC_Unit_V_Features_Of_E_Commerce_Technology
IT8005_EC_Unit_V_Features_Of_E_Commerce_TechnologyIT8005_EC_Unit_V_Features_Of_E_Commerce_Technology
IT8005_EC_Unit_V_Features_Of_E_Commerce_TechnologyPalani Kumar
 
IT8005_EC_Unit_IV_Internet_Marketing_Technologies
IT8005_EC_Unit_IV_Internet_Marketing_TechnologiesIT8005_EC_Unit_IV_Internet_Marketing_Technologies
IT8005_EC_Unit_IV_Internet_Marketing_TechnologiesPalani Kumar
 
IT8005_EC_Unit_III_Securing_Communication_Channels
IT8005_EC_Unit_III_Securing_Communication_ChannelsIT8005_EC_Unit_III_Securing_Communication_Channels
IT8005_EC_Unit_III_Securing_Communication_ChannelsPalani Kumar
 
IT8005_EC_Unit_II_Building_ECommerce
IT8005_EC_Unit_II_Building_ECommerceIT8005_EC_Unit_II_Building_ECommerce
IT8005_EC_Unit_II_Building_ECommercePalani Kumar
 
IT_8005_Electronic Commerce_Unit_I
IT_8005_Electronic Commerce_Unit_IIT_8005_Electronic Commerce_Unit_I
IT_8005_Electronic Commerce_Unit_IPalani Kumar
 

Mais de Palani Kumar (9)

CS8091_BDA_Unit_V_NoSQL
CS8091_BDA_Unit_V_NoSQLCS8091_BDA_Unit_V_NoSQL
CS8091_BDA_Unit_V_NoSQL
 
CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingCS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_Computing
 
CS8091_BDA_Unit_III_Content_Based_Recommendation
CS8091_BDA_Unit_III_Content_Based_RecommendationCS8091_BDA_Unit_III_Content_Based_Recommendation
CS8091_BDA_Unit_III_Content_Based_Recommendation
 
CS8091_BDA_Unit_II_Clustering
CS8091_BDA_Unit_II_ClusteringCS8091_BDA_Unit_II_Clustering
CS8091_BDA_Unit_II_Clustering
 
IT8005_EC_Unit_V_Features_Of_E_Commerce_Technology
IT8005_EC_Unit_V_Features_Of_E_Commerce_TechnologyIT8005_EC_Unit_V_Features_Of_E_Commerce_Technology
IT8005_EC_Unit_V_Features_Of_E_Commerce_Technology
 
IT8005_EC_Unit_IV_Internet_Marketing_Technologies
IT8005_EC_Unit_IV_Internet_Marketing_TechnologiesIT8005_EC_Unit_IV_Internet_Marketing_Technologies
IT8005_EC_Unit_IV_Internet_Marketing_Technologies
 
IT8005_EC_Unit_III_Securing_Communication_Channels
IT8005_EC_Unit_III_Securing_Communication_ChannelsIT8005_EC_Unit_III_Securing_Communication_Channels
IT8005_EC_Unit_III_Securing_Communication_Channels
 
IT8005_EC_Unit_II_Building_ECommerce
IT8005_EC_Unit_II_Building_ECommerceIT8005_EC_Unit_II_Building_ECommerce
IT8005_EC_Unit_II_Building_ECommerce
 
IT_8005_Electronic Commerce_Unit_I
IT_8005_Electronic Commerce_Unit_IIT_8005_Electronic Commerce_Unit_I
IT_8005_Electronic Commerce_Unit_I
 

Último

Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Call Now ≜ 9953056974 â‰ŒđŸ” Call Girls In New Ashok Nagar â‰ŒđŸ” Delhi door step de...
Call Now ≜ 9953056974 â‰ŒđŸ” Call Girls In New Ashok Nagar  â‰ŒđŸ” Delhi door step de...Call Now ≜ 9953056974 â‰ŒđŸ” Call Girls In New Ashok Nagar  â‰ŒđŸ” Delhi door step de...
Call Now ≜ 9953056974 â‰ŒđŸ” Call Girls In New Ashok Nagar â‰ŒđŸ” Delhi door step de...9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 

Último (20)

Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Call Now ≜ 9953056974 â‰ŒđŸ” Call Girls In New Ashok Nagar â‰ŒđŸ” Delhi door step de...
Call Now ≜ 9953056974 â‰ŒđŸ” Call Girls In New Ashok Nagar  â‰ŒđŸ” Delhi door step de...Call Now ≜ 9953056974 â‰ŒđŸ” Call Girls In New Ashok Nagar  â‰ŒđŸ” Delhi door step de...
Call Now ≜ 9953056974 â‰ŒđŸ” Call Girls In New Ashok Nagar â‰ŒđŸ” Delhi door step de...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 

CS8091_BDA_Unit_I_Analytical_Architecture

  • 1. CS8091 / Big Data Analytics III Year / VI Semester
  • 2. Objectives  To study the basic Big Data and analytics concepts, HDFS and MapReduce.  To learn the fundamentals of Clustering and Classification.  To understand the fundamental concepts of Association and different types of Recommendation System.  To learn about stream computing and study various case studies.  To have an introductory knowledge about NoSQL Data Management and Visualization.
  • 3. Unit I - INTRODUCTION TO BIG DATA Evolution of Big data - Best Practices for Big data Analytics - Big data characteristics - Validating - The Promotion of the Value of Big Data - Big Data Use Cases- Characteristics of Big Data Applications - Perception and Quantification of Value -Understanding Big Data Storage - A General Overview of High- Performance Architecture - HDFS - MapReduce and YARN - Map Reduce Programming Model.
  • 4. DATA The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.
  • 5. BIG DATA Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in size and yet growing exponentially with time. “Big Data” is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it

  • 6. BIG DATA Units of Memory-  Byte  Kilo Byte  Mega Byte  Giga Byte  Tera Byte  Peta Byte  Exa Byte  Zetta Byte  Yotta Byte
  • 7. BIG DATA - Sources
  • 8. BIG DATA - Sources  Primary sources of Big Data  Social data:  Likes, Tweets & Retweets, Comments, Video Uploads, and general media
  • 9. BIG DATA - Sources  Primary sources of Big Data  Machine data:  Industrial equipment,  sensors that are installed in machinery,  web logs which track user behavior  Sensors such as medical devices, smart meters, road cameras, satellites, games
  • 10. BIG DATA - Sources  Primary sources of Big Data  Transactional data:  Invoices,  Payment orders,  Storage records,  Delivery receipts
  • 11. BIG DATA – Data Structures
  • 12. BIG DATA – Data Structures Structured data:  Data containing a defined data type, format, and structure.
  • 13. BIG DATA – Data Structures Semi-structured data:  Semi-structured data is information that does not reside in a relational database but that have some organizational properties that make it easier to analyze.  Example: XML Data
  • 14. BIG DATA – Data Structures Quasi-structured data:  It consists of textual data with erratic data formats, and can be formatted with effort, software tools, and time. An example of quasi-structured data is the data about which webpages a user visited and in what order.
  • 15. BIG DATA – Data Structures Quasi-structured data:
  • 16. BIG DATA – Data Structures Unstructured data: Data that has no inherent structure, which may include text documents, PDFs, images, and video.
  • 17. BIG DATA – Data Structures A clickstream that can be parsed and mined by data scientists to discover usage patterns and uncover relationships among clicks and areas of interest on a website or group of sites.
  • 18. Types of Data Repositories, from an Analyst Perspective Data Repository Characteristics Spreadsheets and data marts Spreadsheets and low-volume databases for recordkeeping Analyst depends on data extracts
  • 19. Types of Data Repositories, from an Analyst Perspective Data Repository Characteristics Data Warehouses Centralized data containers in a purpose-built space Supports BI and reporting, but restricts robust analyses Analyst dependent on IT and DBAs for data access and schema changes Analysts must spend significant time to get aggregated and disaggregated data extracts from multiple sources.
  • 20. Types of Data Repositories, from an Analyst Perspective Data Repository Characteristics Analytic Sandbox (workspaces) Data assets gathered from multiple sources and technologies for analysis Enables flexible, high-performance analysis in a nonproduction environment; can leverage in- database processing Reduces costs and risks associated with data replication into “shadow” file systems “Analyst owned” rather than “DBA owned”
  • 21. State of the Practice in Analytics Business Driver Examples Optimize business operations Sales, pricing, profitability, efficiency Identify business risk Customer churn, fraud, default Predict new business opportunities Upsell, cross-sell, best new customer prospects Comply with laws or regulatory requirements Anti-Money Laundering, Fair Lending, Basel II-III, Sarbanes-Oxley (SOX)
  • 22. BI Versus Data Science
  • 23. BI Versus Data Science BI systems make it easy to answer questions related to: Quarter-to-date revenue, Progress toward quarterly targets, and Understand how much of a given product was sold in a prior quarter or year
  • 24. BI Versus Data Science Data Science tends to use disaggregated data in a more forward-looking, exploratory way, focusing on analyzing the present and enabling informed decisions about the future.
  • 25. BI Versus Data Science BI problems tend to require highly structured data organized in rows and columns for accurate reporting,  Data Science projects tend to use many types of data sources, including large or unconventional datasets
  • 26. Current Analytical Architecture Most organizations still have data warehouses that provide excellent support for traditional reporting and simple data analysis activities but unfortunately have a more difficult time supporting more robust analyses.
  • 28. Current Analytical Architecture  For data sources to be loaded into the data warehouse, data needs to be well understood, structured, and normalized with the appropriate data type definitions.
  • 29. Current Analytical Architecture  Although this kind of centralization enables security, backup, and failover of highly critical data, it also means that data typically must go through significant preprocessing and checkpoints before it can enter this sort of controlled environment
  • 30. Current Analytical Architecture  As a result of this level of control on the EDW, additional local systems may emerge in the form of departmental warehouses and local data marts that business users create to accommodate their need for flexible analysis.
  • 31. Current Analytical Architecture  Once in the data warehouse, data is read by additional applications across the enterprise for BI and reporting purposes.  These are high-priority operational processes getting critical data feeds from the data warehouses and repositories
  • 32. Current Analytical Architecture  Analysts create data extracts from the EDW to analyze data offline in R or other local analytical tools.
  • 33. Current Analytical Architecture  Because new data sources slowly accumulate in the EDW due to the rigorous validation and data structuring process, data is slow to move into the EDW, and the data schema is slow to change.
  • 34. Current Analytical Architecture  Departmental data warehouses may have been originally designed for a specific purpose and set of business needs, some of which may be forced into existing schemas to enable BI and the creation of OLAP cubes for analysis and reporting.
  • 36. Drivers of Big Data The data now comes from multiple sources, such as these:  Medical information, such as genomic sequencing and diagnostic imaging Photos and video footage uploaded to the World Wide Web.
  • 37. Drivers of Big Data The data now comes from multiple sources, such as these:  Video surveillance, such as the thousands of video cameras spread across a city Mobile devices, which provide geospatial location data of the users, as well as metadata about text messages, phone calls, and application usage on smart phones.
  • 38. Drivers of Big Data The data now comes from multiple sources, such as these:  Smart devices, which provide sensor-based collection of information from smart electric grids, smart buildings, and many other public and industry infrastructures Nontraditional IT devices, including the use of radio-frequency identification (RFID) readers, GPS navigation systems, and seismic processing.
  • 39. Emerging Big Data Ecosystem and a New Approach to Analytics
  • 40. Emerging Big Data Ecosystem and a New Approach to Analytics  Data devices  “Sensornet” gather data from multiple locations and continuously generate new data about this data.  The video game provider captures data about the skill and levels attained by the player.
  • 41. Emerging Big Data Ecosystem and a New Approach to Analytics  Data devices  As a consequence, the game provider can fine- tune the difficulty of the game, suggest other related games that would most likely interest the user, and offer additional equipment and enhancements for the character based on the user’s age, gender, and interests.
  • 42. Emerging Big Data Ecosystem and a New Approach to Analytics  Data collectors  Retail stores tracking the path a customer takes through their store while pushing a shopping cart with an RFID chip so they can gauge which products get the most foot traffic using geospatial data collected from the RFID chips
  • 43. Emerging Big Data Ecosystem and a New Approach to Analytics  Data aggregators  Organizations compile data from the devices and usage patterns collected by government agencies, retail stores, and websites. In turn, they can choose to transform and package the data as products to sell to list brokers, who may want to generate marketing lists of people who may be good targets for specific ad campaigns.
  • 44. Emerging Big Data Ecosystem and a New Approach to Analytics  Data users and buyers  These groups directly benefit from the data collected and aggregated by others within the data value chain.