SlideShare a Scribd company logo
1 of 100
Introduction to
Big Data Analytics
Instructor
Dendej Sawarnkatat
dendej@gmail.com
Agenda
• What is Big Data?
• Concepts and Terminology
• Big Data Characteristics
• Different Types of Data
• Case Study Background
• Marketplace Dynamics
• Business Architecture
• Information and Communications
Technology
2
WHAT’S BIG DATA? 3
What’s Big Data?
4
Global Digital Usage
5
Device Ownership
6
Mobile Internet Usage
7
Come in 60 Seconds ?
8
Come in 60 Seconds?
9
Search Engine Market Share
10
Social Media
11
Social Media Share
12
Social Media Profile
13
Social Media Behavior
14
Social Media Spent Hours
15
Facebook Languages
16
Thailand
17
Thailand Device Ownership
18
Thailand Daily Media Times
19
Thailand Internet Usage
20
Thailand Mobile Internet
Usage
21
Thailand Most Visited Webs
22
Thailand Social Media
Overview
23
Thailand Social Media
Platforms
24
Thailand Mobile Apps
25
Mobile Data
26
Mobile Social Usage
27
Mobile Social Usage
28
M2M Data
• Data generated by different sources
around us like automated systems, sensors
and mobile devices.
• 2.5 quintillion bytes of data created
everyday.
• 80-90% of the data in the world today has
been created in the last two years alone.
29
Flood of Data
• More than 4.5 billion internet users in the
world today.
• The New York Stock Exchange generates
about 4-5 TB of data per day.
• 7TB of data are processed by Twitter every
day.
• 10TB of data are processed by Facebook
every day and growing at 7 PB per month. 30
Storage is Growing FAST!!!!
31
More than Exabyte
32
Flood of Data (cont’d)
• Interestingly 80% of these data are
unstructured.
• With this massive quantity of data,
businesses need fast, reliable, deeper data
insight.
• Therefore, Big Data solutions based on
Hadoop and other analytics software are
becoming more and more relevant.
33
Just Can’t Do It
34
Handling Humongous Data
• Traditional approaches not fit for data
analysis due to inflation.
• Handling Large volume of data which are
structured or unstructured.
• Datasets that grow so large that it is
difficult to capture, store, manage, share,
analyze and visualize with the typical
database software tools.
35
Data Evolution
36
Big Data Analytic Applications
• Analysis of market and derive new strategy
to improve business in different geo
locations.
• To know the response for their campaigns,
promotions, and other advertising
mediums.
• Use medical history of patients, hospitals
to provide better and quick service.
37
Big Data Analytic Applications
• Perform Risk Analysis.
• Create new revenue streams.
• Reduces maintenance cost.
• Faster, better decision making.
• New products & services.
• Etc
38
Data Science as Tool
• Involves using methods to analyze massive
amounts of data and extract the
knowledge it contains.
• Data science and big data evolved from
statistics and traditional data management
but are now considered to be distinct
disciplines.
39
Data Scientist
40
Data Science Processes
1. Setting the research goal
2. Retrieving data
3. Cleansing, integrating, and transforming
data
4. Exploratory data analysis
5. Building model(s)
6. Presenting of finding (insights)
41
BIG DATA: CONCEPTS AND
TERMINOLOGY
42
Datasets
• Collections or groups of related data are
generally referred to as datasets.
• Each group or dataset member (datum)
shares the same set of attributes or
properties as others in the same dataset.
43
Example of Datasets
• Tweets stored in a flat file
• A collection of image files in a directory
• An extract of rows from a database table
stored in a CSV formatted file
• Historical weather observations that are
stored as XML files
44
Data Analysis
• Process of examining data to
find facts, relationships,
patterns, insights and/or
trends.
• Goal: to support better
decision making
• Help establish patterns and
relationships among the
data being analyzed 45
Data Analytics
• Discipline that includes the management
of the complete data lifecycle, which
encompasses collecting, cleansing,
organizing, storing, analyzing and
governing data.
• Involves both development of analysis
methods and scientific technique and
automated tools.
46
Data Analytics
• Developed methods that allow data
analysis to occur through the use of highly
scalable distributed technologies and
frameworks that are capable of analyzing
large volumes of data from different
sources.
• Enable data-driven decision-making with
scientific backing so that decisions can be
based on factual data and not simply on
past experience or intuition alone. 47
Data Analytics Categories
There are four general categories of analytics
that are distinguished by the results they
produce:
1. Descriptive Analytics
2. Diagnostic Analytics
3. Predictive Analytics
4. Prescriptive Analytics
48
Descriptive Analytics
• Carried out to answer questions about events
that have already occurred.
• Contextualizes data to generate information.
• Often carried out via ad-hoc reporting or
dashboards.
• The reports are generally
static in nature and
display historical data that
is presented in the form
of data grids or charts.
49
Diagnostic Analytics
• Determine the cause of a phenomenon that
occurred in the past using questions that focus
on the reason behind the event.
• Require collecting data from multiple sources
and storing it in a structure
• To performing
drill-down and
roll-up analysis.
50
Predictive Analytics
• Carried out in an attempt to determine the
outcome of an event that might occur in the
future.
• The models used for predictive analytics have
implicit dependencies on the conditions under
which the past events occurred.
51
Prescriptive Analytics
• Build upon the results
of predictive analytics
by prescribing actions
that should be taken.
• The focus is not only on
which prescribed
option is best to follow,
but why.
• For management an
advantage or mitigate a
risk. 52
Value & Complexity
53
Business Intelligence (BI)
54
• Enables an organization to gain insight into
the performance of an enterprise using
analyzed data.
• The analyzed data is generated by its business
processes and information systems.
• The results of the analysis can be used by
management to steer the business in an
effort to correct detected issues or otherwise
enhance organizational performance.
Business Intelligence (BI)
• BI applies analytics to large amounts of
data across the enterprise, which has
typically been consolidated into an
enterprise data warehouse to run
analytical queries.
55
Business Intelligence (BI)
• The output of BI can be surfaced to a
dashboard
• Allows managers to access and analyze the
results
• And to potentially refine the analytic
queries to further explore the data.
56
Key Performance Indices
(KPIs)
• A metric that can be used to gauge
success within a particular business
context.
• Linked with an enterprise’s overall
strategic goals and objectives.
• Often used to identify business
performance problems and demonstrate
regulatory compliance. 57
Key Performance Indices
(KPIs)
• Act as quantifiable reference points for
measuring a specific aspect of a business’
overall performance.
58
BIG DATA CHARACTERISTICS 59
Big Data Definition
• For someone, it is a buzzword that is trying
to address all this “new” needing of
processing a lot of data.
• Usually use the “Three V” to define Big
Data
60
Volume
• The anticipated volume of data that is
processed by Big Data solutions is substantial
and ever-growing.
• High data volumes
impose distinct data
storage and processing
demands, as well as
additional data
preparation, curation
and management
processes.
61
Velocity
• In Big Data environments, data
can arrive at fast speeds.
• Enormous datasets can
accumulate within very short
periods of time.
• Coping with the fast inflow of
data requires the enterprise to
design highly elastic and
available data processing
solutions and corresponding
data storage capabilities 62
Variety
• The multiple formats and types of data that
need to be supported by Big Data solutions.
• Data variety brings challenges for enterprises
in terms of data integration, transformation,
processing, and storage.
63
The “3Vs”
64
… And More Challenge
65
Veracity
• Veracity refers to the quality or fidelity of
data.
• Data that enters Big Data environments
needs to be assessed for quality, which can
lead to data processing activities to resolve
invalid data and remove noise.
• Noise is data that cannot be converted
into information and thus has no value,
whereas signals have value and lead to
meaningful information 66
The 4Vs
Source: IBM.com
67
Various Data States
68
The 5Vs
Big Data
Volume
Velocity
Variety
Veracity
Value
69
Value
• Value is defined as the usefulness of data for
an enterprise.
• Value is also dependent on how long data
processing takes.
• The longer it takes for data to be turned into
meaningful information, the less value it has
for a business.
70
DIFFERENT TYPES OF DATA 71
What’s kind of Data ?
• Structured
• Relational DB,
• Library Catalogues (date, author, place,
subject, etc.,)
• Semi Structured
• CSV, XML, JSON, NoSQL database
• Unstructured
72
Structured Data
• Conforms to a data model or schema and
is often stored in tabular form.
• Used to capture relationships between
different entities and is therefore most
often stored in a relational database.
• Frequently generated by enterprise
applications and information systems like
ERP and CRM systems.
• Rarely requires special consideration in
regards to processing or storage.
73
Unstructured Data
• Data that does not conform to a data model
or data schema is known as unstructured
data.
• It is estimated that unstructured data makes
up 80% of the data within any given
enterprise.
• Unstructured data has a faster growth rate
than structured data.
• This form of data is either textual or binary
and often conveyed via files that are self-
contained and non-relational. 74
Semi-structured Data
• Has a defined level of structure and
consistency that is not relational in nature but
is hierarchical or graph-based.
• This kind of data is commonly stored in files
that contain text.
• It conforms to some level of structure, it is
more easily processed than unstructured
data.
• Often has special pre-processing and storage
requirements, especially if the underlying
format is not text-based. 75
Unstructured Data
• Machine Generated
• Satellite images
• Scientific data
• Photographs and video
• Radar or sonar data
• Human Generated
• Word, PDF, Text
• Social media data (Facebook, Twitter, LinkedIn)
• Mobile data (text messages)
• website contents (blogs, Instagram)
76
Metadata
• Provides information about a dataset’s
characteristics and structure.
• Mostly machine-generated and can be
appended to data.
• The tracking of metadata is crucial to Big
Data processing, storage and analysis – it
provides information about the pedigree
of the data and its provenance during
processing. 77
Data Type Summary
78
BIG DATA DRIVER: MARKET DYNAMICS 79
Overview
• Businesses entrenched and worked to
improve their efficiency and effectiveness
to stabilize their profitability by reducing
costs.
• Companies began to focus outward,
looking to find new customers and keep
existing customers from defecting.
• They offer new products and services and
delivering increased value propositions to
customers. 80
External Data
• Companies need to expand their Business
Intelligence activities beyond retrospection
on extracted internal information.
• Open themselves to external data sources as
a means of sensing the marketplace and their
position within it.
• External data could brings additional context
to their internal data
• Allows a corporation to move up the analytic
value chain from hindsight to insight and
foresight. 81
DIKW Pyramid
• Shows how data can be:
• enriched with context to create information
• information can be supplied with meaning to create knowledge
• knowledge can be integrated to form wisdom.
82
BIG DATA DRIVER:
BUSINESS ARCHITECTURE (BA)
83
Overview
• BA provides a means of blueprinting or
concretely expressing the design of the
business.
• It helps an organization align its strategic
vision with its underlying execution.
• It includes linkages from abstract concepts
to more concrete ones.
• Linkages provide guidance as to how to
align the business and its information
technology.
84
Business as Layered System
• Top layer: strategic layer occupied by C-
level executives and advisory groups
• Middle layer: tactical or managerial layer
that seeks to steer the organization in
alignment with the strategy
• Bottom layer: operations layer where a
business executes its core processes and
delivers value to its customers.
85
Business as Layered System
• Each layer’s goals and objectives are
influenced by and often defined by the
layer above.
• Communication flows bottom-up via the
collection of metrics.
• Activity monitoring at the operations layer
generates Performance Indicators (PIs) and
metrics.
86
Business as Layered System
• They get aggregated to create Key
Performance Indicators (KPIs) used at the
tactical layer.
• KPIs can be aligned with Critical Success
Factors (CSFs) at the strategic layer.
87
Big Data & Business Layers
• Big Data has ties to business architecture at each of
the organizational layers.
• It help convert data into information (what) and
provide meaning to generate knowledge (how) from
information.
• The information can be examined to answer
questions regarding how the business is performing.
• With such knowledge, the strategic layer could
provide insight (why) of which the best strategy
needs be adopted in order to enhance the
performance.
88
DIKW Pyramid & Business
Layers
Modified DIKW pyramid that aligns
with Strategic, Tactical and
Operational corporate levels
89
Feed Back Loop
• The strategic layer drives response via the
application of judgment by making
decisions that are communicated as
constraints to the tactical layer.
• The tactical layer leverages this knowledge
to generate priorities and actions that
conform to corporate direction.
• These actions adjust the execution of
business at the operational layer. 90
Feed Back Loop
• The change in the experience of internal
stakeholders and external customers as they
deliver and consume business services should
be measurable.
• This change(result) should surface and be
visible in the data in the form of changed PIs
that are then aggregated into KPIs.
• Over time, the strategic and management
layers injection of judgment and action into
the loop will serve to refine the delivery of
business services.
91
The “Anatomy of Knowledge”
An organization can relate and align its organizational layers
by creating a virtuous cycle via a feedback loop.
92
BIG DATA DRIVER:
INFORMATION AND
COMMUNICATIONS TECHNOLOGY 93
Data Analytics & Data Science
• To find new insights that can drive more
efficient and effective operations, provide
management the ability to steer the
business proactively.
• Allow the C-suite to better formulate and
assess their strategic initiatives.
• Looking for new ways to gain a
competitive edge.
94
Digitization
95
• The use of digital artifacts saves both time
and cost.
• As consumers connect to a business
through their interaction with these digital
substitutes, it leads to an opportunity to
collect further “secondary” data.
Digitization
• Collecting secondary data can be
important for businesses for:
• customized marketing
• automated recommendations
• development of optimized product
features.
96
Affordable Technology
• Technology capable of
storing and processing
large quantities of
diverse data has become
increasingly affordable.
• Big Data solutions often
leverage open-source
software that executes
on commodity hardware.
97
Social Media
• Has empowered customers to provide feedback
in near-real-time via open and public mediums.
• businesses are storing increasing amounts of
data from social media sites.
• This information feeds Big Data analysis
algorithms that provide:
• better levels of service
• increase sales
• enable targeted marketing
• create new products and services. 98
Hyper-connected communities
and devices
• The internet and the proliferation of cellular
and wi-fi networks has enabled more people
and their devices to be continuously.
• The proliferation of internet connected sensors
such as the internet of things (IOT) generate the
number of available data streams increase.
99
Cloud Computing
• Allows to the creation
of environments that
are capable of
providing highly
scalable, on-demand
IT resources.
100

More Related Content

What's hot

Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : ConceptsPragya Pandey
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceMahir Haque
 
Analysis of big data and analytics market in latin america
Analysis of big data and analytics market in latin americaAnalysis of big data and analytics market in latin america
Analysis of big data and analytics market in latin americaLeandro Scalize
 
Big Data in e-Commerce
Big Data in e-CommerceBig Data in e-Commerce
Big Data in e-CommerceDivante
 
Big data introduction
Big data introductionBig data introduction
Big data introductionvikas samant
 
DAS Slides: Master Data Management — Aligning Data, Process, and Governance
DAS Slides: Master Data Management — Aligning Data, Process, and GovernanceDAS Slides: Master Data Management — Aligning Data, Process, and Governance
DAS Slides: Master Data Management — Aligning Data, Process, and GovernanceDATAVERSITY
 
Data visualization in a Nutshell
Data visualization in a NutshellData visualization in a Nutshell
Data visualization in a NutshellWingChan46
 
Big data introduction
Big data introductionBig data introduction
Big data introductionChirag Ahuja
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceEdureka!
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data ScienceKenny Daniel
 
Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015Carl Anderson
 

What's hot (20)

Big data
Big dataBig data
Big data
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : Concepts
 
Big Data
Big DataBig Data
Big Data
 
Big data
Big dataBig data
Big data
 
Big Data analytics
Big Data analyticsBig Data analytics
Big Data analytics
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Analysis of big data and analytics market in latin america
Analysis of big data and analytics market in latin americaAnalysis of big data and analytics market in latin america
Analysis of big data and analytics market in latin america
 
Big Data use cases in telcos
Big Data use cases in telcosBig Data use cases in telcos
Big Data use cases in telcos
 
Big Data in e-Commerce
Big Data in e-CommerceBig Data in e-Commerce
Big Data in e-Commerce
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 
DAS Slides: Master Data Management — Aligning Data, Process, and Governance
DAS Slides: Master Data Management — Aligning Data, Process, and GovernanceDAS Slides: Master Data Management — Aligning Data, Process, and Governance
DAS Slides: Master Data Management — Aligning Data, Process, and Governance
 
Big data
Big dataBig data
Big data
 
Big Data
Big DataBig Data
Big Data
 
Data visualization in a Nutshell
Data visualization in a NutshellData visualization in a Nutshell
Data visualization in a Nutshell
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
 
Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015
 

Similar to 000 introduction to big data analytics 2021

001 More introduction to big data analytics
001   More introduction to big data analytics001   More introduction to big data analytics
001 More introduction to big data analyticsDendej Sawarnkatat
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdfssuser0413ec
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptxinfinix8
 
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataFoundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataPrecisely
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data AnalyticsUtkarsh Sharma
 
Transform Your Downstream Cloud Analytics with Data Quality 
Transform Your Downstream Cloud Analytics with Data Quality Transform Your Downstream Cloud Analytics with Data Quality 
Transform Your Downstream Cloud Analytics with Data Quality Precisely
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Group 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxGroup 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxsalutiontechnology
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
2. Business Data Analytics and Technology.pptx
2. Business Data Analytics and Technology.pptx2. Business Data Analytics and Technology.pptx
2. Business Data Analytics and Technology.pptxnirmalanr2
 
Presentation1 (1).pptx
Presentation1 (1).pptxPresentation1 (1).pptx
Presentation1 (1).pptxDat Trinh
 

Similar to 000 introduction to big data analytics 2021 (20)

001 More introduction to big data analytics
001   More introduction to big data analytics001   More introduction to big data analytics
001 More introduction to big data analytics
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdf
 
lec1.pdf
lec1.pdflec1.pdf
lec1.pdf
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataFoundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Transform Your Downstream Cloud Analytics with Data Quality 
Transform Your Downstream Cloud Analytics with Data Quality Transform Your Downstream Cloud Analytics with Data Quality 
Transform Your Downstream Cloud Analytics with Data Quality 
 
KIT601 Unit I.pptx
KIT601 Unit I.pptxKIT601 Unit I.pptx
KIT601 Unit I.pptx
 
Big data
Big dataBig data
Big data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Trends in data analytics
Trends in data analyticsTrends in data analytics
Trends in data analytics
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 
Data driven decision making
Data driven decision makingData driven decision making
Data driven decision making
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Group 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxGroup 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptx
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
2. Business Data Analytics and Technology.pptx
2. Business Data Analytics and Technology.pptx2. Business Data Analytics and Technology.pptx
2. Business Data Analytics and Technology.pptx
 
Presentation1 (1).pptx
Presentation1 (1).pptxPresentation1 (1).pptx
Presentation1 (1).pptx
 

Recently uploaded

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

000 introduction to big data analytics 2021

  • 1. Introduction to Big Data Analytics Instructor Dendej Sawarnkatat dendej@gmail.com
  • 2. Agenda • What is Big Data? • Concepts and Terminology • Big Data Characteristics • Different Types of Data • Case Study Background • Marketplace Dynamics • Business Architecture • Information and Communications Technology 2
  • 8. Come in 60 Seconds ? 8
  • 9. Come in 60 Seconds? 9
  • 15. Social Media Spent Hours 15
  • 29. M2M Data • Data generated by different sources around us like automated systems, sensors and mobile devices. • 2.5 quintillion bytes of data created everyday. • 80-90% of the data in the world today has been created in the last two years alone. 29
  • 30. Flood of Data • More than 4.5 billion internet users in the world today. • The New York Stock Exchange generates about 4-5 TB of data per day. • 7TB of data are processed by Twitter every day. • 10TB of data are processed by Facebook every day and growing at 7 PB per month. 30
  • 31. Storage is Growing FAST!!!! 31
  • 33. Flood of Data (cont’d) • Interestingly 80% of these data are unstructured. • With this massive quantity of data, businesses need fast, reliable, deeper data insight. • Therefore, Big Data solutions based on Hadoop and other analytics software are becoming more and more relevant. 33
  • 35. Handling Humongous Data • Traditional approaches not fit for data analysis due to inflation. • Handling Large volume of data which are structured or unstructured. • Datasets that grow so large that it is difficult to capture, store, manage, share, analyze and visualize with the typical database software tools. 35
  • 37. Big Data Analytic Applications • Analysis of market and derive new strategy to improve business in different geo locations. • To know the response for their campaigns, promotions, and other advertising mediums. • Use medical history of patients, hospitals to provide better and quick service. 37
  • 38. Big Data Analytic Applications • Perform Risk Analysis. • Create new revenue streams. • Reduces maintenance cost. • Faster, better decision making. • New products & services. • Etc 38
  • 39. Data Science as Tool • Involves using methods to analyze massive amounts of data and extract the knowledge it contains. • Data science and big data evolved from statistics and traditional data management but are now considered to be distinct disciplines. 39
  • 41. Data Science Processes 1. Setting the research goal 2. Retrieving data 3. Cleansing, integrating, and transforming data 4. Exploratory data analysis 5. Building model(s) 6. Presenting of finding (insights) 41
  • 42. BIG DATA: CONCEPTS AND TERMINOLOGY 42
  • 43. Datasets • Collections or groups of related data are generally referred to as datasets. • Each group or dataset member (datum) shares the same set of attributes or properties as others in the same dataset. 43
  • 44. Example of Datasets • Tweets stored in a flat file • A collection of image files in a directory • An extract of rows from a database table stored in a CSV formatted file • Historical weather observations that are stored as XML files 44
  • 45. Data Analysis • Process of examining data to find facts, relationships, patterns, insights and/or trends. • Goal: to support better decision making • Help establish patterns and relationships among the data being analyzed 45
  • 46. Data Analytics • Discipline that includes the management of the complete data lifecycle, which encompasses collecting, cleansing, organizing, storing, analyzing and governing data. • Involves both development of analysis methods and scientific technique and automated tools. 46
  • 47. Data Analytics • Developed methods that allow data analysis to occur through the use of highly scalable distributed technologies and frameworks that are capable of analyzing large volumes of data from different sources. • Enable data-driven decision-making with scientific backing so that decisions can be based on factual data and not simply on past experience or intuition alone. 47
  • 48. Data Analytics Categories There are four general categories of analytics that are distinguished by the results they produce: 1. Descriptive Analytics 2. Diagnostic Analytics 3. Predictive Analytics 4. Prescriptive Analytics 48
  • 49. Descriptive Analytics • Carried out to answer questions about events that have already occurred. • Contextualizes data to generate information. • Often carried out via ad-hoc reporting or dashboards. • The reports are generally static in nature and display historical data that is presented in the form of data grids or charts. 49
  • 50. Diagnostic Analytics • Determine the cause of a phenomenon that occurred in the past using questions that focus on the reason behind the event. • Require collecting data from multiple sources and storing it in a structure • To performing drill-down and roll-up analysis. 50
  • 51. Predictive Analytics • Carried out in an attempt to determine the outcome of an event that might occur in the future. • The models used for predictive analytics have implicit dependencies on the conditions under which the past events occurred. 51
  • 52. Prescriptive Analytics • Build upon the results of predictive analytics by prescribing actions that should be taken. • The focus is not only on which prescribed option is best to follow, but why. • For management an advantage or mitigate a risk. 52
  • 54. Business Intelligence (BI) 54 • Enables an organization to gain insight into the performance of an enterprise using analyzed data. • The analyzed data is generated by its business processes and information systems. • The results of the analysis can be used by management to steer the business in an effort to correct detected issues or otherwise enhance organizational performance.
  • 55. Business Intelligence (BI) • BI applies analytics to large amounts of data across the enterprise, which has typically been consolidated into an enterprise data warehouse to run analytical queries. 55
  • 56. Business Intelligence (BI) • The output of BI can be surfaced to a dashboard • Allows managers to access and analyze the results • And to potentially refine the analytic queries to further explore the data. 56
  • 57. Key Performance Indices (KPIs) • A metric that can be used to gauge success within a particular business context. • Linked with an enterprise’s overall strategic goals and objectives. • Often used to identify business performance problems and demonstrate regulatory compliance. 57
  • 58. Key Performance Indices (KPIs) • Act as quantifiable reference points for measuring a specific aspect of a business’ overall performance. 58
  • 60. Big Data Definition • For someone, it is a buzzword that is trying to address all this “new” needing of processing a lot of data. • Usually use the “Three V” to define Big Data 60
  • 61. Volume • The anticipated volume of data that is processed by Big Data solutions is substantial and ever-growing. • High data volumes impose distinct data storage and processing demands, as well as additional data preparation, curation and management processes. 61
  • 62. Velocity • In Big Data environments, data can arrive at fast speeds. • Enormous datasets can accumulate within very short periods of time. • Coping with the fast inflow of data requires the enterprise to design highly elastic and available data processing solutions and corresponding data storage capabilities 62
  • 63. Variety • The multiple formats and types of data that need to be supported by Big Data solutions. • Data variety brings challenges for enterprises in terms of data integration, transformation, processing, and storage. 63
  • 65. … And More Challenge 65
  • 66. Veracity • Veracity refers to the quality or fidelity of data. • Data that enters Big Data environments needs to be assessed for quality, which can lead to data processing activities to resolve invalid data and remove noise. • Noise is data that cannot be converted into information and thus has no value, whereas signals have value and lead to meaningful information 66
  • 70. Value • Value is defined as the usefulness of data for an enterprise. • Value is also dependent on how long data processing takes. • The longer it takes for data to be turned into meaningful information, the less value it has for a business. 70
  • 72. What’s kind of Data ? • Structured • Relational DB, • Library Catalogues (date, author, place, subject, etc.,) • Semi Structured • CSV, XML, JSON, NoSQL database • Unstructured 72
  • 73. Structured Data • Conforms to a data model or schema and is often stored in tabular form. • Used to capture relationships between different entities and is therefore most often stored in a relational database. • Frequently generated by enterprise applications and information systems like ERP and CRM systems. • Rarely requires special consideration in regards to processing or storage. 73
  • 74. Unstructured Data • Data that does not conform to a data model or data schema is known as unstructured data. • It is estimated that unstructured data makes up 80% of the data within any given enterprise. • Unstructured data has a faster growth rate than structured data. • This form of data is either textual or binary and often conveyed via files that are self- contained and non-relational. 74
  • 75. Semi-structured Data • Has a defined level of structure and consistency that is not relational in nature but is hierarchical or graph-based. • This kind of data is commonly stored in files that contain text. • It conforms to some level of structure, it is more easily processed than unstructured data. • Often has special pre-processing and storage requirements, especially if the underlying format is not text-based. 75
  • 76. Unstructured Data • Machine Generated • Satellite images • Scientific data • Photographs and video • Radar or sonar data • Human Generated • Word, PDF, Text • Social media data (Facebook, Twitter, LinkedIn) • Mobile data (text messages) • website contents (blogs, Instagram) 76
  • 77. Metadata • Provides information about a dataset’s characteristics and structure. • Mostly machine-generated and can be appended to data. • The tracking of metadata is crucial to Big Data processing, storage and analysis – it provides information about the pedigree of the data and its provenance during processing. 77
  • 79. BIG DATA DRIVER: MARKET DYNAMICS 79
  • 80. Overview • Businesses entrenched and worked to improve their efficiency and effectiveness to stabilize their profitability by reducing costs. • Companies began to focus outward, looking to find new customers and keep existing customers from defecting. • They offer new products and services and delivering increased value propositions to customers. 80
  • 81. External Data • Companies need to expand their Business Intelligence activities beyond retrospection on extracted internal information. • Open themselves to external data sources as a means of sensing the marketplace and their position within it. • External data could brings additional context to their internal data • Allows a corporation to move up the analytic value chain from hindsight to insight and foresight. 81
  • 82. DIKW Pyramid • Shows how data can be: • enriched with context to create information • information can be supplied with meaning to create knowledge • knowledge can be integrated to form wisdom. 82
  • 83. BIG DATA DRIVER: BUSINESS ARCHITECTURE (BA) 83
  • 84. Overview • BA provides a means of blueprinting or concretely expressing the design of the business. • It helps an organization align its strategic vision with its underlying execution. • It includes linkages from abstract concepts to more concrete ones. • Linkages provide guidance as to how to align the business and its information technology. 84
  • 85. Business as Layered System • Top layer: strategic layer occupied by C- level executives and advisory groups • Middle layer: tactical or managerial layer that seeks to steer the organization in alignment with the strategy • Bottom layer: operations layer where a business executes its core processes and delivers value to its customers. 85
  • 86. Business as Layered System • Each layer’s goals and objectives are influenced by and often defined by the layer above. • Communication flows bottom-up via the collection of metrics. • Activity monitoring at the operations layer generates Performance Indicators (PIs) and metrics. 86
  • 87. Business as Layered System • They get aggregated to create Key Performance Indicators (KPIs) used at the tactical layer. • KPIs can be aligned with Critical Success Factors (CSFs) at the strategic layer. 87
  • 88. Big Data & Business Layers • Big Data has ties to business architecture at each of the organizational layers. • It help convert data into information (what) and provide meaning to generate knowledge (how) from information. • The information can be examined to answer questions regarding how the business is performing. • With such knowledge, the strategic layer could provide insight (why) of which the best strategy needs be adopted in order to enhance the performance. 88
  • 89. DIKW Pyramid & Business Layers Modified DIKW pyramid that aligns with Strategic, Tactical and Operational corporate levels 89
  • 90. Feed Back Loop • The strategic layer drives response via the application of judgment by making decisions that are communicated as constraints to the tactical layer. • The tactical layer leverages this knowledge to generate priorities and actions that conform to corporate direction. • These actions adjust the execution of business at the operational layer. 90
  • 91. Feed Back Loop • The change in the experience of internal stakeholders and external customers as they deliver and consume business services should be measurable. • This change(result) should surface and be visible in the data in the form of changed PIs that are then aggregated into KPIs. • Over time, the strategic and management layers injection of judgment and action into the loop will serve to refine the delivery of business services. 91
  • 92. The “Anatomy of Knowledge” An organization can relate and align its organizational layers by creating a virtuous cycle via a feedback loop. 92
  • 93. BIG DATA DRIVER: INFORMATION AND COMMUNICATIONS TECHNOLOGY 93
  • 94. Data Analytics & Data Science • To find new insights that can drive more efficient and effective operations, provide management the ability to steer the business proactively. • Allow the C-suite to better formulate and assess their strategic initiatives. • Looking for new ways to gain a competitive edge. 94
  • 95. Digitization 95 • The use of digital artifacts saves both time and cost. • As consumers connect to a business through their interaction with these digital substitutes, it leads to an opportunity to collect further “secondary” data.
  • 96. Digitization • Collecting secondary data can be important for businesses for: • customized marketing • automated recommendations • development of optimized product features. 96
  • 97. Affordable Technology • Technology capable of storing and processing large quantities of diverse data has become increasingly affordable. • Big Data solutions often leverage open-source software that executes on commodity hardware. 97
  • 98. Social Media • Has empowered customers to provide feedback in near-real-time via open and public mediums. • businesses are storing increasing amounts of data from social media sites. • This information feeds Big Data analysis algorithms that provide: • better levels of service • increase sales • enable targeted marketing • create new products and services. 98
  • 99. Hyper-connected communities and devices • The internet and the proliferation of cellular and wi-fi networks has enabled more people and their devices to be continuously. • The proliferation of internet connected sensors such as the internet of things (IOT) generate the number of available data streams increase. 99
  • 100. Cloud Computing • Allows to the creation of environments that are capable of providing highly scalable, on-demand IT resources. 100

Editor's Notes

  1. How to learn Emphasize on programming with Java Apology for document Document is not quite complete Some parts are irrelevant Some just get added because of its interesting nature Some are missing Some are not part of this documentß Student must lecture on undocumented details
  2. Where do data come from? => People Data creating devices - computer - mobile - other kinds of devices
  3. Other devices - Smart TV - Games consoles - Smart Home / Smart appliance - smart personal gadget
  4. We spend 3h 39m on smart phone
  5. Search Text Video email
  6. Google. And Bind
  7. Social media is one of the biggest sources of data 98.8% accessing from mobile
  8. East + South + SEA > 50% of users
  9. 25 – 34 new work force - see advertisement of - first car - first condo - new investor training - cosmetics
  10. 2H 16m on Social Media ¼ of internet users use SM for work. -> social media incorporate real business functionality Line app Line Payment
  11. English, Spanish India (Hindi) 3rd Middle east – Arabic Indonesian Top 5 > 80%
  12. 1 person has 1.3 mobile or 3 persons own 4 mobile. ¾ has access to internet ¾ actively use SM.
  13. Heating market is streaming New kinds of devices are coming - Smart Home device - Smart watch - VR
  14. Using Internet ~ 9Hrs => what do they do? On social ~ 3 Hrs On TV ~ 3.5 Hrs Advertising goto mobile => cheaper / targeting
  15. Internet users grows about 2%
  16. 97% of internet user is from mobile Use internet on Mobile ~ 5hrs
  17. SM users growth = 4.7% 99% access from mobile
  18. Interesting Non-US products are LINE and Tiktok On 2021 tiktok reachs 54%
  19. Sensor ( traffic cam, satellite) , IoT, Smart Appliances Quintillion => America (10**18) / English (10**30)
  20. Unstructure data mostly generated by human both social and business Business need insight Big data solution in need
  21. Need of new solution to handle massive data
  22. Handling Large volume of data (Zetta Bytes & Yotta Bytes) which are structured or unstructured.
  23. Data Island – each machine keeps its data Datawarehouse – centralized - BI on top – dashboard / reporting - IT & DBAs control and analyze Analytics Sanbox - suite of tool and solution - replication (cheap storage) - business analysts control and analyze data
  24. Gather / collect => group together Each group share attribute / property or feature
  25. Log file Facebook json data
  26. Eg. Pole data - Min / Max / Average of age , salary, education level, etc
  27. the reality that the generation of high value analytic results increases the complexity and cost of the analytic environment.
  28. Answer to What Example questions: What was the sales volume over the past 12 months? What is the number of support calls received as categorized by severity and geographic location? What is the monthly commission earned by each sales agent? Queries are executed on operational data stores from within an enterprise, for example a Customer Relationship Management system (CRM) or Enterprise Resource Planning (ERP) system
  29. Answer to Why Such questions include: • Why were Q2 sales less than Q1 sales? • Why have there been more support calls originating from the Eastern region than from the Western region? • Why was there an increase in patient re-admission rates over the past three months? A feature of Roll-Up Properties, which aggregate data from multiple source properties into a single property. Roll-Up Reporting is a special kind of reporting that lets you analyze the aggregated data that's in a Roll-Up Property. A drill down report is a report which allows users to navigate to a different layer of data granularity by navigating and clicking a specific data element on a web page or in an application. Drill down allows users to explore multidimensional data by navigating from one level down to a more detailed level.
  30. Questions are usually formulated using a what-if rationale, such as the following: • What are the chances that a customer will default on a loan if they have missed a monthly payment? • What will be the patient survival rate if Drug B is administered instead of Drug A? • If a customer has purchased Products A and B, what are the chances that they will also purchase Product C?
  31. Sample questions may include: • Among three drugs, which one provides the best results? • When is the best time to trade a particular stock?
  32. Enables an organization to gain insight into the performance of an enterprise by analyzing data generated by its business processes and information systems. The results of the analysis can be used by management to steer the business in an effort to correct detected issues or otherwise enhance organizational performance. >>>> Insight about performance from analyzed data >>>> analyzed data are collected from business process and IT system >>>> result of analysis help management make a decision to solve the root clause or problems.
  33. System of S/W & H/W >>>> BI need data from across enterprise >>>> gathers at centralized data warehouse and run analytics queries.
  34. >>>> graphical output on dashboard >>>> allows manger to easily interpret >>> later, can refine quire to gain further insight or answer
  35. KPIs are often displayed via a KPI dashboard. The dashboard consolidates the display of multiple KPIs and compares the actual measurements with threshold values that define the acceptable value range of the KPI. >>> metric or gauge of business performance >>>> link with goals and objective of enterprise >>>> to identify problem >>>> show how much compliance to regulation
  36. >>>> reference point to measure the overall performance
  37. Right time at the right place
  38. Due to the abundance of tools and databases that natively support structured data, it rarely requires special consideration in regards to processing or storage. Examples of this type of data include banking transactions, invoices, and customer records.
  39. A text file may contain the contents of various tweets or blog postings. Binary files are often media files that contain image, audio or video data. Technically, both text and binary files have a structure defined by the file format itself, but this aspect is disregarded, and the notion of being unstructured is in relation to the format of the data contained in the file itself.
  40. Due to the textual nature of this data and its conformance to some level of structure, it is more easily processed than unstructured data. Examples of common sources of semi-structured data include electronic data interchange (EDI) files, spreadsheets, RSS feeds and sensor data. An example of pre-processing of semi-structured data would be the validation of an XML file to ensure that it conformed to its schema definition.
  41. Examples of metadata include: • XML tags providing the author and creation date of a document • attributes providing the file size and resolution of a digital photograph Big Data solutions rely on metadata, particularly when processing semi-structured and unstructured data. Data Classification is the classification of data based on its level of sensitivity and the impact to the Organizational Entity or Personal Entity should that data be subject to Disclosure-Alteration-Destruction (DAD) without authorization. Data Provenance is Provenance information relevant or pertaining to evaluating the source or author of the data. Provenance : source , origin , history of ownership Pedegree Data lineage includes the data's origins, what happens to it and where it moves over time.[1] Data lineage gives visibility while greatly simplifying the ability to trace errors back to the root cause in a data analytics process. Data Pedigree refers to the data relationship to an authoritative Entity. Data Pedigree is an attribute of Data Provenance and could be provided as metadata. Data Pedigree should be considered during Data Classification
  42. Part of Big Data Adoption factors
  43. >>> business constantly improvement performance to sustain profit and reduce costs. >>> high competition , need new customers and keep existing ones. >>> offer new product and service >>> or new increased value added of original product and services
  44. >>> expand BI beyond hindsight >>>> To understand ever-changing market needs external data (market share / penetration / demography) >>> add more context (from external data) to internal data >>> gain more insight and foresight Y-on-Y sales drop 10% But global market shrink by 20% => We may still perform better?
  45. Data -> Information e.g. PM2.5 + location Information -> Knowledge http://air4thai.pcd.go.th/webV2/aqi_info.php 0 – 25 very good 26 – 50 good 51 – 100 moderate 101- 200 start affect health 201 – has effect with health Knowledge -> Wisdom PM2.5 > 50 = wearing mass
  46. Building a house -> design on blueprint first Same as business align its strategic vision with its underlying execution whether they be technical resources or human capital. abstract concepts => business mission, vision, strategy and goals concrete ones => business services, organizational structure, key performance indicators and application services.
  47. >>> Upper layer define lower layer’s goals and objectives >>> Lower layer send metric (collected data) upward. >>> Activity monitoring at the operations layer generates Performance Indicators (PIs) and metrics, for both services and processes.
  48. These KPIs can be aligned with Critical Success Factors (CSFs) at the strategic layer, which in turn help measure progress being made toward the achievement of strategic goals and objectives. CSFs are the cause of your success, whereas KPIs are the effects of your actions.  we’re asking “what must we do to be successful?” (CSFs) and “what indicates that we’re winning?” 
  49. With such knowledge, the strategic layer can provide further insight to help answer questions of which strategy needs to change or be adopted in order to correct or enhance the performance.
  50. >>The strategic layer drives response via the application of judgment by >>>>making decisions regarding corporate strategy, policy, goals and objectives >>>>that are communicated as constraints to the tactical layer. >>The tactical layer in turn leverages this knowledge >>>>to generate priorities and actions that conform to corporate direction. >>>>These actions adjust the execution of business at the operational layer.
  51. Recall that KPIs are metrics that can be associated with critical success factors that inform the executive team as to whether or not their strategies are working. >>>> measure change in experience of internal stakeholder and external customers >>>> result (change) should be visible in the form of collect data as PIs which, in turn, get aggregated into KPIs. >>>> judgement and action would lead to refining of business services
  52. a diagram produced by Joe Gollner in his blog post “The Anatomy of Knowledge
  53. >>> Add on-line channel >>> Collect interaction between user and digitized data as ”secondary data”
  54. Netflix, amazon recommendation