000 introduction to big data analytics 2021

Introduction to
Big Data Analytics
Instructor
Dendej Sawarnkatat
dendej@gmail.com

Agenda
• What is Big Data?
• Concepts and Terminology
• Big Data Characteristics
• Different Types of Data
• Case Study Background
• Marketplace Dynamics
• Business Architecture
• Information and Communications
Technology
2

Thailand Mobile Internet
Usage
21

Thailand Social Media
Overview
23

Thailand Social Media
Platforms
24

M2M Data
• Data generated by different sources
around us like automated systems, sensors
and mobile devices.
• 2.5 quintillion bytes of data created
everyday.
• 80-90% of the data in the world today has
been created in the last two years alone.
29

Flood of Data
• More than 4.5 billion internet users in the
world today.
• The New York Stock Exchange generates
about 4-5 TB of data per day.
• 7TB of data are processed by Twitter every
day.
• 10TB of data are processed by Facebook
every day and growing at 7 PB per month. 30

Storage is Growing FAST!!!!
31

Flood of Data (cont’d)
• Interestingly 80% of these data are
unstructured.
• With this massive quantity of data,
businesses need fast, reliable, deeper data
insight.
• Therefore, Big Data solutions based on
Hadoop and other analytics software are
becoming more and more relevant.
33

Handling Humongous Data
• Traditional approaches not fit for data
analysis due to inflation.
• Handling Large volume of data which are
structured or unstructured.
• Datasets that grow so large that it is
difficult to capture, store, manage, share,
analyze and visualize with the typical
database software tools.
35

Big Data Analytic Applications
• Analysis of market and derive new strategy
to improve business in different geo
locations.
• To know the response for their campaigns,
promotions, and other advertising
mediums.
• Use medical history of patients, hospitals
to provide better and quick service.
37

Big Data Analytic Applications
• Perform Risk Analysis.
• Create new revenue streams.
• Reduces maintenance cost.
• Faster, better decision making.
• New products & services.
• Etc
38

Data Science as Tool
• Involves using methods to analyze massive
amounts of data and extract the
knowledge it contains.
• Data science and big data evolved from
statistics and traditional data management
but are now considered to be distinct
disciplines.
39

Data Science Processes
1. Setting the research goal
2. Retrieving data
3. Cleansing, integrating, and transforming
data
4. Exploratory data analysis
5. Building model(s)
6. Presenting of finding (insights)
41

BIG DATA: CONCEPTS AND
TERMINOLOGY
42

Datasets
• Collections or groups of related data are
generally referred to as datasets.
• Each group or dataset member (datum)
shares the same set of attributes or
properties as others in the same dataset.
43

Example of Datasets
• Tweets stored in a flat file
• A collection of image files in a directory
• An extract of rows from a database table
stored in a CSV formatted file
• Historical weather observations that are
stored as XML files
44

Data Analysis
• Process of examining data to
find facts, relationships,
patterns, insights and/or
trends.
• Goal: to support better
decision making
• Help establish patterns and
relationships among the
data being analyzed 45

Data Analytics
• Discipline that includes the management
of the complete data lifecycle, which
encompasses collecting, cleansing,
organizing, storing, analyzing and
governing data.
• Involves both development of analysis
methods and scientific technique and
automated tools.
46

Data Analytics
• Developed methods that allow data
analysis to occur through the use of highly
scalable distributed technologies and
frameworks that are capable of analyzing
large volumes of data from different
sources.
• Enable data-driven decision-making with
scientific backing so that decisions can be
based on factual data and not simply on
past experience or intuition alone. 47

Data Analytics Categories
There are four general categories of analytics
that are distinguished by the results they
produce:
1. Descriptive Analytics
2. Diagnostic Analytics
3. Predictive Analytics
4. Prescriptive Analytics
48

Descriptive Analytics
• Carried out to answer questions about events
that have already occurred.
• Contextualizes data to generate information.
• Often carried out via ad-hoc reporting or
dashboards.
• The reports are generally
static in nature and
display historical data that
is presented in the form
of data grids or charts.
49

Diagnostic Analytics
• Determine the cause of a phenomenon that
occurred in the past using questions that focus
on the reason behind the event.
• Require collecting data from multiple sources
and storing it in a structure
• To performing
drill-down and
roll-up analysis.
50

Predictive Analytics
• Carried out in an attempt to determine the
outcome of an event that might occur in the
future.
• The models used for predictive analytics have
implicit dependencies on the conditions under
which the past events occurred.
51

Prescriptive Analytics
• Build upon the results
of predictive analytics
by prescribing actions
that should be taken.
• The focus is not only on
which prescribed
option is best to follow,
but why.
• For management an
advantage or mitigate a
risk. 52

Business Intelligence (BI)
54
• Enables an organization to gain insight into
the performance of an enterprise using
analyzed data.
• The analyzed data is generated by its business
processes and information systems.
• The results of the analysis can be used by
management to steer the business in an
effort to correct detected issues or otherwise
enhance organizational performance.

• BI applies analytics to large amounts of
data across the enterprise, which has
typically been consolidated into an
enterprise data warehouse to run
analytical queries.
55

• The output of BI can be surfaced to a
dashboard
• Allows managers to access and analyze the
results
• And to potentially refine the analytic
queries to further explore the data.
56

Key Performance Indices
(KPIs)
• A metric that can be used to gauge
success within a particular business
context.
• Linked with an enterprise’s overall
strategic goals and objectives.
• Often used to identify business
performance problems and demonstrate
regulatory compliance. 57

Key Performance Indices
(KPIs)
• Act as quantifiable reference points for
measuring a specific aspect of a business’
overall performance.
58

Big Data Definition
• For someone, it is a buzzword that is trying
to address all this “new” needing of
processing a lot of data.
• Usually use the “Three V” to define Big
Data
60

Volume
• The anticipated volume of data that is
processed by Big Data solutions is substantial
and ever-growing.
• High data volumes
impose distinct data
storage and processing
demands, as well as
additional data
preparation, curation
and management
processes.
61

Velocity
• In Big Data environments, data
can arrive at fast speeds.
• Enormous datasets can
accumulate within very short
periods of time.
• Coping with the fast inflow of
data requires the enterprise to
design highly elastic and
available data processing
solutions and corresponding
data storage capabilities 62

Variety
• The multiple formats and types of data that
need to be supported by Big Data solutions.
• Data variety brings challenges for enterprises
in terms of data integration, transformation,
processing, and storage.
63

Veracity
• Veracity refers to the quality or fidelity of
data.
• Data that enters Big Data environments
needs to be assessed for quality, which can
lead to data processing activities to resolve
invalid data and remove noise.
• Noise is data that cannot be converted
into information and thus has no value,
whereas signals have value and lead to
meaningful information 66

The 5Vs
Big Data
Volume
Velocity
Variety
Veracity
Value
69

Value
• Value is defined as the usefulness of data for
an enterprise.
• Value is also dependent on how long data
processing takes.
• The longer it takes for data to be turned into
meaningful information, the less value it has
for a business.
70

What’s kind of Data ?
• Structured
• Relational DB,
• Library Catalogues (date, author, place,
subject, etc.,)
• Semi Structured
• CSV, XML, JSON, NoSQL database
• Unstructured
72

Structured Data
• Conforms to a data model or schema and
is often stored in tabular form.
• Used to capture relationships between
different entities and is therefore most
often stored in a relational database.
• Frequently generated by enterprise
applications and information systems like
ERP and CRM systems.
• Rarely requires special consideration in
regards to processing or storage.
73

Unstructured Data
• Data that does not conform to a data model
or data schema is known as unstructured
data.
• It is estimated that unstructured data makes
up 80% of the data within any given
enterprise.
• Unstructured data has a faster growth rate
than structured data.
• This form of data is either textual or binary
and often conveyed via files that are self-
contained and non-relational. 74

Semi-structured Data
• Has a defined level of structure and
consistency that is not relational in nature but
is hierarchical or graph-based.
• This kind of data is commonly stored in files
that contain text.
• It conforms to some level of structure, it is
more easily processed than unstructured
data.
• Often has special pre-processing and storage
requirements, especially if the underlying
format is not text-based. 75

Unstructured Data
• Machine Generated
• Satellite images
• Scientific data
• Photographs and video
• Radar or sonar data
• Human Generated
• Word, PDF, Text
• Social media data (Facebook, Twitter, LinkedIn)
• Mobile data (text messages)
• website contents (blogs, Instagram)
76

Metadata
• Provides information about a dataset’s
characteristics and structure.
• Mostly machine-generated and can be
appended to data.
• The tracking of metadata is crucial to Big
Data processing, storage and analysis – it
provides information about the pedigree
of the data and its provenance during
processing. 77

BIG DATA DRIVER: MARKET DYNAMICS 79

Overview
• Businesses entrenched and worked to
improve their efficiency and effectiveness
to stabilize their profitability by reducing
costs.
• Companies began to focus outward,
looking to find new customers and keep
existing customers from defecting.
• They offer new products and services and
delivering increased value propositions to
customers. 80

External Data
• Companies need to expand their Business
Intelligence activities beyond retrospection
on extracted internal information.
• Open themselves to external data sources as
a means of sensing the marketplace and their
position within it.
• External data could brings additional context
to their internal data
• Allows a corporation to move up the analytic
value chain from hindsight to insight and
foresight. 81

DIKW Pyramid
• Shows how data can be:
• enriched with context to create information
• information can be supplied with meaning to create knowledge
• knowledge can be integrated to form wisdom.
82

BIG DATA DRIVER:
BUSINESS ARCHITECTURE (BA)
83

Overview
• BA provides a means of blueprinting or
concretely expressing the design of the
business.
• It helps an organization align its strategic
vision with its underlying execution.
• It includes linkages from abstract concepts
to more concrete ones.
• Linkages provide guidance as to how to
align the business and its information
technology.
84

Business as Layered System
• Top layer: strategic layer occupied by C-
level executives and advisory groups
• Middle layer: tactical or managerial layer
that seeks to steer the organization in
alignment with the strategy
• Bottom layer: operations layer where a
business executes its core processes and
delivers value to its customers.
85

• Each layer’s goals and objectives are
influenced by and often defined by the
layer above.
• Communication flows bottom-up via the
collection of metrics.
• Activity monitoring at the operations layer
generates Performance Indicators (PIs) and
metrics.
86

• They get aggregated to create Key
Performance Indicators (KPIs) used at the
tactical layer.
• KPIs can be aligned with Critical Success
Factors (CSFs) at the strategic layer.
87

Big Data & Business Layers
• Big Data has ties to business architecture at each of
the organizational layers.
• It help convert data into information (what) and
provide meaning to generate knowledge (how) from
information.
• The information can be examined to answer
questions regarding how the business is performing.
• With such knowledge, the strategic layer could
provide insight (why) of which the best strategy
needs be adopted in order to enhance the
performance.
88

DIKW Pyramid & Business
Layers
Modified DIKW pyramid that aligns
with Strategic, Tactical and
Operational corporate levels
89

Feed Back Loop
• The strategic layer drives response via the
application of judgment by making
decisions that are communicated as
constraints to the tactical layer.
• The tactical layer leverages this knowledge
to generate priorities and actions that
conform to corporate direction.
• These actions adjust the execution of
business at the operational layer. 90

Feed Back Loop
• The change in the experience of internal
stakeholders and external customers as they
deliver and consume business services should
be measurable.
• This change(result) should surface and be
visible in the data in the form of changed PIs
that are then aggregated into KPIs.
• Over time, the strategic and management
layers injection of judgment and action into
the loop will serve to refine the delivery of
business services.
91

The “Anatomy of Knowledge”
An organization can relate and align its organizational layers
by creating a virtuous cycle via a feedback loop.
92

BIG DATA DRIVER:
INFORMATION AND
COMMUNICATIONS TECHNOLOGY 93

Data Analytics & Data Science
• To find new insights that can drive more
efficient and effective operations, provide
management the ability to steer the
business proactively.
• Allow the C-suite to better formulate and
assess their strategic initiatives.
• Looking for new ways to gain a
competitive edge.
94

Digitization
95
• The use of digital artifacts saves both time
and cost.
• As consumers connect to a business
through their interaction with these digital
substitutes, it leads to an opportunity to
collect further “secondary” data.

Digitization
• Collecting secondary data can be
important for businesses for:
• customized marketing
• automated recommendations
• development of optimized product
features.
96

Affordable Technology
• Technology capable of
storing and processing
large quantities of
diverse data has become
increasingly affordable.
• Big Data solutions often
leverage open-source
software that executes
on commodity hardware.
97

Social Media
• Has empowered customers to provide feedback
in near-real-time via open and public mediums.
• businesses are storing increasing amounts of
data from social media sites.
• This information feeds Big Data analysis
algorithms that provide:
• better levels of service
• increase sales
• enable targeted marketing
• create new products and services. 98

Hyper-connected communities
and devices
• The internet and the proliferation of cellular
and wi-fi networks has enabled more people
and their devices to be continuously.
• The proliferation of internet connected sensors
such as the internet of things (IOT) generate the
number of available data streams increase.
99

Cloud Computing
• Allows to the creation
of environments that
are capable of
providing highly
scalable, on-demand
IT resources.
100

000 introduction to big data analytics 2021

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 000 introduction to big data analytics 2021

Similar to 000 introduction to big data analytics 2021 (20)

Recently uploaded

Recently uploaded (20)

000 introduction to big data analytics 2021

Editor's Notes