7. Machine vs. human generated data
CRISP-MD CRoss-Industry Standard Process for Data Mining
Plan
Understand the business
Acquire
Data Understanding
Transform
Data Preparation
Model
Build &Testing
Visualise
Communicate and evaluate
Score
Productionize the model
• What are your
business goals?
• What outcome are
you trying to change
• Are the answers you
are looking for:
• Descriptive
• Predictive
• Prescriptive
• Are there
constraints to the
use of your data
• What is the
meaning and
relevance of your
data
• What sampling
methods were used
• Cleanse the data
• Analyse/reduce
variables
• Plot the data
• Discover first
insights into the
data
• Select and build a
model
• Train the model
• Validate the model
• Does the model
teach us anything
• Communicate and
visualise the results
• What did we learn
• Do the results make
sense
• Can we deploy the
model
• Publish/deploy the
model
• Implement real-
time data
transformation for
real-time scoring
• Schedule data
transformation for
batch scoring
• Make informed
decisions
9. How do you collect and process this analog information, to
transform into useful business insights?
9
Information Insight
Internet of Things
10. How do you collect and process this analog information, to
transform into useful business insights?
10
Information Insight
Internet of Things
Daily shipments in tons
throughout the year
“Winter schedule”
“Summer schedule”
“Winter schedule”
JAN APR JUL OCT DEC
Daily shipments in tons
throughout the year
JAN APR JUL OCT DEC
“Transmetrics schedule”
Traditional network capacity planning: without prediction Predictive network capacity planning: with prediction models
Analog Data Digital Data
11. What has been holding people back?
11
Cities TransportationIndustrial
14. 14
Collect data Clean data Identify patterns Make prediction
Data Understanding
hindsight insight foresight
15. The type of the information you have for a device:
15
Colect Data Clean Data Identify patterns Make prediction
Devices-
Gateway
Connectivity
Eventpipeline
AnalyticsApplications
16. The type of the information you have for a device:
16
Assets/Beacons Access Points Wi-Fi Router
Inventory Interface
19. Even when they aren’t lying, sensors don’t always tell the whole truth
19
This might be a problem… … or loose device connection.
20. Even when they aren’t lying, sensors don’t always tell the whole truth
20
What the sensor reads… … what the control unit stores and forwards
21. Extracting useful signal from time-series sensor data requires ‘multi-genre’
Predictive Analytics – and additional data
21
Analytics
Capture full-fidelity
data to enable use-
case specific event
detection
Interpolation of
missing values,
corrections,
recalibration, etc.
Identification of state
change; matching
Comparison and
correlation with other
systems(CRM,
Marketing, etc.)
Raw sensor data
Raw sensor data from
adjacent sensors;
Master data
Alert data; historical
data; environmental
data
-
Interpolation;
Neural networks;
Smoothing
Time-series;
Pattern recognition;
Event mapping
Graph; Clustering;
Predictions; Decisions
trees
Whole device
historical data
Raw IoT data Cleansed IoT data Event Detection Path to association
A
A
A
Comparison and
correlation with
human observations
Text and network
analytics
Maintenance and
operational data
Labelled IoT data
Data
Process