SlideShare uma empresa Scribd logo
1 de 41
Innovation and
Reinvention Driving
Transformation
OCTOBER 9,
2018
2018 HPCC Systems® Community
Day
Dan S. Camper – Sr. Architect, HPCC Solutions Lab
Data Patterns: A Native Open Source Data Profiling Tool for HPCC Systems
What Is Data Profiling?
• A method of examining data to collect
statistics and information about that
data
• Determines the “shape” of the data
• Data types
• Lengths
• Cardinality
• Prominent discrete values
• Patterns
• Also known as “Data Discovery”
Data Patterns: A Data Profiling Tool for HPCC Systems 2
When Would You Profile Data?
• Explore a new dataset
• Determine the real data types
• Determine field population
• Spot garbage data
• Find highly-correlated fields
• Verify data updates
• Ensure that structure has not
changed
• Check for expected cardinality
• Check for expected fill rates
• Check for unexpected garbage
Data Patterns: A Data Profiling Tool for HPCC Systems 3
DataPatterns.Profile()
• Written entirely in ECL
• It is a single FUNCTIONMACRO
• No library or module dependencies
• Performs all profiling checks by default
• Numerous parameters for controlling analysis and output
• Analyze all rows in a dataset or just a sample
• Analyze all fields or only certain fields
• Enable only specified profiling checks
• Specify returned pattern counts
• Creates a single dataset as a result
• One record for each field analyzed
Data Patterns: A Data Profiling Tool for HPCC Systems 4
DataPatterns.Profile() – The Usual Analysis
Data Patterns: A Data Profiling Tool for HPCC Systems 5
Output Description
attribute The name of the field in the input dataset
given_attribute_type The ECL type of the attribute as it was defined in the RECORD definition
best_attribute_type An ECL data type that both allows all values in the input dataset and consumes the
least amount of memory
rec_count The number of records analyzed
fill_count The number of rec_count records containing non-nil values
fill_rate The percentage of rec_count records containing non-nil values
cardinality The number of unique, non-nil values
modes The most common value(s) in the attribute, after coercing all values to STRING,
along with the number of records in which the values were found
min_length The shortest length of a value when expressed as a string
max_length The longest length of a value when expressed as a string
ave_length The average length of a value when expressed as a string
DataPatterns.Profile() – Analysis For Numeric Fields
Data Patterns: A Data Profiling Tool for HPCC Systems 6
Output Description
is_numeric Boolean indicating if the original attribute was numeric and therefore whether or not
the numeric_xxxx output fields will be populated with actual values
numeric_min The smallest non-nil value as a DECIMAL
numeric_max The largest non-nil value as a DECIMAL
numeric_mean The mean (average) non-nil value as a DECIMAL
numeric_std_dev The standard deviation of the non-nil values as a DECIMAL
numeric_lower_quartile The value separating the first (bottom) and second quarters of non-nil values as a
DECIMAL
numeric_median The median non-nil value as a DECIMAL
numeric_upper_quartile The value separating the third and fourth (top) quarters of non-nil values as a
DECIMAL
numeric_correlations A child dataset containing correlation values comparing the current numeric attribute
with all other numeric attributes, listed in descending correlation value order
DataPatterns.Profile() – Text Patterns
• Text patterns give you an idea of what your data looks like when it is expressed as a
human-readable generalized string
• Very useful for spotting data that doesn’t belong
• Converts each character of the string into a fixed character palette to produce a new
string pattern
• Any uppercase letter => A
• Any lowercase letter => a
• Any numeric digit => 9
• Any boolean value => B
• All other characters remain as-is
• By counting the unique patterns and ranking them, you can easily see what kind of
data is very common or very rare
• All string data types are supported
Data Patterns: A Data Profiling Tool for HPCC Systems 7
DataPatterns.Profile() – Text Pattern Analysis
Data Patterns: A Data Profiling Tool for HPCC Systems 8
Output Description
popular_patterns The most common patterns of values; patterns are listed from most- to least-
common and an example (pulled from the data) is shown for each
rare_patterns The least common patterns of values; patterns are listed from least- to most-common
and an example (pulled from the data) is shown for each; patterns already shown in
popular_patterns are not repeated here
Original Value Pattern
45816.01 99999.99
Dan Camper Aaa Aaaaaa
For *only* $10! Aaa *aaaa* $99!
Examples
Some Data To Profile …
Data Patterns: A Data Profiling Tool for HPCC Systems 9
… And How To Profile It
Data Patterns: A Data Profiling Tool for HPCC Systems 10
Import the DataPatterns module
Define a record structure
Declare the dataset
Call the profiler
Show result
Profiling Results – The Usual Suspects
Data Patterns: A Data Profiling Tool for HPCC Systems 11
Profiling Results – Numeric Fields
Data Patterns: A Data Profiling Tool for HPCC Systems 12
Profiling Results – Data Pattern Analysis
Data Patterns: A Data Profiling Tool for HPCC Systems 13
Final Thoughts
• DataPatterns is an open-source ECL bundle
• https://github.com/hpcc-systems/DataPatterns.git
• Currently contains only two functions
• Profile()
• BestRecordStructure()
• Future plans
• Histograms for numeric fields
• Additional information for low-cardinality fields
• Expand correlations to non-numeric discrete-value fields
• Easy comparison of profile results to detect changes
• Visualization
• Data Detectors
Data Patterns: A Data Profiling Tool for HPCC Systems 14
Data Patterns: A Data Profiling Tool for HPCC Systems 15
Questions?
Innovation and
Reinvention Driving
Transformation
OCTOBER 9,
2018
2018 HPCC Systems® Community
Day
Hicham Elhassani – VP Modeling Vertical Support
Dan S. Camper – Sr. Architect, HPCC Solutions Lab
Making IoT Data Actionable Using Predictive Analytics
Making IoT Data Actionable Using Predictive Analytics 17
If you think connected “things” are everywhere NOW . . .
Making IoT Data Actionable Using Predictive Analytics
2016 2017 2018 2020
Consumer 3,963 5,244 7,036 12,863
Business:Cross-Industry 1,102 1,501 2,133 4,381
Business:Vertical-Specific 1,317 1,635 2,028 3,171
Grand Total 6,382 8,381 11,197 20,415
Source: Gartner (January 2017)
IoT Units Installed Base by Category
(Millions of Units)
18
Value proposition?
Cyber risk?
What does the data say?
Who is driving?
Incremental or revolutionary?
Cost vs. Benefit?
Making IoT Data Actionable Using Predictive Analytics
BIG QUESTIONS
FOR
INSURANCE
19
Making IoT Data Actionable Using Predictive Analytics
Importance of collecting Iot data to company’s insurance strategy
(n=120)
8%
70%
22%
Very / Somewhat Important
Neither important or unimportant
Not at all/not very important
Importance for insurers to collect IoT data today
20
Making IoT Data Actionable Using Predictive Analytics
Collection and/or Purchase of Connected Home
Data
(n=120)
1%
4%
19%
38%
38% Collect/purchase, use in decision-making
Collect/purchase, plan to use
Collect/purchase, but not sure how to use
Don’t collect/purchase, but plan to
Don’t collect/purchase, don’t plan to
Collect today
= 24%
Don’t Collect today
= 76%
Collection of Connected Home Data
21
Making IoT Data Actionable Using Predictive Analytics
Timeline to begin collecting Connected Home data
Anticipated Timeline for Collecting and/or Using Connected Homes
Data
(among those not currently using, but planning to use connected homes, n=73)
In next year
In next 2-3 years
In next 4-5 years
In 6+ years
Not sure
4%
52%
34%
7%
3%
Next 3Years
= 56%
4+Years
= 41%
22
Home Loss Statistics and IOT opportunities
Making IoT Data Actionable Using Predictive Analytics
11
%
OTHERTHEFT
25
%
21% 22% 21%
WIND HAIL FIRE WATER
NON-
WEATHERWATER
WEATHER
LIABILITY
Internals data
Security
Freeze
detection
Leak detection
Smoke/CO
Temp/Humidity
Motion sensor
Appliances
Audio/video
External data
Weather API
Social M
events
Loss history
Property info
Geo
information
Internals data
Security
Freeze
detection
Leak detection
Smoke/CO
Temp/Humidity
Motion sensor
Appliances
Audio/Video
External data
Weather API
Social M
events
Loss history
Property info
Geo
information
Internals data
Security
Freeze
detection
Leak detection
Smoke/CO
Temp/Humidity
Motion sensor
Appliances
Audio/video
External data
Weather API
Social M
events
Loss history
Property info
Geo
information
Internals data
Security
Freeze
detection
Leak detection
Smoke/CO
Temp/Humidity
Motion sensor
Appliances
Audio/video
External data
Weather API
Social M
events
Loss history
Property info
Geo
information
Internals data
Security
Freeze
detection
Leak detection
Smoke/CO
Temp/Humidity
Motion sensor
Appliances
Audio/video
External data
Weather API
Social M
events
Loss history
Property info
Geo
information
23
Today, let’s discuss some examples
Occupancy: Monitoring/Prevention
Water Leak:
Monitoring/Alert
24
Making IoT Data Actionable Using Predictive Analytics
Smart Thermostat Data: Primary Residence
HVAC Mode Observations
0
50
100
150
200
250
300
350
Eco
July 4th
Weekend
Source: Nest
25
Making IoT Data Actionable Using Predictive Analytics
Smart Thermostat Data: Vacation Home
0
20
40
60
80
100
120
Eco
HVAC Mode Observations July 4th
Weekend
Source: Nest
26
Making IoT Data Actionable Using Predictive Analytics
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
3/12/20180:00
3/12/20186:00
3/12/201812:00
3/12/201818:00
3/13/20180:00
3/13/20186:00
3/13/201812:00
3/13/201818:00
3/14/20180:00
3/14/20186:00
3/14/201812:00
3/14/201818:00
3/15/20180:00
3/15/20186:00
3/15/201812:00
3/15/201818:00
3/16/20180:00
3/16/20186:00
3/16/201812:00
3/16/201818:00
3/17/20180:00
3/17/20187:00
3/17/201813:00
3/17/201819:00
3/18/20181:00
3/18/20187:00
3/18/201813:00
3/18/201819:00
3/19/20181:00
3/19/20187:00
3/19/201813:00
3/19/201819:00
3/20/20181:00
3/20/20187:00
3/20/201813:00
3/20/201819:00
3/21/20181:00
3/21/20187:00
3/21/201813:00
3/21/201819:00
3/22/20181:00
3/22/20187:00
3/22/201813:00
3/22/201819:00
3/23/20181:00
3/23/20187:00
3/23/201813:00
3/23/201819:00
3/24/20181:00
3/24/20187:00
3/24/201813:00
3/24/201819:00
3/25/20181:00
3/25/20187:00
3/25/201813:00
3/25/201819:00
3/26/20181:00
3/26/20187:00
3/26/201813:00
3/26/201819:00
Shower
Restroo
m
Laundry x3
Dishwasher x2
Child’s bath Dishwasher
Child’s bath
Child’s
bath
Child’s
bath
Child’s
bath
Child’s
bath
Child’s
bath
Source: Streamlabs
Example: Water Leak Detection
27
Example: Water Leak & Assignment of Benefits
Making IoT Data Actionable Using Predictive Analytics
File it
Assign of benefits (AOB) is a
legal tool that allows the
homeowner to transfer their
rights to collect from an
insurance claim to a third
party.
Fix It
AOB is commonly used when
a homeowner employs a
contractor or water
remediation company to fix
water damage from pipe and
appliance leaks
Fake it
This arrangement has
permitted some contractors to
overinflate claims, resulting in
a dramatic increase in
frequency and severity in
Florida water non-weather
claims
Source: Office of Insurance Consumer Advocate, Florida Office of Insurance Regulation
28
Assignment of Benefits – Florida vs USA (Excl. Florida)
Making IoT Data Actionable Using Predictive Analytics
30
25
20
15
10
5
0
LossCost($)
2011 2012 2013 2014 2015 2016
Accidental Water Discharge and Appliance Leakage Loss Cost
USA (Excl. Florida) FloridaSource: LexisNexis Internal Research
29
Broward
Miami-Dade
Palm Beach
Assignment of Benefits – Tri Counties
Making IoT Data Actionable Using Predictive Analytics
Source: LexisNexis Internal Research
30
Broward
Miami-Dade
Palm Beach
Assignment of Benefits – Tri Counties
Making IoT Data Actionable Using Predictive Analytics
Source: LexisNexis Internal Research
31
Water Leak and Geo-located losses
Making IoT Data Actionable Using Predictive Analytics
0.50%
0.45%
0.40%
0.35%
0.30%
0.25%
0.20%
0.15%
0.10%
0.05%
0.00%
Frequency
2011 2012 2013 2014 2015 2016
Accidental Water Discharge and Appliance Leakage Frequency
Broward County Miami-Dade
County
Palm Beach
County
Florida (Excl. Tri
Counties)
Source: LexisNexis Internal Research
32
Harvey: Tweets Containing “Flood”
Making IoT Data Actionable Using Predictive Analytics 33
Weather Events Digital Trail
• Elk City tornado
by the
NOAA:yesterday
17/05/2017
• Flood
• Hail
• Lightning
• Tornado
• Wildfire
Making IoT Data Actionable Using Predictive Analytics 34
Stream Analytics: Push and Pull data sources
Making IoT Data Actionable Using Predictive Analytics
Wind Fire Water
(non-
weather)
Water
(weather
)
Theft Liability Other
Hail
35
Data platforms will be key to unlocking the full potential of this
opportunity
Making IoT Data Actionable Using Predictive Analytics
MARKETING
CONTACT
QUOTE
UNDERWRITIN
G
RENEWAL
COMPLIANCE
CLAIM
IoT
Platform
Insurer
Automatio
n
Mitigation Utilities
Connected Home
Securit
y
Connecte
d Car
Connecte
d Self
Connecte
d
Business
36
How to start unlocking these insights now
Technology/Analytics to
develop and deploy a
pilot program
HPCC Systems
Architecture
Making IoT Data Actionable Using Predictive Analytics 38
HPCC Systems – Pull Architecture
• Device users register at a web portal
• Authentication and authorization via
device manufacturer’s web site
• Authorization response includes an
access token
• All registration information saved
• Thor queries devices for all registered
users in parallel
• Ancillary data, such as weather
conditions local to every device, is
periodically gathered
• Analytics are also run periodically, as
often as needed
• ROXIE updated with analytics results
and are made available to external
services
Making IoT Data Actionable Using Predictive Analytics 39
HPCC Systems – Push Architecture
• Authorized devices whitelisted via
master device management
• Remote devices send their data to
ROXIE
• After validation and normalization,
message stored in Kafka and
Couchbase
• Thor periodically pulls new messages
from Kafka for processing
• Ancillary data, such as weather
conditions local to every device, is
periodically gathered
• Analytics are also run periodically, as
often as needed
• ROXIE updated with analytics results
and are made available to external
services
Making IoT Data Actionable Using Predictive Analytics 40
IoT Data Profiling and Predictive Analytics

Mais conteúdo relacionado

Mais procurados

Mining Product Reputations On the Web
Mining Product Reputations On the WebMining Product Reputations On the Web
Mining Product Reputations On the Webfeiwin
 
Searching Techniques and Analysis
Searching Techniques and AnalysisSearching Techniques and Analysis
Searching Techniques and AnalysisAkashBorse2
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDataminingTools Inc
 
Mining frequent patterns association
Mining frequent patterns associationMining frequent patterns association
Mining frequent patterns associationDeepaR42
 
Searching techniques in Data Structure And Algorithm
Searching techniques in Data Structure And AlgorithmSearching techniques in Data Structure And Algorithm
Searching techniques in Data Structure And Algorithm03446940736
 
Cost estimation for Query Optimization
Cost estimation for Query OptimizationCost estimation for Query Optimization
Cost estimation for Query OptimizationRavinder Kamboj
 
Introduction to dm and dw
Introduction to dm and dwIntroduction to dm and dw
Introduction to dm and dwANUSUYA T K
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with PythonDavis David
 
A classification of methods for frequent pattern mining
A classification of methods for frequent pattern miningA classification of methods for frequent pattern mining
A classification of methods for frequent pattern miningIOSR Journals
 
REVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesREVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesEditor IJMTER
 
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataGraph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataBenjamin Bengfort
 
Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...ranjit banshpal
 

Mais procurados (19)

Mining Product Reputations On the Web
Mining Product Reputations On the WebMining Product Reputations On the Web
Mining Product Reputations On the Web
 
Searching Techniques and Analysis
Searching Techniques and AnalysisSearching Techniques and Analysis
Searching Techniques and Analysis
 
data mining
data miningdata mining
data mining
 
Part1
Part1Part1
Part1
 
Algorithm and Programming (Searching)
Algorithm and Programming (Searching)Algorithm and Programming (Searching)
Algorithm and Programming (Searching)
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
Data1
Data1Data1
Data1
 
Data analysis
Data analysisData analysis
Data analysis
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Mining frequent patterns association
Mining frequent patterns associationMining frequent patterns association
Mining frequent patterns association
 
Searching techniques in Data Structure And Algorithm
Searching techniques in Data Structure And AlgorithmSearching techniques in Data Structure And Algorithm
Searching techniques in Data Structure And Algorithm
 
Cost estimation for Query Optimization
Cost estimation for Query OptimizationCost estimation for Query Optimization
Cost estimation for Query Optimization
 
Introduction to dm and dw
Introduction to dm and dwIntroduction to dm and dw
Introduction to dm and dw
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
A classification of methods for frequent pattern mining
A classification of methods for frequent pattern miningA classification of methods for frequent pattern mining
A classification of methods for frequent pattern mining
 
REVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesREVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining Techniques
 
D0352630
D0352630D0352630
D0352630
 
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataGraph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational Data
 
Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...
 

Semelhante a IoT Data Profiling and Predictive Analytics

Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data ScientistsRichard Garris
 
2. Data Preprocessing with Numpy and Pandas.pptx
2. Data Preprocessing with Numpy and Pandas.pptx2. Data Preprocessing with Numpy and Pandas.pptx
2. Data Preprocessing with Numpy and Pandas.pptxPeangSereysothirich
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxiamultapromax
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-stepsShesha R
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data scienceTanujaSomvanshi1
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analyticsAnirudh
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreHPCC Systems
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016George Roth
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxMalla Reddy University
 
Lecture 1 IntroductionToDataStructures_coursematerial_Draft0.01.ppt
Lecture 1 IntroductionToDataStructures_coursematerial_Draft0.01.pptLecture 1 IntroductionToDataStructures_coursematerial_Draft0.01.ppt
Lecture 1 IntroductionToDataStructures_coursematerial_Draft0.01.pptiamsallauddin
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL ServerStéphane Fréchette
 
employee turnover prediction document.docx
employee turnover prediction document.docxemployee turnover prediction document.docx
employee turnover prediction document.docxrohithprabhas1
 
Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Institute of Contemporary Sciences
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dmsumit621
 
01VD062009003760042.pdf
01VD062009003760042.pdf01VD062009003760042.pdf
01VD062009003760042.pdfSunilMatsagar1
 
SHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docxSHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docxShahbazKhan77289
 

Semelhante a IoT Data Profiling and Predictive Analytics (20)

Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
2. Data Preprocessing with Numpy and Pandas.pptx
2. Data Preprocessing with Numpy and Pandas.pptx2. Data Preprocessing with Numpy and Pandas.pptx
2. Data Preprocessing with Numpy and Pandas.pptx
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptx
 
Lecture 1 IntroductionToDataStructures_coursematerial_Draft0.01.ppt
Lecture 1 IntroductionToDataStructures_coursematerial_Draft0.01.pptLecture 1 IntroductionToDataStructures_coursematerial_Draft0.01.ppt
Lecture 1 IntroductionToDataStructures_coursematerial_Draft0.01.ppt
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
employee turnover prediction document.docx
employee turnover prediction document.docxemployee turnover prediction document.docx
employee turnover prediction document.docx
 
Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...
 
Intro_2.ppt
Intro_2.pptIntro_2.ppt
Intro_2.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
01VD062009003760042.pdf
01VD062009003760042.pdf01VD062009003760042.pdf
01VD062009003760042.pdf
 
SHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docxSHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docx
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 

Mais de HPCC Systems

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...HPCC Systems
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsHPCC Systems
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn HPCC Systems
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingHPCC Systems
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle ChangesHPCC Systems
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index HPCC Systems
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningHPCC Systems
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesHPCC Systems
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsHPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch HPCC Systems
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem HPCC Systems
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis ToolHPCC Systems
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony HPCC Systems
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterHPCC Systems
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...HPCC Systems
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...HPCC Systems
 

Mais de HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 

Último

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 

Último (20)

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 

IoT Data Profiling and Predictive Analytics

  • 1. Innovation and Reinvention Driving Transformation OCTOBER 9, 2018 2018 HPCC Systems® Community Day Dan S. Camper – Sr. Architect, HPCC Solutions Lab Data Patterns: A Native Open Source Data Profiling Tool for HPCC Systems
  • 2. What Is Data Profiling? • A method of examining data to collect statistics and information about that data • Determines the “shape” of the data • Data types • Lengths • Cardinality • Prominent discrete values • Patterns • Also known as “Data Discovery” Data Patterns: A Data Profiling Tool for HPCC Systems 2
  • 3. When Would You Profile Data? • Explore a new dataset • Determine the real data types • Determine field population • Spot garbage data • Find highly-correlated fields • Verify data updates • Ensure that structure has not changed • Check for expected cardinality • Check for expected fill rates • Check for unexpected garbage Data Patterns: A Data Profiling Tool for HPCC Systems 3
  • 4. DataPatterns.Profile() • Written entirely in ECL • It is a single FUNCTIONMACRO • No library or module dependencies • Performs all profiling checks by default • Numerous parameters for controlling analysis and output • Analyze all rows in a dataset or just a sample • Analyze all fields or only certain fields • Enable only specified profiling checks • Specify returned pattern counts • Creates a single dataset as a result • One record for each field analyzed Data Patterns: A Data Profiling Tool for HPCC Systems 4
  • 5. DataPatterns.Profile() – The Usual Analysis Data Patterns: A Data Profiling Tool for HPCC Systems 5 Output Description attribute The name of the field in the input dataset given_attribute_type The ECL type of the attribute as it was defined in the RECORD definition best_attribute_type An ECL data type that both allows all values in the input dataset and consumes the least amount of memory rec_count The number of records analyzed fill_count The number of rec_count records containing non-nil values fill_rate The percentage of rec_count records containing non-nil values cardinality The number of unique, non-nil values modes The most common value(s) in the attribute, after coercing all values to STRING, along with the number of records in which the values were found min_length The shortest length of a value when expressed as a string max_length The longest length of a value when expressed as a string ave_length The average length of a value when expressed as a string
  • 6. DataPatterns.Profile() – Analysis For Numeric Fields Data Patterns: A Data Profiling Tool for HPCC Systems 6 Output Description is_numeric Boolean indicating if the original attribute was numeric and therefore whether or not the numeric_xxxx output fields will be populated with actual values numeric_min The smallest non-nil value as a DECIMAL numeric_max The largest non-nil value as a DECIMAL numeric_mean The mean (average) non-nil value as a DECIMAL numeric_std_dev The standard deviation of the non-nil values as a DECIMAL numeric_lower_quartile The value separating the first (bottom) and second quarters of non-nil values as a DECIMAL numeric_median The median non-nil value as a DECIMAL numeric_upper_quartile The value separating the third and fourth (top) quarters of non-nil values as a DECIMAL numeric_correlations A child dataset containing correlation values comparing the current numeric attribute with all other numeric attributes, listed in descending correlation value order
  • 7. DataPatterns.Profile() – Text Patterns • Text patterns give you an idea of what your data looks like when it is expressed as a human-readable generalized string • Very useful for spotting data that doesn’t belong • Converts each character of the string into a fixed character palette to produce a new string pattern • Any uppercase letter => A • Any lowercase letter => a • Any numeric digit => 9 • Any boolean value => B • All other characters remain as-is • By counting the unique patterns and ranking them, you can easily see what kind of data is very common or very rare • All string data types are supported Data Patterns: A Data Profiling Tool for HPCC Systems 7
  • 8. DataPatterns.Profile() – Text Pattern Analysis Data Patterns: A Data Profiling Tool for HPCC Systems 8 Output Description popular_patterns The most common patterns of values; patterns are listed from most- to least- common and an example (pulled from the data) is shown for each rare_patterns The least common patterns of values; patterns are listed from least- to most-common and an example (pulled from the data) is shown for each; patterns already shown in popular_patterns are not repeated here Original Value Pattern 45816.01 99999.99 Dan Camper Aaa Aaaaaa For *only* $10! Aaa *aaaa* $99! Examples
  • 9. Some Data To Profile … Data Patterns: A Data Profiling Tool for HPCC Systems 9
  • 10. … And How To Profile It Data Patterns: A Data Profiling Tool for HPCC Systems 10 Import the DataPatterns module Define a record structure Declare the dataset Call the profiler Show result
  • 11. Profiling Results – The Usual Suspects Data Patterns: A Data Profiling Tool for HPCC Systems 11
  • 12. Profiling Results – Numeric Fields Data Patterns: A Data Profiling Tool for HPCC Systems 12
  • 13. Profiling Results – Data Pattern Analysis Data Patterns: A Data Profiling Tool for HPCC Systems 13
  • 14. Final Thoughts • DataPatterns is an open-source ECL bundle • https://github.com/hpcc-systems/DataPatterns.git • Currently contains only two functions • Profile() • BestRecordStructure() • Future plans • Histograms for numeric fields • Additional information for low-cardinality fields • Expand correlations to non-numeric discrete-value fields • Easy comparison of profile results to detect changes • Visualization • Data Detectors Data Patterns: A Data Profiling Tool for HPCC Systems 14
  • 15. Data Patterns: A Data Profiling Tool for HPCC Systems 15 Questions?
  • 16. Innovation and Reinvention Driving Transformation OCTOBER 9, 2018 2018 HPCC Systems® Community Day Hicham Elhassani – VP Modeling Vertical Support Dan S. Camper – Sr. Architect, HPCC Solutions Lab Making IoT Data Actionable Using Predictive Analytics
  • 17. Making IoT Data Actionable Using Predictive Analytics 17
  • 18. If you think connected “things” are everywhere NOW . . . Making IoT Data Actionable Using Predictive Analytics 2016 2017 2018 2020 Consumer 3,963 5,244 7,036 12,863 Business:Cross-Industry 1,102 1,501 2,133 4,381 Business:Vertical-Specific 1,317 1,635 2,028 3,171 Grand Total 6,382 8,381 11,197 20,415 Source: Gartner (January 2017) IoT Units Installed Base by Category (Millions of Units) 18
  • 19. Value proposition? Cyber risk? What does the data say? Who is driving? Incremental or revolutionary? Cost vs. Benefit? Making IoT Data Actionable Using Predictive Analytics BIG QUESTIONS FOR INSURANCE 19
  • 20. Making IoT Data Actionable Using Predictive Analytics Importance of collecting Iot data to company’s insurance strategy (n=120) 8% 70% 22% Very / Somewhat Important Neither important or unimportant Not at all/not very important Importance for insurers to collect IoT data today 20
  • 21. Making IoT Data Actionable Using Predictive Analytics Collection and/or Purchase of Connected Home Data (n=120) 1% 4% 19% 38% 38% Collect/purchase, use in decision-making Collect/purchase, plan to use Collect/purchase, but not sure how to use Don’t collect/purchase, but plan to Don’t collect/purchase, don’t plan to Collect today = 24% Don’t Collect today = 76% Collection of Connected Home Data 21
  • 22. Making IoT Data Actionable Using Predictive Analytics Timeline to begin collecting Connected Home data Anticipated Timeline for Collecting and/or Using Connected Homes Data (among those not currently using, but planning to use connected homes, n=73) In next year In next 2-3 years In next 4-5 years In 6+ years Not sure 4% 52% 34% 7% 3% Next 3Years = 56% 4+Years = 41% 22
  • 23. Home Loss Statistics and IOT opportunities Making IoT Data Actionable Using Predictive Analytics 11 % OTHERTHEFT 25 % 21% 22% 21% WIND HAIL FIRE WATER NON- WEATHERWATER WEATHER LIABILITY Internals data Security Freeze detection Leak detection Smoke/CO Temp/Humidity Motion sensor Appliances Audio/video External data Weather API Social M events Loss history Property info Geo information Internals data Security Freeze detection Leak detection Smoke/CO Temp/Humidity Motion sensor Appliances Audio/Video External data Weather API Social M events Loss history Property info Geo information Internals data Security Freeze detection Leak detection Smoke/CO Temp/Humidity Motion sensor Appliances Audio/video External data Weather API Social M events Loss history Property info Geo information Internals data Security Freeze detection Leak detection Smoke/CO Temp/Humidity Motion sensor Appliances Audio/video External data Weather API Social M events Loss history Property info Geo information Internals data Security Freeze detection Leak detection Smoke/CO Temp/Humidity Motion sensor Appliances Audio/video External data Weather API Social M events Loss history Property info Geo information 23
  • 24. Today, let’s discuss some examples Occupancy: Monitoring/Prevention Water Leak: Monitoring/Alert 24
  • 25. Making IoT Data Actionable Using Predictive Analytics Smart Thermostat Data: Primary Residence HVAC Mode Observations 0 50 100 150 200 250 300 350 Eco July 4th Weekend Source: Nest 25
  • 26. Making IoT Data Actionable Using Predictive Analytics Smart Thermostat Data: Vacation Home 0 20 40 60 80 100 120 Eco HVAC Mode Observations July 4th Weekend Source: Nest 26
  • 27. Making IoT Data Actionable Using Predictive Analytics 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 3/12/20180:00 3/12/20186:00 3/12/201812:00 3/12/201818:00 3/13/20180:00 3/13/20186:00 3/13/201812:00 3/13/201818:00 3/14/20180:00 3/14/20186:00 3/14/201812:00 3/14/201818:00 3/15/20180:00 3/15/20186:00 3/15/201812:00 3/15/201818:00 3/16/20180:00 3/16/20186:00 3/16/201812:00 3/16/201818:00 3/17/20180:00 3/17/20187:00 3/17/201813:00 3/17/201819:00 3/18/20181:00 3/18/20187:00 3/18/201813:00 3/18/201819:00 3/19/20181:00 3/19/20187:00 3/19/201813:00 3/19/201819:00 3/20/20181:00 3/20/20187:00 3/20/201813:00 3/20/201819:00 3/21/20181:00 3/21/20187:00 3/21/201813:00 3/21/201819:00 3/22/20181:00 3/22/20187:00 3/22/201813:00 3/22/201819:00 3/23/20181:00 3/23/20187:00 3/23/201813:00 3/23/201819:00 3/24/20181:00 3/24/20187:00 3/24/201813:00 3/24/201819:00 3/25/20181:00 3/25/20187:00 3/25/201813:00 3/25/201819:00 3/26/20181:00 3/26/20187:00 3/26/201813:00 3/26/201819:00 Shower Restroo m Laundry x3 Dishwasher x2 Child’s bath Dishwasher Child’s bath Child’s bath Child’s bath Child’s bath Child’s bath Child’s bath Source: Streamlabs Example: Water Leak Detection 27
  • 28. Example: Water Leak & Assignment of Benefits Making IoT Data Actionable Using Predictive Analytics File it Assign of benefits (AOB) is a legal tool that allows the homeowner to transfer their rights to collect from an insurance claim to a third party. Fix It AOB is commonly used when a homeowner employs a contractor or water remediation company to fix water damage from pipe and appliance leaks Fake it This arrangement has permitted some contractors to overinflate claims, resulting in a dramatic increase in frequency and severity in Florida water non-weather claims Source: Office of Insurance Consumer Advocate, Florida Office of Insurance Regulation 28
  • 29. Assignment of Benefits – Florida vs USA (Excl. Florida) Making IoT Data Actionable Using Predictive Analytics 30 25 20 15 10 5 0 LossCost($) 2011 2012 2013 2014 2015 2016 Accidental Water Discharge and Appliance Leakage Loss Cost USA (Excl. Florida) FloridaSource: LexisNexis Internal Research 29
  • 30. Broward Miami-Dade Palm Beach Assignment of Benefits – Tri Counties Making IoT Data Actionable Using Predictive Analytics Source: LexisNexis Internal Research 30
  • 31. Broward Miami-Dade Palm Beach Assignment of Benefits – Tri Counties Making IoT Data Actionable Using Predictive Analytics Source: LexisNexis Internal Research 31
  • 32. Water Leak and Geo-located losses Making IoT Data Actionable Using Predictive Analytics 0.50% 0.45% 0.40% 0.35% 0.30% 0.25% 0.20% 0.15% 0.10% 0.05% 0.00% Frequency 2011 2012 2013 2014 2015 2016 Accidental Water Discharge and Appliance Leakage Frequency Broward County Miami-Dade County Palm Beach County Florida (Excl. Tri Counties) Source: LexisNexis Internal Research 32
  • 33. Harvey: Tweets Containing “Flood” Making IoT Data Actionable Using Predictive Analytics 33
  • 34. Weather Events Digital Trail • Elk City tornado by the NOAA:yesterday 17/05/2017 • Flood • Hail • Lightning • Tornado • Wildfire Making IoT Data Actionable Using Predictive Analytics 34
  • 35. Stream Analytics: Push and Pull data sources Making IoT Data Actionable Using Predictive Analytics Wind Fire Water (non- weather) Water (weather ) Theft Liability Other Hail 35
  • 36. Data platforms will be key to unlocking the full potential of this opportunity Making IoT Data Actionable Using Predictive Analytics MARKETING CONTACT QUOTE UNDERWRITIN G RENEWAL COMPLIANCE CLAIM IoT Platform Insurer Automatio n Mitigation Utilities Connected Home Securit y Connecte d Car Connecte d Self Connecte d Business 36
  • 37. How to start unlocking these insights now Technology/Analytics to develop and deploy a pilot program
  • 38. HPCC Systems Architecture Making IoT Data Actionable Using Predictive Analytics 38
  • 39. HPCC Systems – Pull Architecture • Device users register at a web portal • Authentication and authorization via device manufacturer’s web site • Authorization response includes an access token • All registration information saved • Thor queries devices for all registered users in parallel • Ancillary data, such as weather conditions local to every device, is periodically gathered • Analytics are also run periodically, as often as needed • ROXIE updated with analytics results and are made available to external services Making IoT Data Actionable Using Predictive Analytics 39
  • 40. HPCC Systems – Push Architecture • Authorized devices whitelisted via master device management • Remote devices send their data to ROXIE • After validation and normalization, message stored in Kafka and Couchbase • Thor periodically pulls new messages from Kafka for processing • Ancillary data, such as weather conditions local to every device, is periodically gathered • Analytics are also run periodically, as often as needed • ROXIE updated with analytics results and are made available to external services Making IoT Data Actionable Using Predictive Analytics 40

Notas do Editor

  1. Devices in the Internet of Things communicate with each other, only a human isn’t directly prompting the interaction. Today we call this “The Internet of Things,” but that’s only because it’s new. In five years we’ll probably just call it “the internet.”
  2. Gartner put the number of IoT devices at 8 billion in 2017. For 2020, they estimate TWENTY billion. Cisco estimates 50 billion. We can be sure they’re both wrong, but one of them might be close. The point is, there will be tens of billions of devices generating data. And on the data side, what’s interesting is that humans have generated the majority of the data out there today, from pictures and texts, to movies, to scholarly articles. But soon the data created by “things” will dwarf the data created by humans.
  3. There has been a lot of activity over the past year but these same key questions are still largely unanswered. [Walk through points] And I’ll add one more --- Consumer engagement. What gets the consumer to push through setup challenges, encourage them to replace batteries, or even engage with the device through an app? There is still a lot of ambivalence and complexity out there so instead of taking a step back like we did last year, let’s take a step in and look at some specific use cases. Who will be the winners and loser in the devices and platforms. There will continue to be consolidation, new entries and exits. This makes partnerships and data agreements complicated. Who is driving? Is it the Consumer, the insurer or the infrastructure. As I showed on the previous slide… You may want to prevent water losses, but that doesn’t mean your policyholder shares that concern. He or she may be more likely to opt for voice activated mood lighting. Discounts or carrier device buys may help to remedy this over time. Connected utility meters, built in capabilities may influence in time. Cyber risk: In 2016 there was a major Distributed Denial of Service attack that shut down a number of websites. Wifi enable baby monitors have been hacked. Carriers do have to consider this when potentially connected their brand with a device. Do you want that connected thermostat you encouraged your customer to buy to be susceptible to ransomware that extort a payment to keep the heat on during the winter? .. . The good news is that there are good companies out there today working on building more sophisticated technology to protect connected devices. Much of the purported benefit of the connected home is speculation. How does this data really play out? Does the connected water sensor really prevent loss payments to a significant degree. Does it reduce frequency? Just Severity? How much? We need a lot more data to know for sure. And multiply that across the dozens of devices that are available. How big is the disruption? If at the end of the day we end up with a lot of new data sources that allow us to offer another 5% discount, or that help us validate the home security system discounts carriers are already giving . . . Then it’s still useful but not revolutionary. On the other hand, being able to price a risk from the ground-up using a multitude of IoT real time data becomes a reality then maybe it does. The other question here is loss mitigation versus loss avoidance. Finally, is cost. Particularly the cost of the device. As we discussed above, the consumer may not buy the devices you want them to have, which means the insurer would potentially need to foot the bill (either directly or through discounting and/or rate). That math needs to work, and a $5 device will be a lot more attractive to mitigate flood risk under a give sink then an $80 device.
  4. Insurers can explore many ways to avoid and limit losses So where does LexisNexis fit in the IoT world? We can analyze, normalize, and score this data for our customers (WITH THE CONSUMERS PERMISSION, OF COURSE). We can solve the many to many challenge, not only for insurers, but for IoT companies, too. We can take millions of datapoints and turn them into something digestible and meaningful to the industry. I hope this all sounds familiar, because it’s what we do every day already. And the normalization can take many forms. It’s not hard to imagine that the Nest, the Ecobee, the Lyric, and the Sensi - all smart thermostats which use occupancy to make decisions – might produce different data. It might come at different intervals, at different levels of granularity, and there may be differences in sensitivity between them. Clearly there’s an opportunity for us to normalize that data on the way in so that we can produce occupancy score or attribute from thermostats that works for ALL popular models of thermostat. This is not too different from what we’ve done in the UBI space to normalize driver scores across phone types.
  5. This is one piece of the data that we can collect from Nest thermostats. In this case I once again got one of my co-workers to agree to let me use his data – but he won’t let me use his real name because he is paranoid that his rates will go up. We are going to call him “Shawn” Shawn has two Nest thermostats and they each send data nearly 150 times a day. This data stream has dozens of field including everything from the actual temperature in the home, the desired temperature, the location of the thermostat the consumer has specified and whether someone has locked in a temperature other than those in the settings. The nest thermostat switches to “Eco” mode when it doesn’t detect anyone present in the home and this data is captured as well.
  6. Here is Shawn’s lake House. Only one thermostat in this house but it is consistently reporting “Eco Status” until we get to the Holiday weekend. Now this is a very clear example and not every example will be this clear but it is evident.
  7. Assignment of Benefits mainly impacts water non-weather claims associated with leaking pipes and damaged appliances
  8. Small circles are tweets containing ‘tornado’, large circles are official sightings So we are starting to harvest based on keywords to 1: build up data to have a baseline  (i.e. background noise) 2: ‘hoping’ for an event to see spikes   Right now we are grabbing tweets with words (also partial) containing the keywords Flood Hail Lightning Tornado Wildfire
  9. So where does LexisNexis fit in the IoT world? We can analyze, normalize, and score this data for our customers (WITH THE CONSUMERS PERMISSION, OF COURSE). We can solve the many to many challenge, not only for insurers, but for IoT companies, too. We can take millions of datapoints and turn them into something digestible and meaningful to the industry. I hope this all sounds familiar, because it’s what we do every day already. And the normalization can take many forms. It’s not hard to imagine that the Nest, the Ecobee, the Lyric, and the Sensi - all smart thermostats which use occupancy to make decisions – might produce different data. It might come at different intervals, at different levels of granularity, and there may be differences in sensitivity between them. Clearly there’s an opportunity for us to normalize that data on the way in so that we can produce occupancy score or attribute from thermostats that works for ALL popular models of thermostat. This is not too different from what we’ve done in the UBI space to normalize driver scores across phone types.
  10. For a carrier that wants to get started in IoT the first objective is to get data, and this can be a challenge by yourself. However, LexisNexis offers to be your partner in collecting and interpreting this data. An easy place to start is by leveraging the devices that are already in your customer’s homes.   LexisNexis is in the process of rolling out internal pilots with our employees to collect Nest thermostat data via an API connection. As we move into phase II of this program by early next year, we invite you to join us. For your customers that opt in, and have a Nest in their home, you will be able to simply supply them with a URL to begin collecting data.   LexisNexis will then collect and process data, including pooling with participants should you choose to participate in data sharing and share the aggregate results with the broader group.   If you are interested in a water device pilot, we are happy to work with you as well and are happy to facilitate conversations with device makers that fit your needs.