This document discusses using predictive analytics and HPCC Systems to make IoT data actionable for insurance companies. It begins by outlining the growth of IoT devices and some of the big questions they pose for insurers. The document then provides examples of how smart thermostat and water leak detection data could help with occupancy monitoring, prevention and claims. It also discusses how water leak claims have increased in Florida due to assignment of benefits to third parties. The document concludes by discussing how insurers can start unlocking insights from IoT data through technology, analytics and pilot programs that leverage HPCC Systems' pull architecture to integrate diverse data sources for predictive modeling.
2. What Is Data Profiling?
• A method of examining data to collect
statistics and information about that
data
• Determines the “shape” of the data
• Data types
• Lengths
• Cardinality
• Prominent discrete values
• Patterns
• Also known as “Data Discovery”
Data Patterns: A Data Profiling Tool for HPCC Systems 2
3. When Would You Profile Data?
• Explore a new dataset
• Determine the real data types
• Determine field population
• Spot garbage data
• Find highly-correlated fields
• Verify data updates
• Ensure that structure has not
changed
• Check for expected cardinality
• Check for expected fill rates
• Check for unexpected garbage
Data Patterns: A Data Profiling Tool for HPCC Systems 3
4. DataPatterns.Profile()
• Written entirely in ECL
• It is a single FUNCTIONMACRO
• No library or module dependencies
• Performs all profiling checks by default
• Numerous parameters for controlling analysis and output
• Analyze all rows in a dataset or just a sample
• Analyze all fields or only certain fields
• Enable only specified profiling checks
• Specify returned pattern counts
• Creates a single dataset as a result
• One record for each field analyzed
Data Patterns: A Data Profiling Tool for HPCC Systems 4
5. DataPatterns.Profile() – The Usual Analysis
Data Patterns: A Data Profiling Tool for HPCC Systems 5
Output Description
attribute The name of the field in the input dataset
given_attribute_type The ECL type of the attribute as it was defined in the RECORD definition
best_attribute_type An ECL data type that both allows all values in the input dataset and consumes the
least amount of memory
rec_count The number of records analyzed
fill_count The number of rec_count records containing non-nil values
fill_rate The percentage of rec_count records containing non-nil values
cardinality The number of unique, non-nil values
modes The most common value(s) in the attribute, after coercing all values to STRING,
along with the number of records in which the values were found
min_length The shortest length of a value when expressed as a string
max_length The longest length of a value when expressed as a string
ave_length The average length of a value when expressed as a string
6. DataPatterns.Profile() – Analysis For Numeric Fields
Data Patterns: A Data Profiling Tool for HPCC Systems 6
Output Description
is_numeric Boolean indicating if the original attribute was numeric and therefore whether or not
the numeric_xxxx output fields will be populated with actual values
numeric_min The smallest non-nil value as a DECIMAL
numeric_max The largest non-nil value as a DECIMAL
numeric_mean The mean (average) non-nil value as a DECIMAL
numeric_std_dev The standard deviation of the non-nil values as a DECIMAL
numeric_lower_quartile The value separating the first (bottom) and second quarters of non-nil values as a
DECIMAL
numeric_median The median non-nil value as a DECIMAL
numeric_upper_quartile The value separating the third and fourth (top) quarters of non-nil values as a
DECIMAL
numeric_correlations A child dataset containing correlation values comparing the current numeric attribute
with all other numeric attributes, listed in descending correlation value order
7. DataPatterns.Profile() – Text Patterns
• Text patterns give you an idea of what your data looks like when it is expressed as a
human-readable generalized string
• Very useful for spotting data that doesn’t belong
• Converts each character of the string into a fixed character palette to produce a new
string pattern
• Any uppercase letter => A
• Any lowercase letter => a
• Any numeric digit => 9
• Any boolean value => B
• All other characters remain as-is
• By counting the unique patterns and ranking them, you can easily see what kind of
data is very common or very rare
• All string data types are supported
Data Patterns: A Data Profiling Tool for HPCC Systems 7
8. DataPatterns.Profile() – Text Pattern Analysis
Data Patterns: A Data Profiling Tool for HPCC Systems 8
Output Description
popular_patterns The most common patterns of values; patterns are listed from most- to least-
common and an example (pulled from the data) is shown for each
rare_patterns The least common patterns of values; patterns are listed from least- to most-common
and an example (pulled from the data) is shown for each; patterns already shown in
popular_patterns are not repeated here
Original Value Pattern
45816.01 99999.99
Dan Camper Aaa Aaaaaa
For *only* $10! Aaa *aaaa* $99!
Examples
9. Some Data To Profile …
Data Patterns: A Data Profiling Tool for HPCC Systems 9
10. … And How To Profile It
Data Patterns: A Data Profiling Tool for HPCC Systems 10
Import the DataPatterns module
Define a record structure
Declare the dataset
Call the profiler
Show result
11. Profiling Results – The Usual Suspects
Data Patterns: A Data Profiling Tool for HPCC Systems 11
12. Profiling Results – Numeric Fields
Data Patterns: A Data Profiling Tool for HPCC Systems 12
13. Profiling Results – Data Pattern Analysis
Data Patterns: A Data Profiling Tool for HPCC Systems 13
14. Final Thoughts
• DataPatterns is an open-source ECL bundle
• https://github.com/hpcc-systems/DataPatterns.git
• Currently contains only two functions
• Profile()
• BestRecordStructure()
• Future plans
• Histograms for numeric fields
• Additional information for low-cardinality fields
• Expand correlations to non-numeric discrete-value fields
• Easy comparison of profile results to detect changes
• Visualization
• Data Detectors
Data Patterns: A Data Profiling Tool for HPCC Systems 14
15. Data Patterns: A Data Profiling Tool for HPCC Systems 15
Questions?
16. Innovation and
Reinvention Driving
Transformation
OCTOBER 9,
2018
2018 HPCC Systems® Community
Day
Hicham Elhassani – VP Modeling Vertical Support
Dan S. Camper – Sr. Architect, HPCC Solutions Lab
Making IoT Data Actionable Using Predictive Analytics
18. If you think connected “things” are everywhere NOW . . .
Making IoT Data Actionable Using Predictive Analytics
2016 2017 2018 2020
Consumer 3,963 5,244 7,036 12,863
Business:Cross-Industry 1,102 1,501 2,133 4,381
Business:Vertical-Specific 1,317 1,635 2,028 3,171
Grand Total 6,382 8,381 11,197 20,415
Source: Gartner (January 2017)
IoT Units Installed Base by Category
(Millions of Units)
18
19. Value proposition?
Cyber risk?
What does the data say?
Who is driving?
Incremental or revolutionary?
Cost vs. Benefit?
Making IoT Data Actionable Using Predictive Analytics
BIG QUESTIONS
FOR
INSURANCE
19
20. Making IoT Data Actionable Using Predictive Analytics
Importance of collecting Iot data to company’s insurance strategy
(n=120)
8%
70%
22%
Very / Somewhat Important
Neither important or unimportant
Not at all/not very important
Importance for insurers to collect IoT data today
20
21. Making IoT Data Actionable Using Predictive Analytics
Collection and/or Purchase of Connected Home
Data
(n=120)
1%
4%
19%
38%
38% Collect/purchase, use in decision-making
Collect/purchase, plan to use
Collect/purchase, but not sure how to use
Don’t collect/purchase, but plan to
Don’t collect/purchase, don’t plan to
Collect today
= 24%
Don’t Collect today
= 76%
Collection of Connected Home Data
21
22. Making IoT Data Actionable Using Predictive Analytics
Timeline to begin collecting Connected Home data
Anticipated Timeline for Collecting and/or Using Connected Homes
Data
(among those not currently using, but planning to use connected homes, n=73)
In next year
In next 2-3 years
In next 4-5 years
In 6+ years
Not sure
4%
52%
34%
7%
3%
Next 3Years
= 56%
4+Years
= 41%
22
23. Home Loss Statistics and IOT opportunities
Making IoT Data Actionable Using Predictive Analytics
11
%
OTHERTHEFT
25
%
21% 22% 21%
WIND HAIL FIRE WATER
NON-
WEATHERWATER
WEATHER
LIABILITY
Internals data
Security
Freeze
detection
Leak detection
Smoke/CO
Temp/Humidity
Motion sensor
Appliances
Audio/video
External data
Weather API
Social M
events
Loss history
Property info
Geo
information
Internals data
Security
Freeze
detection
Leak detection
Smoke/CO
Temp/Humidity
Motion sensor
Appliances
Audio/Video
External data
Weather API
Social M
events
Loss history
Property info
Geo
information
Internals data
Security
Freeze
detection
Leak detection
Smoke/CO
Temp/Humidity
Motion sensor
Appliances
Audio/video
External data
Weather API
Social M
events
Loss history
Property info
Geo
information
Internals data
Security
Freeze
detection
Leak detection
Smoke/CO
Temp/Humidity
Motion sensor
Appliances
Audio/video
External data
Weather API
Social M
events
Loss history
Property info
Geo
information
Internals data
Security
Freeze
detection
Leak detection
Smoke/CO
Temp/Humidity
Motion sensor
Appliances
Audio/video
External data
Weather API
Social M
events
Loss history
Property info
Geo
information
23
24. Today, let’s discuss some examples
Occupancy: Monitoring/Prevention
Water Leak:
Monitoring/Alert
24
25. Making IoT Data Actionable Using Predictive Analytics
Smart Thermostat Data: Primary Residence
HVAC Mode Observations
0
50
100
150
200
250
300
350
Eco
July 4th
Weekend
Source: Nest
25
26. Making IoT Data Actionable Using Predictive Analytics
Smart Thermostat Data: Vacation Home
0
20
40
60
80
100
120
Eco
HVAC Mode Observations July 4th
Weekend
Source: Nest
26
28. Example: Water Leak & Assignment of Benefits
Making IoT Data Actionable Using Predictive Analytics
File it
Assign of benefits (AOB) is a
legal tool that allows the
homeowner to transfer their
rights to collect from an
insurance claim to a third
party.
Fix It
AOB is commonly used when
a homeowner employs a
contractor or water
remediation company to fix
water damage from pipe and
appliance leaks
Fake it
This arrangement has
permitted some contractors to
overinflate claims, resulting in
a dramatic increase in
frequency and severity in
Florida water non-weather
claims
Source: Office of Insurance Consumer Advocate, Florida Office of Insurance Regulation
28
29. Assignment of Benefits – Florida vs USA (Excl. Florida)
Making IoT Data Actionable Using Predictive Analytics
30
25
20
15
10
5
0
LossCost($)
2011 2012 2013 2014 2015 2016
Accidental Water Discharge and Appliance Leakage Loss Cost
USA (Excl. Florida) FloridaSource: LexisNexis Internal Research
29
32. Water Leak and Geo-located losses
Making IoT Data Actionable Using Predictive Analytics
0.50%
0.45%
0.40%
0.35%
0.30%
0.25%
0.20%
0.15%
0.10%
0.05%
0.00%
Frequency
2011 2012 2013 2014 2015 2016
Accidental Water Discharge and Appliance Leakage Frequency
Broward County Miami-Dade
County
Palm Beach
County
Florida (Excl. Tri
Counties)
Source: LexisNexis Internal Research
32
34. Weather Events Digital Trail
• Elk City tornado
by the
NOAA:yesterday
17/05/2017
• Flood
• Hail
• Lightning
• Tornado
• Wildfire
Making IoT Data Actionable Using Predictive Analytics 34
35. Stream Analytics: Push and Pull data sources
Making IoT Data Actionable Using Predictive Analytics
Wind Fire Water
(non-
weather)
Water
(weather
)
Theft Liability Other
Hail
35
36. Data platforms will be key to unlocking the full potential of this
opportunity
Making IoT Data Actionable Using Predictive Analytics
MARKETING
CONTACT
QUOTE
UNDERWRITIN
G
RENEWAL
COMPLIANCE
CLAIM
IoT
Platform
Insurer
Automatio
n
Mitigation Utilities
Connected Home
Securit
y
Connecte
d Car
Connecte
d Self
Connecte
d
Business
36
37. How to start unlocking these insights now
Technology/Analytics to
develop and deploy a
pilot program
39. HPCC Systems – Pull Architecture
• Device users register at a web portal
• Authentication and authorization via
device manufacturer’s web site
• Authorization response includes an
access token
• All registration information saved
• Thor queries devices for all registered
users in parallel
• Ancillary data, such as weather
conditions local to every device, is
periodically gathered
• Analytics are also run periodically, as
often as needed
• ROXIE updated with analytics results
and are made available to external
services
Making IoT Data Actionable Using Predictive Analytics 39
40. HPCC Systems – Push Architecture
• Authorized devices whitelisted via
master device management
• Remote devices send their data to
ROXIE
• After validation and normalization,
message stored in Kafka and
Couchbase
• Thor periodically pulls new messages
from Kafka for processing
• Ancillary data, such as weather
conditions local to every device, is
periodically gathered
• Analytics are also run periodically, as
often as needed
• ROXIE updated with analytics results
and are made available to external
services
Making IoT Data Actionable Using Predictive Analytics 40
Notas do Editor
Devices in the Internet of Things communicate with each other, only a human isn’t directly prompting the interaction. Today we call this “The Internet of Things,” but that’s only because it’s new. In five years we’ll probably just call it “the internet.”
Gartner put the number of IoT devices at 8 billion in 2017. For 2020, they estimate TWENTY billion. Cisco estimates 50 billion. We can be sure they’re both wrong, but one of them might be close. The point is, there will be tens of billions of devices generating data.
And on the data side, what’s interesting is that humans have generated the majority of the data out there today, from pictures and texts, to movies, to scholarly articles. But soon the data created by “things” will dwarf the data created by humans.
There has been a lot of activity over the past year but these same key questions are still largely unanswered.
[Walk through points]
And I’ll add one more --- Consumer engagement. What gets the consumer to push through setup challenges, encourage them to replace batteries, or even engage with the device through an app?
There is still a lot of ambivalence and complexity out there so instead of taking a step back like we did last year, let’s take a step in and look at some specific use cases.
Who will be the winners and loser in the devices and platforms. There will continue to be consolidation, new entries and exits. This makes partnerships and data agreements complicated.
Who is driving? Is it the Consumer, the insurer or the infrastructure. As I showed on the previous slide… You may want to prevent water losses, but that doesn’t mean your policyholder shares that concern. He or she may be more likely to opt for voice activated mood lighting. Discounts or carrier device buys may help to remedy this over time. Connected utility meters, built in capabilities may influence in time.
Cyber risk: In 2016 there was a major Distributed Denial of Service attack that shut down a number of websites. Wifi enable baby monitors have been hacked. Carriers do have to consider this when potentially connected their brand with a device. Do you want that connected thermostat you encouraged your customer to buy to be susceptible to ransomware that extort a payment to keep the heat on during the winter? .. . The good news is that there are good companies out there today working on building more sophisticated technology to protect connected devices.
Much of the purported benefit of the connected home is speculation. How does this data really play out? Does the connected water sensor really prevent loss payments to a significant degree. Does it reduce frequency? Just Severity? How much? We need a lot more data to know for sure. And multiply that across the dozens of devices that are available.
How big is the disruption? If at the end of the day we end up with a lot of new data sources that allow us to offer another 5% discount, or that help us validate the home security system discounts carriers are already giving . . . Then it’s still useful but not revolutionary. On the other hand, being able to price a risk from the ground-up using a multitude of IoT real time data becomes a reality then maybe it does. The other question here is loss mitigation versus loss avoidance.
Finally, is cost. Particularly the cost of the device. As we discussed above, the consumer may not buy the devices you want them to have, which means the insurer would potentially need to foot the bill (either directly or through discounting and/or rate). That math needs to work, and a $5 device will be a lot more attractive to mitigate flood risk under a give sink then an $80 device.
Insurers can explore many ways to avoid and limit losses
So where does LexisNexis fit in the IoT world? We can analyze, normalize, and score this data for our customers (WITH THE CONSUMERS PERMISSION, OF COURSE). We can solve the many to many challenge, not only for insurers, but for IoT companies, too. We can take millions of datapoints and turn them into something digestible and meaningful to the industry. I hope this all sounds familiar, because it’s what we do every day already.
And the normalization can take many forms. It’s not hard to imagine that the Nest, the Ecobee, the Lyric, and the Sensi - all smart thermostats which use occupancy to make decisions – might produce different data. It might come at different intervals, at different levels of granularity, and there may be differences in sensitivity between them. Clearly there’s an opportunity for us to normalize that data on the way in so that we can produce occupancy score or attribute from thermostats that works for ALL popular models of thermostat. This is not too different from what we’ve done in the UBI space to normalize driver scores across phone types.
This is one piece of the data that we can collect from Nest thermostats. In this case I once again got one of my co-workers to agree to let me use his data – but he won’t let me use his real name because he is paranoid that his rates will go up. We are going to call him “Shawn”
Shawn has two Nest thermostats and they each send data nearly 150 times a day. This data stream has dozens of field including everything from the actual temperature in the home, the desired temperature, the location of the thermostat the consumer has specified and whether someone has locked in a temperature other than those in the settings. The nest thermostat switches to “Eco” mode when it doesn’t detect anyone present in the home and this data is captured as well.
Here is Shawn’s lake House. Only one thermostat in this house but it is consistently reporting “Eco Status” until we get to the Holiday weekend.
Now this is a very clear example and not every example will be this clear but it is evident.
Assignment of Benefits mainly impacts water non-weather claims associated with leaking pipes and damaged appliances
Small circles are tweets containing ‘tornado’, large circles are official sightings
So we are starting to harvest based on keywords to
1: build up data to have a baseline (i.e. background noise)
2: ‘hoping’ for an event to see spikes
Right now we are grabbing tweets with words (also partial) containing the keywords
Flood
Hail
Lightning
Tornado
Wildfire
So where does LexisNexis fit in the IoT world? We can analyze, normalize, and score this data for our customers (WITH THE CONSUMERS PERMISSION, OF COURSE). We can solve the many to many challenge, not only for insurers, but for IoT companies, too. We can take millions of datapoints and turn them into something digestible and meaningful to the industry. I hope this all sounds familiar, because it’s what we do every day already.
And the normalization can take many forms. It’s not hard to imagine that the Nest, the Ecobee, the Lyric, and the Sensi - all smart thermostats which use occupancy to make decisions – might produce different data. It might come at different intervals, at different levels of granularity, and there may be differences in sensitivity between them. Clearly there’s an opportunity for us to normalize that data on the way in so that we can produce occupancy score or attribute from thermostats that works for ALL popular models of thermostat. This is not too different from what we’ve done in the UBI space to normalize driver scores across phone types.
For a carrier that wants to get started in IoT the first objective is to get data, and this can be a challenge by yourself. However, LexisNexis offers to be your partner in collecting and interpreting this data. An easy place to start is by leveraging the devices that are already in your customer’s homes.
LexisNexis is in the process of rolling out internal pilots with our employees to collect Nest thermostat data via an API connection. As we move into phase II of this program by early next year, we invite you to join us. For your customers that opt in, and have a Nest in their home, you will be able to simply supply them with a URL to begin collecting data.
LexisNexis will then collect and process data, including pooling with participants should you choose to participate in data sharing and share the aggregate results with the broader group.
If you are interested in a water device pilot, we are happy to work with you as well and are happy to facilitate conversations with device makers that fit your needs.