Take a look at this presentation from Hortonworks and Skytree and learn how Communications Service Providers can enhance their customers experience by:
– Creating a Data Lake for a 360 degree customer view.
– Building dynamic customer profiles.
– Leveraging a next-best-action streaming engine.
You will learn more about how Hortonworks Hadoop Distribution Platform and Skytree Machine Learning Solution can help you do so.
Speakers: Dr. Alexander Gray, CTO at Skytree, and Sanjay Kumar, General Manager, Hortonworks
18. CONFIDENTIAL
Bigger Data. Better Insights.™
CONFIDENTIAL
Machine Learning and Telecom
Alexander Gray, PhD
CTO, Skytree
19. CONFIDENTIAL
Machine Learning on Big Data
Next step in Big Data Journey – AnalyEcs and Machine Learning to
Make BeFer Decisions:
-‐ Churn – From PredicEon to PrevenEon
-‐ Net Promoter Score
Requires a 360 Degree View of Customers
20. CONFIDENTIAL
External DataInternal Data
Big Data
Environment
DataData
Data warehouse
E-Mail
CRM
Single Customer View
with improved decision making
capabilities based on Customer data
Big Data
Enabling innovative products
& services, customer
satisfaction
Analytics
Churn propensity and prevention,
Product Sentiment, Recommendations
and more.
Customer 360o View
22. CONFIDENTIAL
UElizing data: The tradiEonal approach
TradiEonally, human domain experts dig into the data via
– VisualizaEon tools
– Basic data analysis
– Querying a database to seek paFerns
– “Thinking hard” about the underlying processes
And extract insights, plots, and decision rules that uElize the paFerns they find
“Tradi7onal
business
intelligence”
23. CONFIDENTIAL
UElizing data: The tradiEonal approach
Human experts are very good at asking certain kinds of quesEons, but they are
limited in the ways they can process data
This is the age of Big Data: lots of nontrivial paFerns, subtle, nonlinear relaEons
that are not visible to tradiEonal analyEcs and visualizaEon tools
Missed paFerns è Missed accuracy è Missed opportuniEes!
24. CONFIDENTIAL
UElizing data: Machine Learning
Machine Learning is the modern science of finding subtle, nonlinear
paFerns in data, that can be used to:
– PREDICT outcomes and guide acEons, e.g.:
• Provide targeted recommendaEons to customers
• Signal the need to service before equipment failure
– DISCOVER insights to inform decisions, e.g.:
• Which variables among a set of thousands have the most weight in
determining an important outcome?
“Advanced
analy7cs”
25. CONFIDENTIAL
UElizing data: Machine Learning
Machine Learning is the modern science of finding subtle, nonlinear
paFerns in data, that can be used to:
– PREDICT outcomes and guide acEons, e.g.:
• Provide targeted recommendaEons to customers
• Signal the need to service before equipment failure
– DISCOVER insights to inform decisions, e.g.:
• Which variables among a set of thousands have the most weight in
determining an important outcome?
“Advanced
analy7cs”
Machine
Learning
empowers
human
experts
with
addi7onal
insights
that
were
not
available
before
• It
is
not
Human
vs.
Machine,
but
Human
and
Machine
together,
best
of
both
worlds
26. CONFIDENTIAL
Net Promoter Score (tradiEonal approach)
Net Promoter Score (NPS) is defined as
% Promoters -‐ % Detractors
where Promoter = 9-‐10, Detractor = 0-‐6 on a scale of 0-‐10 in answer to the
quesEon "How likely is it that you would recommend our company/product/
service to a friend or colleague?”
Thus, NPS ranges from -‐100 to 100.
How good a score is depends on what your compeEtors’ scores are
27. CONFIDENTIAL
Using ML to improve Net Promoter score
Skytree can improve your
Net Promoter Score"
Given a set of exisEng customer NPSs,
Skytree can tell you which variables
(gathered from other data in the
organizaEon) are significant in
producing the NPS score
Skytree can tell you WHY, thus
informing acEons to improve the NPS
score and hence customer loyalty
Instead of using NPS, Skytree could predict
customer loyalty directly, without the
approximaEons required by NPS
Whereas NPS puts all customers in just 3
categories (favorable, neural, not favorable),
Skytree enables targeEng of each customer
individually, giving more accurate and
focused personalized markeEng
Skytree can improve customer
loyalty directly"
28. CONFIDENTIAL
Data ML can use
28
Customer
Demographic
Data
-‐
Primary
household
member’s
age
-‐
Gender
and
marital
status
-‐
Number
of
adults
-‐
Primary
household
member’s
occupa7on
-‐
Household
es7mated
income
and
wealth
ranking
-‐
Number
of
children
and
children’s
age
-‐
Number
of
vehicles
and
vehicle
value
-‐
Credit
card
-‐
Frequent
traveler
-‐
Responder
to
mail
orders
-‐
Dwelling
and
length
of
residence
Customer
Internal
Data:
Informa7on
-‐
Market
channel
-‐
Plan
type
-‐
Bill
agency
-‐
Customer
segmenta7on
code
-‐
Ownership
of
the
company’s
other
products
-‐
Dispute
-‐
Late
fee
charge
-‐
Discount
-‐
Promo7on/save
promo7on
-‐
Addi7onal
lines
-‐
Toll
free
services
-‐
Rewards
redemp7on
-‐
Billing
dispute
Customer
Internal
Data:
Usage
-‐
Weekly
average
call
counts
-‐
Percentage
change
of
minutes
-‐
Share
of
domes7c/interna7onal
revenue
Customer
Contact
Records
-‐
Customer
calls
to
service
centers
-‐
Company’s
mail
contacts
to
customers
-‐
Customer
contact
category:
customer
general
inquiry,
customer
requests
to
change
service,
customer
inquiry
about
cancel
Cancel
Reason
Codes
-‐
Unacceptable
call
quality
-‐
More
favorable
compe7tor’s
pricing
plan
-‐
Misinforma7on
given
by
sales
-‐
Customer
expecta7on
not
met
-‐
Billing
problem,
-‐
Moving
-‐
Change
in
business
A
typical
Telco
set
of
variables
might
include:
29. CONFIDENTIAL
PredicEng Customer Churn
Cost
of
churn:
lost
revenue
+
marke7ng
costs
to
replace
depar7ng
customers
Goal:
predict
customers
at
high
risk
of
churning
while
there
is
s0ll
0me
to
do
something
about
it.
Model
inputs
/
features:
• Customer
micro-‐segments
• Customer
behavior
• Customer
characteris7cs
• Customer-‐company
interac7on
• Micro-‐segment
migra7on
• Note:
much
of
this
requires
fusing
disparate
unstructured
data
sources
Machine
Learning
can
help:
• Predict
customers
at
high
risk
of
churn
months
in
advance
of
actual
or
passive
churn
• Customer
micro-‐segmenta0on
–
iden7fica7on
of
customer
segments
through
unsupervised
learning.
Model
outputs
/
interpretability:
• Iden7ty
of
high-‐risk
churners:
scoring
churn-‐
risk
of
each
customer
• Rela7ve
importance
of
ML
features:
• where
are
customers
experiencing
issues
with
products
or
services?
• Iden7fica7on
of
poten7al
improvements
to
products
or
services
with
highest
impact
on
revenues.
30. CONFIDENTIAL
PrevenEng Customer Churn: PredicEng Impact of MarkeEng AcEons
Maximize
revenue
by
iden7fying
marke7ng
ac7ons
with
highest
probability
of
posi7ve
outcome
• Tailor
marke7ng
ac7on
to
specific
high-‐
risk
customers
• Minimize
offers
to
happy
customers.
Poten7al
Model
inputs:
• Previous
customer
offers
and
the
outcome
of
those
offers
• Customer
micro-‐segments
and
migra7on
over
7me
of
customers
through/between
micro-‐segments
• Customer-‐specific
features,
including
company-‐customer
interac7ons
Machine
Learning
Tasks:
• Rank
and
score
poten7al
marke7ng
ac7ons
on
a
per-‐customer
basis
• Iden7fy
micro-‐segments
as
basis
for
targe7ng
marke7ng
ac7ons
• Predict
customer
life7me
value
Examples
of
Model
Outputs
/
Interpretability:
• List
of
scored
marke7ng
op7ons,
specific
to
each
customer
• Iden7fica7on
of
marke7ng
ac7ons
having
greatest
reten7on
impact.
• Reducing
marke7ng
expense
to
retain
happy
customers.
• Es7ma7on
of
impact
on
customer
life7me
value
of
possible
marke7ng
ac7ons.
31. CONFIDENTIAL
Other ML OpportuniEes in Telecom
OperaEonal:
• Prevent SDN aFacks and related fraud
• Predict most VULNERABLE POINTS in networks
• Predict device/ component FAILURE
• Detect ANOMALOUS behavior, trigger alerts
• AutomaEc PROVISIONING
32. CONFIDENTIAL
Typical Data Science Workflow: Disparate Tools, Manual Processes
Data Prep:
Transform and fuse
data sets using various
tools
Method SelecEon:
Manually pick and try mulEple
Test:
ConEnually verify accuracy
Deployment:
Export model for producEon
Real-‐Eme Scoring
Results
New
Data`
Parameter SelecEon:
Iterate on different
parameters for best results
Pull holdout
data for test
Feature ExtracEon:
Use subset of data due
to performance issues
33. CONFIDENTIAL
• Parallelize without sacrificing accuracy
Built to Scale From the Ground Up for Big Data
• Massive Hadoop scaling with TrueScaleTM
• Runs directly on Hadoop
nodes
• Minimize internode traffic
• Net result: near linear scalability
• Algorithms deeply opEmized
• In memory execuEon
P
A
R
A
L
L
L
E
Z
E
I
CPU
CPU
CPU
CPU
In Memory
ExecuEon
Skytree
Fast
Internode
Communica7on
Hadoop
Data
Node
Hadoop
Data
Node
Hadoop
Data
Node
Hadoop
Data
Node
Hadoop
Data
Node
Hadoop
Data
Node
Hadoop
Data
Node
Hadoop
Data
Node
Hadoop
Data
Node
Skytree
Skytree
Skytree
Skytree
Skytree
Skytree
Skytree
Skytree
Skytree
In Memory
ExecuEon
34. CONFIDENTIAL
Skytree Streamlines and Automates the Data ScienEst Workflow
BeFer PredicEon/
Results
Data Prep:
Broad ML
transformaEons
speed data
extracEon/cleansing
New Data
Single click AutoModel™:
Automated method and
parameter selecEon quickly
derives & verifies best models
Feature ExtracEon:
Use all data you need
for beFer results
Unified
Skytree
Environment
Single Step Train-‐Tune-‐Test
Deployment:
Run on Skytree with streaming
data or export model for
producEon
35. CONFIDENTIAL
Dataset
Size
(Rows)
Accuracy
(Norm. Gini)
100,000
87.8%
200,000
90.1%
400,000
91.3%
800,000
92.6%
1,600,000
93.4%
3,200,000
94.4%
• Source Dataset: Pascal Large Scale
Learning Challenge DNA dataset
• 4M-‐row dataset was held out for
tesEng.
• 6 training datasets from 100K
through 3.2M rows, arranged into
200 columns, were used.
• Tuned StochasEc GBT, trees limited
to 5000
• No featurizaEon applied.
100,000
200,000
400,000
800,000
1,600,000
3,200,000
86.00%
88.00%
90.00%
92.00%
94.00%
96.00%
Accuracy
(Normalized
Gini)
Dataset
Size
(Rows)
Accuracy
as
a
Func0on
of
Data
Set
Size
Scalability Drives BeFer Accuracy
36. CONFIDENTIAL
Taming the Complexity of ML via AutomaEon
• Reduce data scienEsts' Eme by 90 – 95%
• Reduce 60 hours of data science experiment Eme
into 4 hours
• Allowing data scienEsts’ to do more strategic tasks
• Reduce total model experiment Eme by
25 – 75%
• Compress a 3 month final model build into 1 month
• Deploy models faster
• Reduce compute Eme by up to 30%
• Reduce compute Eme from 35 days to 30 days
• Save compute cost and resource
• Get equivalent or beFer model results
0
20
40
60
80
With
AutoModel
Grid
Search
Time
to
Build
Final
Model
using
Skytree
Automa7on
vs.
manually
by
skilled
data
scien7st
(in
hours)
0
5
10
15
With
AutoModel
Grid
Search
Total
Time
Elapsed
to
Complete
Experimenta7on
using
Skytree
Automa7on
vs.
manually
by
skilled
data
scien7st
(in
weeks)
38. CONFIDENTIAL
Data Centric Customer Experience Management
Func0onal
Area
Example
Use
Case
Hortonworks
-‐
Hadoop
SkyTree
–
Machine
Learning
Customer
Experience
Management
360
Degree
Customer
&
Household
View
-‐
Computa7onal
Net
Promoter
Score
&
other
Customers
Metrics
Collec7on
data
across
sources
into
Hadoop
Data
Lake
for
360
degree
view
of
Customer
and
Household:
Yarn
enabled
Hadoop
Architecture
–
Single
set
of
data
cross
the
en7re
cluster
with
mul7ple
access
methods
Inges7on:
Mul7ple
sources
of
unstructured
and
structured
data
include,
CDR,
clickstream,
network
probe
&
log
records,
sensor,
IVR
Voice-‐2-‐Text,
social
media,
OSS/
BSS,
etc
Process
&
Store:
Yarn
enabled
Architecture
–
Single
set
of
data
across
the
en7re
cluster
with
mul7ple
access
methods.
Distributed
storage
in
HDFS
and
many
processed
workloads
managed
by
Yarn
Query
&
Alerts:
Schema
on
read
allows
mul7ple
methods
for
queries
and
alerts
through
different
applica7ons
or
through
HDP
tools
(Hive,
Hbase,
Storm,
etc)
• Understand
which
variables
are
significant
in
producing
the
NPS
score
• Understand
the
WHY
for
an
NPS
score,
thus
informing
ac7ons
to
improve
it
and
hence
customer
loyalty
• Finally,
the
poten7al
to
predict
customer
loyalty
directly,
without
the
approxima7ons
required
by
NPS
• Skytree
enables
targe7ng
of
each
customer
individually,
giving
more
accurate
and
focused
personalized
marke7ng
Customer
Sen7ment
and
Churn
Detec7on
• Tailor
marke7ng
ac7on
to
specific
high-‐risk
customers
• Minimize
offers
to
happy
customers.
• Rank
and
score
poten7al
marke7ng
ac7ons
on
a
per-‐
customer
basis
• Iden7fy
micro-‐segments
as
basis
for
targe7ng
marke7ng
ac7ons
• Predict
customer
life7me
value