SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Applying data science to sales pipelines !
– for fun and profit!
!
Andy Twigg!
Chief Scientist!
WHY APPLY DATA SCIENCE TO SALES?!
Problem: sales teams are biased!
!
•  Unrealistic targets – “you must have 3x coverage”!
•  Happy ears – “they said they’ll definitely buy it”!
•  Sandbagging – reps want to look like heroes, so don’t report deals
until late in the quarter!
We should be able to remove these biases!
•  Stat: since 1995, CRM data has increased ~150x, but forecast
accuracy has reduced by 10% !
!
è data is available, but not helping!
PROBLEMS!
Opportunity Scoring!
•  Pr(win) ?!
•  Pr(win in quarter) ?!
•  How does this compare to sales team commits?!
•  Which deals can we influence most?!
Forecasting!
•  How much will be won this quarter?!
SALES OPPORTUNITIES!
•  Opportunities are temporal, either open or closed. Once closed, either won/lost!
•  Usually proceed through stages, except:!
•  Stages are a partial order - can skip / revisit!
•  An opportunity can be entered as closed (no open observations)!
•  As the opportunity evolves, we get more and more data about the opportunity!
•  Sales teams mark an opportunity ‘committed’ – they predict win within the quarter!
•  A pipeline is a set of open opportunities!
•  We want to estimate Pr(final outcome = won), Pr(closed before time t), …!
Lead
created!
Stage:
Qualifying!
Email sent! Email opened! Amount=
$1000! Call!
Stage:
Validate!
Meeting! Demo!
Close date!
changed!
Stage:
negotiation!
Outcome:
Closed/won!
open closed
committed
•  sales team: good precision (~70-80%) but poor recall (~10-40%)!
•  model won precision ~ sales team won precision!
•  model won recall ~ 3 x sales team won recall!
First observation Last observation
precision recall F1 precision recall F1
model 0.65 0.86 0.74 0.75 0.93 0.83
sales team 0.70 0.07 0.13 0.87 0.45 0.59
ANATOMY OF AN OPPTY!
ANATOMY OF AN OPPTY!
Pushed out
Pulled back
in
Final outcome:
won
Committed
here (by the
sales rep)
ANATOMY OF AN OPPTY!
Pushed out
Pulled back
in
Final outcome:
won
Committed
here (by the
sales rep)
Predicted
won from
the start
Predicted won
in the correct
quarter
SALES OPPORTUNITIES!
Lead
created!
Stage:
Qualifying!
Email sent! Email opened! Amount=
$1000! Call!
Stage:
Validate!
Meeting! Demo!
Close date!
changed!
Stage:
negotiation!
Outcome:
Closed/won!
state!
xt!
state!
…!
x0!
y=1!
Lead
created!
Stage:
Qualifying!
Email sent! Email opened! Amount=
$1000! Call!
Stage:
Validate!
Meeting! Demo!
Close date!
changed!
Stage:
negotiation!
Outcome:
Closed/won!
SALES OPPORTUNITIES!
state!
xt!
state!
…!
x0!
•  Sequence of observations x0, x1, … !
•  associated with fixed target y={0,1}!
•  Consider states as a MDP: state xt encodes temporal features
about previous states (cf RMF features)!
•  # times this stage was previously visited, time between successive
visits, time in current stage, direction of amount change, …!
y=1!
•  Sequence of observations x0, x1, … !
•  associated with fixed target y={0,1}!
•  Consider states as a MDP: state xt encodes temporal features
about previous states (cf RMF features)!
•  # times this stage was previously visited, time between successive
visits, time in current stage, direction of amount change, …!
•  States also contain!
•  Sales-specific features e.g. momentum!
•  External data e.g. firmographic!
•  Global features e.g. avg_sales_cycle(target)!
•  Gives examples {(x0,y),(x1,y),…} for each opportunity!
•  Shuffle to break correlations between successive examples!
SALES OPPORTUNITIES!
y=1!
state!
xt!
state!
…!
x0!
Lead
created!
Stage:
Qualifying!
Email sent! Email opened! Amount=
$1000! Call!
Stage:
Validate!
Meeting! Demo!
Close date!
changed!
Stage:
negotiation!
Outcome:
Closed/won!
DURATION MODEL!
•  Win/loss model!
•  Pr(win)!
•  independent of time horizon!
•  RF/GBDT!
!
•  Duration model!
•  Pr(win within quarter)!
•  Poisson regression: assume that in current state xt, fixed probability of closing each day!
•  Train a model to predict expected duration d, conditioned on outcome=win!
•  Integrating corresponding exponential distribution gives Pr(close < t) (interarrival times)!
•  Pr(win < t) = Pr(win) Pr(close < t | win)!
FORECASTING: BOTTOM-UP!
Bottom-up: Predict current quarter based
on currently open pipeline!
!
Considers quality of deals in pipeline!
!
Ignores trends, deals not in pipeline!
$265,410!
$157,000
77%
$200,000
37%
$82,000
86%
+!
-!
Obvious solution: expected amount in
pipeline wrt Pr(win in quarter) scores!
FORECASTING: TOP-DOWN!
Top-down: Predict current quarter based on
previous quarters!
!
Accounts for seasonality and trending!
!
Ignores state of current pipeline!
0.0e+002.5e+08
observed
5.0e+072.5e+08
trend
−5e+065e+06
seasonal
−1e+075e+06
2013.0 2013.2 2013.4 2013.6 2013.8 2014.0 2014.2 2014.4
random
Time
Decomposition of additive time series
+!
-!
Typical decomposition of
revenue time series into 3
components:!
!
•  Trend component!
•  Seasonal component!
•  Random component!
Idea: try to reduce the
random component by taking
into account current pipeline!
‘HYBRID’ FORECASTING!
top down + bottom up!
•  Idea: augment ARIMA model with side
information from bottom-up model!
•  Allows model to adjust coefficients in
response to bottom-up features
(representing current pipeline) while
retaining ARIMA features !
•  Amount predicted to close in
current quarter!
•  Average score of currently open
opportunities!
•  Average predicted days to close!
•  Historic adjusted coverage ratios!
!
•  Sometimes known as ARIMAX [1]!
[1] robjhyndman.com/hyndsight/arimax!
!
WORD VECTORS!
•  Train word2vec model on text fields
on opportunities!
•  description, status, risks, …!
•  “deal pushed out because no
budget this quarter”!
!
•  ~200m words!
•  Gives 300-dimensional ‘neural’ word
embeddings!
•  Compare to GoogleNews model!
•  Learned some sales-specific
concepts!
In [23]: model.most_similar('lost')!
Out[23]:!
[('disqualified', 0.7105633020401001),!
('killed', 0.6871206164360046),!
('won', 0.6662579774856567),!
('abandoned', 0.6619119048118591),!
('closing', 0.6464139223098755),!
('moved', 0.6406350135803223),!
('reopened', 0.6268107891082764),!
('closed_lost', 0.6187739968299866),!
('low_probability', 0.6092942953109741),!
('closed', 0.6073518395423889)]!
!
In [24]: gn_model.most_similar('lost')!
Out[24]:!
[(u'losing', 0.7544215321540833),!
(u'lose', 0.7136349081993103),!
(u'regained', 0.618366003036499),!
(u'loses', 0.6115548610687256),!
(u'loosing', 0.576453447341919),!
(u'gained', 0.5561528205871582),!
(u'dropped', 0.5492223501205444),!
(u'loss', 0.5399519205093384),!
(u'won', 0.5263957977294922),!
(u'regain', 0.5241336822509766)]!
WORD VECTORS! In [8]: model.most_similar('pushed')!
Out[8]:!
[('moved', 0.8117796778678894),!
('pushing', 0.72132408618927),!
('delayed', 0.7004601955413818),!
('stalled', 0.6817235946655273),!
('indefinitely', 0.6797506809234619),!
('until', 0.6696473360061646),!
('shelved', 0.6633578538894653),!
('slowed_down', 0.6619900465011597),!
('might_slip', 0.6591036915779114),!
('gone', 0.6582096815109253)]!
!
In [9]: gn_model.most_similar('pushed')!
Out[9]:!
[(u'pushing', 0.762706458568573),!
(u'push', 0.695708692073822),!
(u'nudged', 0.6802582144737244),!
(u'shoved', 0.6162334084510803),!
(u'bumped', 0.6148176789283752),!
(u'pushes', 0.610393762588501),!
(u'dragged', 0.5916476845741272),!
(u'pulled', 0.5719939470291138),!
(u'moved', 0.5660783052444458),!
(u'inched', 0.5563575029373169)]!
In [49]: model.most_similar('sdr')!
Out[49]:!
[('mktg', 0.6193182468414307),!
('lead_gen', 0.5637482404708862),!
('ppl', 0.5618690252304077),!
('lss', 0.5492127537727356),!
('reps', 0.5445878505706787),!
('cold_calling', 0.5426461696624756),!
('mkt', 0.5422939658164978),!
('marketo', 0.5341131687164307),!
('team', 0.532421886920929),!
('guru', 0.5259524583816528)]!
!
In [50]: gn_model.most_similar('sdr')!
!
!
KeyError: "word 'sdr' not in vocabulary"!
We’re hiring!
!
data {scientists, engineers}!
!
!
andy.twigg@insidesales.com!

Mais conteúdo relacionado

Semelhante a Applying data science to sales pipelines — for fun and profit

Brisbane Shopify Meetup - 7th June 2017
Brisbane Shopify Meetup - 7th June 2017 Brisbane Shopify Meetup - 7th June 2017
Brisbane Shopify Meetup - 7th June 2017 Reload Media
 
Cost volume analysis
Cost volume analysisCost volume analysis
Cost volume analysisJanak Secktoo
 
Business in Motion
Business in MotionBusiness in Motion
Business in Motionnomadant
 
Business planning for social entrepreneurs
Business planning  for social entrepreneursBusiness planning  for social entrepreneurs
Business planning for social entrepreneursAlberto Cottica
 
Lsmto lean canvas ranking by profit forecasting
Lsmto lean canvas ranking by profit forecastingLsmto lean canvas ranking by profit forecasting
Lsmto lean canvas ranking by profit forecastingPeter LePiane
 
Profits, not sales for Keystone
Profits, not sales for KeystoneProfits, not sales for Keystone
Profits, not sales for KeystoneThom Finn
 
Building a Repeatable, Scalable & Profitable Growth Process
Building a Repeatable, Scalable & Profitable Growth ProcessBuilding a Repeatable, Scalable & Profitable Growth Process
Building a Repeatable, Scalable & Profitable Growth ProcessDavid Skok
 
Autotask how to stop being a whiner 2013
Autotask how to stop being a whiner 2013Autotask how to stop being a whiner 2013
Autotask how to stop being a whiner 2013Ronnie Parisella
 
Trade Shows Optimization
Trade Shows OptimizationTrade Shows Optimization
Trade Shows Optimizationtedfinch
 
#Measurefest : 20 Simple Ways to Fuck Up your AB tests
#Measurefest : 20 Simple Ways to Fuck Up your AB tests#Measurefest : 20 Simple Ways to Fuck Up your AB tests
#Measurefest : 20 Simple Ways to Fuck Up your AB testsCraig Sullivan
 
Mass affluent lead gen and web based marketing for financial professionals
Mass affluent lead gen and web based marketing for financial professionalsMass affluent lead gen and web based marketing for financial professionals
Mass affluent lead gen and web based marketing for financial professionalsLoic Jeanjean
 
Revenue Reporting: Your Genie in a Bottle
Revenue Reporting: Your Genie in a BottleRevenue Reporting: Your Genie in a Bottle
Revenue Reporting: Your Genie in a BottleMarketo
 
Zero to $50M – A Roadmap of the Key Stages, and How to Win at Each Stage
Zero to $50M – A Roadmap of the Key Stages, and How to Win at Each StageZero to $50M – A Roadmap of the Key Stages, and How to Win at Each Stage
Zero to $50M – A Roadmap of the Key Stages, and How to Win at Each Stagesaastr
 
Day1 track session_1_b_ryan_cheyne
Day1 track session_1_b_ryan_cheyneDay1 track session_1_b_ryan_cheyne
Day1 track session_1_b_ryan_cheyneTheFocusGroup
 
Day1 track session_1_b_ryan_cheyne
Day1 track session_1_b_ryan_cheyneDay1 track session_1_b_ryan_cheyne
Day1 track session_1_b_ryan_cheyneTheFocusGroup
 
Financial Planning/Budgeting - Entrepreneurship 101
Financial Planning/Budgeting - Entrepreneurship 101Financial Planning/Budgeting - Entrepreneurship 101
Financial Planning/Budgeting - Entrepreneurship 101MaRS Discovery District
 
How To Sell More At Higher Prices Keynote 2012
How To Sell More At Higher Prices Keynote 2012How To Sell More At Higher Prices Keynote 2012
How To Sell More At Higher Prices Keynote 2012Orvel Ray Wilson, CSP
 
David Skok on The SaaS Founder's Journey || SAAS NORTH 2017
David Skok on The SaaS Founder's Journey || SAAS NORTH 2017David Skok on The SaaS Founder's Journey || SAAS NORTH 2017
David Skok on The SaaS Founder's Journey || SAAS NORTH 2017L-SPARK
 
Business Model Analytics - MaRS Best Practices
Business Model Analytics - MaRS Best PracticesBusiness Model Analytics - MaRS Best Practices
Business Model Analytics - MaRS Best PracticesMaRS Discovery District
 

Semelhante a Applying data science to sales pipelines — for fun and profit (20)

Brisbane Shopify Meetup - 7th June 2017
Brisbane Shopify Meetup - 7th June 2017 Brisbane Shopify Meetup - 7th June 2017
Brisbane Shopify Meetup - 7th June 2017
 
Cost volume analysis
Cost volume analysisCost volume analysis
Cost volume analysis
 
Business in Motion
Business in MotionBusiness in Motion
Business in Motion
 
Business planning for social entrepreneurs
Business planning  for social entrepreneursBusiness planning  for social entrepreneurs
Business planning for social entrepreneurs
 
Lsmto lean canvas ranking by profit forecasting
Lsmto lean canvas ranking by profit forecastingLsmto lean canvas ranking by profit forecasting
Lsmto lean canvas ranking by profit forecasting
 
Profits, not sales for Keystone
Profits, not sales for KeystoneProfits, not sales for Keystone
Profits, not sales for Keystone
 
Building a Repeatable, Scalable & Profitable Growth Process
Building a Repeatable, Scalable & Profitable Growth ProcessBuilding a Repeatable, Scalable & Profitable Growth Process
Building a Repeatable, Scalable & Profitable Growth Process
 
Autotask how to stop being a whiner 2013
Autotask how to stop being a whiner 2013Autotask how to stop being a whiner 2013
Autotask how to stop being a whiner 2013
 
Trade Shows Optimization
Trade Shows OptimizationTrade Shows Optimization
Trade Shows Optimization
 
#Measurefest : 20 Simple Ways to Fuck Up your AB tests
#Measurefest : 20 Simple Ways to Fuck Up your AB tests#Measurefest : 20 Simple Ways to Fuck Up your AB tests
#Measurefest : 20 Simple Ways to Fuck Up your AB tests
 
Mass affluent lead gen and web based marketing for financial professionals
Mass affluent lead gen and web based marketing for financial professionalsMass affluent lead gen and web based marketing for financial professionals
Mass affluent lead gen and web based marketing for financial professionals
 
Revenue Reporting: Your Genie in a Bottle
Revenue Reporting: Your Genie in a BottleRevenue Reporting: Your Genie in a Bottle
Revenue Reporting: Your Genie in a Bottle
 
Zero to $50M – A Roadmap of the Key Stages, and How to Win at Each Stage
Zero to $50M – A Roadmap of the Key Stages, and How to Win at Each StageZero to $50M – A Roadmap of the Key Stages, and How to Win at Each Stage
Zero to $50M – A Roadmap of the Key Stages, and How to Win at Each Stage
 
Zero to 50m
Zero to 50m Zero to 50m
Zero to 50m
 
Day1 track session_1_b_ryan_cheyne
Day1 track session_1_b_ryan_cheyneDay1 track session_1_b_ryan_cheyne
Day1 track session_1_b_ryan_cheyne
 
Day1 track session_1_b_ryan_cheyne
Day1 track session_1_b_ryan_cheyneDay1 track session_1_b_ryan_cheyne
Day1 track session_1_b_ryan_cheyne
 
Financial Planning/Budgeting - Entrepreneurship 101
Financial Planning/Budgeting - Entrepreneurship 101Financial Planning/Budgeting - Entrepreneurship 101
Financial Planning/Budgeting - Entrepreneurship 101
 
How To Sell More At Higher Prices Keynote 2012
How To Sell More At Higher Prices Keynote 2012How To Sell More At Higher Prices Keynote 2012
How To Sell More At Higher Prices Keynote 2012
 
David Skok on The SaaS Founder's Journey || SAAS NORTH 2017
David Skok on The SaaS Founder's Journey || SAAS NORTH 2017David Skok on The SaaS Founder's Journey || SAAS NORTH 2017
David Skok on The SaaS Founder's Journey || SAAS NORTH 2017
 
Business Model Analytics - MaRS Best Practices
Business Model Analytics - MaRS Best PracticesBusiness Model Analytics - MaRS Best Practices
Business Model Analytics - MaRS Best Practices
 

Mais de Turi, Inc.

Webinar - Analyzing Video
Webinar - Analyzing VideoWebinar - Analyzing Video
Webinar - Analyzing VideoTuri, Inc.
 
Webinar - Patient Readmission Risk
Webinar - Patient Readmission RiskWebinar - Patient Readmission Risk
Webinar - Patient Readmission RiskTuri, Inc.
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Turi, Inc.
 
Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)Turi, Inc.
 
Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)Turi, Inc.
 
Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Turi, Inc.
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsTuri, Inc.
 
Pattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log DataPattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log DataTuri, Inc.
 
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsIntelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsTuri, Inc.
 
Text Analysis with Machine Learning
Text Analysis with Machine LearningText Analysis with Machine Learning
Text Analysis with Machine LearningTuri, Inc.
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab CreateTuri, Inc.
 
Machine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive ServicesMachine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive ServicesTuri, Inc.
 
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos GuestrinMachine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos GuestrinTuri, Inc.
 
Scalable data structures for data science
Scalable data structures for data scienceScalable data structures for data science
Scalable data structures for data scienceTuri, Inc.
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Turi, Inc.
 
Introduction to Recommender Systems
Introduction to Recommender SystemsIntroduction to Recommender Systems
Introduction to Recommender SystemsTuri, Inc.
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in productionTuri, Inc.
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringTuri, Inc.
 
Building Personalized Data Products with Dato
Building Personalized Data Products with DatoBuilding Personalized Data Products with Dato
Building Personalized Data Products with DatoTuri, Inc.
 

Mais de Turi, Inc. (20)

Webinar - Analyzing Video
Webinar - Analyzing VideoWebinar - Analyzing Video
Webinar - Analyzing Video
 
Webinar - Patient Readmission Risk
Webinar - Patient Readmission RiskWebinar - Patient Readmission Risk
Webinar - Patient Readmission Risk
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)
 
Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)
 
Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)
 
Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
 
Pattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log DataPattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log Data
 
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsIntelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning Toolkits
 
Text Analysis with Machine Learning
Text Analysis with Machine LearningText Analysis with Machine Learning
Text Analysis with Machine Learning
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab Create
 
Machine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive ServicesMachine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive Services
 
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos GuestrinMachine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos Guestrin
 
Scalable data structures for data science
Scalable data structures for data scienceScalable data structures for data science
Scalable data structures for data science
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
 
Introduction to Recommender Systems
Introduction to Recommender SystemsIntroduction to Recommender Systems
Introduction to Recommender Systems
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
 
SFrame
SFrameSFrame
SFrame
 
Building Personalized Data Products with Dato
Building Personalized Data Products with DatoBuilding Personalized Data Products with Dato
Building Personalized Data Products with Dato
 

Último

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Último (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Applying data science to sales pipelines — for fun and profit

  • 1. Applying data science to sales pipelines ! – for fun and profit! ! Andy Twigg! Chief Scientist!
  • 2. WHY APPLY DATA SCIENCE TO SALES?! Problem: sales teams are biased! ! •  Unrealistic targets – “you must have 3x coverage”! •  Happy ears – “they said they’ll definitely buy it”! •  Sandbagging – reps want to look like heroes, so don’t report deals until late in the quarter! We should be able to remove these biases! •  Stat: since 1995, CRM data has increased ~150x, but forecast accuracy has reduced by 10% ! ! è data is available, but not helping!
  • 3. PROBLEMS! Opportunity Scoring! •  Pr(win) ?! •  Pr(win in quarter) ?! •  How does this compare to sales team commits?! •  Which deals can we influence most?! Forecasting! •  How much will be won this quarter?!
  • 4. SALES OPPORTUNITIES! •  Opportunities are temporal, either open or closed. Once closed, either won/lost! •  Usually proceed through stages, except:! •  Stages are a partial order - can skip / revisit! •  An opportunity can be entered as closed (no open observations)! •  As the opportunity evolves, we get more and more data about the opportunity! •  Sales teams mark an opportunity ‘committed’ – they predict win within the quarter! •  A pipeline is a set of open opportunities! •  We want to estimate Pr(final outcome = won), Pr(closed before time t), …! Lead created! Stage: Qualifying! Email sent! Email opened! Amount= $1000! Call! Stage: Validate! Meeting! Demo! Close date! changed! Stage: negotiation! Outcome: Closed/won! open closed committed
  • 5.
  • 6.
  • 7. •  sales team: good precision (~70-80%) but poor recall (~10-40%)! •  model won precision ~ sales team won precision! •  model won recall ~ 3 x sales team won recall! First observation Last observation precision recall F1 precision recall F1 model 0.65 0.86 0.74 0.75 0.93 0.83 sales team 0.70 0.07 0.13 0.87 0.45 0.59
  • 8.
  • 9. ANATOMY OF AN OPPTY!
  • 10. ANATOMY OF AN OPPTY! Pushed out Pulled back in Final outcome: won Committed here (by the sales rep)
  • 11. ANATOMY OF AN OPPTY! Pushed out Pulled back in Final outcome: won Committed here (by the sales rep) Predicted won from the start Predicted won in the correct quarter
  • 12. SALES OPPORTUNITIES! Lead created! Stage: Qualifying! Email sent! Email opened! Amount= $1000! Call! Stage: Validate! Meeting! Demo! Close date! changed! Stage: negotiation! Outcome: Closed/won! state! xt! state! …! x0! y=1!
  • 13. Lead created! Stage: Qualifying! Email sent! Email opened! Amount= $1000! Call! Stage: Validate! Meeting! Demo! Close date! changed! Stage: negotiation! Outcome: Closed/won! SALES OPPORTUNITIES! state! xt! state! …! x0! •  Sequence of observations x0, x1, … ! •  associated with fixed target y={0,1}! •  Consider states as a MDP: state xt encodes temporal features about previous states (cf RMF features)! •  # times this stage was previously visited, time between successive visits, time in current stage, direction of amount change, …! y=1!
  • 14. •  Sequence of observations x0, x1, … ! •  associated with fixed target y={0,1}! •  Consider states as a MDP: state xt encodes temporal features about previous states (cf RMF features)! •  # times this stage was previously visited, time between successive visits, time in current stage, direction of amount change, …! •  States also contain! •  Sales-specific features e.g. momentum! •  External data e.g. firmographic! •  Global features e.g. avg_sales_cycle(target)! •  Gives examples {(x0,y),(x1,y),…} for each opportunity! •  Shuffle to break correlations between successive examples! SALES OPPORTUNITIES! y=1! state! xt! state! …! x0! Lead created! Stage: Qualifying! Email sent! Email opened! Amount= $1000! Call! Stage: Validate! Meeting! Demo! Close date! changed! Stage: negotiation! Outcome: Closed/won!
  • 15. DURATION MODEL! •  Win/loss model! •  Pr(win)! •  independent of time horizon! •  RF/GBDT! ! •  Duration model! •  Pr(win within quarter)! •  Poisson regression: assume that in current state xt, fixed probability of closing each day! •  Train a model to predict expected duration d, conditioned on outcome=win! •  Integrating corresponding exponential distribution gives Pr(close < t) (interarrival times)! •  Pr(win < t) = Pr(win) Pr(close < t | win)!
  • 16. FORECASTING: BOTTOM-UP! Bottom-up: Predict current quarter based on currently open pipeline! ! Considers quality of deals in pipeline! ! Ignores trends, deals not in pipeline! $265,410! $157,000 77% $200,000 37% $82,000 86% +! -! Obvious solution: expected amount in pipeline wrt Pr(win in quarter) scores!
  • 17. FORECASTING: TOP-DOWN! Top-down: Predict current quarter based on previous quarters! ! Accounts for seasonality and trending! ! Ignores state of current pipeline! 0.0e+002.5e+08 observed 5.0e+072.5e+08 trend −5e+065e+06 seasonal −1e+075e+06 2013.0 2013.2 2013.4 2013.6 2013.8 2014.0 2014.2 2014.4 random Time Decomposition of additive time series +! -! Typical decomposition of revenue time series into 3 components:! ! •  Trend component! •  Seasonal component! •  Random component! Idea: try to reduce the random component by taking into account current pipeline!
  • 18. ‘HYBRID’ FORECASTING! top down + bottom up! •  Idea: augment ARIMA model with side information from bottom-up model! •  Allows model to adjust coefficients in response to bottom-up features (representing current pipeline) while retaining ARIMA features ! •  Amount predicted to close in current quarter! •  Average score of currently open opportunities! •  Average predicted days to close! •  Historic adjusted coverage ratios! ! •  Sometimes known as ARIMAX [1]! [1] robjhyndman.com/hyndsight/arimax! !
  • 19. WORD VECTORS! •  Train word2vec model on text fields on opportunities! •  description, status, risks, …! •  “deal pushed out because no budget this quarter”! ! •  ~200m words! •  Gives 300-dimensional ‘neural’ word embeddings! •  Compare to GoogleNews model! •  Learned some sales-specific concepts! In [23]: model.most_similar('lost')! Out[23]:! [('disqualified', 0.7105633020401001),! ('killed', 0.6871206164360046),! ('won', 0.6662579774856567),! ('abandoned', 0.6619119048118591),! ('closing', 0.6464139223098755),! ('moved', 0.6406350135803223),! ('reopened', 0.6268107891082764),! ('closed_lost', 0.6187739968299866),! ('low_probability', 0.6092942953109741),! ('closed', 0.6073518395423889)]! ! In [24]: gn_model.most_similar('lost')! Out[24]:! [(u'losing', 0.7544215321540833),! (u'lose', 0.7136349081993103),! (u'regained', 0.618366003036499),! (u'loses', 0.6115548610687256),! (u'loosing', 0.576453447341919),! (u'gained', 0.5561528205871582),! (u'dropped', 0.5492223501205444),! (u'loss', 0.5399519205093384),! (u'won', 0.5263957977294922),! (u'regain', 0.5241336822509766)]!
  • 20. WORD VECTORS! In [8]: model.most_similar('pushed')! Out[8]:! [('moved', 0.8117796778678894),! ('pushing', 0.72132408618927),! ('delayed', 0.7004601955413818),! ('stalled', 0.6817235946655273),! ('indefinitely', 0.6797506809234619),! ('until', 0.6696473360061646),! ('shelved', 0.6633578538894653),! ('slowed_down', 0.6619900465011597),! ('might_slip', 0.6591036915779114),! ('gone', 0.6582096815109253)]! ! In [9]: gn_model.most_similar('pushed')! Out[9]:! [(u'pushing', 0.762706458568573),! (u'push', 0.695708692073822),! (u'nudged', 0.6802582144737244),! (u'shoved', 0.6162334084510803),! (u'bumped', 0.6148176789283752),! (u'pushes', 0.610393762588501),! (u'dragged', 0.5916476845741272),! (u'pulled', 0.5719939470291138),! (u'moved', 0.5660783052444458),! (u'inched', 0.5563575029373169)]! In [49]: model.most_similar('sdr')! Out[49]:! [('mktg', 0.6193182468414307),! ('lead_gen', 0.5637482404708862),! ('ppl', 0.5618690252304077),! ('lss', 0.5492127537727356),! ('reps', 0.5445878505706787),! ('cold_calling', 0.5426461696624756),! ('mkt', 0.5422939658164978),! ('marketo', 0.5341131687164307),! ('team', 0.532421886920929),! ('guru', 0.5259524583816528)]! ! In [50]: gn_model.most_similar('sdr')! ! ! KeyError: "word 'sdr' not in vocabulary"!
  • 21. We’re hiring! ! data {scientists, engineers}! ! ! andy.twigg@insidesales.com!