SlideShare uma empresa Scribd logo
1 de 27
Baixar para ler offline
Live predictions with
schemaless data at scale
Ondrej Brichta
CONFIDENTIAL
WE ARE HIRING
Our problem at Exponea
What we have:
● Terabytes of customer data
● Means to process all the data fast
● Customers with business understanding (or our consultants)
What we needed:
● Probability classificator
○ For any chosen event or attribute
○ Able to process any given data and return a reasonable output without human interaction
Our problem at Exponea
One specific problem:
● How to choose which is the best banner to show right now?
● Web page personalization?
Solution?
In session prediction
● Predict probability of buying for each customer live when they are online
What steps do we need to do and automate?
1. Business understanding
2. Data understanding
3. Data preparation
4. Modeling
5. Evaluation
6. Deployment
1. Business understanding
Key questions:
● What problem do we have?
● What are our use cases?
● Whom are we creating this model for?
1. Business understanding
Key questions:
● What problem do we have? - We want to optimize our campaigns, ...
● What are our use cases? - Should I show this banner?
○ Translate this use case into machine learning - predict whether customer will buy something
● Whom are we creating this model for? - NOT FOR US, our customers!
1. Business understanding
Key questions:
● What problem do we have? - We want to optimize our campaigns, ...
● What are our use cases? - Should I show this banner? ...send this email?
○ Translate this use case into machine learning - predict whether customer will buy something
● Whom are we creating this model for? - NOT FOR US, our customers!
Hard or impossible to automate, but!
We can delegate this to customers or consultants as easier problems.
● Create few templates to choose from
○ At least few easy to use and few advanced to modify the model
2. Data understanding
● Closely related to business understanding
● Most important part of machine learning!
● Key questions
○ What data do i have?
○ Is the data valuable?
○ What do I want to predict?
○ Should I train on all data or only subset? <- crucial
● Changes here can influence outcome by the largest amount
2. Data understanding
Automation?
Yes, but difficult...
2. Data understanding
● Have pre sets for each template
○ We have in session predictions just for this problem
● Experiment with variables (date ranges, data filters,...)
○ We have variables set to best values for all projects. Can be manually adjusted by user
● Have a model providing unselected variables
3. Data preparation
● Cleaning the data - have some common structure for easy selection
● Derive attributes - have general easy aggregates (count, first, last,...)
● Dataset balancing - easy rules to determine which technique to use
● Aggregates - have a model which selects which complex aggregates to use
● Reducing dimensionality - feature selection
● Mapping data to “clean” values - most frequent items, vector indexer
● Casting data to vectors - vector assembler
3. Data preparation
3. Data preparation
● Customers connect from different platforms
● Customers log in after few events
3. Data preparation
3. Data preparation
4. Modeling
4. Modeling
5. Evaluation
● How good is my model?
● ROC, F1, Lift curve
5. Evaluation
6. Deployment
How do you want to use it? Must it be fast? Online or Offline?
6. Deployment
For us, Fast and Online!
6. Deployment
6. Deployment
Other deployments:
● Fast and Offline -> pre computed results for each customer
● Slow and Online -> improved results with slower algorithms
● Slow and Offline -> you messed up. Rework your solution.
App overview
Ask me anything
ondrej.brichta@exponea.com
CONFIDENTIAL
WE ARE HIRING

Mais conteúdo relacionado

Semelhante a Live predictions with schemaless data at scale. MLMU Kosice, Exponea

Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makerszekeLabs Technologies
 
"What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual..."What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual...Dataconomy Media
 
A journey of ai driven analytics insights engine
A journey of  ai driven analytics insights engineA journey of  ai driven analytics insights engine
A journey of ai driven analytics insights engineSomya Anand
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemVMware Tanzu
 
Using machine learning for customer service (Data Talks Club)
Using machine learning for customer service (Data Talks Club)Using machine learning for customer service (Data Talks Club)
Using machine learning for customer service (Data Talks Club)Neal Lathia
 
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016MLconf
 
Key Tactics for a Successful Product Launch by Kespry Senior PM
Key Tactics for a Successful Product Launch by Kespry Senior PMKey Tactics for a Successful Product Launch by Kespry Senior PM
Key Tactics for a Successful Product Launch by Kespry Senior PMProduct School
 
Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Albert Y. C. Chen
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or realityAwantik Das
 
Better Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolBetter Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolLouis Cialdella
 
What Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PMWhat Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PMProduct School
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the EnterpriseSrinath Perera
 
How to Use Visual Graphics for User Stories by Sony Crackle PM
How to Use Visual Graphics for User Stories by Sony Crackle PMHow to Use Visual Graphics for User Stories by Sony Crackle PM
How to Use Visual Graphics for User Stories by Sony Crackle PMProduct School
 
Data Integration and Marketing Attribution
Data Integration and Marketing Attribution Data Integration and Marketing Attribution
Data Integration and Marketing Attribution ROIVENUE™
 
Panaseer Sr. PM on How to Use Data as a Product Manager
Panaseer Sr. PM on How to Use Data as a Product ManagerPanaseer Sr. PM on How to Use Data as a Product Manager
Panaseer Sr. PM on How to Use Data as a Product ManagerProduct School
 
How to Use Data for Product Decisions by YouTube Product Manager
How to Use Data for Product Decisions by YouTube Product ManagerHow to Use Data for Product Decisions by YouTube Product Manager
How to Use Data for Product Decisions by YouTube Product ManagerProduct School
 

Semelhante a Live predictions with schemaless data at scale. MLMU Kosice, Exponea (20)

Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
"What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual..."What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual...
 
A journey of ai driven analytics insights engine
A journey of  ai driven analytics insights engineA journey of  ai driven analytics insights engine
A journey of ai driven analytics insights engine
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation System
 
Using machine learning for customer service (Data Talks Club)
Using machine learning for customer service (Data Talks Club)Using machine learning for customer service (Data Talks Club)
Using machine learning for customer service (Data Talks Club)
 
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
 
Key Tactics for a Successful Product Launch by Kespry Senior PM
Key Tactics for a Successful Product Launch by Kespry Senior PMKey Tactics for a Successful Product Launch by Kespry Senior PM
Key Tactics for a Successful Product Launch by Kespry Senior PM
 
DS Life Cycle
DS Life CycleDS Life Cycle
DS Life Cycle
 
DS Life Cycle
DS Life CycleDS Life Cycle
DS Life Cycle
 
Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Better Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolBetter Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product School
 
What Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PMWhat Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PM
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the Enterprise
 
How to Use Visual Graphics for User Stories by Sony Crackle PM
How to Use Visual Graphics for User Stories by Sony Crackle PMHow to Use Visual Graphics for User Stories by Sony Crackle PM
How to Use Visual Graphics for User Stories by Sony Crackle PM
 
Data Integration and Marketing Attribution
Data Integration and Marketing Attribution Data Integration and Marketing Attribution
Data Integration and Marketing Attribution
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
 
3 types of monitoring for 2020
3 types of monitoring for 20203 types of monitoring for 2020
3 types of monitoring for 2020
 
Panaseer Sr. PM on How to Use Data as a Product Manager
Panaseer Sr. PM on How to Use Data as a Product ManagerPanaseer Sr. PM on How to Use Data as a Product Manager
Panaseer Sr. PM on How to Use Data as a Product Manager
 
How to Use Data for Product Decisions by YouTube Product Manager
How to Use Data for Product Decisions by YouTube Product ManagerHow to Use Data for Product Decisions by YouTube Product Manager
How to Use Data for Product Decisions by YouTube Product Manager
 

Último

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 

Último (20)

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 

Live predictions with schemaless data at scale. MLMU Kosice, Exponea

  • 1. Live predictions with schemaless data at scale Ondrej Brichta
  • 3. Our problem at Exponea What we have: ● Terabytes of customer data ● Means to process all the data fast ● Customers with business understanding (or our consultants) What we needed: ● Probability classificator ○ For any chosen event or attribute ○ Able to process any given data and return a reasonable output without human interaction
  • 4. Our problem at Exponea One specific problem: ● How to choose which is the best banner to show right now? ● Web page personalization? Solution? In session prediction ● Predict probability of buying for each customer live when they are online
  • 5. What steps do we need to do and automate? 1. Business understanding 2. Data understanding 3. Data preparation 4. Modeling 5. Evaluation 6. Deployment
  • 6. 1. Business understanding Key questions: ● What problem do we have? ● What are our use cases? ● Whom are we creating this model for?
  • 7. 1. Business understanding Key questions: ● What problem do we have? - We want to optimize our campaigns, ... ● What are our use cases? - Should I show this banner? ○ Translate this use case into machine learning - predict whether customer will buy something ● Whom are we creating this model for? - NOT FOR US, our customers!
  • 8. 1. Business understanding Key questions: ● What problem do we have? - We want to optimize our campaigns, ... ● What are our use cases? - Should I show this banner? ...send this email? ○ Translate this use case into machine learning - predict whether customer will buy something ● Whom are we creating this model for? - NOT FOR US, our customers! Hard or impossible to automate, but! We can delegate this to customers or consultants as easier problems. ● Create few templates to choose from ○ At least few easy to use and few advanced to modify the model
  • 9. 2. Data understanding ● Closely related to business understanding ● Most important part of machine learning! ● Key questions ○ What data do i have? ○ Is the data valuable? ○ What do I want to predict? ○ Should I train on all data or only subset? <- crucial ● Changes here can influence outcome by the largest amount
  • 11. 2. Data understanding ● Have pre sets for each template ○ We have in session predictions just for this problem ● Experiment with variables (date ranges, data filters,...) ○ We have variables set to best values for all projects. Can be manually adjusted by user ● Have a model providing unselected variables
  • 12. 3. Data preparation ● Cleaning the data - have some common structure for easy selection ● Derive attributes - have general easy aggregates (count, first, last,...) ● Dataset balancing - easy rules to determine which technique to use ● Aggregates - have a model which selects which complex aggregates to use ● Reducing dimensionality - feature selection ● Mapping data to “clean” values - most frequent items, vector indexer ● Casting data to vectors - vector assembler
  • 15. ● Customers connect from different platforms ● Customers log in after few events 3. Data preparation
  • 19. 5. Evaluation ● How good is my model? ● ROC, F1, Lift curve
  • 21. 6. Deployment How do you want to use it? Must it be fast? Online or Offline?
  • 22. 6. Deployment For us, Fast and Online!
  • 24. 6. Deployment Other deployments: ● Fast and Offline -> pre computed results for each customer ● Slow and Online -> improved results with slower algorithms ● Slow and Offline -> you messed up. Rework your solution.