SlideShare uma empresa Scribd logo
1 de 33
Baixar para ler offline
Portfolio-Scale Data
Science At Zynga
1
Ben Weber
February 4, 2020
2
About Me
Distinguished Data Scientist at Zynga
@bgweber on Twitter, Medium & Github
3
Applied Science at Zynga
Goal: Build Portfolio-Scale Data Products
Example Projects
• Propensity Models
• Recommendation Systems
• Player Segmentation
• Anomaly Detection
Tech Stack: AWS, Python, Databricks, CouchBase
4
Zynga Games
4
Our Challenge
• We have tens of millions of players and dozens of
games across multiple platforms
• Our games have diverse event taxonomies
• We want to build accurate models for personalizing
our gameplay experiences
5
“One of the holy grails of machine learning is
to automate more and more of the feature
engineering process.”
Pedro Domingos
CACM 2012
6
Our Approach
• Leverage ML libraries to automate feature engineering
• Develop Portfolio-Scale data products
• Empower our game studios with ML models
7
Use Cases
8
Applications
Propensity Models: What actions are players performing?
Segmentation: Who are our players?
Anomaly Detection: which players are bad actors?
Recommendation: What actions should they take?
9
Feature Encoding
Input Dataset
• Thousands of events per player
Feature Generation
• Aggregation with FeatureTools
Output Dataset
• A single row per player
10
Raw
Event
Data
Player Summaries
Propensity Models
• We predict which users are likely to act using classification models
• Game studios use propensity scores to define experiment groups
• Feature generation reduces the need for manual feature engineering
11
Data
Extract
Feature
Engineering
Feature
Application
Model
Training
Model
Publish
Segmentation
• Generated features are used as input to k-means clustering
• Archetype labels are assigned based on qualitative analysis
12
Anomaly Detection
• Players are represented as 1D images
• We train an autoencoder to reduce dimensionality
• Players with large vector differences are flagged as suspect
13
Features
Latent
Space
InputLayer
OutputLayer
Players
Features
Players
AutoencoderInput Vectors Output Vectors
Recommendation Systems
• Feature engineering is used for item & guild recommendations
• Cosine similarity is applied to normalized generated features
Item Recommendations
sim(u, v) = u * v
|| u || * || v ||
weighti
= ∑ sim(u, w) * rating(w, i)
w = user neighborhood
14
Feature
Engineering
15
FeatureTools
• A python library for deep feature synthesis
• Represents data as entity sets
• Identifies feature descriptors for transforming your
data into a shallow and wide format
• Open-source version maintained by FeatureLabs
16
Kaggle NHL Dataset
17
18
Data Frames
game_df
plays_df
19
Entity Sets
• Define the tables and
relationships for DFS
• Operate on Pandas
data frames
20
1-Hot Encoding
21
Deep Feature Synthesis
Applying FeatureTools
• We translate our raw tracking events into player summaries
• Supports dozens of games with diverse taxonomies
• Minimizes manual steps in our data science workflows
• Scales to millions of players and billions of records
22
Deployment
23
Tech Stack
• Databricks for PySpark
• FeatureTools for generation
• Pandas UDFs for distribution
• MLlib for predictive modeling
24
• Introduced in Spark 2.3
• Provide Scalar and Grouped map operations
• Partitioned using a groupby clause
• Enable distributing code that uses Pandas
25
Pandas UDFs
26
UDF
Pandas
Output
Pandas
Input
Spark Output
Spark Input
UDF
Pandas
Output
Pandas
Input
UDF
Pandas
Output
Pandas
Input
UDF
Pandas
Output
Pandas
Input
UDF
Pandas
Output
Pandas
Input
Grouped MAP UDFs
27
Feature Generation at Scale
AutoModel System
•Generates hundreds of propensity models
•Powers features in our games & live services
28
Data
Extract
Feature
Engineering
Feature
Application
Model
Training
Model
Publish
Wrapping Up
29
Data Science at Zynga
Old Approach
• Custom data science and
engineering work per model
• Months-long development cycles
• Ad-hoc process for deploying
models to production
30
New Approach
• Minimal effort spent on the
feature engineering stage
• No custom work for new games
• Model outputs are published to
application databases
Takeaways
• Zynga is leveraging automated feature engineering to build
Portfolio-Scale data products
• We are using PySpark to scale to tens of millions of players
• Feature generation has unlocked novel data products
31
Looking Forward
• Reinforcement Learning
• Real-time personalization
• Procedural Content Generation
32
33
Portfolio-Scale Data Science At Zynga
Ben Weber
Distinguished Data Scientist
bweber@zynga.com
https://www.zynga.com/jobs/

Mais conteúdo relacionado

Mais procurados

DSAA 2016 Churn Prediction in Mobile Social Games
DSAA 2016 Churn Prediction in Mobile Social GamesDSAA 2016 Churn Prediction in Mobile Social Games
DSAA 2016 Churn Prediction in Mobile Social Games
Africa Perianez
 
Churn prediction in mobile social games towards a complete assessment using ...
Churn prediction in mobile social games  towards a complete assessment using ...Churn prediction in mobile social games  towards a complete assessment using ...
Churn prediction in mobile social games towards a complete assessment using ...
Alain Saas
 
Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using ...
Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using ...Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using ...
Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using ...
Silicon Studio Corporation
 

Mais procurados (9)

Game analytics - The challenges of mobile free-to-play games
Game analytics - The challenges of mobile free-to-play gamesGame analytics - The challenges of mobile free-to-play games
Game analytics - The challenges of mobile free-to-play games
 
Segap project(lncs)
Segap project(lncs)Segap project(lncs)
Segap project(lncs)
 
DSAA 2016 Churn Prediction in Mobile Social Games
DSAA 2016 Churn Prediction in Mobile Social GamesDSAA 2016 Churn Prediction in Mobile Social Games
DSAA 2016 Churn Prediction in Mobile Social Games
 
IEEE CIG 2017 New York, Games and Big Data: A Scalable Multi-Dimensional Chur...
IEEE CIG 2017 New York, Games and Big Data: A Scalable Multi-Dimensional Chur...IEEE CIG 2017 New York, Games and Big Data: A Scalable Multi-Dimensional Chur...
IEEE CIG 2017 New York, Games and Big Data: A Scalable Multi-Dimensional Chur...
 
Impact of big data in gaming industry
Impact of big data in gaming industryImpact of big data in gaming industry
Impact of big data in gaming industry
 
Churn prediction in mobile social games towards a complete assessment using ...
Churn prediction in mobile social games  towards a complete assessment using ...Churn prediction in mobile social games  towards a complete assessment using ...
Churn prediction in mobile social games towards a complete assessment using ...
 
4Front Game Data Science
4Front Game Data Science4Front Game Data Science
4Front Game Data Science
 
Using Data Science to grow games / Robert Magyar (SuperScale)
Using Data Science to grow games / Robert Magyar (SuperScale)Using Data Science to grow games / Robert Magyar (SuperScale)
Using Data Science to grow games / Robert Magyar (SuperScale)
 
Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using ...
Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using ...Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using ...
Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using ...
 

Semelhante a Impact AI 2020: Portfolio-Scale Data Science at Zynga

Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Databricks
 

Semelhante a Impact AI 2020: Portfolio-Scale Data Science at Zynga (20)

Ai expo 2019
Ai expo 2019Ai expo 2019
Ai expo 2019
 
Snowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWSSnowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWS
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
V3 gamingcasestudy
V3 gamingcasestudyV3 gamingcasestudy
V3 gamingcasestudy
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
De Re PlayStation Vita
De Re PlayStation VitaDe Re PlayStation Vita
De Re PlayStation Vita
 
Accelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeAccelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks Runtime
 
Thomas Blair Portfolio
Thomas Blair PortfolioThomas Blair Portfolio
Thomas Blair Portfolio
 
Deep Dive: Amazon Lumberyard & Amazon GameLift
Deep Dive: Amazon Lumberyard & Amazon GameLiftDeep Dive: Amazon Lumberyard & Amazon GameLift
Deep Dive: Amazon Lumberyard & Amazon GameLift
 
Massively Social != Massively Multiplayer
Massively Social != Massively MultiplayerMassively Social != Massively Multiplayer
Massively Social != Massively Multiplayer
 
SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?
 
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
 
Building a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache DruidBuilding a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache Druid
 
Building an Applied Science Portfolio
Building an Applied Science PortfolioBuilding an Applied Science Portfolio
Building an Applied Science Portfolio
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
 
Ansiblefest 2018 Network automation journey at roblox
Ansiblefest 2018 Network automation journey at robloxAnsiblefest 2018 Network automation journey at roblox
Ansiblefest 2018 Network automation journey at roblox
 
Reveal's Advanced Analytics: Using R & Python
Reveal's Advanced Analytics: Using R & PythonReveal's Advanced Analytics: Using R & Python
Reveal's Advanced Analytics: Using R & Python
 
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Impact AI 2020: Portfolio-Scale Data Science at Zynga