How to find best predictor of good LTV in a game (feature/event in a game) ? What is needed for good feature selection? How to construct and aggregate data well? How it can be used in practical way? Strategic and operational approach.
Big Problem, BigQuery: User Feature Engineering in Event-driven Analytics
1. Big Problem, BigQuery:
User Feature Engineering
in Event-driven Analytics
GameCamp - 15/11/19
Mikalai Tsytsarau
GCP Professional Data Engineer, DELVE
2. User Feature Engineering in Event-driven Analytics
Introduction
All user and app’s actions generate a stream of events which can be stored and analysed
GameCamp - 15/11/19
3. User Feature Engineering in Event-driven Analytics
Introduction
In-app events arrive in an ordered
sequence and can be analyzed for
causality patterns, i.e. using
funnels for Event Analytics
Events of the same kind can
also be analyzed as a collection
with statistical methods, i.e. for
Feature Analytics
GameCamp - 15/11/19
4. User Feature Engineering in Event-driven Analytics
Introduction
Funnels
Event Analytics
● Usually analyses events by
funnels: which fraction of
users completing Event 1 have
also completed Event 2, etc.
● Funnel percentage at each
step can be actually seen as
event probability.
● The problem here is: which of
the preceding events actually
drive this probability?
GameCamp - 15/11/19
5. User Feature Engineering in Event-driven Analytics
Introduction
Feature Analytics
● Usually analyses events by
aggregations:
○ What is the distribution of
users completing Event?
○ Which are the average
event parameters?
● Feature distribution at each
event can be seen as
Bayesian probability
● But which features are
good?
● Yet another problem here is:
massive retrieval, aggregation
and analysis of event data
Occurrences by Event Name split by LTV
GameCamp - 15/11/19
6. User Feature Engineering in Event-driven Analytics
Introduction
Feature Engineering benefits
● Analysts are in control of features
● Domain knowledge is used to
engineer meaningful features
● Facilitates understanding of users
● Features can be used for regular
app analytics, like segmentation
● Simpler queries vs. events
● Considerably smaller size of data
Feature Engineering challenges
● Designing good features
● Massive retrieval, aggregation
and analysis of event data
● Events params and data are
different for various events
● Events params and features are
often evolving with time
GameCamp - 15/11/19
7. User Feature Engineering in Event-driven Analytics
Introduction
A typical use-case of event-driven
analytics is featured in Firebase
Firebase is a platform for app
development backed by Google
that provides database, analytics,
messaging and everything else
needed in one seamless package
AutoML
AutoML Tables enables to
automatically build and deploy
powerful machine learning
models based on feature vectors.
Firebase can also generate predictions
and make user segmentation based on
events stream (event occurrences)
GameCamp - 15/11/19
8. User Feature Engineering in Event-driven Analytics
Introduction
Firebase can export complete
event data in its original format
to BigQuery daily, which can be
processed and analysed on a
massive scale
BigQuery
BigQuery is Google’s enterprise
analytical data warehouse which
can run blazing-fast SQL queries
on gigabytes to petabytes of data
AutoML
AutoML Tables enables to
automatically build and deploy
powerful machine learning
models using BigQuery data with
the convenience of SQL query.
Trains on flat table data
+ +
Sounds like a plan? )
GameCamp - 15/11/19
9. User Feature Engineering in Event-driven Analytics
Introduction
Bingo Blast
Case Study
GameCamp - 15/11/19
10. User Feature Engineering in Event-driven Analytics
Introduction
LTV Sample Pipeline
Platform to integrate
external data sources,
orchestrate pipelines and
activate various GCP
services with easy to use
interface.
GameCamp - 15/11/19
11. User Feature Engineering in Event-driven Analytics
Feature Engineering
Bingo Blast Firebase dataset
Query which extracts event count
GameCamp - 15/11/19
12. User Feature Engineering in Event-driven Analytics
Feature Engineering
Single UNNEST( ) statement Multiple UNNEST( ) statements
Source: Todd Kerpelman
GameCamp - 15/11/19
13. User Feature Engineering in Event-driven Analytics
Feature Engineering
Solution scenario:
● Unpack all user properties and
event properties from repeated
rows to serialized JSON
● Collect and store all events and
associated profiles in the same
denormalized row structure
Row structure allows:
● Query event data for user and
analyze features on-demand
● Stream user events and construct
features on continuous basis
GameCamp - 15/11/19
14. User Feature Engineering in Event-driven Analytics
Feature Engineering
BigQuery can handle huge feature aggregation queries, as long as they have efficient joins
15. User Feature Engineering in Event-driven Analytics
Model Training Tips & Bits
AutoML raining data must meet the following
requirements:
● Has 1000 to 100,000,000 rows
● Has between 1 and 1000 features
● At least 50 rows for each class
● Usually, 10-100k of data is enough
Tips for improving prediction:
● Use as many features as you have
● Gradually remove unused features
● Avoid features dependent on target
● Use feature-specific data types
● Include aggregated “context” data
● Avoid missing values if possible
● Use null values for empty data
● Curate categorical feature values
● Try including timestamp & weight columns
GameCamp - 15/11/19
16. User Feature Engineering in Event-driven Analytics
Model Evaluation Tips & Bits
● AutoML provides an extensive
feedback for model evaluation
● Trained model is ready to be
immediately deployed for batch and
online prediction using SQL and API
Important tips:
Use appropriate quality metrics:
● AUC or F1 for classification problems
● RMSE for regression problems
Take a note of decision boundary
● If you want to include more
potential buyers for in-app offers
● If you need more precision for UA
GameCamp - 15/11/19
17. User Feature Engineering in Event-driven Analytics
Feature Selection
Feature importance gives feedback on
their impact on resulting prediction
● Facilitates better user
understanding
● Can be used to construct more
effective audience segmentation
It’s important to test various packs
of features:
● User profile & Geo
● Event data
● Game status features
GameCamp - 15/11/19
18. User Feature Engineering in Event-driven Analytics
Some Observations and Open Questions
Observations
● Training dataset should include
samples from similar user traffic
● Model must be updated in sync
with game mechanics
● In-app offers can interfere with
LTV prediction!
● Training / prediction features
should be uniform w/regards to
prediction target
● It’s beneficial to include “meta-
event” features, like event
frequencies, delays
Questions
● User profile and demographics are
usually addressed by ad campaigns:
should we leave it for manual
optimisation?
● Predict LTV or Payer / Non-payer?
● Data completeness / prediction
delay tradeoff
● Individual or cohort prediction?
● Predict LTV or offers for a user?
GameCamp - 15/11/19