Big Problem, BigQuery: User Feature Engineering in Event-driven Analytics

Big Problem, BigQuery:
User Feature Engineering
in Event-driven Analytics
GameCamp - 15/11/19
Mikalai Tsytsarau
GCP Professional Data Engineer, DELVE

User Feature Engineering in Event-driven Analytics
Introduction
All user and app’s actions generate a stream of events which can be stored and analysed
GameCamp - 15/11/19

Introduction
In-app events arrive in an ordered
sequence and can be analyzed for
causality patterns, i.e. using
funnels for Event Analytics
Events of the same kind can
also be analyzed as a collection
with statistical methods, i.e. for
Feature Analytics
GameCamp - 15/11/19

Introduction
Funnels
Event Analytics
● Usually analyses events by
funnels: which fraction of
users completing Event 1 have
also completed Event 2, etc.
● Funnel percentage at each
step can be actually seen as
event probability.
● The problem here is: which of
the preceding events actually
drive this probability?
GameCamp - 15/11/19

Introduction
Feature Analytics
● Usually analyses events by
aggregations:
○ What is the distribution of
users completing Event?
○ Which are the average
event parameters?
● Feature distribution at each
event can be seen as
Bayesian probability
● But which features are
good?
● Yet another problem here is:
massive retrieval, aggregation
and analysis of event data
Occurrences by Event Name split by LTV
GameCamp - 15/11/19

Introduction
Feature Engineering benefits
● Analysts are in control of features
● Domain knowledge is used to
engineer meaningful features
● Facilitates understanding of users
● Features can be used for regular
app analytics, like segmentation
● Simpler queries vs. events
● Considerably smaller size of data
Feature Engineering challenges
● Designing good features
● Massive retrieval, aggregation
and analysis of event data
● Events params and data are
different for various events
● Events params and features are
often evolving with time
GameCamp - 15/11/19

Introduction
A typical use-case of event-driven
analytics is featured in Firebase
Firebase is a platform for app
development backed by Google
that provides database, analytics,
messaging and everything else
needed in one seamless package
AutoML
AutoML Tables enables to
automatically build and deploy
powerful machine learning
models based on feature vectors.
Firebase can also generate predictions
and make user segmentation based on
events stream (event occurrences)
GameCamp - 15/11/19

Introduction
Firebase can export complete
event data in its original format
to BigQuery daily, which can be
processed and analysed on a
massive scale
BigQuery
BigQuery is Google’s enterprise
analytical data warehouse which
can run blazing-fast SQL queries
on gigabytes to petabytes of data
AutoML
AutoML Tables enables to
automatically build and deploy
powerful machine learning
models using BigQuery data with
the convenience of SQL query.
Trains on flat table data
+ +
Sounds like a plan? )
GameCamp - 15/11/19

Introduction
Bingo Blast
Case Study
GameCamp - 15/11/19

Introduction
LTV Sample Pipeline
Platform to integrate
external data sources,
orchestrate pipelines and
activate various GCP
services with easy to use
interface.
GameCamp - 15/11/19

Feature Engineering
Bingo Blast Firebase dataset
Query which extracts event count
GameCamp - 15/11/19

Feature Engineering
Single UNNEST( ) statement Multiple UNNEST( ) statements
Source: Todd Kerpelman
GameCamp - 15/11/19

Feature Engineering
Solution scenario:
● Unpack all user properties and
event properties from repeated
rows to serialized JSON
● Collect and store all events and
associated profiles in the same
denormalized row structure
Row structure allows:
● Query event data for user and
analyze features on-demand
● Stream user events and construct
features on continuous basis
GameCamp - 15/11/19

Feature Engineering
BigQuery can handle huge feature aggregation queries, as long as they have efficient joins

Model Training Tips & Bits
AutoML raining data must meet the following
requirements:
● Has 1000 to 100,000,000 rows
● Has between 1 and 1000 features
● At least 50 rows for each class
● Usually, 10-100k of data is enough
Tips for improving prediction:
● Use as many features as you have
● Gradually remove unused features
● Avoid features dependent on target
● Use feature-specific data types
● Include aggregated “context” data
● Avoid missing values if possible
● Use null values for empty data
● Curate categorical feature values
● Try including timestamp & weight columns
GameCamp - 15/11/19

Model Evaluation Tips & Bits
● AutoML provides an extensive
feedback for model evaluation
● Trained model is ready to be
immediately deployed for batch and
online prediction using SQL and API
Important tips:
Use appropriate quality metrics:
● AUC or F1 for classification problems
● RMSE for regression problems
Take a note of decision boundary
● If you want to include more
potential buyers for in-app offers
● If you need more precision for UA
GameCamp - 15/11/19

Feature Selection
Feature importance gives feedback on
their impact on resulting prediction
● Facilitates better user
understanding
● Can be used to construct more
effective audience segmentation
It’s important to test various packs
of features:
● User profile & Geo
● Event data
● Game status features
GameCamp - 15/11/19

Some Observations and Open Questions
Observations
● Training dataset should include
samples from similar user traffic
● Model must be updated in sync
with game mechanics
● In-app offers can interfere with
LTV prediction!
● Training / prediction features
should be uniform w/regards to
prediction target
● It’s beneficial to include “meta-
event” features, like event
frequencies, delays
Questions
● User profile and demographics are
usually addressed by ad campaigns:
should we leave it for manual
optimisation?
● Predict LTV or Payer / Non-payer?
● Data completeness / prediction
delay tradeoff
● Individual or cohort prediction?
● Predict LTV or offers for a user?
GameCamp - 15/11/19

Big Problem, BigQuery: User Feature Engineering in Event-driven Analytics

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Big Problem, BigQuery: User Feature Engineering in Event-driven Analytics

Semelhante a Big Problem, BigQuery: User Feature Engineering in Event-driven Analytics (20)

Mais de GameCamp

Mais de GameCamp (20)

Último

Último (6)

Big Problem, BigQuery: User Feature Engineering in Event-driven Analytics