Feature store: Solving anti-patterns in ML-systems

Feature store:
Solving anti-patterns
in ML-systems

About
Synerise
Synerise is a European data company that collects,
interprets and leverages online and offline data with
the use of AI to power 1:1 Customer Engagement.
Our technology helps to power brands in all major
B2C verticals including retail, consumer banking,
telecommunications, public and automotive.

AI: a powerful
engine of growth
Customer
Engagement
Empower
Employee
Innovation
Cost
Optimization
Product
Transformation

Challanges
to address
Old
Combine available datasets for each
customer
Perform regression, scoring, ranking,
segmentation, anomaly detection, …
Do all of that in real-time
Support non-stationary, evolving data
distributions
Support evolving feature spaces
1.
2.
3.
4.
5.
Support incremental improvement when new
data sources become available6.
7.
8.
9.
Achieve performance on-par with
or better than dedicated
single use-case
models
Low latency, high throughput!
Data safety - all data can be obfuscated via
hashing, quantization etc.

Reality of ML
system
Source: Hidden Technical Debt in Machine Learning Systems, D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd
Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison

"…a mature system might end up being
(at most) 5% machine learning code
and (at least) 95% glue code”
Source: Hidden Technical Debt in Machine Learning Systems, D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips,
Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison

ML Systems
Anti-Patterns
Old
Glue Code1.

ML Systems
Anti-Patterns
Glue Code
Pipeline jungles
Dead experimental code paths
1.
2.
3.

ML Systems
Anti-Patterns
Glue Code
Pipeline jungles
Reproducibility debt & inconsistency
between training and serving
Multi-model systems
1.
2.
3.
4.
5.

ML Systems
Anti-Patterns
Old
Glue Code
Pipeline jungles
Multi-model systems
1.
2.
3.
4.
5.
Data-processing doesn’t scale6.
7. Real-time Feature requires engineers

ML Systems
Anti-Patterns
Old
Glue Code
Pipeline jungles
Multi-model systems
1.
2.
3.
4.
5.
Data-processing doesn’t scale6.
7.
9.
10.
Real-time Feature requires engineers
Lack of Feature discovery
Lack of standardization
Lack of data testing8.
11. Multi-language issue

„Data is the hardest part of ML and the most important piece to get right.
Modelers spend most of their time selecting and transforming Feature at training time
and then building the pipelines to deliver those Feature to production models.”
Source: Scaling Machine Learning at Uber with Michelangelo, Jeremy Hermann and Mike Del Balso

Machine Learning & Data science are in the same place
where software engineering was 20 years ago...

First-class
entity
Machine learning and data science is about data, but often data is not a first-class entity
in such systems.
So:
1. Let's make the data a first-class entity as code is for software engineering
2. Let's make Feature a first-class entity as functions/modules are for software engineering
3. Let's think about models as compiled software libraries

First-class
entity
Let people be creative, do the awesome job, free them from the usual and boring,
but necessary:
o data access & ingestion
o data processing & cleaning
o feature engineering & management
o data modeling & building processing pipelines

First-class
entity
Source: Hidden Technical Debt in Machine Learning Systems, D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd
Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison

Feature
store
Feature store is:
o a place to store unified, versioned, tested and documented Feature
o an interface between data engineering and model development
o an interface for feature discovery and analysis
Raw/Structered
Data
Feature store Models
Future
Engineering
Training & Serving

Feature
store
Model
1
Model
2
Model
3
Data
set 1
Data
set 2
Data
set 3
Feature
engineering 2
Feature
engineering 3
Feature
engineering 1

Feature
store
Model
1
Model
2
Model
3
Data
set 1
Data
set 2
Data
set 3
Feature Store

Feature
store gives:
Old
Feature versioning
Feature trust – can be tested
Feature consistency
Feature discovery and reuse
Feature documentation and analytics
1.
2.
3.
4.
5.
Standardized access to Feature between
training and serving
– also reproducibility of results
6.
8.
9.
Feature can be access controlled
Production model results can be Feature for
other models
Automatic backfilling of Feature –
avoid expensive re computations7.

Feature
store
Avg.CostofaNewML
Project
Num. Curated Feature
in Feature Store
Source: The Feature Store in Hopsworks, Jim Dowling

Feature
store architecture
Source Create Ingest Store Access
Event
Stream
Batch
Data
Stream
Transform
Batch
Transform
Ingest
Feature
Storage
ModelAPI
Discovery
API
Model
Serving
Model
Training
Feature
Metadata

Feature
store - storage:
Old
Clickhouse:
o Scalable big data column-oriented
database
o Easy to use
o Handle large and sparse feature
spaces
o ASOF join - joining sequences with
a non-exact match
1. SSDB2.
o Persistent high performace key-
value database
o Implements Redis protocol
o Designed to store collection data
o Replication(master-slave), load
balance

Feature
store architecture
Source Create Ingest Store Access
Event
Stream
Batch
Data
Stream
Transform
Batch
Transform
Ingest
Feature
Storage
ModelAPI
Discovery
API
Model
Serving
Model
Training
Feature
Metadata
SSDB

Feature
store
Thanks to the Feature store, we are able to:
o cut down new model development time
o cut down model training time
o easily test new ideas
In one word:
focus on interesting and creative parts of machine learning based systems.

Next steps
and future work
o unify streaming part
o implement feature analytics and monitoring
o improve feature documentation

Andrzej Michałowski
Head of AI Research and Development
andrzej.michalowski@synerise.com
Thank you
Questions?

Feature store: Solving anti-patterns in ML-systems

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Feature store: Solving anti-patterns in ML-systems

Semelhante a Feature store: Solving anti-patterns in ML-systems (20)

Último

Último (20)

Feature store: Solving anti-patterns in ML-systems