SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
Building a Feature Store around
Dataframes and Apache Spark
Jim Dowling, CEO @ Logical Clocks AB
Fabio Buso, Head of Engineering @ Logical Clocks AB
When Data Engineers are asked to re-use other teams’ features*
*Hide-the-pain-Harold smiles and says ‘yes’, but inside he’s in a world of pain
Known Feature Stores in Production
▪ Logical Clocks – Hopsworks (world’s first/only fully open source)
▪ Uber Michelangelo
▪ Airbnb – Bighead/Zipline
▪ Comcast
▪ Twitter
▪ GO-JEK Feast
▪ Conde Nast
▪ Facebook FB Learner
▪ Netflix
▪ Reference: www.featurestore.org
Feature Store in Banking
▪ Problem: Manage TBs of Transactions as ML Features. Develop
models to reduce costs of Fraud.
▪ Solution:
Hopsworks provides the platform to train machine learning models to
classify transactions as suspected for Fraud or not. The Fraud dataset
contains billions of records (40 TB) and the solution involves using
Deep Learning (GPUs) to detect structural patterns in bank
transactions and temporal patterns based on the frequency of bank
transactions executed.
▪ Reference: Swedbank Talk at Spark/AI EU Summit 2019
Data Teams are moving from Analytics to ML
Event DataRaw Data
Data LakeDATA PIPELINES BI Platforms
SQL Data
Data Teams are moving from Analytics to ML
Event DataRaw Data
Data LakeDATA PIPELINES BI Platforms
SQL Data
FEATURE PIPELINES
Feature Store
Hopsworks
MODEL TRAINING
ONLINE MODEL SERVING
ANALYTICAL MODEL SCORING (BATCH)
Features are created/updated at different cadences
Click features every 10 secs
CDC data every 30 secs
User profile updates every hour
Featurized weblogs data every day
Online
Feature
Store
Offline
Feature
Store
SQL DW
S3, HDFS
SQL
Event Data
Real-Time Data
User-Entered Features (<2 secs) Online
App
Low
Latency
Features
High
Latency
Features
Train,
Batch App
Feature Store
No existing database is both scalable (PBs) and low latency (<10ms). Hence, online + offline Feature Stores.
<10ms
TBs/PBs
FeatureGroup Ingestion in Hopsworks
Feature Store
ClickFeatureGroup
TableFeatureGroup
UserFeatureGroup
LogsFeatureGroup
Event Data
SQL DW
S3, HDFS
SQL
DataFrameAPI
Kafka Input
Flink
RTFeatureGroup
Online
App
Train,
Batch App
User Clicks
DB Updates
User Profile Updates
Weblogs
Real-time features
Kafka Output
No More End-to-End ML Pipelines!
Event DataRaw Data
Feature Pipeline FEATURE STORE TRAIN/VALIDATE MODEL SERVING
MONITOR
Data Lake
ML Pipelines start and stop at the Feature Store
Feature Store Concepts
Features name Pclass Sex Survive Name Balance
Train / Test
Datasets
Survivename PClass Sex Balance
Join key
Feature
Groups
Titanic
Passenger List
Passenger
Bank Account
File format
.tfrecord
.npy
.csv
.hdf5,
.petastorm,
etc
Storage
GCS
Amazon S3
HopsFS
Features, FeatureGroups, and Train/Test Datasets are all versioned
Register a FeatureGroup with the Feature Store
from hops import featurestore as fs
df = # Spark or Pandas Dataframe
# Do feature engineering on ‘df’
# Register Dataframe as FeatureGroup
fs.create_featuregroup(df, ”titanic_df“, online=True)
Hopsworks Feature Store
Raw Data
Structured
Data
Events
Data Lake
Online
Feature Store
Offline
Feature Store
Ingest
Data
From
Used
By
Online Apps
Batch Apps
Create Train/Test Data
Create Train/Test Datasets using the Feature Store
from hops import featurestore as fs
sample_data = fs.get_features([“name”, “Pclass”, “Sex”,
“Balance”, “Survived”])
fs.create_training_dataset(sample_data,
“titanic_training_dataset",
data_format="tfrecords“,
training_dataset_version=1)
Online Feature Store
US-West-la
MySQL
NDB1
Model
Online Application
1.JDBC 2.Predict
1. Build a Feature Vector using the Online Feature Store US-West-1c
MySQL
NDB3
Model
~5-50ms
US-West-1b
MySQL
NDB2
Model
2-20ms
2. Send the Feature Vector to a Model for Prediction
Good Decisions we took in Version 1
▪ General Purpose Data Frame API (DSL could be added later)
▪ Feature Store is a cache for materialized features, not a library.
▪ Online and Offline Feature Stores to support low latency and scale,
respectively
▪ Reuse of Features means JOINS – Spark as a join engine
Feature Store API v2
▪ Enforce feature-group scope and versioning (as best practice)
▪ Better support for multiple feature stores - join features from
development and production feature stores
▪ Better support for complex joins of features
▪ First class API support for time-travel
▪ More consistent developer experience
Connect and Support for Multiple Feature Stores
import hsfs
# Connect to the production feature store
conn = hsfs.connection(host="ea2.aws.hopsworks.ai",
project="prod")
prod_fs = conn.get_feature_store()
dev_fs = conn.get_feature_store(“dev”)
Feature Group Operations
# Create Feature group metadata
fg = dev_fs.create_feature_group(“temperature”,
description=”Temperature Features”,
version = 1,
online_enabled=True)
# Schema is inferred from the dataframe
fg.save(dataframe)
# Read the feature group as dataframe
df = fg.read()
# Append more data to the feature group
fg.insert(dataframe, overwrite=False)
fg = dev_fs.get_feature_group(“temperature”, version = 1)
fg.add_tag(“country”, “SE”)
fg.add_tags({“country”: “SE”, “year”: 2020})
Tags
▪ Allow feature groups, features and training datasets to be discoverable
▪ Tags are searchable from the Hopsworks UI
fg.add_feature("new_feature", type=”int”, default_value)
Schema Version Management
Non breaking schema changes (e.g. add a feature) can be applied without
bumping the version.
fg = dev_fs.get_feature_group(“temperature”, version=1)
# Returns a dataframe object
fg.read()
# Show a sample of 10 rows in the feature group
fg.show(10)
fg.select(["date", "location", "avg"]).show(10)
fg.select(["date”, “location”,”avg”]).read()
.filter(col(“location”) == “Stockholm”).show(10)
Exploratory Data Analysis
Joins - Pandas Style API
crop_fg = prod_fs.get_feature_group(“crop”, version = 1)
temperature = dev_fs.get_feature_group(“temperature”, version = 1)
rain = dev_fs.get_feature_group(“rain”, version = 1)
joined_features = crop_fg.select(["location", "yield"])
.join(temperature.select(["location", “season_avg"]))
.join(rain.select(["location", "avg_mm"]),
on=["location"],
join_type="left")
dataframe = joined_features.read()
Time-Travel
fs.get_feature_group("temperature", version = 1,
wallclock_time=None,
wallclock_time_start=None,
wallclock_time_end=None)
▪ Explore how the feature group looked like at X point in time in the past
▪ List value changes between timestamps
Create Train/Test Data from Joined Features
connector = fs.get_storage_connector("s3connector", "S3")
td = fs.create_training_dataset(name='crop_model',
description='Dataset to train the crop model',
version=1,
data_format='tfrecords',
connector = connector,
splits={'train': 0.7,'test': 0.2,'validate': 0.1})
td.save(joined_features)
fs.get_feature_vector(training_dataset=”crop”,
id = [{
‘location’: ‘Stockholm’,
‘crop’: ‘wheat’
}])
Get feature vector for online serving
▪ Return feature vector from the online feature store
▪ Feature order is maintained
Demo
Using Hopsworks Feature Store
in Databricks
Thank You!
Get Started
hopsworks.ai
github.com/logicalclocks/hopsworks
Twitter
@logicalclocks
Web
www.logicalclocks.com
Feature Store contributions from colleagues
▪ Moritz Meister
▪ Kim Hammar
▪ Alex Ormenisan
▪ Robin Andersson
▪ Ermias Gebremeskel
▪ Theofilos Kakantousis
Thanks to the Logical Clocks Team!
Feedback
Your feedback is important to us.
Don’t forget to rate and
review the sessions.

Mais conteúdo relacionado

Mais procurados

Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyJim Dowling
 
Treasure Data and OSS
Treasure Data and OSSTreasure Data and OSS
Treasure Data and OSSN Masahiro
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleJim Dowling
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21Hadoop User Group
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 
GPPB2020 - Milan - Power BI dataflows deep dive
GPPB2020 - Milan - Power BI dataflows deep diveGPPB2020 - Milan - Power BI dataflows deep dive
GPPB2020 - Milan - Power BI dataflows deep diveRiccardo Perico
 
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019Kim Hammar
 
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Lucidworks
 
Apache Hadoop India Summit 2011 talk "Feeds Processing at Yahoo!" by Jean-Chr...
Apache Hadoop India Summit 2011 talk "Feeds Processing at Yahoo!" by Jean-Chr...Apache Hadoop India Summit 2011 talk "Feeds Processing at Yahoo!" by Jean-Chr...
Apache Hadoop India Summit 2011 talk "Feeds Processing at Yahoo!" by Jean-Chr...Yahoo Developer Network
 
Functional programming
 for optimization problems 
in Big Data
Functional programming
  for optimization problems 
in Big DataFunctional programming
  for optimization problems 
in Big Data
Functional programming
 for optimization problems 
in Big DataPaco Nathan
 
Kim Hammar - Spotify ML Guild Meetup - Feature Stores
Kim Hammar - Spotify ML Guild Meetup - Feature StoresKim Hammar - Spotify ML Guild Meetup - Feature Stores
Kim Hammar - Spotify ML Guild Meetup - Feature StoresKim Hammar
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopDataWorks Summit
 
Sparkling Water 2.0 - Michal Malohlava
Sparkling Water 2.0 - Michal MalohlavaSparkling Water 2.0 - Michal Malohlava
Sparkling Water 2.0 - Michal MalohlavaSri Ambati
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009yhadoop
 
PyconJP: Building a data preparation pipeline with Pandas and AWS Lambda
PyconJP: Building a data preparation pipeline with Pandas and AWS LambdaPyconJP: Building a data preparation pipeline with Pandas and AWS Lambda
PyconJP: Building a data preparation pipeline with Pandas and AWS LambdaFabian Dubois
 
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastScaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastDatabricks
 
Easing offline web application development with GWT
Easing offline web application development with GWTEasing offline web application development with GWT
Easing offline web application development with GWTArnaud Tournier
 
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...Amazon Web Services
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Josh Elser
 
Streaming Inference with Apache Beam and TFX
Streaming Inference with Apache Beam and TFXStreaming Inference with Apache Beam and TFX
Streaming Inference with Apache Beam and TFXDatabricks
 

Mais procurados (20)

Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
 
Treasure Data and OSS
Treasure Data and OSSTreasure Data and OSS
Treasure Data and OSS
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
GPPB2020 - Milan - Power BI dataflows deep dive
GPPB2020 - Milan - Power BI dataflows deep diveGPPB2020 - Milan - Power BI dataflows deep dive
GPPB2020 - Milan - Power BI dataflows deep dive
 
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
 
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
 
Apache Hadoop India Summit 2011 talk "Feeds Processing at Yahoo!" by Jean-Chr...
Apache Hadoop India Summit 2011 talk "Feeds Processing at Yahoo!" by Jean-Chr...Apache Hadoop India Summit 2011 talk "Feeds Processing at Yahoo!" by Jean-Chr...
Apache Hadoop India Summit 2011 talk "Feeds Processing at Yahoo!" by Jean-Chr...
 
Functional programming
 for optimization problems 
in Big Data
Functional programming
  for optimization problems 
in Big DataFunctional programming
  for optimization problems 
in Big Data
Functional programming
 for optimization problems 
in Big Data
 
Kim Hammar - Spotify ML Guild Meetup - Feature Stores
Kim Hammar - Spotify ML Guild Meetup - Feature StoresKim Hammar - Spotify ML Guild Meetup - Feature Stores
Kim Hammar - Spotify ML Guild Meetup - Feature Stores
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
 
Sparkling Water 2.0 - Michal Malohlava
Sparkling Water 2.0 - Michal MalohlavaSparkling Water 2.0 - Michal Malohlava
Sparkling Water 2.0 - Michal Malohlava
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009
 
PyconJP: Building a data preparation pipeline with Pandas and AWS Lambda
PyconJP: Building a data preparation pipeline with Pandas and AWS LambdaPyconJP: Building a data preparation pipeline with Pandas and AWS Lambda
PyconJP: Building a data preparation pipeline with Pandas and AWS Lambda
 
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastScaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and Feast
 
Easing offline web application development with GWT
Easing offline web application development with GWTEasing offline web application development with GWT
Easing offline web application development with GWT
 
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
 
Streaming Inference with Apache Beam and TFX
Streaming Inference with Apache Beam and TFXStreaming Inference with Apache Beam and TFX
Streaming Inference with Apache Beam and TFX
 

Semelhante a Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx

Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreMoritz Meister
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleJim Dowling
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine LearningLogical Clocks
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Chester Chen
 
Productionalizing ML : Real Experience
Productionalizing ML : Real ExperienceProductionalizing ML : Real Experience
Productionalizing ML : Real ExperienceIhor Bobak
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureData Science Milan
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...Fabio Franzini
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupJim Dowling
 
thinking in key value stores
thinking in key value storesthinking in key value stores
thinking in key value storesBhasker Kode
 
Eat whatever you can with PyBabe
Eat whatever you can with PyBabeEat whatever you can with PyBabe
Eat whatever you can with PyBabeDataiku
 
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML PipelinesPyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML PipelinesJim Dowling
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformSf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformChester Chen
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling WaterSri Ambati
 
Intro to-html-backbone
Intro to-html-backboneIntro to-html-backbone
Intro to-html-backbonezonathen
 
HTML5 for Rich User Experience
HTML5 for Rich User ExperienceHTML5 for Rich User Experience
HTML5 for Rich User ExperienceMahbubur Rahman
 

Semelhante a Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx (20)

Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature Store
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine Learning
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
Productionalizing ML : Real Experience
Productionalizing ML : Real ExperienceProductionalizing ML : Real Experience
Productionalizing ML : Real Experience
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
thinking in key value stores
thinking in key value storesthinking in key value stores
thinking in key value stores
 
Scaling 101 test
Scaling 101 testScaling 101 test
Scaling 101 test
 
Scaling 101
Scaling 101Scaling 101
Scaling 101
 
Eat whatever you can with PyBabe
Eat whatever you can with PyBabeEat whatever you can with PyBabe
Eat whatever you can with PyBabe
 
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML PipelinesPyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformSf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling Water
 
Intro to-html-backbone
Intro to-html-backboneIntro to-html-backbone
Intro to-html-backbone
 
HTML5 for Rich User Experience
HTML5 for Rich User ExperienceHTML5 for Rich User Experience
HTML5 for Rich User Experience
 

Último

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 

Último (20)

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx

  • 1.
  • 2. Building a Feature Store around Dataframes and Apache Spark Jim Dowling, CEO @ Logical Clocks AB Fabio Buso, Head of Engineering @ Logical Clocks AB
  • 3.
  • 4. When Data Engineers are asked to re-use other teams’ features* *Hide-the-pain-Harold smiles and says ‘yes’, but inside he’s in a world of pain
  • 5. Known Feature Stores in Production ▪ Logical Clocks – Hopsworks (world’s first/only fully open source) ▪ Uber Michelangelo ▪ Airbnb – Bighead/Zipline ▪ Comcast ▪ Twitter ▪ GO-JEK Feast ▪ Conde Nast ▪ Facebook FB Learner ▪ Netflix ▪ Reference: www.featurestore.org
  • 6. Feature Store in Banking ▪ Problem: Manage TBs of Transactions as ML Features. Develop models to reduce costs of Fraud. ▪ Solution: Hopsworks provides the platform to train machine learning models to classify transactions as suspected for Fraud or not. The Fraud dataset contains billions of records (40 TB) and the solution involves using Deep Learning (GPUs) to detect structural patterns in bank transactions and temporal patterns based on the frequency of bank transactions executed. ▪ Reference: Swedbank Talk at Spark/AI EU Summit 2019
  • 7. Data Teams are moving from Analytics to ML Event DataRaw Data Data LakeDATA PIPELINES BI Platforms SQL Data
  • 8. Data Teams are moving from Analytics to ML Event DataRaw Data Data LakeDATA PIPELINES BI Platforms SQL Data FEATURE PIPELINES Feature Store Hopsworks MODEL TRAINING ONLINE MODEL SERVING ANALYTICAL MODEL SCORING (BATCH)
  • 9. Features are created/updated at different cadences Click features every 10 secs CDC data every 30 secs User profile updates every hour Featurized weblogs data every day Online Feature Store Offline Feature Store SQL DW S3, HDFS SQL Event Data Real-Time Data User-Entered Features (<2 secs) Online App Low Latency Features High Latency Features Train, Batch App Feature Store No existing database is both scalable (PBs) and low latency (<10ms). Hence, online + offline Feature Stores. <10ms TBs/PBs
  • 10. FeatureGroup Ingestion in Hopsworks Feature Store ClickFeatureGroup TableFeatureGroup UserFeatureGroup LogsFeatureGroup Event Data SQL DW S3, HDFS SQL DataFrameAPI Kafka Input Flink RTFeatureGroup Online App Train, Batch App User Clicks DB Updates User Profile Updates Weblogs Real-time features Kafka Output
  • 11. No More End-to-End ML Pipelines!
  • 12. Event DataRaw Data Feature Pipeline FEATURE STORE TRAIN/VALIDATE MODEL SERVING MONITOR Data Lake ML Pipelines start and stop at the Feature Store
  • 13. Feature Store Concepts Features name Pclass Sex Survive Name Balance Train / Test Datasets Survivename PClass Sex Balance Join key Feature Groups Titanic Passenger List Passenger Bank Account File format .tfrecord .npy .csv .hdf5, .petastorm, etc Storage GCS Amazon S3 HopsFS Features, FeatureGroups, and Train/Test Datasets are all versioned
  • 14. Register a FeatureGroup with the Feature Store from hops import featurestore as fs df = # Spark or Pandas Dataframe # Do feature engineering on ‘df’ # Register Dataframe as FeatureGroup fs.create_featuregroup(df, ”titanic_df“, online=True)
  • 15. Hopsworks Feature Store Raw Data Structured Data Events Data Lake Online Feature Store Offline Feature Store Ingest Data From Used By Online Apps Batch Apps Create Train/Test Data
  • 16. Create Train/Test Datasets using the Feature Store from hops import featurestore as fs sample_data = fs.get_features([“name”, “Pclass”, “Sex”, “Balance”, “Survived”]) fs.create_training_dataset(sample_data, “titanic_training_dataset", data_format="tfrecords“, training_dataset_version=1)
  • 17. Online Feature Store US-West-la MySQL NDB1 Model Online Application 1.JDBC 2.Predict 1. Build a Feature Vector using the Online Feature Store US-West-1c MySQL NDB3 Model ~5-50ms US-West-1b MySQL NDB2 Model 2-20ms 2. Send the Feature Vector to a Model for Prediction
  • 18. Good Decisions we took in Version 1 ▪ General Purpose Data Frame API (DSL could be added later) ▪ Feature Store is a cache for materialized features, not a library. ▪ Online and Offline Feature Stores to support low latency and scale, respectively ▪ Reuse of Features means JOINS – Spark as a join engine
  • 19. Feature Store API v2 ▪ Enforce feature-group scope and versioning (as best practice) ▪ Better support for multiple feature stores - join features from development and production feature stores ▪ Better support for complex joins of features ▪ First class API support for time-travel ▪ More consistent developer experience
  • 20. Connect and Support for Multiple Feature Stores import hsfs # Connect to the production feature store conn = hsfs.connection(host="ea2.aws.hopsworks.ai", project="prod") prod_fs = conn.get_feature_store() dev_fs = conn.get_feature_store(“dev”)
  • 21. Feature Group Operations # Create Feature group metadata fg = dev_fs.create_feature_group(“temperature”, description=”Temperature Features”, version = 1, online_enabled=True) # Schema is inferred from the dataframe fg.save(dataframe) # Read the feature group as dataframe df = fg.read() # Append more data to the feature group fg.insert(dataframe, overwrite=False)
  • 22. fg = dev_fs.get_feature_group(“temperature”, version = 1) fg.add_tag(“country”, “SE”) fg.add_tags({“country”: “SE”, “year”: 2020}) Tags ▪ Allow feature groups, features and training datasets to be discoverable ▪ Tags are searchable from the Hopsworks UI
  • 23. fg.add_feature("new_feature", type=”int”, default_value) Schema Version Management Non breaking schema changes (e.g. add a feature) can be applied without bumping the version.
  • 24. fg = dev_fs.get_feature_group(“temperature”, version=1) # Returns a dataframe object fg.read() # Show a sample of 10 rows in the feature group fg.show(10) fg.select(["date", "location", "avg"]).show(10) fg.select(["date”, “location”,”avg”]).read() .filter(col(“location”) == “Stockholm”).show(10) Exploratory Data Analysis
  • 25. Joins - Pandas Style API crop_fg = prod_fs.get_feature_group(“crop”, version = 1) temperature = dev_fs.get_feature_group(“temperature”, version = 1) rain = dev_fs.get_feature_group(“rain”, version = 1) joined_features = crop_fg.select(["location", "yield"]) .join(temperature.select(["location", “season_avg"])) .join(rain.select(["location", "avg_mm"]), on=["location"], join_type="left") dataframe = joined_features.read()
  • 26. Time-Travel fs.get_feature_group("temperature", version = 1, wallclock_time=None, wallclock_time_start=None, wallclock_time_end=None) ▪ Explore how the feature group looked like at X point in time in the past ▪ List value changes between timestamps
  • 27. Create Train/Test Data from Joined Features connector = fs.get_storage_connector("s3connector", "S3") td = fs.create_training_dataset(name='crop_model', description='Dataset to train the crop model', version=1, data_format='tfrecords', connector = connector, splits={'train': 0.7,'test': 0.2,'validate': 0.1}) td.save(joined_features)
  • 28. fs.get_feature_vector(training_dataset=”crop”, id = [{ ‘location’: ‘Stockholm’, ‘crop’: ‘wheat’ }]) Get feature vector for online serving ▪ Return feature vector from the online feature store ▪ Feature order is maintained
  • 29. Demo Using Hopsworks Feature Store in Databricks
  • 30. Thank You! Get Started hopsworks.ai github.com/logicalclocks/hopsworks Twitter @logicalclocks Web www.logicalclocks.com Feature Store contributions from colleagues ▪ Moritz Meister ▪ Kim Hammar ▪ Alex Ormenisan ▪ Robin Andersson ▪ Ermias Gebremeskel ▪ Theofilos Kakantousis Thanks to the Logical Clocks Team!
  • 31. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.