SlideShare a Scribd company logo
1 of 64
Download to read offline
OSS Meetup
11 Feb 2020
Tools for Data Scientists
6:00 pm Registration, Food, Networking
Faisal Siddiqi 7:00 (5m) Welcome
Ville Tuulos 7:05 pm (20m) Metaflow
Jeremy Smith 7:25 pm (20m) Polynote
Matthew Seal 7:45 pm (15m) Papermill
8:00 pm Demo Stations, Networking, Food
Agenda
data scientist
productivity
Basics
Workflow as a DAG
State Transfer and Checkpointing
Versioning and Experiment Tracking
Inspection and Monitoring
Vertical Scalability
Horizontal Scalability
Dependency Management
...and much more!
See metaflow.org for details
Metaflow @
Google
polynote.org
Polynote is a polyglot notebook environment,
built from scratch.
Polynote is a polyglot notebook environment,
built from scratch.
It supports mixing Scala, Python, SQL, and
Vega in a single notebook.
Polynote is a polyglot notebook environment,
built from scratch.
It supports mixing Scala, Python, SQL, and
Vega in a single notebook.
Data is shared seamlessly* between
languages.
Why did we build it?
Scientists were avoiding Scala notebooks for
experimentation.
Why did we build it?
Scientists were avoiding Scala notebooks for
experimentation.
It was just a pain to use Scala and Spark in a
notebook.
Scala + Spark pain points
● Interactive autocomplete is practically a
necessity
● Difficult to find compiler errors
● Dependencies are many and varied
● Spark clashes with dependencies –
constantly building shaded JARs
What's different about
Polynote?
Editing improvements
Quality-of-life IDE features like autocomplete and
parameter hints, error highlighting, etc.
Reproducibility
Cells see only the state derived from cells above, no
matter what order they ran in.
Visibility
See what the Kernel's up to with the symbol table, task
list and executing expression highlight.
Data Visualization
Use the built-in Data Inspector to browse tabular data
and inspect schema. Plot data with the plot editor, or use
Vega or matplotlib directly.
Polyglot
Scala cells and Python cells together in one notebook.
Variables from each language are available to the other.
Polyglot
Scala cells and Python cells together in one notebook.
Variables from each language are available to the other.
Example use case: data prep in Scala+Spark, model
training in Python with TensorFlow/PyTorch/etc
Questions?
(stop by our demo station!)
Papermill
(2.0!)
Matthew Seal
Backend Engineer on the Big Data Platform
Orchestration Team @ Netflix
@codeseal
Speaker Details
Notebook
Wins.
● Shareable
● Easy to Read
● Documentation with
Code
● Outputs as Reports
● Familiar Interface
● Multi-Language
Things to preserve:
● Results linked to code
● Good visuals
● Easy to share
Focus points to extend uses.
Things to improve:
● Not versioned
● Mutable state
● Templating
Jupyter Notebooks:
A Repl Protocol + UIs
Jupyter
UIs
Jupyter
Server
Jupyter
Kernel
execute
code
receive
outputs
forward
requests
save / load
.ipynb
It’s more complex than this in reality
develop
share
A simple library for executing
notebooks.
EFS
S3
Papermill
template.ipynb
run_1.ipynb
run_3.ipynb
output
notebooks
parameterize & run
run_2.ipynb
run_4.ipynbinput
notebook
input store
s3://output/mseal/
efs://users/mseal/notebooks
import papermill as pm
pm.execute_notebook('input_nb.ipynb', 'outputs/20190402_run.ipynb')
…
# Each run can be placed in a unique / sortable path
pprint(files_in_directory('outputs'))
outputs/
...
20190401_run.ipynb
20190402_run.ipynb
Choose an output location.
# Pass template parameters to notebook execution
pm.execute_notebook('input_nb.ipynb', 'outputs/20190402_run.ipynb',
{'region': 'ca', 'devices': ['phone', 'tablet']})
…
[2] # Default values for our potential input parameters
region = 'us'
devices = ['pc']
date_since = datetime.now() - timedelta(days=30)
[3] # Parameters
region = 'ca'
devices = ['phone', 'tablet']
Add Parameters
# Same example as last slide
pm.execute_notebook('input_nb.ipynb', 'outputs/20190402_run.ipynb',
{'region': 'ca', 'devices': ['phone', 'tablet']})
…
# Bash version of that input
papermill input_nb.ipynb outputs/20190402_run.ipynb -p region ca -y
'{"devices": ["phone", "tablet"]}'
Also Available as a CLI
Let’s use the CLI ...
Notebooks: Programmatically
Jupyter
UIs
Jupyter
Server
Jupyter
Kernel
execute
code
receive
outputs
forward
requests
save / load
.ipynb
develop
share
Papermill
receive
outputs
Kernel
Manager
forward
requests
read write
execute
code
# To add SFTP support you’d add this class
class SFTPHandler():
def read(self, file_path):
...
def write(self, file_contents, file_path):
…
# Then add an entry_point for the handler
from setuptools import setup, find_packages
setup(
# all the usual setup arguments ...
entry_points={'papermill.io':
['sftp://=papermill_sftp:SFTPHandler']})
# Use the new prefix to read/write from that location
pm.execute_notebook('sftp://my_ftp_server.co.uk/input.ipynb',
'sftp://my_ftp_server.co.uk/output.ipynb')
Entire Library is Component Based
Failed Notebooks
A better way to review outcomes
Debugging failed jobs.
Notebook
Job #1
Notebook
Job #2
Failed
Notebook
Job #3
Notebook
Job #4
Notebook
Job #5
Output notebooks are the place to
look for failures. They have:
● Stack traces
● Re-runnable code
● Execution logs
● Same interface as input
Failed outputs
are useful.
Find the issue.
Test the fix.
Update the notebook.
Output notebooks are the place to
look for failures. They have:
● Stack traces
● Re-runnable code
● Execution logs
● Same interface as input
Adds notebook isolation
● Immutable inputs
● Immutable outputs
● Parameterization of notebook runs
● Configurable sourcing / sinking
and gives better control of notebook flows via library calls.
Changes to the notebook experience.
● Platform Scheduler uses Jupyter
Notebooks for all Templates
● Notebooks used to run integration tests,
monitor systems, execute ETL, and wrap
ML flows.
Jupyter Notebooks @Netflix
Questions?
https://slack.nteract.io/
https://discourse.jupyter.org/

More Related Content

What's hot

Container World 2018
Container World 2018Container World 2018
Container World 2018aspyker
 
CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2aspyker
 
Velocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ NetflixVelocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ Netflixaspyker
 
CS80A Foothill College Open Source Talk
CS80A Foothill College Open Source TalkCS80A Foothill College Open Source Talk
CS80A Foothill College Open Source Talkaspyker
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixCodemotion Tel Aviv
 
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, KayentaNetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, Kayentaaspyker
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talksRuslan Meshenberg
 
Netflix oss season 1 episode 3
Netflix oss season 1 episode 3 Netflix oss season 1 episode 3
Netflix oss season 1 episode 3 Ruslan Meshenberg
 
Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1aspyker
 
Timed Text At Netflix
Timed Text At NetflixTimed Text At Netflix
Timed Text At NetflixRohit Puri
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Sourceaspyker
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016aspyker
 
Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1aspyker
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Docker, Inc.
 
Dev309 from asgard to zuul - netflix oss-final
Dev309  from asgard to zuul - netflix oss-finalDev309  from asgard to zuul - netflix oss-final
Dev309 from asgard to zuul - netflix oss-finalRuslan Meshenberg
 
Netflix Story of Embracing the Cloud
Netflix Story of Embracing the CloudNetflix Story of Embracing the Cloud
Netflix Story of Embracing the CloudKate Karniouchina
 
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?GetInData
 
The service mesh management plane
The service mesh management planeThe service mesh management plane
The service mesh management planeLibbySchulze
 
Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?GetInData
 

What's hot (20)

Container World 2018
Container World 2018Container World 2018
Container World 2018
 
CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2
 
The new Netflix API
The new Netflix APIThe new Netflix API
The new Netflix API
 
Velocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ NetflixVelocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ Netflix
 
CS80A Foothill College Open Source Talk
CS80A Foothill College Open Source TalkCS80A Foothill College Open Source Talk
CS80A Foothill College Open Source Talk
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, Wix
 
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, KayentaNetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talks
 
Netflix oss season 1 episode 3
Netflix oss season 1 episode 3 Netflix oss season 1 episode 3
Netflix oss season 1 episode 3
 
Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1
 
Timed Text At Netflix
Timed Text At NetflixTimed Text At Netflix
Timed Text At Netflix
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Source
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016
 
Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
 
Dev309 from asgard to zuul - netflix oss-final
Dev309  from asgard to zuul - netflix oss-finalDev309  from asgard to zuul - netflix oss-final
Dev309 from asgard to zuul - netflix oss-final
 
Netflix Story of Embracing the Cloud
Netflix Story of Embracing the CloudNetflix Story of Embracing the Cloud
Netflix Story of Embracing the Cloud
 
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
 
The service mesh management plane
The service mesh management planeThe service mesh management plane
The service mesh management plane
 
Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?
 

Similar to Season 7 Episode 1 - Tools for Data Scientists

KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...Chris Fregly
 
Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroidsJose Enrique Ruiz
 
Présentation du FME World Tour 2018 à Montréal
Présentation du FME World Tour 2018 à MontréalPrésentation du FME World Tour 2018 à Montréal
Présentation du FME World Tour 2018 à MontréalGuillaume Genest
 
Developing and releasing SOFA Statistics
Developing and releasing SOFA StatisticsDeveloping and releasing SOFA Statistics
Developing and releasing SOFA StatisticsGrant Paton-Simpson
 
Collaborative data science and how to build a data science toolchain around n...
Collaborative data science and how to build a data science toolchain around n...Collaborative data science and how to build a data science toolchain around n...
Collaborative data science and how to build a data science toolchain around n...Moon Soo Lee
 
Présentation du FME World Tour 2018 à Québec
Présentation du FME World Tour 2018 à QuébecPrésentation du FME World Tour 2018 à Québec
Présentation du FME World Tour 2018 à QuébecGuillaume Genest
 
Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018Holden Karau
 
PyQt Application Development On Maemo
PyQt Application Development On MaemoPyQt Application Development On Maemo
PyQt Application Development On Maemoachipa
 
Fullstack workshop
Fullstack workshopFullstack workshop
Fullstack workshopAssaf Gannon
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsLuciano Resende
 
Exploring SharePoint with F#
Exploring SharePoint with F#Exploring SharePoint with F#
Exploring SharePoint with F#Talbott Crowell
 
Openmeetings
OpenmeetingsOpenmeetings
Openmeetingshs1250
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesNicola Ferraro
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIYoni Davidson
 
Improving Operations Efficiency with Puppet
Improving Operations Efficiency with PuppetImproving Operations Efficiency with Puppet
Improving Operations Efficiency with PuppetNicolas Brousse
 
DevOps for Data Scientists - Stefano Tucci
DevOps for Data Scientists - Stefano TucciDevOps for Data Scientists - Stefano Tucci
DevOps for Data Scientists - Stefano TucciStefano Tucci
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
Yaetos Tech Overview
Yaetos Tech OverviewYaetos Tech Overview
Yaetos Tech Overviewprevota
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaIntel® Software
 
EclipseOMRBuildingBlocks4Polyglot_TURBO18
EclipseOMRBuildingBlocks4Polyglot_TURBO18EclipseOMRBuildingBlocks4Polyglot_TURBO18
EclipseOMRBuildingBlocks4Polyglot_TURBO18Xiaoli Liang
 

Similar to Season 7 Episode 1 - Tools for Data Scientists (20)

KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
 
Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroids
 
Présentation du FME World Tour 2018 à Montréal
Présentation du FME World Tour 2018 à MontréalPrésentation du FME World Tour 2018 à Montréal
Présentation du FME World Tour 2018 à Montréal
 
Developing and releasing SOFA Statistics
Developing and releasing SOFA StatisticsDeveloping and releasing SOFA Statistics
Developing and releasing SOFA Statistics
 
Collaborative data science and how to build a data science toolchain around n...
Collaborative data science and how to build a data science toolchain around n...Collaborative data science and how to build a data science toolchain around n...
Collaborative data science and how to build a data science toolchain around n...
 
Présentation du FME World Tour 2018 à Québec
Présentation du FME World Tour 2018 à QuébecPrésentation du FME World Tour 2018 à Québec
Présentation du FME World Tour 2018 à Québec
 
Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018
 
PyQt Application Development On Maemo
PyQt Application Development On MaemoPyQt Application Development On Maemo
PyQt Application Development On Maemo
 
Fullstack workshop
Fullstack workshopFullstack workshop
Fullstack workshop
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 Analytics
 
Exploring SharePoint with F#
Exploring SharePoint with F#Exploring SharePoint with F#
Exploring SharePoint with F#
 
Openmeetings
OpenmeetingsOpenmeetings
Openmeetings
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with Kubernetes
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-API
 
Improving Operations Efficiency with Puppet
Improving Operations Efficiency with PuppetImproving Operations Efficiency with Puppet
Improving Operations Efficiency with Puppet
 
DevOps for Data Scientists - Stefano Tucci
DevOps for Data Scientists - Stefano TucciDevOps for Data Scientists - Stefano Tucci
DevOps for Data Scientists - Stefano Tucci
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Yaetos Tech Overview
Yaetos Tech OverviewYaetos Tech Overview
Yaetos Tech Overview
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
EclipseOMRBuildingBlocks4Polyglot_TURBO18
EclipseOMRBuildingBlocks4Polyglot_TURBO18EclipseOMRBuildingBlocks4Polyglot_TURBO18
EclipseOMRBuildingBlocks4Polyglot_TURBO18
 

More from aspyker

SRECon Lightning Talk
SRECon Lightning TalkSRECon Lightning Talk
SRECon Lightning Talkaspyker
 
Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17aspyker
 
Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4aspyker
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integrationaspyker
 
Netflix Open Source: Building a Distributed and Automated Open Source Program
Netflix Open Source:  Building a Distributed and Automated Open Source ProgramNetflix Open Source:  Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Programaspyker
 
Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3aspyker
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016aspyker
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Sourceaspyker
 
Ibm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalaspyker
 
Docker Demo IBM Impact 2014
Docker Demo IBM Impact 2014Docker Demo IBM Impact 2014
Docker Demo IBM Impact 2014aspyker
 
Netflix s2e1lightningtalk
Netflix s2e1lightningtalkNetflix s2e1lightningtalk
Netflix s2e1lightningtalkaspyker
 
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@PulseGoing Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulseaspyker
 

More from aspyker (13)

SRECon Lightning Talk
SRECon Lightning TalkSRECon Lightning Talk
SRECon Lightning Talk
 
Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17
 
Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
 
Netflix Open Source: Building a Distributed and Automated Open Source Program
Netflix Open Source:  Building a Distributed and Automated Open Source ProgramNetflix Open Source:  Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Program
 
Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
 
Ibm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinal
 
Docker Demo IBM Impact 2014
Docker Demo IBM Impact 2014Docker Demo IBM Impact 2014
Docker Demo IBM Impact 2014
 
Netflix s2e1lightningtalk
Netflix s2e1lightningtalkNetflix s2e1lightningtalk
Netflix s2e1lightningtalk
 
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@PulseGoing Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
 

Recently uploaded

Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfEr. Suman Jyoti
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 

Recently uploaded (20)

FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 

Season 7 Episode 1 - Tools for Data Scientists

  • 1. OSS Meetup 11 Feb 2020 Tools for Data Scientists
  • 2. 6:00 pm Registration, Food, Networking Faisal Siddiqi 7:00 (5m) Welcome Ville Tuulos 7:05 pm (20m) Metaflow Jeremy Smith 7:25 pm (20m) Polynote Matthew Seal 7:45 pm (15m) Papermill 8:00 pm Demo Stations, Networking, Food Agenda
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 17.
  • 18.
  • 19.
  • 22. State Transfer and Checkpointing
  • 28. ...and much more! See metaflow.org for details
  • 31. Polynote is a polyglot notebook environment, built from scratch.
  • 32. Polynote is a polyglot notebook environment, built from scratch. It supports mixing Scala, Python, SQL, and Vega in a single notebook.
  • 33. Polynote is a polyglot notebook environment, built from scratch. It supports mixing Scala, Python, SQL, and Vega in a single notebook. Data is shared seamlessly* between languages.
  • 34. Why did we build it? Scientists were avoiding Scala notebooks for experimentation.
  • 35. Why did we build it? Scientists were avoiding Scala notebooks for experimentation. It was just a pain to use Scala and Spark in a notebook.
  • 36. Scala + Spark pain points ● Interactive autocomplete is practically a necessity ● Difficult to find compiler errors ● Dependencies are many and varied ● Spark clashes with dependencies – constantly building shaded JARs
  • 38. Editing improvements Quality-of-life IDE features like autocomplete and parameter hints, error highlighting, etc.
  • 39. Reproducibility Cells see only the state derived from cells above, no matter what order they ran in.
  • 40. Visibility See what the Kernel's up to with the symbol table, task list and executing expression highlight.
  • 41. Data Visualization Use the built-in Data Inspector to browse tabular data and inspect schema. Plot data with the plot editor, or use Vega or matplotlib directly.
  • 42.
  • 43. Polyglot Scala cells and Python cells together in one notebook. Variables from each language are available to the other.
  • 44. Polyglot Scala cells and Python cells together in one notebook. Variables from each language are available to the other. Example use case: data prep in Scala+Spark, model training in Python with TensorFlow/PyTorch/etc
  • 45. Questions? (stop by our demo station!)
  • 47. Matthew Seal Backend Engineer on the Big Data Platform Orchestration Team @ Netflix @codeseal Speaker Details
  • 48. Notebook Wins. ● Shareable ● Easy to Read ● Documentation with Code ● Outputs as Reports ● Familiar Interface ● Multi-Language
  • 49. Things to preserve: ● Results linked to code ● Good visuals ● Easy to share Focus points to extend uses. Things to improve: ● Not versioned ● Mutable state ● Templating
  • 50. Jupyter Notebooks: A Repl Protocol + UIs Jupyter UIs Jupyter Server Jupyter Kernel execute code receive outputs forward requests save / load .ipynb It’s more complex than this in reality develop share
  • 51. A simple library for executing notebooks. EFS S3 Papermill template.ipynb run_1.ipynb run_3.ipynb output notebooks parameterize & run run_2.ipynb run_4.ipynbinput notebook input store s3://output/mseal/ efs://users/mseal/notebooks
  • 52. import papermill as pm pm.execute_notebook('input_nb.ipynb', 'outputs/20190402_run.ipynb') … # Each run can be placed in a unique / sortable path pprint(files_in_directory('outputs')) outputs/ ... 20190401_run.ipynb 20190402_run.ipynb Choose an output location.
  • 53. # Pass template parameters to notebook execution pm.execute_notebook('input_nb.ipynb', 'outputs/20190402_run.ipynb', {'region': 'ca', 'devices': ['phone', 'tablet']}) … [2] # Default values for our potential input parameters region = 'us' devices = ['pc'] date_since = datetime.now() - timedelta(days=30) [3] # Parameters region = 'ca' devices = ['phone', 'tablet'] Add Parameters
  • 54. # Same example as last slide pm.execute_notebook('input_nb.ipynb', 'outputs/20190402_run.ipynb', {'region': 'ca', 'devices': ['phone', 'tablet']}) … # Bash version of that input papermill input_nb.ipynb outputs/20190402_run.ipynb -p region ca -y '{"devices": ["phone", "tablet"]}' Also Available as a CLI
  • 55. Let’s use the CLI ...
  • 56. Notebooks: Programmatically Jupyter UIs Jupyter Server Jupyter Kernel execute code receive outputs forward requests save / load .ipynb develop share Papermill receive outputs Kernel Manager forward requests read write execute code
  • 57. # To add SFTP support you’d add this class class SFTPHandler(): def read(self, file_path): ... def write(self, file_contents, file_path): … # Then add an entry_point for the handler from setuptools import setup, find_packages setup( # all the usual setup arguments ... entry_points={'papermill.io': ['sftp://=papermill_sftp:SFTPHandler']}) # Use the new prefix to read/write from that location pm.execute_notebook('sftp://my_ftp_server.co.uk/input.ipynb', 'sftp://my_ftp_server.co.uk/output.ipynb') Entire Library is Component Based
  • 58. Failed Notebooks A better way to review outcomes
  • 59. Debugging failed jobs. Notebook Job #1 Notebook Job #2 Failed Notebook Job #3 Notebook Job #4 Notebook Job #5
  • 60. Output notebooks are the place to look for failures. They have: ● Stack traces ● Re-runnable code ● Execution logs ● Same interface as input Failed outputs are useful.
  • 61. Find the issue. Test the fix. Update the notebook. Output notebooks are the place to look for failures. They have: ● Stack traces ● Re-runnable code ● Execution logs ● Same interface as input
  • 62. Adds notebook isolation ● Immutable inputs ● Immutable outputs ● Parameterization of notebook runs ● Configurable sourcing / sinking and gives better control of notebook flows via library calls. Changes to the notebook experience.
  • 63. ● Platform Scheduler uses Jupyter Notebooks for all Templates ● Notebooks used to run integration tests, monitor systems, execute ETL, and wrap ML flows. Jupyter Notebooks @Netflix