In this talk, we’ll present techniques for visualizing large scale machine learning systems in Spark. These are techniques that are employed by Netflix to understand and refine the machine learning models behind Netflix’s famous recommender systems that are used to personalize the Netflix experience for their 99 millions members around the world. Essential to these techniques is Vegas, a new OSS Scala library that aims to be the “missing MatPlotLib” for Spark/Scala. We’ll talk about the design of Vegas and its usage in Scala notebooks to visualize Machine Learning Models.
6. My List:
Continue Watching:
Popular on Netflix:
Trending Now:
Watch It Again:
Top Picks:
Because You Watched:
Genres:
New Releases:
Recently Added:
Originals RowBillboard:
7. Machine Learning at Netflix
● Optimize the Experimentation usecase vs Productionization
● Experimentation
○ Opportunity sizing, Data Exploration
○ Feature Identification and Selection
○ Tweaks to ML algos
○ Model Evaluation
9. Notebooks
● Optimal for Experimentation
● Sharing reproducible research
○ Facilitates feedback loop with Product Managers
● End to end ML experiment.
○ Interactivity drives productivity
11. Python Notebooks
● Seamless Experience - ML experimentation
● Well known Scientific computing libraries
● Huge catalog of Visualization plotting libraries
○ Matplotlib, Seaborn, Bokeh, BQPlot, Lightning, etc.
12. Scala Notebooks
● Zeppelin, Jupyter, Databricks, Spark-Notebooks, ...
● Computing library gap filling up
● Lack of Visualization Libraries
○ Main friction point in adoption
○ End to End ML use case not convincing
13. Introducing Vegas
● Visualization Library in Scala
● Mainly built for the notebook use case
● Scala wrapper around Vega-Lite
○ Missing MatPlotLib for the Scala/Spark world.
14. DECLARATIVE
STATISTICAL
VISUALIZATION
GRAMMAR
IN SCALA
You tell it WHAT should be done with the data, and it knows
HOW to do it!
Operations such as filtering, aggregation, faceting are built
into the visualization, rather than putting the burden on the
user to massage the data into shape.
Complex visualizations can be built with a few high level
abstractions:
DATA
TRANS-
FORMS
SCALES
GUIDES MARKS
cf : Altair Talk by Brian Granger in PyData 2016 https://youtu.be/v5mrwq7yJc4
15. Added Bonus of Declarative
Visualizations:
INTERACTIVITY!
D3JS
VEGAS
VEGAS CODE EXPANDS OUT TO D3JS CODE!
16. Anatomy of a plot: Channels
X/Y channel
Shape Channel
Size Channel
Color Channel
26. 1. Specify in Scala
2. Embed HTML
(iFrame)
3. Render within
iFrame using JS
27. VEGA
D3JS
VEGA-LITE*
VEGAS
MOREABSTRACTION SCALA DSL EMITS TYPE-CHECKED
VEGA-LITE JSON
VEGA-LITE CONVERTS INTERNALLY
TO VEGA JSON SPEC
VEGA TRANSLATES JSON TO D3JS
CODE THAT CAN BE VERY VERBOSE
A SCALA DSL FOR VEGA-LITE
* Vega-Lite