SlideShare uma empresa Scribd logo
1 de 69
Build, Scale, and Deploy
Deep Learning Pipelines
Using Apache Spark
Sue Ann Hong, Databricks
Tim Hunter, Databricks
#EUdd3
About Us
Sue Ann Hong
• Software engineer @ Databricks
• Ph.D. from CMU in Machine Learning
Tim Hunter
• Software engineer @ Databricks
• Ph.D. from UC Berkeley in Machine Learning
• Very early Spark user
This talk
• Deep Learning at scale: current state
• Deep Learning Pipelines: the vision
• End-to-end workflow with DL Pipelines
• Future
Deep Learning at Scale
: current state
4put your #assignedhashtag here by setting the
What is Deep Learning?
• A set of machine learning techniques that use layers that
transform numerical inputs
• Classification
• Regression
• Arbitrary mapping
• Popular in the 80’s as Neural Networks
• Recently came back thanks to advances in data collection,
computation techniques, and hardware.
t
Success of Deep Learning
Tremendous success for applications with complex data
• AlphaGo
• Image interpretation
• Automatic translation
• Speech recognition
But requires a lot of effort
• No exact science around deep learning
• Success requires many engineer-hours
• Low level APIs with steep learning curve
• Not well integrated with other enterprise tools
• Tedious to distribute computations
What does Spark offer?
Very little in Apache Spark MLlib itself (multilayer perceptron)
Many Spark packages
Integrations with existing DL libraries
• Deep Learning Pipelines (from Databricks)
• Caffe (CaffeOnSpark)
• Keras (Elephas)
• mxnet
• Paddle
• TensorFlow (TensorFlow on Spark, TensorFrames)
• CNTK (mmlspark)
Implementations of DL on
Spark
• BigDL
• DeepDist
• DeepLearning4J
• MLlib
• SparkCL
• SparkNet
Deep Learning in industry
• Currently limited adoption
• Huge potential beyond the industrial giants
• How do we accelerate the road to massive availability?
Deep Learning Pipelines
10put your #assignedhashtag here by setting the
Deep Learning Pipelines:
Deep Learning with Simplicity
• Open-source Databricks library
• Focuses on ease of use and integration
• without sacrificing performance
• Primary language: Python
• Uses Apache Spark for scaling out common tasks
• Integrates with MLlib Pipelines to capture the ML workflow
concisely
s
A typical Deep Learning workflow
• Load data (images, text, time series, …)
• Interactive work
• Train
• Select an architecture for a neural network
• Optimize the weights of the NN
• Evaluate results, potentially re-train
• Apply:
• Pass the data through the NN to produce new features or
output
Load data
Interactive work
Train
Evaluate
Apply
A typical Deep Learning workflow
Load data
Interactive work
Train
Evaluate
Apply
• Image loading in Spark
• Distributed batch prediction
• Deploying models in SQL
• Transfer learning
• Distributed tuning
• Pre-trained models
Part 1
Part 2
End-to-End Workflow
with Deep Learning Pipelines
14put your #assignedhashtag here by setting the
Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
t
Adds support for images in Spark
• ImageSchema, reader, conversion functions to/from numpy arrays
• Most of the tools we’ll describe work on ImageSchema columns
from sparkdl import readImages
image_df = readImages(sample_img_dir)
Upcoming: built-in support in Spark
• Spark-21866
• Contributing image format & reading to Spark
• Targeted for Spark 2.3
• Joint work with Microsoft
17put your #assignedhashtag here by setting the
Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
Applying popular models
• Popular pre-trained models accessible through MLlib
Transformers
predictor = DeepImagePredictor(inputCol="image",
outputCol="predicted_labels",
modelName="InceptionV3")
predictions_df = predictor.transform(image_df)
Applying popular models
predictor = DeepImagePredictor(inputCol="image",
outputCol="predicted_labels",
modelName="InceptionV3")
predictions_df = predictor.transform(image_df)
Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
Hyperparameter tuning
Transfer learning
s
Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
Hyperparameter tuning
Transfer learning
Transfer learning
• Pre-trained models may not be directly applicable
• New domain, e.g. shoes
• Training from scratch requires
• Enormous amounts of data
• A lot of compute resources & time
• Idea: intermediate representations learned for one task may be
useful for other related tasks
Transfer Learning
SoftMax
GIANT PANDA
0.9
RACCOON
0.05
RED PANDA
0.01
…
Transfer Learning
0.01
…
Transfer Learning
0.01
…
Classifie
r
Transfer Learning
0.01
…
Classifie
r
Rose: 0.7
Daisy: 0.3
MLlib Pipelines primer
• MLlib: the machine learning library included with Spark
• Transformer
• Takes in a Spark dataframe
• Returns a Spark dataframe with new column(s) containing “transformed”
data
• e.g. a Model is a Transformer
• Estimator
• A learning algorithm, e.g. lr = LogisticRegression()
• Produces a Model via lr.fit()
• Pipeline: a sequence of Transformers and Estimators
Transfer Learning as a Pipeline
0.01
…
Classifie
rDeepImageFeaturiz
er
Rose / Daisy
Transfer Learning as a Pipeline
0.01
…
DeepImageFeaturiz
er
Image
Loading
Preprocessi
ng
Logistic
Regression
MLlib
Pipeline
Transfer Learning as a Pipeline
31put your #assignedhashtag here by setting the
featurizer = DeepImageFeaturizer(inputCol="image",
outputCol="features",
modelName="InceptionV3")
lr = LogisticRegression(labelCol="label")
p = Pipeline(stages=[featurizer, lr])
p_model = p.fit(train_df)
Transfer Learning
• Usually for classification tasks
• Similar task, new domain
• But other forms of learning leveraging learned
representations can be loosely considered transfer learning
Featurization for similarity-based ML
0.01
…
DeepImageFeaturiz
er
Image
Loading
Preprocessi
ng
Logistic
Regression
Featurization for similarity-based ML
0.01
…
DeepImageFeaturiz
er
Image
Loading
Preprocessi
ng
Clustering
KMeans
GaussianMixture
Nearest
Neighbor
KNN LSH
Distance
computation
Featurization for similarity-based ML
0.01
…
DeepImageFeaturiz
er
Image
Loading
Preprocessi
ng
Clustering
KMeans
GaussianMixture
Nearest
Neighbor
KNN LSH
Distance
computation
Duplicate
Detection
Recommendatio
n
Anomaly
Detection
Search result
diversification
Break?
Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
Hyperparameter tuning
Transfer learning
t
Distributed Hyperparameter Tuning
…
Spark driver
{model = ResNet50,
learning_rate =
0.01,
batch_size = 64}
{model =
InceptionV3,
learning_rate = 0.1,
batch_size = 128}
MLlib Components
• Learning algorithms
• Implement
model = fit(train_df, )
• Metric used to measure model’s goodness on validation data
• e.g. BinaryClassificationEvaluator computes accuracy
40put your #assignedhashtag here by setting the
e.g. learning rate
Estimators
Evaluators
params
Model Selection (Parameter Search)
41put your #assignedhashtag here by setting the
Estimators
Evaluators
paramsMap of
cv = CrossValidator(
estimator=pipeline,
estimatorParamMaps=paramGrid,
evaluator=BinaryClassificationEvaluator())
best_model = cv.fit(train_df)
paramGrid = ( ParamGridBuilder()
.addGrid(hashingTF.numFeatures, [10, 100])
.addGrid(lr.regParam, [0.1, 0.01])
.build() )
Deep Learning Estimators
• Transfer learning Pipelines
• KerasImageFileEstimator
Deep Learning Estimators
• Transfer learning Pipelines
• KerasImageFileEstimator
Distributed Tuning Transfer Learning
Logistic
Regressio
n
+
ResNet50
…
Logistic
Regressio
n
+
InceptionV3
Spark driver
pipeline = Pipeline([DeepImageFeaturizer(), LogisticRegression()])
paramGrid = ( ParamGridBuilder()
.addGrid(modelName=[“ “, “ “]) )
cv = CrossValidator(estimator= pipeline,
estimatorParamMaps=paramGrid,
evaluator=BinaryClassificationEvaluator(),
numFolds=3)
best_model = cv.fit(train_df)
Distributed Tuning Transfer Learning
ResNet50 InceptionV3
Deep Learning Estimators
• Transfer learning Pipelines
• KerasImageFileEstimator
Keras
47
model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Activation('relu'))
• A popular, declarative interface to build DL models
• High level, expressive API in python
• Executes on TensorFlow, Theano, CNTK
model = Sequential()
model.add(...)
model.save(model_filename)
estimator = KerasImageFileEstimator(
kerasOptimizer=“adam“,
kerasLoss=“categorical_crossentropy“,
kerasFitParams={“batch_size“:100},
modelFile=model_filename)
model = model.fit(dataframe)
48
Keras Estimator
model = Sequential()
model.add(...)
model.save(model_filename)
estimator = KerasImageFileEstimator(
kerasOptimizer=“adam“,
kerasLoss=“categorical_crossentropy“,
kerasFitParams={“batch_size“:100},
modelFile=model_filename)
model = model.fit(dataframe)
49
Keras Estimator in Model Selection
estimator = KerasImageFileEstimator(
kerasOptimizer=“adam“,
kerasLoss=“categorical_crossentropy“)
paramGrid = ( ParamGridBuilder()
.addGrid(kerasFitParams=[{“batch_size“:100}, {“batch_size“:200}])
.addGrid(modelFile=[model1, model2]) )
50
Keras Estimator in Model Selection
estimator = KerasImageFileEstimator(
kerasOptimizer=“adam“,
kerasLoss=“categorical_crossentropy“)
paramGrid = ( ParamGridBuilder()
.addGrid(kerasFitParams=[{“batch_size“:100}, {“batch_size“:200}])
.addGrid(modelFile=[model1, model2]) )
cv = CrossValidator(estimator=estimator,
estimatorParamMaps=paramGrid,
evaluator=BinaryClassificationEvaluator(),
numFolds=3)
best_model = cv.fit(train_df)
Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
Spark SQL
Batch prediction
s
Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
Spark SQL
Batch prediction
Batch prediction as an MLlib Transformer
• Recall a model is a Transformer in MLlib
predictor = XXTransformer(inputCol="image",
outputCol=”predictions",
modelSpecification={…})
predictions = predictor.transform(test_df)
Hierarchy of DL transformers for images
55
TFImageTransformer
KerasImageTransformer
DeepImageFeaturizerDeepImagePredictor
NamedImageTransformer(model_name)
(keras.Model)
(tf.Graph)
Input
(Image)
Output
(vector)
Hierarchy of DL transformers for images
56
TFImageTransformer
KerasImageTransformer
DeepImageFeaturizerDeepImagePredictor
NamedImageTransformer(model_name)
(keras.Model)
(tf.Graph)
Input
(Image)
Output
(vector)
model_namemodel_name
Keras.Model
tf.Graph
Hierarchy of DL transformers
57
TFImageTransformer
KerasImageTransformer
DeepImageF
eaturizer
DeepImageP
redictor
NamedImageTransformer
)
Input
(Image
)
Output
(vector
)
model_namemodel_name
Keras.Model
tf.Graph
TFTextTransformer
KerasTextTransformer
DeepTextFeat
urizer
DeepTextPre
dictor
NamedTextTransformerInput
(Text)
Output
(vector)
model_namemodel_name
Keras.Model
tf.Graph
TFTransformer
58
TFImageTransformer
Defined by tf.Graph
1. Convert ImageSchema data into a vector
2. Use tensorframes to
• Efficiently distribute tf.Graph to workers
• Apply the graph to the partitioned data
3. Return the result as a spark.ml.Vector
Aside: high performance with Spark
• TensorFlow (and other frameworks) have 2 mechanisms to
ingest data:
• memory-based API (tensors)
• file-based API (Queue, …)
• DLP makes all data transfers in memory
• Spark responsible for reading and assembling data
• Low-level transformations handled by TensorFrames
59put your #assignedhashtag here by setting the
t
High performance with Spark
• Every DL transform is a
TensorFlow graph
60put your #assignedhashtag here by setting the
x:
int32
y:
int32
mul 3
z
High performance with Spark
• Every DL transform is a
TensorFlow graph
• Applied on each of the
rows of the dataframe
61
output_df:
DataFrame[x: int, y: int, z: int]
x:
int32
y:
int32
mul 3
z
df: DataFrame[x: int, y: int]
Example: running batch inference
Spark driver Spark worker
Deep
Learning
Pipelines
Broadcast
(protocol buffers)TensorFlow
Program
(tf.Graph)
TensorFlow
runtime
(C++)
Worker JVM
Driver JVM
Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
Spark SQL
Batch prediction
s
Shipping predictors in SQL
Take a trained model / Pipeline, register a SQL UDF usable
by anyone in the organization
In Spark SQL:
registerKerasUDF(”my_object_recognition_function",
keras_model_file="/mymodels/007model.h5")
select image, my_object_recognition_function(image) as objects
from traffic_imgs
This means you can apply deep learning models in streaming!
Almost done
65put your #assignedhashtag here by setting the
Deep Learning Pipelines : Future
In progress
• Scala API for DeepImageFeaturizer
• Text featurization (embeddings)
• TFTransformer for arbitrary vectors
Future
• Distributed training
• Support for more backends, e.g. MXNet, PyTorch, BigDL
Deep Learning without Deep
Pockets
• Simple API for Deep Learning, integrated with MLlib
• Scales common tasks with transformers and estimators
• Embeds Deep Learning models in MLlib and SparkSQL
• Check out https://github.com/databricks/spark-deep-
learning !
Questions?
Thank you!
68put your #assignedhashtag here by setting the
Resources
Blog posts & webinars (http://databricks.com/blog)
• Deep Learning Pipelines
• GPU acceleration in Databricks
• BigDL on Databricks
• Deep Learning and Apache Spark
Docs for Deep Learning on Databricks (http://docs.databricks.com)
• Getting started
• Deep Learning Pipelines Example
• Spark integration

Mais conteúdo relacionado

Mais procurados

What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
Simplilearn
 

Mais procurados (20)

Data Augmentation
Data AugmentationData Augmentation
Data Augmentation
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
 
How to add security in dataops and devops
How to add security in dataops and devopsHow to add security in dataops and devops
How to add security in dataops and devops
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
 
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graph
 
Hadoop Distributed file system.pdf
Hadoop Distributed file system.pdfHadoop Distributed file system.pdf
Hadoop Distributed file system.pdf
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
 
Temporal based Recommendation System
Temporal based Recommendation SystemTemporal based Recommendation System
Temporal based Recommendation System
 
Transfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningTransfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine Learning
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?
 
Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial Networks
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
 
Image anomaly detection with generative adversarial networks
Image anomaly detection with generative adversarial networksImage anomaly detection with generative adversarial networks
Image anomaly detection with generative adversarial networks
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
(2017/06)Practical points of deep learning for medical imaging
(2017/06)Practical points of deep learning for medical imaging(2017/06)Practical points of deep learning for medical imaging
(2017/06)Practical points of deep learning for medical imaging
 
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
 

Destaque

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 

Destaque (8)

Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei ZahariaDeep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Building Custom ML PipelineStages for Feature Selection with Marc Kaminski
Building Custom ML PipelineStages for Feature Selection with Marc KaminskiBuilding Custom ML PipelineStages for Feature Selection with Marc Kaminski
Building Custom ML PipelineStages for Feature Selection with Marc Kaminski
 
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene PangBest Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene Pang
 
Deep Learning on Apache Spark: TensorFrames & Deep Learning Pipelines
Deep Learning on Apache Spark: TensorFrames & Deep Learning Pipelines Deep Learning on Apache Spark: TensorFrames & Deep Learning Pipelines
Deep Learning on Apache Spark: TensorFrames & Deep Learning Pipelines
 
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard MaasSpark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard Maas
 

Semelhante a Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter

Semelhante a Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter (20)

Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
 
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease
Build, Scale, and Deploy Deep Learning Pipelines with EaseBuild, Scale, and Deploy Deep Learning Pipelines with Ease
Build, Scale, and Deploy Deep Learning Pipelines with Ease
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
 
Conference 2014: Rajat Arya - Deployment with GraphLab Create
Conference 2014: Rajat Arya - Deployment with GraphLab Create Conference 2014: Rajat Arya - Deployment with GraphLab Create
Conference 2014: Rajat Arya - Deployment with GraphLab Create
 
transferlearning.pptx
transferlearning.pptxtransferlearning.pptx
transferlearning.pptx
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
 
Machine learning systems for engineers
Machine learning systems for engineersMachine learning systems for engineers
Machine learning systems for engineers
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
Alex mang patterns for scalability in microsoft azure application
Alex mang   patterns for scalability in microsoft azure applicationAlex mang   patterns for scalability in microsoft azure application
Alex mang patterns for scalability in microsoft azure application
 
Machine learning
Machine learningMachine learning
Machine learning
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
 
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
 
Enterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JEnterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4J
 

Mais de Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

Mais de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
gajnagarg
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Último (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter

  • 1. Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark Sue Ann Hong, Databricks Tim Hunter, Databricks #EUdd3
  • 2. About Us Sue Ann Hong • Software engineer @ Databricks • Ph.D. from CMU in Machine Learning Tim Hunter • Software engineer @ Databricks • Ph.D. from UC Berkeley in Machine Learning • Very early Spark user
  • 3. This talk • Deep Learning at scale: current state • Deep Learning Pipelines: the vision • End-to-end workflow with DL Pipelines • Future
  • 4. Deep Learning at Scale : current state 4put your #assignedhashtag here by setting the
  • 5. What is Deep Learning? • A set of machine learning techniques that use layers that transform numerical inputs • Classification • Regression • Arbitrary mapping • Popular in the 80’s as Neural Networks • Recently came back thanks to advances in data collection, computation techniques, and hardware. t
  • 6. Success of Deep Learning Tremendous success for applications with complex data • AlphaGo • Image interpretation • Automatic translation • Speech recognition
  • 7. But requires a lot of effort • No exact science around deep learning • Success requires many engineer-hours • Low level APIs with steep learning curve • Not well integrated with other enterprise tools • Tedious to distribute computations
  • 8. What does Spark offer? Very little in Apache Spark MLlib itself (multilayer perceptron) Many Spark packages Integrations with existing DL libraries • Deep Learning Pipelines (from Databricks) • Caffe (CaffeOnSpark) • Keras (Elephas) • mxnet • Paddle • TensorFlow (TensorFlow on Spark, TensorFrames) • CNTK (mmlspark) Implementations of DL on Spark • BigDL • DeepDist • DeepLearning4J • MLlib • SparkCL • SparkNet
  • 9. Deep Learning in industry • Currently limited adoption • Huge potential beyond the industrial giants • How do we accelerate the road to massive availability?
  • 10. Deep Learning Pipelines 10put your #assignedhashtag here by setting the
  • 11. Deep Learning Pipelines: Deep Learning with Simplicity • Open-source Databricks library • Focuses on ease of use and integration • without sacrificing performance • Primary language: Python • Uses Apache Spark for scaling out common tasks • Integrates with MLlib Pipelines to capture the ML workflow concisely s
  • 12. A typical Deep Learning workflow • Load data (images, text, time series, …) • Interactive work • Train • Select an architecture for a neural network • Optimize the weights of the NN • Evaluate results, potentially re-train • Apply: • Pass the data through the NN to produce new features or output Load data Interactive work Train Evaluate Apply
  • 13. A typical Deep Learning workflow Load data Interactive work Train Evaluate Apply • Image loading in Spark • Distributed batch prediction • Deploying models in SQL • Transfer learning • Distributed tuning • Pre-trained models Part 1 Part 2
  • 14. End-to-End Workflow with Deep Learning Pipelines 14put your #assignedhashtag here by setting the
  • 15. Deep Learning Pipelines • Load data • Interactive work • Train • Evaluate model • Apply t
  • 16. Adds support for images in Spark • ImageSchema, reader, conversion functions to/from numpy arrays • Most of the tools we’ll describe work on ImageSchema columns from sparkdl import readImages image_df = readImages(sample_img_dir)
  • 17. Upcoming: built-in support in Spark • Spark-21866 • Contributing image format & reading to Spark • Targeted for Spark 2.3 • Joint work with Microsoft 17put your #assignedhashtag here by setting the
  • 18. Deep Learning Pipelines • Load data • Interactive work • Train • Evaluate model • Apply
  • 19. Applying popular models • Popular pre-trained models accessible through MLlib Transformers predictor = DeepImagePredictor(inputCol="image", outputCol="predicted_labels", modelName="InceptionV3") predictions_df = predictor.transform(image_df)
  • 20. Applying popular models predictor = DeepImagePredictor(inputCol="image", outputCol="predicted_labels", modelName="InceptionV3") predictions_df = predictor.transform(image_df)
  • 21. Deep Learning Pipelines • Load data • Interactive work • Train • Evaluate model • Apply Hyperparameter tuning Transfer learning s
  • 22. Deep Learning Pipelines • Load data • Interactive work • Train • Evaluate model • Apply Hyperparameter tuning Transfer learning
  • 23. Transfer learning • Pre-trained models may not be directly applicable • New domain, e.g. shoes • Training from scratch requires • Enormous amounts of data • A lot of compute resources & time • Idea: intermediate representations learned for one task may be useful for other related tasks
  • 28. MLlib Pipelines primer • MLlib: the machine learning library included with Spark • Transformer • Takes in a Spark dataframe • Returns a Spark dataframe with new column(s) containing “transformed” data • e.g. a Model is a Transformer • Estimator • A learning algorithm, e.g. lr = LogisticRegression() • Produces a Model via lr.fit() • Pipeline: a sequence of Transformers and Estimators
  • 29. Transfer Learning as a Pipeline 0.01 … Classifie rDeepImageFeaturiz er Rose / Daisy
  • 30. Transfer Learning as a Pipeline 0.01 … DeepImageFeaturiz er Image Loading Preprocessi ng Logistic Regression MLlib Pipeline
  • 31. Transfer Learning as a Pipeline 31put your #assignedhashtag here by setting the featurizer = DeepImageFeaturizer(inputCol="image", outputCol="features", modelName="InceptionV3") lr = LogisticRegression(labelCol="label") p = Pipeline(stages=[featurizer, lr]) p_model = p.fit(train_df)
  • 32. Transfer Learning • Usually for classification tasks • Similar task, new domain • But other forms of learning leveraging learned representations can be loosely considered transfer learning
  • 33.
  • 34. Featurization for similarity-based ML 0.01 … DeepImageFeaturiz er Image Loading Preprocessi ng Logistic Regression
  • 35. Featurization for similarity-based ML 0.01 … DeepImageFeaturiz er Image Loading Preprocessi ng Clustering KMeans GaussianMixture Nearest Neighbor KNN LSH Distance computation
  • 36. Featurization for similarity-based ML 0.01 … DeepImageFeaturiz er Image Loading Preprocessi ng Clustering KMeans GaussianMixture Nearest Neighbor KNN LSH Distance computation Duplicate Detection Recommendatio n Anomaly Detection Search result diversification
  • 38. Deep Learning Pipelines • Load data • Interactive work • Train • Evaluate model • Apply Hyperparameter tuning Transfer learning t
  • 39. Distributed Hyperparameter Tuning … Spark driver {model = ResNet50, learning_rate = 0.01, batch_size = 64} {model = InceptionV3, learning_rate = 0.1, batch_size = 128}
  • 40. MLlib Components • Learning algorithms • Implement model = fit(train_df, ) • Metric used to measure model’s goodness on validation data • e.g. BinaryClassificationEvaluator computes accuracy 40put your #assignedhashtag here by setting the e.g. learning rate Estimators Evaluators params
  • 41. Model Selection (Parameter Search) 41put your #assignedhashtag here by setting the Estimators Evaluators paramsMap of cv = CrossValidator( estimator=pipeline, estimatorParamMaps=paramGrid, evaluator=BinaryClassificationEvaluator()) best_model = cv.fit(train_df) paramGrid = ( ParamGridBuilder() .addGrid(hashingTF.numFeatures, [10, 100]) .addGrid(lr.regParam, [0.1, 0.01]) .build() )
  • 42. Deep Learning Estimators • Transfer learning Pipelines • KerasImageFileEstimator
  • 43. Deep Learning Estimators • Transfer learning Pipelines • KerasImageFileEstimator
  • 44. Distributed Tuning Transfer Learning Logistic Regressio n + ResNet50 … Logistic Regressio n + InceptionV3 Spark driver
  • 45. pipeline = Pipeline([DeepImageFeaturizer(), LogisticRegression()]) paramGrid = ( ParamGridBuilder() .addGrid(modelName=[“ “, “ “]) ) cv = CrossValidator(estimator= pipeline, estimatorParamMaps=paramGrid, evaluator=BinaryClassificationEvaluator(), numFolds=3) best_model = cv.fit(train_df) Distributed Tuning Transfer Learning ResNet50 InceptionV3
  • 46. Deep Learning Estimators • Transfer learning Pipelines • KerasImageFileEstimator
  • 47. Keras 47 model = Sequential() model.add(Dense(32, input_dim=784)) model.add(Activation('relu')) • A popular, declarative interface to build DL models • High level, expressive API in python • Executes on TensorFlow, Theano, CNTK
  • 48. model = Sequential() model.add(...) model.save(model_filename) estimator = KerasImageFileEstimator( kerasOptimizer=“adam“, kerasLoss=“categorical_crossentropy“, kerasFitParams={“batch_size“:100}, modelFile=model_filename) model = model.fit(dataframe) 48 Keras Estimator
  • 49. model = Sequential() model.add(...) model.save(model_filename) estimator = KerasImageFileEstimator( kerasOptimizer=“adam“, kerasLoss=“categorical_crossentropy“, kerasFitParams={“batch_size“:100}, modelFile=model_filename) model = model.fit(dataframe) 49 Keras Estimator in Model Selection estimator = KerasImageFileEstimator( kerasOptimizer=“adam“, kerasLoss=“categorical_crossentropy“) paramGrid = ( ParamGridBuilder() .addGrid(kerasFitParams=[{“batch_size“:100}, {“batch_size“:200}]) .addGrid(modelFile=[model1, model2]) )
  • 50. 50 Keras Estimator in Model Selection estimator = KerasImageFileEstimator( kerasOptimizer=“adam“, kerasLoss=“categorical_crossentropy“) paramGrid = ( ParamGridBuilder() .addGrid(kerasFitParams=[{“batch_size“:100}, {“batch_size“:200}]) .addGrid(modelFile=[model1, model2]) ) cv = CrossValidator(estimator=estimator, estimatorParamMaps=paramGrid, evaluator=BinaryClassificationEvaluator(), numFolds=3) best_model = cv.fit(train_df)
  • 51. Deep Learning Pipelines • Load data • Interactive work • Train • Evaluate model • Apply
  • 52. Deep Learning Pipelines • Load data • Interactive work • Train • Evaluate model • Apply Spark SQL Batch prediction s
  • 53. Deep Learning Pipelines • Load data • Interactive work • Train • Evaluate model • Apply Spark SQL Batch prediction
  • 54. Batch prediction as an MLlib Transformer • Recall a model is a Transformer in MLlib predictor = XXTransformer(inputCol="image", outputCol=”predictions", modelSpecification={…}) predictions = predictor.transform(test_df)
  • 55. Hierarchy of DL transformers for images 55 TFImageTransformer KerasImageTransformer DeepImageFeaturizerDeepImagePredictor NamedImageTransformer(model_name) (keras.Model) (tf.Graph) Input (Image) Output (vector)
  • 56. Hierarchy of DL transformers for images 56 TFImageTransformer KerasImageTransformer DeepImageFeaturizerDeepImagePredictor NamedImageTransformer(model_name) (keras.Model) (tf.Graph) Input (Image) Output (vector) model_namemodel_name Keras.Model tf.Graph
  • 57. Hierarchy of DL transformers 57 TFImageTransformer KerasImageTransformer DeepImageF eaturizer DeepImageP redictor NamedImageTransformer ) Input (Image ) Output (vector ) model_namemodel_name Keras.Model tf.Graph TFTextTransformer KerasTextTransformer DeepTextFeat urizer DeepTextPre dictor NamedTextTransformerInput (Text) Output (vector) model_namemodel_name Keras.Model tf.Graph TFTransformer
  • 58. 58 TFImageTransformer Defined by tf.Graph 1. Convert ImageSchema data into a vector 2. Use tensorframes to • Efficiently distribute tf.Graph to workers • Apply the graph to the partitioned data 3. Return the result as a spark.ml.Vector
  • 59. Aside: high performance with Spark • TensorFlow (and other frameworks) have 2 mechanisms to ingest data: • memory-based API (tensors) • file-based API (Queue, …) • DLP makes all data transfers in memory • Spark responsible for reading and assembling data • Low-level transformations handled by TensorFrames 59put your #assignedhashtag here by setting the t
  • 60. High performance with Spark • Every DL transform is a TensorFlow graph 60put your #assignedhashtag here by setting the x: int32 y: int32 mul 3 z
  • 61. High performance with Spark • Every DL transform is a TensorFlow graph • Applied on each of the rows of the dataframe 61 output_df: DataFrame[x: int, y: int, z: int] x: int32 y: int32 mul 3 z df: DataFrame[x: int, y: int]
  • 62. Example: running batch inference Spark driver Spark worker Deep Learning Pipelines Broadcast (protocol buffers)TensorFlow Program (tf.Graph) TensorFlow runtime (C++) Worker JVM Driver JVM
  • 63. Deep Learning Pipelines • Load data • Interactive work • Train • Evaluate model • Apply Spark SQL Batch prediction s
  • 64. Shipping predictors in SQL Take a trained model / Pipeline, register a SQL UDF usable by anyone in the organization In Spark SQL: registerKerasUDF(”my_object_recognition_function", keras_model_file="/mymodels/007model.h5") select image, my_object_recognition_function(image) as objects from traffic_imgs This means you can apply deep learning models in streaming!
  • 65. Almost done 65put your #assignedhashtag here by setting the
  • 66. Deep Learning Pipelines : Future In progress • Scala API for DeepImageFeaturizer • Text featurization (embeddings) • TFTransformer for arbitrary vectors Future • Distributed training • Support for more backends, e.g. MXNet, PyTorch, BigDL
  • 67. Deep Learning without Deep Pockets • Simple API for Deep Learning, integrated with MLlib • Scales common tasks with transformers and estimators • Embeds Deep Learning models in MLlib and SparkSQL • Check out https://github.com/databricks/spark-deep- learning !
  • 68. Questions? Thank you! 68put your #assignedhashtag here by setting the
  • 69. Resources Blog posts & webinars (http://databricks.com/blog) • Deep Learning Pipelines • GPU acceleration in Databricks • BigDL on Databricks • Deep Learning and Apache Spark Docs for Deep Learning on Databricks (http://docs.databricks.com) • Getting started • Deep Learning Pipelines Example • Spark integration

Notas do Editor

  1. Sue Ann is not only the product manager for this effort, but also the author of this libarry. Joining her is also …. ->
  2. TIM End-to-end workflow with DL Pipelines: Data ingestion Featurization Training Cross-validation and evaluation Model deployment Deep Learning easily and at scale: the vision Challenges Overview of a DL pipeline How Spark can help: starting from an example Processing images with DLP the API and an example some tips for Databricks users Building simple Deep Learning models with transfer learning What is transfer learning? Example from demo Trying different combinations. Tips: memory, GPU, etc. Integration with SQL Why is it useful? Example with Keras
  3. It’s a machine learning technique after all… TODO: might want to show GAN / cat generators as well
  4. So mostly large companies like google and facebook have been making strides in deep learning and solving complex problems. TODO: give a sense of how long it takes to train / use these models
  5. Still requires a lot of effort with low-level APIs
  6. Let’s look at a typical deep learning workflow
  7. SUE ANN
  8. Most existing frameworks are low level
  9. Like any ML workflow With an emphasis on the models often being used for feature transform
  10. Like any ML workflow With an emphasis on the models often being used for feature transform
  11. TIM
  12. TODO: probably animate
  13. - Image tasks were the first success stories in deep learning. - We have no native support for images in Spark, so we added one in DL Pipelines. - Numpy arrays – since most deep learning libraries support Python, they tend to work with numpy arrays
  14. TODO: probably animate
  15. There are a bunch of pre-trained deep learning models in the wild. We put them inside a Mllib Transformer called “DeepImagePredictor” for images. Without having to write any tensorflow code, user can try applying the model to their data. And underneath, the transformer is using Spark to massively scale out the prediction. Focus on image classification for now.
  16. This is great for looking at what’s in the data set, or if your problem at hand matches exactly what the pre-trained model provides. But this is rarely the case in real life, and most of the time you need to train your own model. Unfortunately, training a deep learning model often takes days or weeks.
  17. SUE ANN
  18. To talk about how this transfer learning workflow is done in Deep Learning pipelines, we need a short primer on MLlib.
  19. TIM
  20. For general hyperparameter tuning, MLlib offers …
  21. For general hyperparameter tuning, MLlib offers …
  22. KerasImageFileEstimator for when you want to train a model for a task which the transfer learning workflow isn’t satisfactory, e.g. There is no great model to transfer learn off of Want to try getting better accuracy by re-training or fine-tuning a model with your own data
  23. TODO: probably animate
  24. SUE ANN
  25. TODO: probably animate
  26. TIM
  27. SUE ANN
  28. Deep learning in the hands of everyone. Currently we have SQL UDF creation for Keras models
  29. Plz shoot me a feature jira