Boost Fertility New Invention Ups Success Rates.pdf
Deep Learning - Luca Grazioli, ICTEAM
1. Image Caption Generation: Intro to Distributed
Tensorflow and Distributed Scoring with Apache Spark
Luca Grazioli, Data Scientist @ ICTEAM
Data Science Milan, 15th May 2017
2. 2 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Agenda
Who am I
ICTeam Big Data Lab
What’s Deep Learning?
Deep Learning Challenges
Tensorflow
Distributed Tensorflow
Image caption generation
Distributed scoring with APACHE SPARK
3. 3 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Who am I
MSc computer science
• University of Milan-Bicocca
• Definition of a Knowledge Engineer ML model
Academic research
• Modeling and understanding time-evolving scenario
(http://www.iiisci.org/journal/CV$/sci/pdfs/SA268SN15.pdf)
Data Scientist @ ICTeam
• Big Data Science
• Data Engineering (a bit!)
• Deep Learning
More at: http://luca-grazioli.it or on Linkedin
4. 4 ICTeam S.p.A. – Presentazione della Divisione Progettazione
GPU NODE 2
ICTeam Big Data Lab
BIG DATA CLUSTER
CLUSTER
NODE 1
CLUSTER
NODE 2
CLUSTER
NODE 3
CLUSTER
NODE 4
EDGE NODE
WEB CLIENT TOOLS
GPU NODE 2
GPU NODE 1
5. 5 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Deep Learning
Credit by Lukas Masuch
6. 6 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Deep Learning
Computer vision
Natural Language
processing
Speech recognition
7. 7 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Deep Learning Challenges
Data Volume
CPU usage
Graph complexity
Parameter Space
8. 8 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Tensorflow
x W
Matmul
b
Add
RELU
C import tensorflow as tf
b = tf.Variable(tf.zeros(100)
x = tf.placeholder(name=‘x’)
W = tf.variable(. . .)
regr = tf.matmul(W, x) + b
relu = tf.nn.relu(regr)
C = [. . .]
# Session
s = tf.Session()
for step in xrange(0, 10):
input = . . .
result = s.run(C, feed_dict={x:
input})
…
[...] Tensorflow takes
computation described
[...] and maps it onto a
wide variety of
different HW
platform, ranging from
[…] mobile device
platforms such as
Android and iOS to […]
large scale
computing systems
[...]
9. 9 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed TF: concepts
Multi-device
execution
Data
Parallel
Training
In-graph
processing
Between-
graph
processing
Model
parallel
training
Model
computation
pipelining
10. 10 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed TF: concepts
Multi-device
execution
Credits: http://download.tensorflow.org/paper/whitepaper2015.pdf
Multi-device
execution
Data Parallel
Training
In-graph
processing
Between-
graph
processing
Model
parallel
training
Model
computation
pipelining
11. 11 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed TF: concepts
Model
parallel
training
Credits: http://download.tensorflow.org/paper/whitepaper2015.pdf
Multi-device
execution
Data Parallel
Training
In-graph
processing
Between-
graph
processing
Model
parallel
training
Model
computation
pipelining
12. 12 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed TF: concepts
Model
computation
pipelining
Credits: http://download.tensorflow.org/paper/whitepaper2015.pdf
Multi-device
execution
Data Parallel
Training
In-graph
processing
Between-
graph
processing
Model
parallel
training
Model
computation
pipelining
15. 15 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Image caption generation
Credits: https://github.com/tensorflow/models/tree/master/im2txt http://press.liacs.nl/mirflickr/
• A couple of dogs standing next to each
other.
• A couple of dogs are standing in a field.
• A couple of dogs standing next to each
other on a field.
• A scenic view of a lake with mountains in the
background.
• A scenic view of a lake with mountains in the
distance.
• A scenic view of a lake with a mountain in the
background.
• A city street at night with traffic lights.
• A city street at night with a red light.
• A city street at night with a red light.
Try it
yourself!
http://bit.ly
/2r8jU1q
17. 17 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed scoring with APACHE SPARK
Phase1 - Ingestion
File
Syste
m
Phase 2 – Distributed Scoring
Data
Node
. . .
CLUSTER EDGE
NODE
SPARK
DRIVER
Data
Node
18. 18 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed scoring with APACHE SPARK
Read images
• images_df = spark.read.parquet('/user/lgrazioli/flickrTestImageBin/')
Define
Scoring function
• Restore last training checkpoint
• Define iterator function to yield scored record from a partition
Let’s score!
• scored_sample_rdd = images_df.rdd.mapPartitions(score_partition).flatMap(lambda
x: x)
• scored_df = spark.createDataFrame(scored_sample_rdd, schema)
19. 19 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Conclusion
Today’s goals:
• Understand Deep Learning technological challanges
• How to distribute a deep learning training algorithm
• How to score in distributed fashion
• How a big data ecosystem can help
Future works:
• Tensorframes https://github.com/databricks/tensorframes
• New technologies (e.g. TPU )
• Tensorflow improvements
• High-level API
20. 20 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Bibliography
1. Deep Learning - The Past, Present and Future of Artificial
Intelligence (Lukas Masuch)
2. TensorFlow: Large-Scale Machine Learning on Heterogeneous
Distributed Systems (Martin Abadi, Ashish Agarwal, et al.)
3. https://github.com/tensorflow/models/tree/master/im2txt
4. http://press.liacs.nl/mirflickr/