SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Image Caption Generation: Intro to Distributed
Tensorflow and Distributed Scoring with Apache Spark
Luca Grazioli, Data Scientist @ ICTEAM
Data Science Milan, 15th May 2017
2 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Agenda
 Who am I
 ICTeam Big Data Lab
 What’s Deep Learning?
 Deep Learning Challenges
 Tensorflow
 Distributed Tensorflow
 Image caption generation
 Distributed scoring with APACHE SPARK
3 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Who am I
MSc computer science
• University of Milan-Bicocca
• Definition of a Knowledge Engineer ML model
Academic research
• Modeling and understanding time-evolving scenario
(http://www.iiisci.org/journal/CV$/sci/pdfs/SA268SN15.pdf)
Data Scientist @ ICTeam
• Big Data Science
• Data Engineering (a bit!)
• Deep Learning
More at: http://luca-grazioli.it or on Linkedin
4 ICTeam S.p.A. – Presentazione della Divisione Progettazione
GPU NODE 2
ICTeam Big Data Lab
BIG DATA CLUSTER
CLUSTER
NODE 1
CLUSTER
NODE 2
CLUSTER
NODE 3
CLUSTER
NODE 4
EDGE NODE
WEB CLIENT TOOLS
GPU NODE 2
GPU NODE 1
5 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Deep Learning
Credit by Lukas Masuch
6 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Deep Learning
Computer vision
Natural Language
processing
Speech recognition
7 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Deep Learning Challenges
Data Volume
CPU usage
Graph complexity
Parameter Space
8 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Tensorflow
x W
Matmul
b
Add
RELU
C import tensorflow as tf
b = tf.Variable(tf.zeros(100)
x = tf.placeholder(name=‘x’)
W = tf.variable(. . .)
regr = tf.matmul(W, x) + b
relu = tf.nn.relu(regr)
C = [. . .]
# Session
s = tf.Session()
for step in xrange(0, 10):
input = . . .
result = s.run(C, feed_dict={x:
input})
…
[...] Tensorflow takes
computation described
[...] and maps it onto a
wide variety of
different HW
platform, ranging from
[…] mobile device
platforms such as
Android and iOS to […]
large scale
computing systems
[...]
9 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed TF: concepts
Multi-device
execution
Data
Parallel
Training
In-graph
processing
Between-
graph
processing
Model
parallel
training
Model
computation
pipelining
10 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed TF: concepts
Multi-device
execution
Credits: http://download.tensorflow.org/paper/whitepaper2015.pdf
Multi-device
execution
Data Parallel
Training
In-graph
processing
Between-
graph
processing
Model
parallel
training
Model
computation
pipelining
11 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed TF: concepts
Model
parallel
training
Credits: http://download.tensorflow.org/paper/whitepaper2015.pdf
Multi-device
execution
Data Parallel
Training
In-graph
processing
Between-
graph
processing
Model
parallel
training
Model
computation
pipelining
12 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed TF: concepts
Model
computation
pipelining
Credits: http://download.tensorflow.org/paper/whitepaper2015.pdf
Multi-device
execution
Data Parallel
Training
In-graph
processing
Between-
graph
processing
Model
parallel
training
Model
computation
pipelining
13 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed TF: concepts
In-graph Processing
Client
Worker 1
CPU:0 GPU:0
Worker 2
CPU:0
PARAMETER SERVERS
PS1 PS2 PS3
Between-graph Processing
Client 1
Worker 1.1
CPU:0 GPU:0
Worker 1.2
CPU:0
PARAMETER SERVERS
PS1 PS2 PS3
Client 2
Worker 2.1
CPU:0 GPU:0
CPU:0
Worker 2.2
14 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Layer 1
Layer 2
Data
shard 2
Layer 1
Layer 2
Layer 1’
Layer 2’
Data
shard 1
Data
shard 4
Layer 1’
Layer 2’
Between-graph Processing
Data
shard 3
Client 1
Worker 1.1 Worker 1.2
PARAMETER SERVERS
PS1 PS2 PS3
Client 2
Worker 2.1
Worker 2.2
Layer 1
Layer 2
Layer 1
Layer 2
Data
shard 3
Data
shard 1
In-graph Processing
Client
Worker 1 Worker 2
PARAMETER SERVERS
PS1 PS2 PS3
Data
shard 2
Data
shard 4
Distributed TF: concepts
15 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Image caption generation
Credits: https://github.com/tensorflow/models/tree/master/im2txt http://press.liacs.nl/mirflickr/
• A couple of dogs standing next to each
other.
• A couple of dogs are standing in a field.
• A couple of dogs standing next to each
other on a field.
• A scenic view of a lake with mountains in the
background.
• A scenic view of a lake with mountains in the
distance.
• A scenic view of a lake with a mountain in the
background.
• A city street at night with traffic lights.
• A city street at night with a red light.
• A city street at night with a red light.
Try it
yourself!
http://bit.ly
/2r8jU1q
16 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Image caption generation
17 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed scoring with APACHE SPARK
Phase1 - Ingestion
File
Syste
m
Phase 2 – Distributed Scoring
Data
Node
. . .
CLUSTER EDGE
NODE
SPARK
DRIVER
Data
Node
18 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Distributed scoring with APACHE SPARK
Read images
• images_df = spark.read.parquet('/user/lgrazioli/flickrTestImageBin/')
Define
Scoring function
• Restore last training checkpoint
• Define iterator function to yield scored record from a partition
Let’s score!
• scored_sample_rdd = images_df.rdd.mapPartitions(score_partition).flatMap(lambda
x: x)
• scored_df = spark.createDataFrame(scored_sample_rdd, schema)
19 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Conclusion
Today’s goals:
• Understand Deep Learning technological challanges
• How to distribute a deep learning training algorithm
• How to score in distributed fashion
• How a big data ecosystem can help
Future works:
• Tensorframes https://github.com/databricks/tensorframes
• New technologies (e.g. TPU )
• Tensorflow improvements
• High-level API
20 ICTeam S.p.A. – Presentazione della Divisione Progettazione
Bibliography
1. Deep Learning - The Past, Present and Future of Artificial
Intelligence (Lukas Masuch)
2. TensorFlow: Large-Scale Machine Learning on Heterogeneous
Distributed Systems (Martin Abadi, Ashish Agarwal, et al.)
3. https://github.com/tensorflow/models/tree/master/im2txt
4. http://press.liacs.nl/mirflickr/
Thanks!

Mais conteúdo relacionado

Mais procurados

SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...Impetus Technologies
 
Dataiku data science studio
Dataiku data science studioDataiku data science studio
Dataiku data science studioNorman Poh
 
Webinar - Patient Readmission Risk
Webinar - Patient Readmission RiskWebinar - Patient Readmission Risk
Webinar - Patient Readmission RiskTuri, Inc.
 
MATLAB Simulink Research Help
MATLAB Simulink Research HelpMATLAB Simulink Research Help
MATLAB Simulink Research HelpPhD Direction
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...MLconf
 
Machinel Learning with spark
Machinel Learning with spark Machinel Learning with spark
Machinel Learning with spark Ons Dridi
 
Driverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.aiDriverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.aiSri Ambati
 
TensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsTensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsSeldon
 
CD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systemsCD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systemsSeldon
 
Hadoop Mapreduce Projects
Hadoop Mapreduce ProjectsHadoop Mapreduce Projects
Hadoop Mapreduce ProjectsPhD Direction
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeIdo Shilon
 
Best Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkBest Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkDatabricks
 
DeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoTDeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoTRomeo Kienzler
 
IBM Middle East Data Science Connect 2016 - Doha, Qatar
IBM Middle East Data Science Connect 2016 - Doha, QatarIBM Middle East Data Science Connect 2016 - Doha, Qatar
IBM Middle East Data Science Connect 2016 - Doha, QatarRomeo Kienzler
 
Adopting software design practices for better machine learning
Adopting software design practices for better machine learningAdopting software design practices for better machine learning
Adopting software design practices for better machine learningMLconf
 
International Journal of Computer Science, Engineering and Information Techn...
International Journal of Computer Science, Engineering and  Information Techn...International Journal of Computer Science, Engineering and  Information Techn...
International Journal of Computer Science, Engineering and Information Techn...ijcseit
 
MATLAB PhD Research Thesis Guidance
MATLAB PhD Research Thesis GuidanceMATLAB PhD Research Thesis Guidance
MATLAB PhD Research Thesis GuidanceMatlab Simulation
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorchgeetachauhan
 
Basic Data Engineering
Basic Data EngineeringBasic Data Engineering
Basic Data EngineeringNovita Sari
 

Mais procurados (20)

SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
 
Dataiku data science studio
Dataiku data science studioDataiku data science studio
Dataiku data science studio
 
Webinar - Patient Readmission Risk
Webinar - Patient Readmission RiskWebinar - Patient Readmission Risk
Webinar - Patient Readmission Risk
 
MATLAB Simulink Research Help
MATLAB Simulink Research HelpMATLAB Simulink Research Help
MATLAB Simulink Research Help
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
 
Machinel Learning with spark
Machinel Learning with spark Machinel Learning with spark
Machinel Learning with spark
 
Driverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.aiDriverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.ai
 
TensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsTensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative models
 
CD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systemsCD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systems
 
Hadoop Mapreduce Projects
Hadoop Mapreduce ProjectsHadoop Mapreduce Projects
Hadoop Mapreduce Projects
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
TigerGraph.js
TigerGraph.jsTigerGraph.js
TigerGraph.js
 
Best Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkBest Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache Spark
 
DeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoTDeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoT
 
IBM Middle East Data Science Connect 2016 - Doha, Qatar
IBM Middle East Data Science Connect 2016 - Doha, QatarIBM Middle East Data Science Connect 2016 - Doha, Qatar
IBM Middle East Data Science Connect 2016 - Doha, Qatar
 
Adopting software design practices for better machine learning
Adopting software design practices for better machine learningAdopting software design practices for better machine learning
Adopting software design practices for better machine learning
 
International Journal of Computer Science, Engineering and Information Techn...
International Journal of Computer Science, Engineering and  Information Techn...International Journal of Computer Science, Engineering and  Information Techn...
International Journal of Computer Science, Engineering and Information Techn...
 
MATLAB PhD Research Thesis Guidance
MATLAB PhD Research Thesis GuidanceMATLAB PhD Research Thesis Guidance
MATLAB PhD Research Thesis Guidance
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorch
 
Basic Data Engineering
Basic Data EngineeringBasic Data Engineering
Basic Data Engineering
 

Semelhante a Deep Learning - Luca Grazioli, ICTEAM

The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...ITCamp
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCFacultad de Informática UCM
 
Creating a Machine Learning Model on the Cloud
Creating a Machine Learning Model on the CloudCreating a Machine Learning Model on the Cloud
Creating a Machine Learning Model on the CloudAlexander Al Basosi
 
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea SaltarelloAzure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea SaltarelloITCamp
 
Resume-Rohit_Vijay_Bapat_December_2016
Resume-Rohit_Vijay_Bapat_December_2016Resume-Rohit_Vijay_Bapat_December_2016
Resume-Rohit_Vijay_Bapat_December_2016Rohit Bapat
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCinside-BigData.com
 
Third Gen Production ML Architectures: Lessons from History, Experiences with...
Third Gen Production ML Architectures: Lessons from History, Experiences with...Third Gen Production ML Architectures: Lessons from History, Experiences with...
Third Gen Production ML Architectures: Lessons from History, Experiences with...M Waleed Kadous
 
Sailing the V: Engineering digitalization through task automation and reuse i...
Sailing the V: Engineering digitalization through task automation and reuse i...Sailing the V: Engineering digitalization through task automation and reuse i...
Sailing the V: Engineering digitalization through task automation and reuse i...CARLOS III UNIVERSITY OF MADRID
 
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...Databricks
 
Creating a custom Machine Learning Model for your applications - Java Dev Day...
Creating a custom Machine Learning Model for your applications - Java Dev Day...Creating a custom Machine Learning Model for your applications - Java Dev Day...
Creating a custom Machine Learning Model for your applications - Java Dev Day...Isabel Palomar
 
License Plate Recognition System using Python and OpenCV
License Plate Recognition System using Python and OpenCVLicense Plate Recognition System using Python and OpenCV
License Plate Recognition System using Python and OpenCVVishal Polley
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectMatthew Gerring
 
Network Analyzer and Report Generation Tool for NS-2 using TCL Script
Network Analyzer and Report Generation Tool for NS-2 using TCL ScriptNetwork Analyzer and Report Generation Tool for NS-2 using TCL Script
Network Analyzer and Report Generation Tool for NS-2 using TCL ScriptIRJET Journal
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Iulian Pintoiu
 
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAMSparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAMData Science Milan
 

Semelhante a Deep Learning - Luca Grazioli, ICTEAM (20)

The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPC
 
Creating a Machine Learning Model on the Cloud
Creating a Machine Learning Model on the CloudCreating a Machine Learning Model on the Cloud
Creating a Machine Learning Model on the Cloud
 
CV Jens Grunert
CV Jens GrunertCV Jens Grunert
CV Jens Grunert
 
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea SaltarelloAzure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
 
Resume-Rohit_Vijay_Bapat_December_2016
Resume-Rohit_Vijay_Bapat_December_2016Resume-Rohit_Vijay_Bapat_December_2016
Resume-Rohit_Vijay_Bapat_December_2016
 
HPC in higher education
HPC in higher educationHPC in higher education
HPC in higher education
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCC
 
Third Gen Production ML Architectures: Lessons from History, Experiences with...
Third Gen Production ML Architectures: Lessons from History, Experiences with...Third Gen Production ML Architectures: Lessons from History, Experiences with...
Third Gen Production ML Architectures: Lessons from History, Experiences with...
 
Sailing the V: Engineering digitalization through task automation and reuse i...
Sailing the V: Engineering digitalization through task automation and reuse i...Sailing the V: Engineering digitalization through task automation and reuse i...
Sailing the V: Engineering digitalization through task automation and reuse i...
 
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
 
Creating a custom Machine Learning Model for your applications - Java Dev Day...
Creating a custom Machine Learning Model for your applications - Java Dev Day...Creating a custom Machine Learning Model for your applications - Java Dev Day...
Creating a custom Machine Learning Model for your applications - Java Dev Day...
 
License Plate Recognition System using Python and OpenCV
License Plate Recognition System using Python and OpenCVLicense Plate Recognition System using Python and OpenCV
License Plate Recognition System using Python and OpenCV
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science Project
 
Computer graphics by bahadar sher
Computer graphics by bahadar sherComputer graphics by bahadar sher
Computer graphics by bahadar sher
 
Network Analyzer and Report Generation Tool for NS-2 using TCL Script
Network Analyzer and Report Generation Tool for NS-2 using TCL ScriptNetwork Analyzer and Report Generation Tool for NS-2 using TCL Script
Network Analyzer and Report Generation Tool for NS-2 using TCL Script
 
Dsp lab manual 15 11-2016
Dsp lab manual 15 11-2016Dsp lab manual 15 11-2016
Dsp lab manual 15 11-2016
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
 
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAMSparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
 

Mais de Data Science Milan

ML & Graph algorithms to prevent financial crime in digital payments
ML & Graph  algorithms to prevent  financial crime in  digital paymentsML & Graph  algorithms to prevent  financial crime in  digital payments
ML & Graph algorithms to prevent financial crime in digital paymentsData Science Milan
 
How to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plansHow to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plansData Science Milan
 
Robustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsRobustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsData Science Milan
 
"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companiesData Science Milan
 
Question generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AIQuestion generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AIData Science Milan
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSData Science Milan
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureData Science Milan
 
Reinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del PraReinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del PraData Science Milan
 
Time Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del PraTime Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del PraData Science Milan
 
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AILudwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AIData Science Milan
 
Audience projection of target consumers over multiple domains a ner and baye...
Audience projection of target consumers over multiple domains  a ner and baye...Audience projection of target consumers over multiple domains  a ner and baye...
Audience projection of target consumers over multiple domains a ner and baye...Data Science Milan
 
Weak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina KhvatovaWeak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina KhvatovaData Science Milan
 
GANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex HoncharGANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex HoncharData Science Milan
 
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo LomonacoContinual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo LomonacoData Science Milan
 
3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep LearningData Science Milan
 
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...Data Science Milan
 
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...Data Science Milan
 
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data ReplyPricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data ReplyData Science Milan
 
A view of graph data usage by Cerved
A view of graph data usage by CervedA view of graph data usage by Cerved
A view of graph data usage by CervedData Science Milan
 

Mais de Data Science Milan (20)

ML & Graph algorithms to prevent financial crime in digital payments
ML & Graph  algorithms to prevent  financial crime in  digital paymentsML & Graph  algorithms to prevent  financial crime in  digital payments
ML & Graph algorithms to prevent financial crime in digital payments
 
How to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plansHow to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plans
 
Robustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsRobustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning Methods
 
"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies
 
Question generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AIQuestion generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AI
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWS
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
Reinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del PraReinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del Pra
 
Time Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del PraTime Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del Pra
 
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AILudwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
 
Audience projection of target consumers over multiple domains a ner and baye...
Audience projection of target consumers over multiple domains  a ner and baye...Audience projection of target consumers over multiple domains  a ner and baye...
Audience projection of target consumers over multiple domains a ner and baye...
 
Weak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina KhvatovaWeak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina Khvatova
 
GANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex HoncharGANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex Honchar
 
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo LomonacoContinual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
 
3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning
 
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
 
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
 
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data ReplyPricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
 
A view of graph data usage by Cerved
A view of graph data usage by CervedA view of graph data usage by Cerved
A view of graph data usage by Cerved
 

Último

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Último (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Deep Learning - Luca Grazioli, ICTEAM

  • 1. Image Caption Generation: Intro to Distributed Tensorflow and Distributed Scoring with Apache Spark Luca Grazioli, Data Scientist @ ICTEAM Data Science Milan, 15th May 2017
  • 2. 2 ICTeam S.p.A. – Presentazione della Divisione Progettazione Agenda  Who am I  ICTeam Big Data Lab  What’s Deep Learning?  Deep Learning Challenges  Tensorflow  Distributed Tensorflow  Image caption generation  Distributed scoring with APACHE SPARK
  • 3. 3 ICTeam S.p.A. – Presentazione della Divisione Progettazione Who am I MSc computer science • University of Milan-Bicocca • Definition of a Knowledge Engineer ML model Academic research • Modeling and understanding time-evolving scenario (http://www.iiisci.org/journal/CV$/sci/pdfs/SA268SN15.pdf) Data Scientist @ ICTeam • Big Data Science • Data Engineering (a bit!) • Deep Learning More at: http://luca-grazioli.it or on Linkedin
  • 4. 4 ICTeam S.p.A. – Presentazione della Divisione Progettazione GPU NODE 2 ICTeam Big Data Lab BIG DATA CLUSTER CLUSTER NODE 1 CLUSTER NODE 2 CLUSTER NODE 3 CLUSTER NODE 4 EDGE NODE WEB CLIENT TOOLS GPU NODE 2 GPU NODE 1
  • 5. 5 ICTeam S.p.A. – Presentazione della Divisione Progettazione Deep Learning Credit by Lukas Masuch
  • 6. 6 ICTeam S.p.A. – Presentazione della Divisione Progettazione Deep Learning Computer vision Natural Language processing Speech recognition
  • 7. 7 ICTeam S.p.A. – Presentazione della Divisione Progettazione Deep Learning Challenges Data Volume CPU usage Graph complexity Parameter Space
  • 8. 8 ICTeam S.p.A. – Presentazione della Divisione Progettazione Tensorflow x W Matmul b Add RELU C import tensorflow as tf b = tf.Variable(tf.zeros(100) x = tf.placeholder(name=‘x’) W = tf.variable(. . .) regr = tf.matmul(W, x) + b relu = tf.nn.relu(regr) C = [. . .] # Session s = tf.Session() for step in xrange(0, 10): input = . . . result = s.run(C, feed_dict={x: input}) … [...] Tensorflow takes computation described [...] and maps it onto a wide variety of different HW platform, ranging from […] mobile device platforms such as Android and iOS to […] large scale computing systems [...]
  • 9. 9 ICTeam S.p.A. – Presentazione della Divisione Progettazione Distributed TF: concepts Multi-device execution Data Parallel Training In-graph processing Between- graph processing Model parallel training Model computation pipelining
  • 10. 10 ICTeam S.p.A. – Presentazione della Divisione Progettazione Distributed TF: concepts Multi-device execution Credits: http://download.tensorflow.org/paper/whitepaper2015.pdf Multi-device execution Data Parallel Training In-graph processing Between- graph processing Model parallel training Model computation pipelining
  • 11. 11 ICTeam S.p.A. – Presentazione della Divisione Progettazione Distributed TF: concepts Model parallel training Credits: http://download.tensorflow.org/paper/whitepaper2015.pdf Multi-device execution Data Parallel Training In-graph processing Between- graph processing Model parallel training Model computation pipelining
  • 12. 12 ICTeam S.p.A. – Presentazione della Divisione Progettazione Distributed TF: concepts Model computation pipelining Credits: http://download.tensorflow.org/paper/whitepaper2015.pdf Multi-device execution Data Parallel Training In-graph processing Between- graph processing Model parallel training Model computation pipelining
  • 13. 13 ICTeam S.p.A. – Presentazione della Divisione Progettazione Distributed TF: concepts In-graph Processing Client Worker 1 CPU:0 GPU:0 Worker 2 CPU:0 PARAMETER SERVERS PS1 PS2 PS3 Between-graph Processing Client 1 Worker 1.1 CPU:0 GPU:0 Worker 1.2 CPU:0 PARAMETER SERVERS PS1 PS2 PS3 Client 2 Worker 2.1 CPU:0 GPU:0 CPU:0 Worker 2.2
  • 14. 14 ICTeam S.p.A. – Presentazione della Divisione Progettazione Layer 1 Layer 2 Data shard 2 Layer 1 Layer 2 Layer 1’ Layer 2’ Data shard 1 Data shard 4 Layer 1’ Layer 2’ Between-graph Processing Data shard 3 Client 1 Worker 1.1 Worker 1.2 PARAMETER SERVERS PS1 PS2 PS3 Client 2 Worker 2.1 Worker 2.2 Layer 1 Layer 2 Layer 1 Layer 2 Data shard 3 Data shard 1 In-graph Processing Client Worker 1 Worker 2 PARAMETER SERVERS PS1 PS2 PS3 Data shard 2 Data shard 4 Distributed TF: concepts
  • 15. 15 ICTeam S.p.A. – Presentazione della Divisione Progettazione Image caption generation Credits: https://github.com/tensorflow/models/tree/master/im2txt http://press.liacs.nl/mirflickr/ • A couple of dogs standing next to each other. • A couple of dogs are standing in a field. • A couple of dogs standing next to each other on a field. • A scenic view of a lake with mountains in the background. • A scenic view of a lake with mountains in the distance. • A scenic view of a lake with a mountain in the background. • A city street at night with traffic lights. • A city street at night with a red light. • A city street at night with a red light. Try it yourself! http://bit.ly /2r8jU1q
  • 16. 16 ICTeam S.p.A. – Presentazione della Divisione Progettazione Image caption generation
  • 17. 17 ICTeam S.p.A. – Presentazione della Divisione Progettazione Distributed scoring with APACHE SPARK Phase1 - Ingestion File Syste m Phase 2 – Distributed Scoring Data Node . . . CLUSTER EDGE NODE SPARK DRIVER Data Node
  • 18. 18 ICTeam S.p.A. – Presentazione della Divisione Progettazione Distributed scoring with APACHE SPARK Read images • images_df = spark.read.parquet('/user/lgrazioli/flickrTestImageBin/') Define Scoring function • Restore last training checkpoint • Define iterator function to yield scored record from a partition Let’s score! • scored_sample_rdd = images_df.rdd.mapPartitions(score_partition).flatMap(lambda x: x) • scored_df = spark.createDataFrame(scored_sample_rdd, schema)
  • 19. 19 ICTeam S.p.A. – Presentazione della Divisione Progettazione Conclusion Today’s goals: • Understand Deep Learning technological challanges • How to distribute a deep learning training algorithm • How to score in distributed fashion • How a big data ecosystem can help Future works: • Tensorframes https://github.com/databricks/tensorframes • New technologies (e.g. TPU ) • Tensorflow improvements • High-level API
  • 20. 20 ICTeam S.p.A. – Presentazione della Divisione Progettazione Bibliography 1. Deep Learning - The Past, Present and Future of Artificial Intelligence (Lukas Masuch) 2. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems (Martin Abadi, Ashish Agarwal, et al.) 3. https://github.com/tensorflow/models/tree/master/im2txt 4. http://press.liacs.nl/mirflickr/