SlideShare uma empresa Scribd logo
1 de 64
Baixar para ler offline
Deep Learning in
Spark with BigDL
Petar Zečević
petar.zecevic@svgroup.hr
https://hr.linkedin.com/in/pzecevic
Apache Spark Zagreb Meetup group
http://www.meetup.com/Apache-Spark-Zagreb-
Meetup
http://sparkinaction.com
Giving away 3 e-books!
(Come to my talk tomorrow)
40% off all Manning books!
Use the code: ctwbds17
Agenda for today
Brie y about Spark
Brie y about Deep Learning
Di erent options for DL on Spark
Intel BigDL library
Q & A
Show of hands
I've never used Apache Spark
I've played around with it
I'm planning to or I'm already using Spark in
production
Show of hands 2
I'm beginner at deep learning
I've built a few DL models
I build DL models for living
Apache Spark
A distributed data processing engine
Why Spark?
Spark is fast
Simple and concise API
Spark is a unifying platform
Spark has gone mainstream
Basic architecture
Spark API components
About deep learning
Family of machine learning methods
Inspired by functioning of the nervous system
Learning units are organized in layers
Started in the 60's
Again popular due to algorithmic advances and
rise of computing resources
Every month brings new advances (e.g. "capsule
networks")
Deep learning applications
Computer vision
Speech recognition
Natural language processing
Handwriting transcription
Recommendation systems
Better ad targeting
Google Echo, Amazon Alexa
Types of neural networks
Convolutional NNs (CNNs)
Region-based CNNs (R-CNNs)
Single Shot MultiBox Detector (SSDs)
Recurrent NNs (RNNs)
Long short-term memory (LSTMs)
Autoencoders
Generative Adversarial Networks (GANs)
Many other types
General principle
Adapted from Deep Learning with Python by F. Chollet
A typical CNN (LeNet)
Source: Wikipedia
Convolutional layer
Source: Wikipedia
Maxpooling layer
Source: Wikipedia
Fully connected / Dense /
Linear layer
Source: Wikipedia
Sigmoid activation
Source: Wikipedia
ReLU activation
Source: Wikipedia
Convolutional Network
example
AlexNet
(Krizhevsky, Sutskever, Hinton)
VGG
(K. Simonyan and A. Zisserman)
Inception
Inception
Deep learning on Apache
Spark
Available frameworks
Intel BigDL
Tensor ow on Spark
Databricks Deep Learning Pipelines
Ca e on Spark
Elephas (Keras)
MXNet
mmlspark (CNTK)
Eclipse Deeplearning4j
SparkCL
SparkNet
...
About Intel BigDL
Open-sourced in February 2017
Uses Intel MKL for fast computations
Integrated into Spark
No GPU execution
Python and Scala APIs
Load/save Ca e, TF, Torch models
A wide variety of layers, optim methods, loss
functions
BigDL building blocks
BigDL architecture
Starting BigDL in local mode
Add BigDL jar to the classpath
Then...
import com.intel.analytics.bigdl.utils.Engine
System.setProperty("bigdl.localMode", "true")
System.setProperty("bigdl.coreNumber", 8)
Engine.init
Starting BigDL on Spark
Add BigDL jar to the classpath (--jars)
Set cmdline parameters (standalone and Mesos):
Set cmdline parameters (YARN):
In your code...
spark-submit --master spark...
--executor-cores --total-executor-cores
spark-submit --master yarn
--executor-cores --num-executors
import com.intel.analytics.bigdl.utils.Engine
val conf = Engine.createSparkConf()
val sc = new SparkContext(conf)
Engine.init
Creating a model
Sequential model:
Graph model:
import com.intel.analytics.bigdl.nn._
val model = Sequential[Float]()
model.add(SpatialConvolution[Float](...))
model.add(Tanh[Float]())
model.add(SpatialMaxPooling[Float](...)
model.add(Sigmoid())
val input = Input[Float]()
val conv = SpatialConvolution[Float](...).inputs(input)
val tanh = Tanh[Float]().inputs(conv)
val maxp = SpatialMaxPooling[Float](...).inputs(tanh)
val sigm = Sigmoid[Float]().inputs(maxp)
val model = Graph(input, sigm)
Example model output
73% dog, 27% cat 82% cat, 18% d
Example model
val model = Sequential[Float]()
model.add(SpatialConvolution[Float](3, 32, 3, 3, 1, 1, 1, 1).
setInitMethod(Xavier, Xavier))
model.add(ReLU(true))
model.add(SpatialMaxPooling[Float](kW=2, kH=2, dW=2, dH=2).ceil())
model.add(SpatialConvolution[Float](32, 64, 3, 3, 1, 1, 1, 1).
setInitMethod(Xavier, Xavier))
model.add(ReLU(true))
model.add(SpatialMaxPooling[Float](kW=2, kH=2, dW=2, dH=2).ceil())
model.add(SpatialConvolution[Float](64, 128, 3, 3, 1, 1, 1, 1).
setInitMethod(Xavier, Xavier))
model.add(ReLU(true))
model.add(SpatialMaxPooling[Float](kW=2, kH=2, dW=2, dH=2).ceil())
Example model - continued
model.add(SpatialConvolution[Float](128, 128, 3, 3, 1, 1, 1, 1).
setInitMethod(Xavier, Xavier))
model.add(ReLU(true))
model.add(SpatialMaxPooling[Float](kW=2, kH=2, dW=2, dH=2).ceil())
model.add(View(128*7*7))
modelv2.add(Dropout(0.4))
model.add(Linear[Float](inputSize=128*7*7, outputSize=512).
setInitMethod(Xavier, Xavier))
model.add(ReLU(true))
model.add(Linear(inputSize=512, outputSize=1))
model.add(Sigmoid())
Preparing the data - of cial
example
But... load is a private method!
val trainSet = DataSet.array(load(trainData, trainLabel)) ->
SampleToGreyImg(28, 28) ->
GreyImgNormalizer(trainMean, trainStd) ->
GreyImgToBatch(batchSize)
Preparing the data -
transformers
Preparing the data
val bytes:RDD[Sample[Float]] = sc.binaryFiles(folder).
map(pathbytes => {
val buffImage = ImageIO.read(pathbytes._2.open())
BGRImage.resizeImage(buffImage, SCALE_WIDTH, SCALE_HEIGHT
}).map(b =>
new LabeledBGRImage().copy(b, 255f).setLabel(label)
).mapPartitions(iter =>
new BGRImgToSample()(iter)
)
Create an optimizer
val optimizer = Optimizer(module,
trainRdd,
BCECriterion[Float](),
batchSize)
optimizer.setEndWhen(Trigger.maxEpoch(10))
optimizer.setOptimMethod(new Adam[Float](1e-4))
optimizer.setValidation(Trigger.severalIteration(10),
testRdd,
Array(new Loss[Float](new BCECriterion[Float]),
new Top1Accuracy[Float]),
batchSize)
Tensorboard visualisation
setup
val trainSumm = TrainSummary("/tensorboard/logdir", "train")
val testSumm = ValidationSummary("/tensorboard/logdir", "test
optimizer.setTrainSummary(trainSumm)
optimizer.setValidationSummary(testSumm)
//start the optimization process:
val trainedModule = optimizer.optimize()
Optimization running
[Epoch 2 18432/20000][Iteration 2 67][Wall Clock 888.091331139s]
Trained 144 records in 4.064710098 seconds.
Throughput is 35.42688 records/second. Loss is 0.6683233.
========== Metrics Summary ==========
get weights average : 0.2731059603333333 s
computing time average : 0.742136533 s
send weights average : 0.004483678833333333 s
put gradient : 0.0018473921666666668 s
aggregate gradient time : 0.004468877833333333 s
aggregrateGradientParition average executor : 0.4345159966666667 s
compute weight average : 0.006117520333333333 s
get weights for each node : 0.03519228 0.03964764 0.027415565 0.040467617
computing time for each node : 0.550181791 0.765139897 0.894009244 0.89169 1
=====================================
DEBUG DistriOptimizer$: Dropped modules: 0
Optimization running
[Wall Clock 857.896149222s] Validate model...
Loss is (Loss: 80.587006, count: 126, Average Loss: 0.6395794)
Top1Accuracy is Accuracy(correct: 634, count: 1000, accuracy: 0.634)
Tensorboard output - accuracy
Tensorboard output - loss
Tensorboard output - loss
Data augmentation
def readAugmentedSamples(folder:String, label:Float,
scaleHeight:Int=96, scaleWidth:Int=128,
includeOriginal:Boolean=true, flip:Boolean=false,
minRotate:Int=0, maxRotate:Int=40, rotatedInstances:Int=0,
minShear:Double=0, maxShear:Double=0.2, shearedInstances:Int=0,
minZoom:Double=0, maxZoom:Double=0.2, zoomedInstances:Int=0,
minTranslate:Int=0, maxTranslate:Int=0, translatedInstances:Int=0) :
RDD[Array[Byte]] { ... }
Data augmentation
var (resModule, resOptim) = runOptimizations(model, None,
trainCats.union(trainDogs), testCats.union(testDogs), 24*6, 2, 1)
var optimizedModule : Module[Float] = resModule
var optimMethod : Option[OptimMethod[Float]] = Some(resOptim)
for(c <- 1 to 20) {
trainCats.unpersist()
trainDogs.unpersist()
trainCats = readSamplesFromHDFSImages(...)
trainDogs = readSamplesFromHDFSImages(...)
val (mod, optim) = runOptimizations(optimizedModule, optimMethod,
trainCats.union(trainDogs), testCats.union(testDogs), 24*6, 2, 1)
optimizedModule = mod
optimMethod = Some(optim)
}
Tensorboard output - accuracy
Tensorboard output - loss
Using the model
trainedModule.saveModule(path)
val quantizedModel = trainedModule.quantize()
val validPredicts = quantizedModel.predict(validationSet)
validPredicts.filter(a => a.toTensor[Float].value > 0.5).coun
quantizedModel.evaluate(validationSet,
Array(new Loss[Float](new BCECriterion[Float]),
new Top1Accuracy[Float]),
batchSize)
Transfer learning - model
freeze
Spark ML integration
Spark ML integration
val dataSet = DataSet.rdd(byteRecordRdd) ->
BytesToBGRImg(normalize=255f) ->
BGRImgToBatch(batchSize, toRGB = false)
val rdd = dataSet.asInstanceOf[DistributedDataSet[MiniBatch[Float]]].
data(false).map(batch => {
val feature = batch.getInput().asInstanceOf[Tensor[Float]]
val labels = batch.getTarget().asInstanceOf[Tensor[Float]]
(feature.storage().array(), labels.storage().array())
})
spark.createDataFrame(rdd).toDF("features", "labels")
Spark ML integration
val criterion = BCECriterion[Float]()
val featureSize = Array(3, 100, 100)
val estimator = new DLClassifier[Float](model, criterion, featureSize).
setFeaturesCol("features").
setLabelCol("labels").
setBatchSize(24*6).
setLearningRate(1e-4).
setMaxEpoch(20).
setOptimMethod(new Adam[Float](1e-4))
val dlmodel:DLModel[Float] = estimator.fit(trainSet)
Spark ML integration
Can be used inside Spark ML Pipelines
But... no access to Optimizer
no validation
no visualization
not really useful yet
BigDL Performance
benchmarks
(https://software.intel.com/en-us/mkl/features/benchmarks)
Conclusion
+ Interesting and clever concept
+ Good engineering
+ Well optimized code
+ Lots of layers, optim methods etc.
- Missing GPU support
- Illogical package/class naming choices
- API debug and data conversion options
- Documentation could be better
Giving away 3 e-books!
Come to the lecture tomorrow
40% off all Manning books!
Use the code: ctwbds17
Questions ?
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017

Mais conteúdo relacionado

Mais procurados

Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...Databricks
 
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Which Is Deeper - Comparison Of Deep Learning Frameworks On SparkSpark Summit
 
Fast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesFast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesDataWorks Summit
 
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...Spark Summit
 
A Deep Learning use case for water end use detection by Roberto Díaz and José...
A Deep Learning use case for water end use detection by Roberto Díaz and José...A Deep Learning use case for water end use detection by Roberto Díaz and José...
A Deep Learning use case for water end use detection by Roberto Díaz and José...Big Data Spain
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySparkSpark Summit
 
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...Databricks
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationshadooparchbook
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache SparkJen Aman
 
Apache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and PresentApache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and PresentDatabricks
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Databricks
 
Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...
 Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark... Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...
Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...Databricks
 
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...Spark Summit
 
Lambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie StricklandLambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie StricklandSpark Summit
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Databricks
 
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Spark Summit
 
Spark Meetup TensorFrames
Spark Meetup TensorFramesSpark Meetup TensorFrames
Spark Meetup TensorFramesJen Aman
 
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael GreeneUnleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael GreeneDatabricks
 
A Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanA Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanDatabricks
 
GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014StampedeCon
 

Mais procurados (20)

Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
 
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 
Fast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesFast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL Releases
 
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
 
A Deep Learning use case for water end use detection by Roberto Díaz and José...
A Deep Learning use case for water end use detection by Roberto Díaz and José...A Deep Learning use case for water end use detection by Roberto Díaz and José...
A Deep Learning use case for water end use detection by Roberto Díaz and José...
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySpark
 
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 
Apache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and PresentApache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and Present
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
 
Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...
 Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark... Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...
Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...
 
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
 
Lambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie StricklandLambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie Strickland
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
 
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
 
Spark Meetup TensorFrames
Spark Meetup TensorFramesSpark Meetup TensorFrames
Spark Meetup TensorFrames
 
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael GreeneUnleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
 
A Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanA Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen Fan
 
GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014
 

Semelhante a Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017

Azure machine learning service
Azure machine learning serviceAzure machine learning service
Azure machine learning serviceRuth Yakubu
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Databricks
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with SparkRoger Rafanell Mas
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafkaNitin Kumar
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacketCsw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacketCanSecWest
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streamingAdam Doyle
 
Deep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionDeep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionEmanuele Bezzi
 
I want my model to be deployed ! (another story of MLOps)
I want my model to be deployed ! (another story of MLOps)I want my model to be deployed ! (another story of MLOps)
I want my model to be deployed ! (another story of MLOps)AZUG FR
 
Unsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at ScaleUnsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at ScaleAaron (Ari) Bornstein
 
Introduction to Spark ML
Introduction to Spark MLIntroduction to Spark ML
Introduction to Spark MLHolden Karau
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterDatabricks
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceLviv Startup Club
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQLYousun Jeong
 
Spark Meetup TensorFrames
Spark Meetup TensorFramesSpark Meetup TensorFrames
Spark Meetup TensorFramesJen Aman
 
A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wyciekówKonrad Kokosa
 
High Performance Django 1
High Performance Django 1High Performance Django 1
High Performance Django 1DjangoCon2008
 
High Performance Django
High Performance DjangoHigh Performance Django
High Performance DjangoDjangoCon2008
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak PROIDEA
 

Semelhante a Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017 (20)

Azure machine learning service
Azure machine learning serviceAzure machine learning service
Azure machine learning service
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacketCsw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streaming
 
Deep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionDeep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an Introduction
 
I want my model to be deployed ! (another story of MLOps)
I want my model to be deployed ! (another story of MLOps)I want my model to be deployed ! (another story of MLOps)
I want my model to be deployed ! (another story of MLOps)
 
Unsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at ScaleUnsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at Scale
 
Introduction to Spark ML
Introduction to Spark MLIntroduction to Spark ML
Introduction to Spark ML
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Spark Meetup TensorFrames
Spark Meetup TensorFramesSpark Meetup TensorFrames
Spark Meetup TensorFrames
 
A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wycieków
 
High Performance Django 1
High Performance Django 1High Performance Django 1
High Performance Django 1
 
High Performance Django
High Performance DjangoHigh Performance Django
High Performance Django
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
 

Mais de Big Data Spain

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data Spain
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Big Data Spain
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017Big Data Spain
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Big Data Spain
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Big Data Spain
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Big Data Spain
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Big Data Spain
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Big Data Spain
 
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...Big Data Spain
 
Deep reinforcement learning : Starcraft learning environment by Gema Parreño ...
Deep reinforcement learning : Starcraft learning environment by Gema Parreño ...Deep reinforcement learning : Starcraft learning environment by Gema Parreño ...
Deep reinforcement learning : Starcraft learning environment by Gema Parreño ...Big Data Spain
 

Mais de Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
 
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...
 
Deep reinforcement learning : Starcraft learning environment by Gema Parreño ...
Deep reinforcement learning : Starcraft learning environment by Gema Parreño ...Deep reinforcement learning : Starcraft learning environment by Gema Parreño ...
Deep reinforcement learning : Starcraft learning environment by Gema Parreño ...
 

Último

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017

  • 1.
  • 2. Deep Learning in Spark with BigDL Petar Zečević petar.zecevic@svgroup.hr https://hr.linkedin.com/in/pzecevic
  • 3. Apache Spark Zagreb Meetup group http://www.meetup.com/Apache-Spark-Zagreb- Meetup
  • 4. http://sparkinaction.com Giving away 3 e-books! (Come to my talk tomorrow)
  • 5. 40% off all Manning books! Use the code: ctwbds17
  • 6. Agenda for today Brie y about Spark Brie y about Deep Learning Di erent options for DL on Spark Intel BigDL library Q & A
  • 7. Show of hands I've never used Apache Spark I've played around with it I'm planning to or I'm already using Spark in production
  • 8. Show of hands 2 I'm beginner at deep learning I've built a few DL models I build DL models for living
  • 9. Apache Spark A distributed data processing engine
  • 10. Why Spark? Spark is fast Simple and concise API Spark is a unifying platform Spark has gone mainstream
  • 13. About deep learning Family of machine learning methods Inspired by functioning of the nervous system Learning units are organized in layers Started in the 60's Again popular due to algorithmic advances and rise of computing resources Every month brings new advances (e.g. "capsule networks")
  • 14. Deep learning applications Computer vision Speech recognition Natural language processing Handwriting transcription Recommendation systems Better ad targeting Google Echo, Amazon Alexa
  • 15. Types of neural networks Convolutional NNs (CNNs) Region-based CNNs (R-CNNs) Single Shot MultiBox Detector (SSDs) Recurrent NNs (RNNs) Long short-term memory (LSTMs) Autoencoders Generative Adversarial Networks (GANs) Many other types
  • 16. General principle Adapted from Deep Learning with Python by F. Chollet
  • 17. A typical CNN (LeNet) Source: Wikipedia
  • 20. Fully connected / Dense / Linear layer Source: Wikipedia
  • 25. VGG (K. Simonyan and A. Zisserman)
  • 28. Deep learning on Apache Spark
  • 29. Available frameworks Intel BigDL Tensor ow on Spark Databricks Deep Learning Pipelines Ca e on Spark Elephas (Keras) MXNet mmlspark (CNTK) Eclipse Deeplearning4j SparkCL SparkNet ...
  • 30. About Intel BigDL Open-sourced in February 2017 Uses Intel MKL for fast computations Integrated into Spark No GPU execution Python and Scala APIs Load/save Ca e, TF, Torch models A wide variety of layers, optim methods, loss functions
  • 33. Starting BigDL in local mode Add BigDL jar to the classpath Then... import com.intel.analytics.bigdl.utils.Engine System.setProperty("bigdl.localMode", "true") System.setProperty("bigdl.coreNumber", 8) Engine.init
  • 34. Starting BigDL on Spark Add BigDL jar to the classpath (--jars) Set cmdline parameters (standalone and Mesos): Set cmdline parameters (YARN): In your code... spark-submit --master spark... --executor-cores --total-executor-cores spark-submit --master yarn --executor-cores --num-executors import com.intel.analytics.bigdl.utils.Engine val conf = Engine.createSparkConf() val sc = new SparkContext(conf) Engine.init
  • 35. Creating a model Sequential model: Graph model: import com.intel.analytics.bigdl.nn._ val model = Sequential[Float]() model.add(SpatialConvolution[Float](...)) model.add(Tanh[Float]()) model.add(SpatialMaxPooling[Float](...) model.add(Sigmoid()) val input = Input[Float]() val conv = SpatialConvolution[Float](...).inputs(input) val tanh = Tanh[Float]().inputs(conv) val maxp = SpatialMaxPooling[Float](...).inputs(tanh) val sigm = Sigmoid[Float]().inputs(maxp) val model = Graph(input, sigm)
  • 36. Example model output 73% dog, 27% cat 82% cat, 18% d
  • 37. Example model val model = Sequential[Float]() model.add(SpatialConvolution[Float](3, 32, 3, 3, 1, 1, 1, 1). setInitMethod(Xavier, Xavier)) model.add(ReLU(true)) model.add(SpatialMaxPooling[Float](kW=2, kH=2, dW=2, dH=2).ceil()) model.add(SpatialConvolution[Float](32, 64, 3, 3, 1, 1, 1, 1). setInitMethod(Xavier, Xavier)) model.add(ReLU(true)) model.add(SpatialMaxPooling[Float](kW=2, kH=2, dW=2, dH=2).ceil()) model.add(SpatialConvolution[Float](64, 128, 3, 3, 1, 1, 1, 1). setInitMethod(Xavier, Xavier)) model.add(ReLU(true)) model.add(SpatialMaxPooling[Float](kW=2, kH=2, dW=2, dH=2).ceil())
  • 38. Example model - continued model.add(SpatialConvolution[Float](128, 128, 3, 3, 1, 1, 1, 1). setInitMethod(Xavier, Xavier)) model.add(ReLU(true)) model.add(SpatialMaxPooling[Float](kW=2, kH=2, dW=2, dH=2).ceil()) model.add(View(128*7*7)) modelv2.add(Dropout(0.4)) model.add(Linear[Float](inputSize=128*7*7, outputSize=512). setInitMethod(Xavier, Xavier)) model.add(ReLU(true)) model.add(Linear(inputSize=512, outputSize=1)) model.add(Sigmoid())
  • 39. Preparing the data - of cial example But... load is a private method! val trainSet = DataSet.array(load(trainData, trainLabel)) -> SampleToGreyImg(28, 28) -> GreyImgNormalizer(trainMean, trainStd) -> GreyImgToBatch(batchSize)
  • 40. Preparing the data - transformers
  • 41. Preparing the data val bytes:RDD[Sample[Float]] = sc.binaryFiles(folder). map(pathbytes => { val buffImage = ImageIO.read(pathbytes._2.open()) BGRImage.resizeImage(buffImage, SCALE_WIDTH, SCALE_HEIGHT }).map(b => new LabeledBGRImage().copy(b, 255f).setLabel(label) ).mapPartitions(iter => new BGRImgToSample()(iter) )
  • 42. Create an optimizer val optimizer = Optimizer(module, trainRdd, BCECriterion[Float](), batchSize) optimizer.setEndWhen(Trigger.maxEpoch(10)) optimizer.setOptimMethod(new Adam[Float](1e-4)) optimizer.setValidation(Trigger.severalIteration(10), testRdd, Array(new Loss[Float](new BCECriterion[Float]), new Top1Accuracy[Float]), batchSize)
  • 43. Tensorboard visualisation setup val trainSumm = TrainSummary("/tensorboard/logdir", "train") val testSumm = ValidationSummary("/tensorboard/logdir", "test optimizer.setTrainSummary(trainSumm) optimizer.setValidationSummary(testSumm) //start the optimization process: val trainedModule = optimizer.optimize()
  • 44. Optimization running [Epoch 2 18432/20000][Iteration 2 67][Wall Clock 888.091331139s] Trained 144 records in 4.064710098 seconds. Throughput is 35.42688 records/second. Loss is 0.6683233. ========== Metrics Summary ========== get weights average : 0.2731059603333333 s computing time average : 0.742136533 s send weights average : 0.004483678833333333 s put gradient : 0.0018473921666666668 s aggregate gradient time : 0.004468877833333333 s aggregrateGradientParition average executor : 0.4345159966666667 s compute weight average : 0.006117520333333333 s get weights for each node : 0.03519228 0.03964764 0.027415565 0.040467617 computing time for each node : 0.550181791 0.765139897 0.894009244 0.89169 1 ===================================== DEBUG DistriOptimizer$: Dropped modules: 0
  • 45. Optimization running [Wall Clock 857.896149222s] Validate model... Loss is (Loss: 80.587006, count: 126, Average Loss: 0.6395794) Top1Accuracy is Accuracy(correct: 634, count: 1000, accuracy: 0.634)
  • 49. Data augmentation def readAugmentedSamples(folder:String, label:Float, scaleHeight:Int=96, scaleWidth:Int=128, includeOriginal:Boolean=true, flip:Boolean=false, minRotate:Int=0, maxRotate:Int=40, rotatedInstances:Int=0, minShear:Double=0, maxShear:Double=0.2, shearedInstances:Int=0, minZoom:Double=0, maxZoom:Double=0.2, zoomedInstances:Int=0, minTranslate:Int=0, maxTranslate:Int=0, translatedInstances:Int=0) : RDD[Array[Byte]] { ... }
  • 50. Data augmentation var (resModule, resOptim) = runOptimizations(model, None, trainCats.union(trainDogs), testCats.union(testDogs), 24*6, 2, 1) var optimizedModule : Module[Float] = resModule var optimMethod : Option[OptimMethod[Float]] = Some(resOptim) for(c <- 1 to 20) { trainCats.unpersist() trainDogs.unpersist() trainCats = readSamplesFromHDFSImages(...) trainDogs = readSamplesFromHDFSImages(...) val (mod, optim) = runOptimizations(optimizedModule, optimMethod, trainCats.union(trainDogs), testCats.union(testDogs), 24*6, 2, 1) optimizedModule = mod optimMethod = Some(optim) }
  • 53. Using the model trainedModule.saveModule(path) val quantizedModel = trainedModule.quantize() val validPredicts = quantizedModel.predict(validationSet) validPredicts.filter(a => a.toTensor[Float].value > 0.5).coun quantizedModel.evaluate(validationSet, Array(new Loss[Float](new BCECriterion[Float]), new Top1Accuracy[Float]), batchSize)
  • 54. Transfer learning - model freeze
  • 56. Spark ML integration val dataSet = DataSet.rdd(byteRecordRdd) -> BytesToBGRImg(normalize=255f) -> BGRImgToBatch(batchSize, toRGB = false) val rdd = dataSet.asInstanceOf[DistributedDataSet[MiniBatch[Float]]]. data(false).map(batch => { val feature = batch.getInput().asInstanceOf[Tensor[Float]] val labels = batch.getTarget().asInstanceOf[Tensor[Float]] (feature.storage().array(), labels.storage().array()) }) spark.createDataFrame(rdd).toDF("features", "labels")
  • 57. Spark ML integration val criterion = BCECriterion[Float]() val featureSize = Array(3, 100, 100) val estimator = new DLClassifier[Float](model, criterion, featureSize). setFeaturesCol("features"). setLabelCol("labels"). setBatchSize(24*6). setLearningRate(1e-4). setMaxEpoch(20). setOptimMethod(new Adam[Float](1e-4)) val dlmodel:DLModel[Float] = estimator.fit(trainSet)
  • 58. Spark ML integration Can be used inside Spark ML Pipelines But... no access to Optimizer no validation no visualization not really useful yet
  • 60. Conclusion + Interesting and clever concept + Good engineering + Well optimized code + Lots of layers, optim methods etc. - Missing GPU support - Illogical package/class naming choices - API debug and data conversion options - Documentation could be better
  • 61. Giving away 3 e-books! Come to the lecture tomorrow
  • 62. 40% off all Manning books! Use the code: ctwbds17