Machine learning and deep learning techniques are present in Java through various libraries. Deep learning allows neural networks to learn from vast amounts of data through multilayer architectures like convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The talk discussed several popular Java libraries that support both traditional machine learning algorithms and deep learning models, including DL4J, TensorFlow, Keras, and H2O. It provided examples of training deep learning models on MNIST and CIFAR10 datasets in DL4J and compared performance between DL4J and TensorFlow.
Invezz.com - Grow your wealth with trading signals
Sjug #26 ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
1. Yes,
Machine Learning is present in Java,
but is Deep Learning too?
Tomasz Sikora
SJUG #26, Katowice, 2018-03-23
2. How to present DL at JUG, for technical audience?
What would be the most important to You?
Dilemma
3.
4. From Machine Learning to Deep Learning
Multilayer
Neural Networks
learning from vast
amounts of data
Algorithms whose
output improve as
they are exposed
to more data
A program that can
sense, reason, act
and adapt
Intelligence
Explosion
(Good, 1965)
6. From Programming to Building Model
Computer
Computer
Traditional
Programming
Data
Program
Machine
Learning
Data
Result
Result
Program/Model
(1) complex task or amount of data
(2) rules difficult to define or huge program
7. Categories of Machine Learning
Supervised,
the algorithm has training data with
a known expected output.
Unsupervised,
the algorithm identifies patterns in
the data without being told the
expected outcome.
Reinforcement Learning,
the algorithm learns from
interactions with the environment,
using trial-and-error and memorizes
strategy for further improvement.
Anomaly detection,
analyzes patterns.
Classification, a set of
data is given, and your
answer is one of the
pieces of data (discrete
target).
Regression, used to
find numbers (numeric
value, continuous
target).
Clustering, used if we
need to know about
structure; forms groups
to interpret the data.
Reinforcement, used
when a decision needs
to be made based on
past experience and the
environment.
8. Name Licence Short Algorithms Other, ANN
WEKA GPLv3 Collection of ML
algorithms for
DM
Classification, Regression, Clustering, Assoc Rules,
Cross-validation, Bayesian Networks, Ensemble
Learning, Visualization, Deep Learning
MLP, and wrapper to
DL4J
H2O Apache
2.0
Distributed and
scalable ML and
predictive
analytics
platform
Deep Learning, Distributed Random Forest, Generalized
Linear Model, Gradient Boosting Machine (GBM), Naïve
Bayes Classifier, Stacked Ensembles, XGBoost,
Generalized Low Rank Models, K-Means Clustering,
Principal Component Analysis
- MLP, RNN, CNN
- Deep Water: TF, Caffe,
MXNet
- Sparkling Water (for
Spark)
MOA GPLv3 Mining data
streams
Unsupervised methods in Cluster Analysis and Outlier
Detection, Decision Trees, Meta Classifiers, Naive Bayes
Weka
ELKI AGPLv3 Clustering and Outlier Detection
MLlib
(Spark)
Apache
2.0
Apache Spark's
scalable ML
library
Distributed Linear Algebra, SVD, PCA, Logistic
Regression, Naive Bayes, Generalized Linear
Regression, Decision trees, Random Forests,
Gradient-boosted trees, Clustering, K-means, Gaussian
Apache Spark's scalable
machine learning library
“Traditional” ML in Java, part 1
(many libraries & algorithms for similar tasks)
9. Name Licence Short Algorithms Other
Mahout Apache 2.0 Java libs for distributed /
scalable ML algorithms
Distributed Linear Algebra, SVD, PCA, Collaborative Filtering,
Canopy Clustering and Classification on to of Hadoop using
map/reduce
Apache Hadoop, Spark,
Flink and H2O
YALE GNU Affero RapidMiner Linear Algebra, PCA, Clustering, ... Extended as a proprietary
software
Shogun GPLv3 General ML Binary and Multiclass Classifier, Regressors, Random Forest,
SVM, Clustering, ...
NNs
JDMP LGPLv3 Data mining and ML Java Data Mining Package, a Library for Machine Learning
and Big Data Analytics
Yooreka Apache 2.0 General ML Clustering, Classification, Bayesian, Decision trees, Neural
Networks, Collaborative filtering
NNs
SAMOA Apache
Incubator
distributed streaming ML
algorithms
multiple DSPEs framework that contains a programing
abstraction for distributed streaming ML algorithms
DSPEe, such as Apache
Storm, Apache S4, and
Apache Samza
Java-ML GPLv2 Java API Java API with a collection of machine learning algorithms
“Traditional” ML in Java, part 2
(many libraries & algorithms for similar tasks)
10. “Traditional” NN
(No GPU/CUDA Support)
Name (Leader) License Architectures and Training Other
Neuroph
(Zoran Severac)
Apache 2.0 - Perceptron, Adaline, Multi Layer Perceptron,
Hopfield network, Bidirectional, Associative
Memory, Kohonen network, Hebbian network,
Maxnet, Competitive network, Instar
Outstar, RBF network, Neuro Fuzzy Reasoner
- Backpropagation, Momentum on Resilient
Propagation...
CNNs!
Encog
(Jeff Heaton)
Apache 2.0 - Perceptron, Adaline, Adaptive Resonance
Theory 1 (ART1), Bidirectional Associative
Memory (BAM), Boltzmann Machine,
Counterpropagation NN (CPN), Elman Recurrent
NN, Hopfield Neural Network, Jordan Recurrent
NN, Radial Basis Function Network, Recurrent
Self Organizing Map (RSOM), Self Organizing
Map (Kohonen)
- Backpropagation, Resilient Propagation, Genetic
Algorithm Training...
Neuroevolution of
augmenting
topologies, NEAT
and HyperNEAT
Bayesian Networks,
Hidden Markov
Models and Support
Vector Machines.
11. ML vs DL Performance and Scale
(Andrew NG, 2016)
Performance
Data
Traditional Algo ML
Shallow NN
Medium NN
Deep NN
13. Deep Learning Area
Focus is on end-to-end:
Vision: image --> object/face --> caption/person
image --> ????? --> caption/person
Audio: wave --> phonem --> transcript
wave --> ????? --> transcript
Instead of human/designer guidance, we need
lots of labeled data
Natural language processing (NLP): english -->
polish (spoken language understanding)
Market segmentation, i.e. predict if customer will
respond to a promotion
16. Supervised Learning, Steps
Prepare the data: Get
the raw data and
structure it.
Train the model: Use the
data and train the
model.
Test the model with
some test data; do the
model fitting and test it
again.
Deploy the model: Once
satisfied with the model,
deploy it to use.
Validate: Review the
success of the model
applied to real
conditions
Training set: Train
model (60% training)
Cross-validation: 20%
Test set: Test the
model (20% test)
17. Pursuit of Good Generalisation...
Error
Model Complexity
Test Sample
Underfitting Overfitting
BestGeneralization
Training Sample
Cross Validation
Error
Training
Error
18. Use cases - we will NOT be focusing on details of...
Training details and backpropagation
Hyperparameters tuning
Activation functions - Sigmoid, Tanh, ReLU, Maxout, ELU...
Architectures - http://www.asimovinstitute.org/neural-network-zoo/
Fighting with Vanishing Gradient Problem
Testing approaches - Sampling, KFold Cross-Validation
Regularisation L1, L2 avoiding overfitting during training, adaptive learning rate, rate
annealing, momentum training, dropout, checkpointing, and grid search enable high
predictive accuracy...
25. Canx Bookings Predictor
Data Set - 10yrs of oper, ~2.3M samples, 1.1GB csv
PoC 2m of oper, 22 attributes, R nn --> NN acc. was 92% (whilst LM was 84%)
352 booking and pax attributes --> sparse matrix of 1386 elements
TF+Keras, 3.5h learning on AWS t2.large --> NN acc. was 97.2%
27. Main Platforms - Big Fight (and Firms)
Name / Site Licence Written In Interfaces NN Notes
DL4J
(Skymind)
Apache 2.0 Java, C++ Java, Scala, Clojure, Kotlin,
Python (Keras)
CNN, RNN,
LSTM
ND4J, Hadoop,
Spark
TensorFlow
(Google Brain)
Apache 2.0 Python,
C++
Python (Keras), C/C++, Java, Go,
R
CNN, RNN,
LSTM
H20 DW
Theano
(U Montreal)
BSD Python Python (Keras) CNN, RNN,
LSTM
H20 DW
Keras (François
Chollet, Google)
MIT Python Python, R Interface to TS,
MXNet, Theano
TensorFlow,
Theano, MXNet
Caffe (U Berkeley),
Caffe 2 (FB)
BSD,
Apache 2.0
C++ Python CNN, RNN,
LSTM
CaffeOnSpark
(Yahoo), H20 DW
DAAL (Intel) Apache 2.0 Python,
Java, C++
Python, C++, Java, R, Matlab Hadoop, Spark
MXNet (Apache) Apache 2.0 C++ C++, Python, Scala, Julia, Matlab,
JavaScript, Go, R, Perl
CNN, LSTM AWS, H20 DW
Torch BSD C, Lua C, Lua CNN, RNN,
LSTM
See PyTorch
28. Benchmarks
Name Desc A K 2016 Libs
MNIST-10 MNIST database of
handwritten digits, available
from this page, has a training
set of 60k examples (subset of
larger NIST), and a test set of
10k pics. The digits have been
size-normalized and centered
in a fixed-size 28x28 image.
https://cs.stanford.edu/pe
ople/karpathy/convnetjs/
demo/mnist.html
DL4J
https://github.com/deepl
earning4j/dl4j-examples
Keras
https://github.com/keras
-team/keras/tree/master
/examples
TS
https://www.tensorflow.
org/tutorials/layers
https://github.com/h2oai
/h2o-3/tree/master/exa
mples/deeplearning/not
ebooks
CIFAR CIFAR-10 dataset consists of
60k 32x32 colour images in 10
classes, with 6k images per
class. There are 50k training
images and 10k test images.
Run with 100 epochs training.
https://cs.stanford.edu/pe
ople/karpathy/convnetjs/
demo/cifar10.html
37. Benchmarks
Name Desc DL4J (CPU) TF + Keras (CPU)
MNIST-10 MNIST database of
handwritten digits, available
from this page, has a training
set of 60k examples (subset of
larger NIST), and a test set of
10k pics. The digits have been
size-normalized and centered
in a fixed-size 28x28 image.
MLP(h:1x1k) - 241s,
acc: 0.9729
MLP(h:2x500) - 193s,
acc: 0.9808
CNN (l6) - 126s, acc: 0.9917
LeNet - 100s, acc: 0.9750
MLP(h:1x1k) - 172s,
acc: 0.9827
MLP(h:2x500) - 182s,
acc: 0.9835
CNN (l6) - 860s,
acc: 0.9955
CIFAR CIFAR-10 dataset consists of
60k 32x32 colour images in 10
classes, with 6k images per
class. There are 50k training
images and 10k test images.
Run with 100 epochs training.
CNN AlexNet
(c64c64m,c96c96m,c128c12
8m,d1024d1024s)
- 180ks, acc: 0.4568
CNN
(c32c32m,c64c64m,d512s)
- 90ks, acc: 0.3437
CNN AlexNet
(c64c64m,c96c96m,c128c1
28m,d1024d1024s)
- 9.9ks, acc: 0.4313
CNN
(c32c32m,c64c64m,d512s)
- 2950s, acc: 0.4616
38. Deep Learning in H20
H20
https://htmlpreview.github.io/?https://github.com/ledell/sldm4-h2o/blob/master/sld
m4-deeplearning-h2o.html
43. General Tips ‘n Tricks
Always use the simplest architecture for a problem
Data prep is key!
Reduce feature set -- Covariance and PCA
The more layers the more features you can manage (dense MLP) but prune weights
Train and validate with test dataset --- use cross validation method
Tune tune tune ;) --- or use hyper-parameter optimization
Experiment with other platforms -- Integrate
Often we did not get to E2E DL yet!
49. What Java Dev can use DL for ?
!@#!@#!@#!@# Pre trained models http://pretrained.ml/
https://github.com/fchollet/deep-learning-models
Architectures
https://www.slideshare.net/xavigiro/deep-learning-architectures-d2l2-insightdcu-m
achine-learning-workshop-2017
http://www.asimovinstitute.org/neural-network-zoo/
Performance Management