SlideShare uma empresa Scribd logo
1 de 39
2019 HPCC
Systems®
Community Day
Challenge Yourself –
Challenge the Status Quo
Robert Kennedy, PhD Candidate at Florida Atlantic University
Taghi M. Khoshgoftaar, PhD | Advisor
Timothy Humphrey | LexisNexis Mentor
Expanding HPCC Systems Deep Neural Network
Capabilities
Overview
• Both topics covered here are a result from my Summer Internship
• Work is available on GitHub
• Tool for creating “Standard” HPCC Systems Platform Virtual Machines
• Hyper-V, AWS, Azure, VirtualBox, etc…
• https://github.com/xwang2713/cloud-image-build
• In addition, used for creating NVIDIA GPU Enabled VMs (AWS AMI)
• Started a GPU Enabled Deep Learning Bundle
• Demonstrating GPU accelerated Deep Learning on HPCC Systems
• https://github.com/hpcc-systems/GPU-Deep-Learning
GPU Accelerated HPCC Systems | Robert Kennedy 2
HPCC Systems on Hyper-V
• Used Packer.io to generate machine images
• To create a Hyper-V Image:
• https://github.com/xwang2713/cloud-image-build/tree/master/packer/hyper-v
• Hyper-V VMs can be used similarly to the VirtualBox VMs you might already be
using
• Hyper-V Images build locally, on a Hyper-V enabled machine
• Installed programs list can be easily modified in a .JSON format
• HPCC Systems Platform running on Hyper-V allows for Docker Desktop
(windows) use
• Docker Desktop uses Hyper-V and Hyper-V and VirtualBox can’t run
concurrently
GPU Accelerated HPCC Systems | Robert Kennedy 3
Config File
• Packer.io uses .json file as config
• Defines network (ex. for VirtualBox)
• Defines size of machine (for cloud
providers)
• Config defines which software to be
installed via standard Linux
commands
GPU Accelerated HPCC Systems | Robert Kennedy 4
GPU Enabled Virtual Machines
• Using the same tool, GPU enabled VMs can be created
• Cloud images build in cloud, local images build locally
• This work supports the use of Python 3.6, CUDA 10.0, TensorFlow 1.14, and
PyTorch 1.1
• AWS GPU Instances:
• K80s, V100s
• Azure GPU Instances:
• K80s [12 gigs vram]
• V100s [16 gigs vram] (with and without NVLink)
• P100s [16 gigs vram]
GPU Accelerated HPCC Systems | Robert Kennedy 5
Bundle
Implementation
HPCC Systems and GPU Accelerated Deep Learning
• Current HPCC Systems are CPU only, and so is its DL runtimes
• My previous work was with Distributed DL on HPCC Systems using only
CPUs
• Traditional HPCC Systems use commodity computers connected via standard
network protocols
• With respect to Deep Learning, this presents a large communication bottle
neck, partly due to its iterative nature
• Graphics Processing Units (GPU) are used to decrease the computation time for
Neural Networks
• Single or Multiple GPUs are connected to the CPU (central node) via much
faster hardware connections
• A new bundle was started to enable GPU accelerated Deep Learning on HPCC
Systems Platform
GPU Accelerated HPCC Systems | Robert Kennedy 7
GPU Accelerated Deep Learning
• With this bundle, you can train NN models on the GPU
• Sprayed data is used as training data
• Bundle is in its infancy, but you can build, train, and use neural networks
• Using only ECL
• Using ECL and Python, allows for more customized NN architectures and
training routines
• A trained model (either in ECL or ECL+Python) can be used to predict on sprayed
data
• It returns its predictions via records in a one-hot-encoded format
GPU Accelerated HPCC Systems | Robert Kennedy 8
Bundle Implementation Overview
• Current work uses only one Thor node
• Single Thor node still can use multiple GPUs
• ECL/HPCC Systems handles the data storage and execution of the NN runtimes
• The implementation is uses data parallelism across one ore more GPUs
• Currently limited to only a single physical computer
• The pyembed plugin allows for Python to run on HPCC Systems Platform
• We use Python 3, as Python 2 is nearing EOL
• Python code handles the NN training and interfaces with GPUs directly using
NVIDIA’s CUDA language
GPU Accelerated HPCC Systems | Robert Kennedy 9
TensorFlow | Keras
• The Python code is in the form of
TensorFlow
• TensorFlow
• Google’s Popular Deep Learning
Library
• Keras
• Deep Learning Library API – uses
TensorFlow or other ‘backend’
• Much less code to produce same
model
10
Artificial Neural
Networks
Biological Neuron
• Basis for artificial neural networks
• Such as the ones in deep learning
• Dendrites
• Input vector, from previous
neurons
• Weights
• Soma
• Summation Function
• Axon
• Activation Function
• A neuron 'fires” when there is enough
of an input stimulus
GPU Accelerated HPCC Systems | Robert Kennedy 12
Dendrite
Axon
Soma
Artificial Neuron
• First concept in 1943
• Inputs of the neuron are the outputs
of the previous layer’s neurons
• The input weights are summed with a
bias
• Then passed into an activation
function
• Activation Functions are like the
biological neurons ‘deciding’ to fire
• ReLu activation – gives output x if
x>0, and outputs 0, if x<0, where x is
the input
GPU Accelerated HPCC Systems | Robert Kennedy 13
A Fully Connected Network
• Fully Connected Network
• Each neuron is connected to
every neuron in the subsequent
layer
• Neural Network Visualization
• 2 hidden layers, fully connected, 3
class classification output
• Multi-Layer Perceptron is an example
GPU Accelerated HPCC Systems | Robert Kennedy 14
Neural Network Training
• Forwardpropagation
• Backpropagation
• Optimize Model with respect to Loss
Function
• Quantification of how “right or wrong” the
model for any given datum
• Gradient Descent
• Stochastic Gradient Descent (SGD)
• Mini-batch SGD
• Right: visualization of gradient
descent over an example loss
function
GPU Accelerated HPCC Systems | Robert Kennedy 15
Gradient Descent In Action
Where Exactly Do the GPUs Come Into Play?
• Training a NN Model is the most
time-consuming part, this is where
the GPU is used to dramatically
reduce computation time
• Two main training steps
• Forward pass – weights and
errors
• Backward pass – gradients and
weight updates
• Computationally expensive
convolutions are offloaded onto
GPUs
• These steps are done for each data
point, multiple times GPU Accelerated HPCC Systems | Robert Kennedy 16
Parallel Paradigms
• Data Parallelism
• Model Parallelism
• Synchronous and
Asynchronous
• Parallel SGD
GPU Accelerated HPCC Systems | Robert Kennedy 17
Data Parallelism Model Parallelism
Model Parallelism
• Neural Network Model is split across
nodes
• For models larger than a GPU’s
memory
• Requires significantly higher
communication bandwidths between
nodes
• Not well suited for a cluster system
• However, this paradigm is feasible for a
multi-GPU system due to faster hardware
speeds
GPU Accelerated HPCC Systems | Robert Kennedy 18
Data Parallelism
• Data is partitioned and distributed to
nodes
• A singe NN model is replicated onto
each node
• Only weight updates are communicated
and aggregated
• As defined by the specific parallel
training method
• Suitable for parallelizing across multiple
nodes in HPCC Systems cluster or
across GPUs in a single system
• This is the paradigm that is used
GPU Accelerated HPCC Systems | Robert Kennedy 19
Not Your Average HPCC Systems
• Slightly different than traditional HPCC
Systems topologies
• Whole figure represents a single physical
computer and Thor Node
• Parameter Server
• This is the CPU on the system
• Nodes (blue)
• Each node represents a single
physical GPU
• Connections are high speed
hardware
• PCI Express is up to 985 MB/s
per each 16 lanes
• NVLINK is roughly 10x faster
than PCIe Gen 3
GPU Accelerated HPCC Systems | Robert Kennedy 20
Workflow
Example
• We will create a Convolutional Neural Network (CNN) and train on the MNIST
Dataset
• MNIST is a 10-class image classification dataset, handwritten digits 0-9
• The CNN takes 784 pixels as an input (each with range 0-255)
• Two Convolutional Layers
• One fully connected layer with 128 neurons
• 10 Output neurons (one for each class)
• Total of 1,199,882 trainable parameters
• Processing through 720,000 MNIST images
Bundle Usage Example Architecture
GPU Accelerated HPCC Systems | Robert Kennedy 22
Spray MNIST Dataset
• MNIST included in bundle
• Test and Train, 785 fixed length
• 60,000 28x28 grayscale images
• 10,000 28x28 grayscale images
• Both are labeled as one of 10
classes, 0-9
GPU Accelerated HPCC Systems | Robert Kennedy 23
Image Visualization
• Imported RAW MNIST
Data
• Visualization of a single
MNIST image in the
“data” format
• Each pixel has value
between 0-255,
represented as 2-digit
hex numbers
• Each pixel is a feature
GPU Accelerated HPCC Systems | Robert Kennedy 24
Preparing the Data
• Currently, the bundle demonstrates how to train on image data
• Includes Example NN and the example dataset (MNSIT and Fashion
MNIST)
• Training data and labels is molded into a NumPy array with specified shape
before training
• Here, shape is the dimensions of the image
• i.e. the dimensions of the input features
• These get flattened to an array of 784 inputs for 784 input neurons
GPU Accelerated HPCC Systems | Robert Kennedy 25
Creating a CNN – model.add() method
• First, we define the optimizer and its
parameters
• Next, we define the training scheme
• Batch size = 128
• We’ll train for 20 epochs
GPU Accelerated HPCC Systems | Robert Kennedy 26
Creating a CNN – model.add() method
• Next, we define the NN architecture
• Input shape, 28x28x1 grayscale
images
• Initialize the model
• The “nnOutputLayer” is the final layer
and is, at this point, the entire NN
model thus far
GPU Accelerated HPCC Systems | Robert Kennedy 27
• “nnOutputLayer” is passed into model.train() along with hyperparameters and
training data
Train the CNN – model.add() method
GPU Accelerated HPCC Systems | Robert Kennedy 28
GPU:
CPU:
Create CNN – ECL and Python
GPU Accelerated HPCC Systems | Robert Kennedy 29
Example Input and Output
GPU Accelerated HPCC Systems | Robert Kennedy 30
Image Input
One-Hot-Encoded Output
Performance
Performance Evaluation
• A case study was performed to measure the performance improvements
• 5 identical Convolutional Neural Networks are trained on the MNIST dataset
• 10 times each to provide statistical significance
• Measuring the required training time for the same model on same data using fixed
training parameters
• Faster training time is desired
• CPU Alone, 1, 2, 3, and 4 GPUs
• Older K80’s are used
• Newer GPUs will only increase performance and efficiency
• Compared against each other and compared against the “optimal” speed up
• i.e. linear speedup
GPU Accelerated HPCC Systems | Robert Kennedy 32
Performance Boost: GPU vs. CPU
• Time, in seconds, to train a CNN on
MNIST dataset
• Training time speedup is 5.4x
between a Xeon CPU vs a K80 GPU
• Speedup is large, even for a
simple model on small and simple
data
• The training time is measuring NN
training time, not necessarily any
HPCC-specific computations that
would be the same during CPU or
GPU
GPU Accelerated HPCC Systems | Robert Kennedy 33
Performance Boost: CPU vs. GPU vs Optimal Speedup
• Optimal Speed up is linear
• i.e. twice the nodes is twice as fast
• Speedup is not expected to be linear
due to communication overheads
• Results show that additional GPUs
have minimal cost
GPU Accelerated HPCC Systems | Robert Kennedy 34
Conclusion
• Tool used to create HPCC Systems Virtual Images on various new platforms
• Good use case is to create GPU enabled images
• Brief overview of Neural Networks and their optimization
• Demonstrated that GPU accelerated deep learning is possible on HPCC Systems
Platform
• Demonstrated that GPU provides significant performance increase, even on non-
traditional cluster
GPU Accelerated HPCC Systems | Robert Kennedy 35
• Implementing generalizable data loaders
• To allow for a training on data with less knowledge of NumPy (Python)
• Continue adding to the supported methods and ECL modeling functions
• Research and Development on integrating model parallelism
• Research on NN training on multi-node clusters where each node can have one
or more GPUs
Future Work
GPU Accelerated HPCC Systems | Robert Kennedy 36
Links
• GitHub
• https://github.com/hpcc-systems/GPU-Deep-Learning
• https://github.com/xwang2713/cloud-image-build
• NVIDIA CUDA
• https://developer.nvidia.com/cuda-toolkit
• TensorFlow
• https://www.tensorflow.org/
• Keras
• https://keras.io/
• NumPy
• https://numpy.org/
GPU Accelerated HPCC Systems | Robert Kennedy 37
GPU Accelerated HPCC Systems | Robert Kennedy 38
Robert Kennedy
PhD Candidate, Florida Atlantic
University
rkennedy@fau.edu
Questions?
GPU Accelerated HPCC Systems | Robert Kennedy 39
View this presentation on YouTube:
https://www.youtube.com/watch?v=GMt-_Io4Jys&list=PL-
8MJMUpp8IKH5-d56az56t52YccleX5h&index=8&t=0s (4:02)

Mais conteúdo relacionado

Mais procurados

OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
OpenNebula Project
 

Mais procurados (20)

CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016
 
Nova net-or-neutron-atlanta2014.pptx
Nova net-or-neutron-atlanta2014.pptxNova net-or-neutron-atlanta2014.pptx
Nova net-or-neutron-atlanta2014.pptx
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Apache CloudStack: API to UI (STLLUG)
Apache CloudStack: API to UI (STLLUG)Apache CloudStack: API to UI (STLLUG)
Apache CloudStack: API to UI (STLLUG)
 
유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리
 
BACD July 2012 : The Xen Cloud Platform
BACD July 2012 : The Xen Cloud Platform BACD July 2012 : The Xen Cloud Platform
BACD July 2012 : The Xen Cloud Platform
 
Adventures in Research
Adventures in ResearchAdventures in Research
Adventures in Research
 
Introduction to High-Performance Computing (HPC) Containers and Singularity*
Introduction to High-Performance Computing (HPC) Containers and Singularity*Introduction to High-Performance Computing (HPC) Containers and Singularity*
Introduction to High-Performance Computing (HPC) Containers and Singularity*
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
 
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
 
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with CrowbarWicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
 
産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み
 
Networking, QoS, Liberty, Mitaka and Newton - Livnat Peer - OpenStack Day Isr...
Networking, QoS, Liberty, Mitaka and Newton - Livnat Peer - OpenStack Day Isr...Networking, QoS, Liberty, Mitaka and Newton - Livnat Peer - OpenStack Day Isr...
Networking, QoS, Liberty, Mitaka and Newton - Livnat Peer - OpenStack Day Isr...
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learned
 
Thread
ThreadThread
Thread
 
Cloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: OpenstackCloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: Openstack
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
 
Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech day
 

Semelhante a Expanding HPCC Systems Deep Neural Network Capabilities

Semelhante a Expanding HPCC Systems Deep Neural Network Capabilities (20)

Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
Parallel Distributed Deep Learning on HPCC Systems
Parallel Distributed Deep Learning on HPCC SystemsParallel Distributed Deep Learning on HPCC Systems
Parallel Distributed Deep Learning on HPCC Systems
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
 
Deep_Learning_Frameworks_CNTK_PyTorch
Deep_Learning_Frameworks_CNTK_PyTorchDeep_Learning_Frameworks_CNTK_PyTorch
Deep_Learning_Frameworks_CNTK_PyTorch
 
Using Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersUsing Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clusters
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for research
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetes
 
Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern Coprocessors
 
Vinetalk: The missing piece for cluster managers to enable accelerator sharing
Vinetalk: The missing piece for cluster managers to enable accelerator sharingVinetalk: The missing piece for cluster managers to enable accelerator sharing
Vinetalk: The missing piece for cluster managers to enable accelerator sharing
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & Profit
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
NSCC Training Introductory Class
NSCC Training Introductory Class NSCC Training Introductory Class
NSCC Training Introductory Class
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
GPU Algorithms and trends 2018
GPU Algorithms and trends 2018GPU Algorithms and trends 2018
GPU Algorithms and trends 2018
 

Mais de HPCC Systems

Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
HPCC Systems
 

Mais de HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Docker Support
Docker Support Docker Support
Docker Support
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
 
Leveraging HPCC Systems as Part of an Information Security, Privacy, and Comp...
Leveraging HPCC Systems as Part of an Information Security, Privacy, and Comp...Leveraging HPCC Systems as Part of an Information Security, Privacy, and Comp...
Leveraging HPCC Systems as Part of an Information Security, Privacy, and Comp...
 

Último

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 

Último (20)

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 

Expanding HPCC Systems Deep Neural Network Capabilities

  • 1. 2019 HPCC Systems® Community Day Challenge Yourself – Challenge the Status Quo Robert Kennedy, PhD Candidate at Florida Atlantic University Taghi M. Khoshgoftaar, PhD | Advisor Timothy Humphrey | LexisNexis Mentor Expanding HPCC Systems Deep Neural Network Capabilities
  • 2. Overview • Both topics covered here are a result from my Summer Internship • Work is available on GitHub • Tool for creating “Standard” HPCC Systems Platform Virtual Machines • Hyper-V, AWS, Azure, VirtualBox, etc… • https://github.com/xwang2713/cloud-image-build • In addition, used for creating NVIDIA GPU Enabled VMs (AWS AMI) • Started a GPU Enabled Deep Learning Bundle • Demonstrating GPU accelerated Deep Learning on HPCC Systems • https://github.com/hpcc-systems/GPU-Deep-Learning GPU Accelerated HPCC Systems | Robert Kennedy 2
  • 3. HPCC Systems on Hyper-V • Used Packer.io to generate machine images • To create a Hyper-V Image: • https://github.com/xwang2713/cloud-image-build/tree/master/packer/hyper-v • Hyper-V VMs can be used similarly to the VirtualBox VMs you might already be using • Hyper-V Images build locally, on a Hyper-V enabled machine • Installed programs list can be easily modified in a .JSON format • HPCC Systems Platform running on Hyper-V allows for Docker Desktop (windows) use • Docker Desktop uses Hyper-V and Hyper-V and VirtualBox can’t run concurrently GPU Accelerated HPCC Systems | Robert Kennedy 3
  • 4. Config File • Packer.io uses .json file as config • Defines network (ex. for VirtualBox) • Defines size of machine (for cloud providers) • Config defines which software to be installed via standard Linux commands GPU Accelerated HPCC Systems | Robert Kennedy 4
  • 5. GPU Enabled Virtual Machines • Using the same tool, GPU enabled VMs can be created • Cloud images build in cloud, local images build locally • This work supports the use of Python 3.6, CUDA 10.0, TensorFlow 1.14, and PyTorch 1.1 • AWS GPU Instances: • K80s, V100s • Azure GPU Instances: • K80s [12 gigs vram] • V100s [16 gigs vram] (with and without NVLink) • P100s [16 gigs vram] GPU Accelerated HPCC Systems | Robert Kennedy 5
  • 7. HPCC Systems and GPU Accelerated Deep Learning • Current HPCC Systems are CPU only, and so is its DL runtimes • My previous work was with Distributed DL on HPCC Systems using only CPUs • Traditional HPCC Systems use commodity computers connected via standard network protocols • With respect to Deep Learning, this presents a large communication bottle neck, partly due to its iterative nature • Graphics Processing Units (GPU) are used to decrease the computation time for Neural Networks • Single or Multiple GPUs are connected to the CPU (central node) via much faster hardware connections • A new bundle was started to enable GPU accelerated Deep Learning on HPCC Systems Platform GPU Accelerated HPCC Systems | Robert Kennedy 7
  • 8. GPU Accelerated Deep Learning • With this bundle, you can train NN models on the GPU • Sprayed data is used as training data • Bundle is in its infancy, but you can build, train, and use neural networks • Using only ECL • Using ECL and Python, allows for more customized NN architectures and training routines • A trained model (either in ECL or ECL+Python) can be used to predict on sprayed data • It returns its predictions via records in a one-hot-encoded format GPU Accelerated HPCC Systems | Robert Kennedy 8
  • 9. Bundle Implementation Overview • Current work uses only one Thor node • Single Thor node still can use multiple GPUs • ECL/HPCC Systems handles the data storage and execution of the NN runtimes • The implementation is uses data parallelism across one ore more GPUs • Currently limited to only a single physical computer • The pyembed plugin allows for Python to run on HPCC Systems Platform • We use Python 3, as Python 2 is nearing EOL • Python code handles the NN training and interfaces with GPUs directly using NVIDIA’s CUDA language GPU Accelerated HPCC Systems | Robert Kennedy 9
  • 10. TensorFlow | Keras • The Python code is in the form of TensorFlow • TensorFlow • Google’s Popular Deep Learning Library • Keras • Deep Learning Library API – uses TensorFlow or other ‘backend’ • Much less code to produce same model 10
  • 12. Biological Neuron • Basis for artificial neural networks • Such as the ones in deep learning • Dendrites • Input vector, from previous neurons • Weights • Soma • Summation Function • Axon • Activation Function • A neuron 'fires” when there is enough of an input stimulus GPU Accelerated HPCC Systems | Robert Kennedy 12 Dendrite Axon Soma
  • 13. Artificial Neuron • First concept in 1943 • Inputs of the neuron are the outputs of the previous layer’s neurons • The input weights are summed with a bias • Then passed into an activation function • Activation Functions are like the biological neurons ‘deciding’ to fire • ReLu activation – gives output x if x>0, and outputs 0, if x<0, where x is the input GPU Accelerated HPCC Systems | Robert Kennedy 13
  • 14. A Fully Connected Network • Fully Connected Network • Each neuron is connected to every neuron in the subsequent layer • Neural Network Visualization • 2 hidden layers, fully connected, 3 class classification output • Multi-Layer Perceptron is an example GPU Accelerated HPCC Systems | Robert Kennedy 14
  • 15. Neural Network Training • Forwardpropagation • Backpropagation • Optimize Model with respect to Loss Function • Quantification of how “right or wrong” the model for any given datum • Gradient Descent • Stochastic Gradient Descent (SGD) • Mini-batch SGD • Right: visualization of gradient descent over an example loss function GPU Accelerated HPCC Systems | Robert Kennedy 15 Gradient Descent In Action
  • 16. Where Exactly Do the GPUs Come Into Play? • Training a NN Model is the most time-consuming part, this is where the GPU is used to dramatically reduce computation time • Two main training steps • Forward pass – weights and errors • Backward pass – gradients and weight updates • Computationally expensive convolutions are offloaded onto GPUs • These steps are done for each data point, multiple times GPU Accelerated HPCC Systems | Robert Kennedy 16
  • 17. Parallel Paradigms • Data Parallelism • Model Parallelism • Synchronous and Asynchronous • Parallel SGD GPU Accelerated HPCC Systems | Robert Kennedy 17 Data Parallelism Model Parallelism
  • 18. Model Parallelism • Neural Network Model is split across nodes • For models larger than a GPU’s memory • Requires significantly higher communication bandwidths between nodes • Not well suited for a cluster system • However, this paradigm is feasible for a multi-GPU system due to faster hardware speeds GPU Accelerated HPCC Systems | Robert Kennedy 18
  • 19. Data Parallelism • Data is partitioned and distributed to nodes • A singe NN model is replicated onto each node • Only weight updates are communicated and aggregated • As defined by the specific parallel training method • Suitable for parallelizing across multiple nodes in HPCC Systems cluster or across GPUs in a single system • This is the paradigm that is used GPU Accelerated HPCC Systems | Robert Kennedy 19
  • 20. Not Your Average HPCC Systems • Slightly different than traditional HPCC Systems topologies • Whole figure represents a single physical computer and Thor Node • Parameter Server • This is the CPU on the system • Nodes (blue) • Each node represents a single physical GPU • Connections are high speed hardware • PCI Express is up to 985 MB/s per each 16 lanes • NVLINK is roughly 10x faster than PCIe Gen 3 GPU Accelerated HPCC Systems | Robert Kennedy 20
  • 22. • We will create a Convolutional Neural Network (CNN) and train on the MNIST Dataset • MNIST is a 10-class image classification dataset, handwritten digits 0-9 • The CNN takes 784 pixels as an input (each with range 0-255) • Two Convolutional Layers • One fully connected layer with 128 neurons • 10 Output neurons (one for each class) • Total of 1,199,882 trainable parameters • Processing through 720,000 MNIST images Bundle Usage Example Architecture GPU Accelerated HPCC Systems | Robert Kennedy 22
  • 23. Spray MNIST Dataset • MNIST included in bundle • Test and Train, 785 fixed length • 60,000 28x28 grayscale images • 10,000 28x28 grayscale images • Both are labeled as one of 10 classes, 0-9 GPU Accelerated HPCC Systems | Robert Kennedy 23
  • 24. Image Visualization • Imported RAW MNIST Data • Visualization of a single MNIST image in the “data” format • Each pixel has value between 0-255, represented as 2-digit hex numbers • Each pixel is a feature GPU Accelerated HPCC Systems | Robert Kennedy 24
  • 25. Preparing the Data • Currently, the bundle demonstrates how to train on image data • Includes Example NN and the example dataset (MNSIT and Fashion MNIST) • Training data and labels is molded into a NumPy array with specified shape before training • Here, shape is the dimensions of the image • i.e. the dimensions of the input features • These get flattened to an array of 784 inputs for 784 input neurons GPU Accelerated HPCC Systems | Robert Kennedy 25
  • 26. Creating a CNN – model.add() method • First, we define the optimizer and its parameters • Next, we define the training scheme • Batch size = 128 • We’ll train for 20 epochs GPU Accelerated HPCC Systems | Robert Kennedy 26
  • 27. Creating a CNN – model.add() method • Next, we define the NN architecture • Input shape, 28x28x1 grayscale images • Initialize the model • The “nnOutputLayer” is the final layer and is, at this point, the entire NN model thus far GPU Accelerated HPCC Systems | Robert Kennedy 27
  • 28. • “nnOutputLayer” is passed into model.train() along with hyperparameters and training data Train the CNN – model.add() method GPU Accelerated HPCC Systems | Robert Kennedy 28 GPU: CPU:
  • 29. Create CNN – ECL and Python GPU Accelerated HPCC Systems | Robert Kennedy 29
  • 30. Example Input and Output GPU Accelerated HPCC Systems | Robert Kennedy 30 Image Input One-Hot-Encoded Output
  • 32. Performance Evaluation • A case study was performed to measure the performance improvements • 5 identical Convolutional Neural Networks are trained on the MNIST dataset • 10 times each to provide statistical significance • Measuring the required training time for the same model on same data using fixed training parameters • Faster training time is desired • CPU Alone, 1, 2, 3, and 4 GPUs • Older K80’s are used • Newer GPUs will only increase performance and efficiency • Compared against each other and compared against the “optimal” speed up • i.e. linear speedup GPU Accelerated HPCC Systems | Robert Kennedy 32
  • 33. Performance Boost: GPU vs. CPU • Time, in seconds, to train a CNN on MNIST dataset • Training time speedup is 5.4x between a Xeon CPU vs a K80 GPU • Speedup is large, even for a simple model on small and simple data • The training time is measuring NN training time, not necessarily any HPCC-specific computations that would be the same during CPU or GPU GPU Accelerated HPCC Systems | Robert Kennedy 33
  • 34. Performance Boost: CPU vs. GPU vs Optimal Speedup • Optimal Speed up is linear • i.e. twice the nodes is twice as fast • Speedup is not expected to be linear due to communication overheads • Results show that additional GPUs have minimal cost GPU Accelerated HPCC Systems | Robert Kennedy 34
  • 35. Conclusion • Tool used to create HPCC Systems Virtual Images on various new platforms • Good use case is to create GPU enabled images • Brief overview of Neural Networks and their optimization • Demonstrated that GPU accelerated deep learning is possible on HPCC Systems Platform • Demonstrated that GPU provides significant performance increase, even on non- traditional cluster GPU Accelerated HPCC Systems | Robert Kennedy 35
  • 36. • Implementing generalizable data loaders • To allow for a training on data with less knowledge of NumPy (Python) • Continue adding to the supported methods and ECL modeling functions • Research and Development on integrating model parallelism • Research on NN training on multi-node clusters where each node can have one or more GPUs Future Work GPU Accelerated HPCC Systems | Robert Kennedy 36
  • 37. Links • GitHub • https://github.com/hpcc-systems/GPU-Deep-Learning • https://github.com/xwang2713/cloud-image-build • NVIDIA CUDA • https://developer.nvidia.com/cuda-toolkit • TensorFlow • https://www.tensorflow.org/ • Keras • https://keras.io/ • NumPy • https://numpy.org/ GPU Accelerated HPCC Systems | Robert Kennedy 37
  • 38. GPU Accelerated HPCC Systems | Robert Kennedy 38 Robert Kennedy PhD Candidate, Florida Atlantic University rkennedy@fau.edu Questions?
  • 39. GPU Accelerated HPCC Systems | Robert Kennedy 39 View this presentation on YouTube: https://www.youtube.com/watch?v=GMt-_Io4Jys&list=PL- 8MJMUpp8IKH5-d56az56t52YccleX5h&index=8&t=0s (4:02)