SlideShare uma empresa Scribd logo
1 de 40
Baixar para ler offline
Anil Thomas
Recurrent Neural Hacks Meetup
July 16, 2016
MAKING MACHINES SMARTER.™
Using neon for pattern recognition
in audio data
Outline
2
•  Intro to neon
•  Workshop environment setup
•  CNN theory
•  CNN hands-on
•  RNN theory
•  RNN hands-on
NEON
3
Neon
4
Backends
NervanaCPU, NervanaGPU
NervanaMGPU, NervanaEngine (internal)
Datasets
Images: ImageNet, CIFAR-10, MNIST
Captions: flickr8k, flickr30k, COCO; Text: Penn Treebank, hutter-prize, IMDB, Amazon
Initializers Constant, Uniform, Gaussian, Glorot Uniform
Learning rules
Gradient Descent with Momentum
RMSProp, AdaDelta, Adam, Adagrad
Activations Rectified Linear, Softmax, Tanh, Logistic
Layers
Linear, Convolution, Pooling, Deconvolution, Dropout
Recurrent, Long Short-Term Memory, Gated Recurrent Unit, Recurrent Sum,
LookupTable
Costs Binary Cross Entropy, Multiclass Cross Entropy, Sum of Squares Error
Metrics Misclassification, TopKMisclassification, Accuracy
•  Modular components
•  Extensible, OO design
•  Documentation
•  neon.nervanasys.com
Why neon?
5
•  Fastest according to third-party benchmarks
•  Advanced data-loading capabilities
•  Open source (including the optimized GPU kernels!)
•  Support for distributed computing
6
Benchmarks for RNNs1
Data loading in neon
7
•  Multithreaded
•  Non-blocking IO
•  Non-blocking decompression and augmentation
•  Supports different types of data
(We will focus on audio in this workshop)
•  Automatic ingest
•  Can handle huge datasets
WORKSHOP ENV SETUP
8
Github repo with source code
9
•  https://github.com/anlthms/meetup2
•  Music classification examples
•  To clone locally:
git clone https://github.com/anlthms/meetup2.git
Configuring EC2 instance
10
•  Use your own EC2 account at http://aws.amazon.com/ec2/
•  Select US West (N. California) zone
•  Search for nervana-neon10 within Community AMIs
•  Select g2.2xlarge as instance type
•  After launching the instance, log in and activate virtual env:
source ~/neon/.venv/bin/activate
Datasets
11
•  GTZAN music genre dataset
10 genres
100 clips in each genre
Each clip is 30 seconds long
Preloaded in the AMI at /home/ubuntu/nervana/music/
•  Whale calls dataset from Kaggle
https://www.kaggle.com/c/whale-detection-challenge/data
30,000 2-second sound clips
Preloaded in the AMI at /home/ubuntu/nervana/wdc/
CNN THEORY
12
Convolution
0 1 2
3 4 5
6 7 8
0 1
2 3
19 25
37 43
0 1 3 4 0 1 2 3 19
13
•  Each element in the output is the result of a dot
product between two vectors
Convolutional layer
14
0
1
2
3
4
5
6
7
8
19
14
0 1 2
3 4 5
6 7 8
0 1
2 3
19 25
37 43
0
2
3
1
0
2
3
1
0
2
3
1
0
2
3
1
25
37
43
Convolutional layer
15
x +
x +
x +
x =
00
11
32
43
The weights are shared among
the units.
0
1
2
3
4
5
6
7
8
0
2
3
1
0
2
3
1
0
2
3
1
0
2
3
1
19
19
Recognizing patterns
16
Detected the pattern!
17
B0 B1 B2
B3 B4 B5
B6 B7 B8
G0 G1 G2
G3 G4 G5
G6 G7 G8
R0 R1 R2
R3 R4 R5
R6 R7 R8
18
B0 B1 B2
B3 B4 B5
B6 B7 B8
G0 G1 G2
G3 G4 G5
G6 G7 G8
R0 R1 R2
R3 R4 R5
R6 R7 R8
B0 B1 B2
B3 B4 B5
B6 B7 B8
19
G0 G1 G2
G3 G4 G5
G6 G7 G8
R0 R1 R2
R3 R4 R5
R6 R7 R8
B0 B1 B2
B3 B4 B5
B6 B7 B8
20
G0 G1 G2
G3 G4 G5
G6 G7 G8
R0 R1 R2
R3 R4 R5
R6 R7 R8
Max pooling
0 1 2
3 4 5
6 7 8
4 5
7 8
0 1 3 4 4
21
•  Each element in the output is the maximum value
within the pooling window
Max( )
CNN HANDS-ON
22
CNN example
23
•  Source at https://github.com/anlthms/meetup2/blob/master/cnn1.py
•  Command line:
./cnn1.py -e 16 -w /home/ubuntu/nervana/music -r 0 -v
Sample spectrograms
24
Sample output of conv layer #1
25
Sample output of conv layer #2
26
RNN THEORY
27
Recurrent layer
28
x
h
WI
WR
ht = g ( WI xt + WR h(t-1) )
Recurrent layer (unrolled)
29
x1 x2 x3
h1 h2 h3
WI WI WI
WR WR WR
ht = g ( WI xt + WR h(t-1) )
•  x represents the input
sequence
•  h is the memory (as well as
the output of the recurrent
layer)
•  h is computed from the
past memory and the
current input
•  h summarizes the sequence
up to the current time steptime
Training RNNs
30
•  Exploding/vanishing gradients
•  Initialization
•  Gradient clipping
•  Optimizers with Adaptive learning rates (e.g. Adagrad)
•  LSTM to capture long term dependencies
31
x!
tanh!
x! +!
x!
output gate!
forget gate! Input gate!
input!
LSTM
Applications of RNNs
32
•  Image captioning
•  Speech recognition
•  Machine translation
•  Time-series analysis
RNN HANDS-ON
33
RNN example 1
34
•  Source at https://github.com/anlthms/meetup2/blob/master/rnn1.py
•  Command line:
./rnn1.py -e 16 -w /home/ubuntu/nervana/music -r 0 -v
(Does not work well! This example demonstrates challenges of training RNNs)
RNN example 2
35
•  Source at https://github.com/anlthms/meetup2/blob/master/rnn2.py
•  Command line:
./rnn2.py -e 16 -w /home/ubuntu/nervana/music -r 0 -v
(Uses Glorot init, gradient clipping, Adagrad and LSTM)
RNN example 3
36
•  Source at https://github.com/anlthms/meetup2/blob/master/rnn3.py
•  Command line:
./rnn3.py -e 16 -w /home/ubuntu/nervana/music -r 0 -v
(Uses multiple bi-rnn layers)
Final example
37
•  Source at https://github.com/NervanaSystems/neon/blob/master/examples/whale_calls.py
•  Command line:
./whale_calls.py -e 16 -w /home/ubuntu/nervana/wdc -r 0 -s whales.pkl -v
Network structure of final example
38
Convolution Layer 'Convolution_0': 1 x (81x49) inputs, 128 x (79x24) outputs, 0,0 padding, 1,2 stride
Activation Layer 'Convolution_0_Rectlin': Rectlin
Convolution Layer 'Convolution_1': 128 x (79x24) inputs, 256 x (77x22) outputs, 0,0 padding, 1,1 stride
BatchNorm Layer 'Convolution_1_bnorm': 433664 inputs, 1 steps, 256 feature maps
Activation Layer 'Convolution_1_Rectlin': Rectlin
Pooling Layer 'Pooling_0': 256 x (77x22) inputs, 256 x (38x11) outputs
Convolution Layer 'Convolution_2': 256 x (38x11) inputs, 512 x (37x10) outputs, 0,0 padding, 1,1 stride
BatchNorm Layer 'Convolution_2_bnorm': 189440 inputs, 1 steps, 512 feature maps
Activation Layer 'Convolution_2_Rectlin': Rectlin
BiRNN Layer 'BiRNN_0': 18944 inputs, (256 outputs) * 2, 10 steps
BiRNN Layer 'BiRNN_1': (256 inputs) * 2, (256 outputs) * 2, 10 steps
BiRNN Layer 'BiRNN_2': (256 inputs) * 2, (256 outputs) * 2, 10 steps
RecurrentOutput choice RecurrentLast : (512, 10) inputs, 512 outputs
Linear Layer 'Linear_0': 512 inputs, 32 outputs
BatchNorm Layer 'Linear_0_bnorm': 32 inputs, 1 steps, 32 feature maps
Activation Layer 'Linear_0_Rectlin': Rectlin
Linear Layer 'Linear_1': 32 inputs, 2 outputs
Activation Layer 'Linear_1_Softmax': Softmax
More info
39
Nervana’s deep learning tutorials:
https://www.nervanasys.com/deep-learning-tutorials/
Github page:
https://github.com/NervanaSystems/neon
For more information, contact:
info@nervanasys.com
40
NERVANA

Mais conteúdo relacionado

Mais procurados

Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Spark Summit
 

Mais procurados (20)

Urs Köster Presenting at RE-Work DL Summit in Boston
Urs Köster Presenting at RE-Work DL Summit in BostonUrs Köster Presenting at RE-Work DL Summit in Boston
Urs Köster Presenting at RE-Work DL Summit in Boston
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at Galvanize
 
ODSC West
ODSC WestODSC West
ODSC West
 
Optimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeOptimizing the graphics pipeline with compute
Optimizing the graphics pipeline with compute
 
The rendering technology of 'lords of the fallen' philip hammer
The rendering technology of 'lords of the fallen'   philip hammerThe rendering technology of 'lords of the fallen'   philip hammer
The rendering technology of 'lords of the fallen' philip hammer
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17
 
Scope Stack Allocation
Scope Stack AllocationScope Stack Allocation
Scope Stack Allocation
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosPT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
 
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
 
Borderless Per Face Texture Mapping
Borderless Per Face Texture MappingBorderless Per Face Texture Mapping
Borderless Per Face Texture Mapping
 
Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
 
Moving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsMoving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC Systems
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
 
An Analysis of Convolution for Inference
An Analysis of Convolution for InferenceAn Analysis of Convolution for Inference
An Analysis of Convolution for Inference
 

Destaque

Destaque (15)

Recurrent Neural Network tutorial (2nd)
Recurrent Neural Network tutorial (2nd) Recurrent Neural Network tutorial (2nd)
Recurrent Neural Network tutorial (2nd)
 
Video Activity Recognition and NLP Q&A Model Example
Video Activity Recognition and NLP Q&A Model ExampleVideo Activity Recognition and NLP Q&A Model Example
Video Activity Recognition and NLP Q&A Model Example
 
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
 
Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
Introduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will ConstableIntroduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will Constable
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
High-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningHigh-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep Learning
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for Robotics
 
RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introduction
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 

Semelhante a Using neon for pattern recognition in audio data

“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
Edge AI and Vision Alliance
 
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
NVIDIA Taiwan
 

Semelhante a Using neon for pattern recognition in audio data (20)

Pr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentation
 
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person Detection
 
CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...
CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...
CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...
 
Deep Learning in theano
Deep Learning in theanoDeep Learning in theano
Deep Learning in theano
 
Wuala, P2P Online Storage
Wuala, P2P Online StorageWuala, P2P Online Storage
Wuala, P2P Online Storage
 
Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoorters
 
Lecture 06 marco aurelio ranzato - deep learning
Lecture 06   marco aurelio ranzato - deep learningLecture 06   marco aurelio ranzato - deep learning
Lecture 06 marco aurelio ranzato - deep learning
 
Killzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo PostmortemKillzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo Postmortem
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainability
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clusters
 
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
 
Performance and scalability for machine learning
Performance and scalability for machine learningPerformance and scalability for machine learning
Performance and scalability for machine learning
 
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
 
Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUs
 
Inferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on SparkInferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on Spark
 
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Using neon for pattern recognition in audio data

  • 1. Anil Thomas Recurrent Neural Hacks Meetup July 16, 2016 MAKING MACHINES SMARTER.™ Using neon for pattern recognition in audio data
  • 2. Outline 2 •  Intro to neon •  Workshop environment setup •  CNN theory •  CNN hands-on •  RNN theory •  RNN hands-on
  • 4. Neon 4 Backends NervanaCPU, NervanaGPU NervanaMGPU, NervanaEngine (internal) Datasets Images: ImageNet, CIFAR-10, MNIST Captions: flickr8k, flickr30k, COCO; Text: Penn Treebank, hutter-prize, IMDB, Amazon Initializers Constant, Uniform, Gaussian, Glorot Uniform Learning rules Gradient Descent with Momentum RMSProp, AdaDelta, Adam, Adagrad Activations Rectified Linear, Softmax, Tanh, Logistic Layers Linear, Convolution, Pooling, Deconvolution, Dropout Recurrent, Long Short-Term Memory, Gated Recurrent Unit, Recurrent Sum, LookupTable Costs Binary Cross Entropy, Multiclass Cross Entropy, Sum of Squares Error Metrics Misclassification, TopKMisclassification, Accuracy •  Modular components •  Extensible, OO design •  Documentation •  neon.nervanasys.com
  • 5. Why neon? 5 •  Fastest according to third-party benchmarks •  Advanced data-loading capabilities •  Open source (including the optimized GPU kernels!) •  Support for distributed computing
  • 7. Data loading in neon 7 •  Multithreaded •  Non-blocking IO •  Non-blocking decompression and augmentation •  Supports different types of data (We will focus on audio in this workshop) •  Automatic ingest •  Can handle huge datasets
  • 9. Github repo with source code 9 •  https://github.com/anlthms/meetup2 •  Music classification examples •  To clone locally: git clone https://github.com/anlthms/meetup2.git
  • 10. Configuring EC2 instance 10 •  Use your own EC2 account at http://aws.amazon.com/ec2/ •  Select US West (N. California) zone •  Search for nervana-neon10 within Community AMIs •  Select g2.2xlarge as instance type •  After launching the instance, log in and activate virtual env: source ~/neon/.venv/bin/activate
  • 11. Datasets 11 •  GTZAN music genre dataset 10 genres 100 clips in each genre Each clip is 30 seconds long Preloaded in the AMI at /home/ubuntu/nervana/music/ •  Whale calls dataset from Kaggle https://www.kaggle.com/c/whale-detection-challenge/data 30,000 2-second sound clips Preloaded in the AMI at /home/ubuntu/nervana/wdc/
  • 13. Convolution 0 1 2 3 4 5 6 7 8 0 1 2 3 19 25 37 43 0 1 3 4 0 1 2 3 19 13 •  Each element in the output is the result of a dot product between two vectors
  • 14. Convolutional layer 14 0 1 2 3 4 5 6 7 8 19 14 0 1 2 3 4 5 6 7 8 0 1 2 3 19 25 37 43 0 2 3 1 0 2 3 1 0 2 3 1 0 2 3 1 25 37 43
  • 15. Convolutional layer 15 x + x + x + x = 00 11 32 43 The weights are shared among the units. 0 1 2 3 4 5 6 7 8 0 2 3 1 0 2 3 1 0 2 3 1 0 2 3 1 19 19
  • 17. 17 B0 B1 B2 B3 B4 B5 B6 B7 B8 G0 G1 G2 G3 G4 G5 G6 G7 G8 R0 R1 R2 R3 R4 R5 R6 R7 R8
  • 18. 18 B0 B1 B2 B3 B4 B5 B6 B7 B8 G0 G1 G2 G3 G4 G5 G6 G7 G8 R0 R1 R2 R3 R4 R5 R6 R7 R8
  • 19. B0 B1 B2 B3 B4 B5 B6 B7 B8 19 G0 G1 G2 G3 G4 G5 G6 G7 G8 R0 R1 R2 R3 R4 R5 R6 R7 R8
  • 20. B0 B1 B2 B3 B4 B5 B6 B7 B8 20 G0 G1 G2 G3 G4 G5 G6 G7 G8 R0 R1 R2 R3 R4 R5 R6 R7 R8
  • 21. Max pooling 0 1 2 3 4 5 6 7 8 4 5 7 8 0 1 3 4 4 21 •  Each element in the output is the maximum value within the pooling window Max( )
  • 23. CNN example 23 •  Source at https://github.com/anlthms/meetup2/blob/master/cnn1.py •  Command line: ./cnn1.py -e 16 -w /home/ubuntu/nervana/music -r 0 -v
  • 25. Sample output of conv layer #1 25
  • 26. Sample output of conv layer #2 26
  • 28. Recurrent layer 28 x h WI WR ht = g ( WI xt + WR h(t-1) )
  • 29. Recurrent layer (unrolled) 29 x1 x2 x3 h1 h2 h3 WI WI WI WR WR WR ht = g ( WI xt + WR h(t-1) ) •  x represents the input sequence •  h is the memory (as well as the output of the recurrent layer) •  h is computed from the past memory and the current input •  h summarizes the sequence up to the current time steptime
  • 30. Training RNNs 30 •  Exploding/vanishing gradients •  Initialization •  Gradient clipping •  Optimizers with Adaptive learning rates (e.g. Adagrad) •  LSTM to capture long term dependencies
  • 31. 31 x! tanh! x! +! x! output gate! forget gate! Input gate! input! LSTM
  • 32. Applications of RNNs 32 •  Image captioning •  Speech recognition •  Machine translation •  Time-series analysis
  • 34. RNN example 1 34 •  Source at https://github.com/anlthms/meetup2/blob/master/rnn1.py •  Command line: ./rnn1.py -e 16 -w /home/ubuntu/nervana/music -r 0 -v (Does not work well! This example demonstrates challenges of training RNNs)
  • 35. RNN example 2 35 •  Source at https://github.com/anlthms/meetup2/blob/master/rnn2.py •  Command line: ./rnn2.py -e 16 -w /home/ubuntu/nervana/music -r 0 -v (Uses Glorot init, gradient clipping, Adagrad and LSTM)
  • 36. RNN example 3 36 •  Source at https://github.com/anlthms/meetup2/blob/master/rnn3.py •  Command line: ./rnn3.py -e 16 -w /home/ubuntu/nervana/music -r 0 -v (Uses multiple bi-rnn layers)
  • 37. Final example 37 •  Source at https://github.com/NervanaSystems/neon/blob/master/examples/whale_calls.py •  Command line: ./whale_calls.py -e 16 -w /home/ubuntu/nervana/wdc -r 0 -s whales.pkl -v
  • 38. Network structure of final example 38 Convolution Layer 'Convolution_0': 1 x (81x49) inputs, 128 x (79x24) outputs, 0,0 padding, 1,2 stride Activation Layer 'Convolution_0_Rectlin': Rectlin Convolution Layer 'Convolution_1': 128 x (79x24) inputs, 256 x (77x22) outputs, 0,0 padding, 1,1 stride BatchNorm Layer 'Convolution_1_bnorm': 433664 inputs, 1 steps, 256 feature maps Activation Layer 'Convolution_1_Rectlin': Rectlin Pooling Layer 'Pooling_0': 256 x (77x22) inputs, 256 x (38x11) outputs Convolution Layer 'Convolution_2': 256 x (38x11) inputs, 512 x (37x10) outputs, 0,0 padding, 1,1 stride BatchNorm Layer 'Convolution_2_bnorm': 189440 inputs, 1 steps, 512 feature maps Activation Layer 'Convolution_2_Rectlin': Rectlin BiRNN Layer 'BiRNN_0': 18944 inputs, (256 outputs) * 2, 10 steps BiRNN Layer 'BiRNN_1': (256 inputs) * 2, (256 outputs) * 2, 10 steps BiRNN Layer 'BiRNN_2': (256 inputs) * 2, (256 outputs) * 2, 10 steps RecurrentOutput choice RecurrentLast : (512, 10) inputs, 512 outputs Linear Layer 'Linear_0': 512 inputs, 32 outputs BatchNorm Layer 'Linear_0_bnorm': 32 inputs, 1 steps, 32 feature maps Activation Layer 'Linear_0_Rectlin': Rectlin Linear Layer 'Linear_1': 32 inputs, 2 outputs Activation Layer 'Linear_1_Softmax': Softmax
  • 39. More info 39 Nervana’s deep learning tutorials: https://www.nervanasys.com/deep-learning-tutorials/ Github page: https://github.com/NervanaSystems/neon For more information, contact: info@nervanasys.com