SlideShare uma empresa Scribd logo
1 de 48
AWS re:INVENTRole of Tensors in Large-Scale
Machine Learning
Anima Anandk umar
Pr inc ipal Sc ientis t, AW S
Br en Pr ofes s or, C altec h
Tensors are Multi-Dimensional
Tensors are Multi-Dimensional
Modern data is inherently multi-dimensional
Tensors to encode multi-dimensional data
Images: 3 dimensions
Tensors to encode multi-dimensional data
Images: 3 dimensions Videos: 4 dimensions
Tensors to encode multi-dimensional data
Are there trees in
the image?
Images: 3 dimensions Videos: 4 dimensions Visual q&a: ? dimensions
Tensors to encode higher order moments
Pairwise correlations
Tensors to encode higher order moments
Pairwise correlations
Third order correlations
Operations on Tensors: Tensor
Contraction
Tensorized Neural Networks
h t t p s : / / g i t h u b . c o m / a w s l a b s / a m a z o n - s a g e m a k e r - e x a m p l e s
Deep Neural Nets: Transforming Tensors
Deep Tensorized Networks
Tensor Contraction & Regression Networks by J. Kossaifi, Z. C. Lipton,A. Khanna,T. Furlanello, A, 2017.
Space Saving in Deep Tensorized
Networks
Tensorly: Framework for Tensor Algebra
• Python programming
• User-friendly API
• Multiple backends:
flexible + scalable
• Example notebooks in
repository
Tensorly with Pytorch backend
import tensorly as tl
from tensorly.random import tucker_tensor
tl.set_backend(‘pytorch’)
core, factors = tucker_tensor((5, 5, 5),
rank=(3, 3, 3))
core = Variable(core, requires_grad=True)
factors = [Variable(f, requires_grad=True) for f in factors]
optimiser = torch.optim.Adam([core]+factors, lr=lr)
for i in range(1, n_iter):
optimiser.zero_grad()
rec = tucker_to_tensor(core, factors)
loss = (rec - tensor).pow(2).sum()
for f in factors:
loss = loss + 0.01*f.pow(2).sum()
loss.backward()
optimiser.step()
Set Pytorch backend
Attach gradients
Set optimizer
TuckerTensor form
Speeding up Tensor Contractions
h t t p s : / / g i t h u b . c o m / a w s l a b s / a m a z o n - s a g e m a k e r - e x a m p l e s
Speeding up Tensor Contractions
Speeding up Tensor Contractions
New Primitive for Tensor Contractions
Performance of StridedBatchedGEMM
Performance on par with pure GEMM (P100 and beyond).
Tensors in Time Series
h t t p s : / / g i t h u b . c o m / a w s l a b s / a m a z o n - s a g e m a k e r - e x a m p l e s
Tensors for long-term forecasting
Difficulties in long term forecasting:
• Long-term dependencies
• High-order correlations
• Error propagation
22
RNNs: First-Order Markov Models
Input state 𝑥 𝑡, hidden state ℎ 𝑡, output 𝑦𝑡,
ℎ 𝑡= 𝑓 𝑥 𝑡, ℎ 𝑡−1 ; 𝜃 ; 𝑦𝑡 = 𝑔( ℎ 𝑡; 𝜃)
23
Tensorized RNNs
L-order Markov
ℎ 𝑡= 𝑓 𝑥𝑡, ℎ 𝑡−1 , ℎ 𝑡−2 , … ℎ 𝑡−𝐿 ; 𝜃 ; 𝑦𝑡 = 𝑔( ℎ 𝑡; 𝜃)
Polynomial Interactions
𝑠𝑡−1 = [1, ℎ 𝑡−1, ℎ 𝑡−2 , … ℎ 𝑡−𝐿 ]
ℎ 𝑡,𝛼 = 𝑓 𝑊ℎ𝑥
𝑥𝑡 + 𝒲 𝛼,𝑖1,…𝑖 𝑃
𝑠𝑡−1;𝑖1
⨂ … ⨂𝑠𝑡−1;𝑖 𝑃
; 𝜃 ;
24
Tensor-Train Decomposition
25
Reduce # of parameters
Linear tensor network
Dimension-free decomposition
𝒲𝑖1…𝑖 𝑃
=
𝛼0…𝛼 𝑃
𝒜 𝛼0 𝑖1 𝛼1
… 𝒜 𝛼 𝑃−1 𝑖 𝑃 𝛼 𝑃
Tensor-Train RNNs and LSTMs
26
Seq2seq architecture
TT-LSTM cells
C l i m a t e d a t a s e tTr a ff i c d a t a s e t
Tensor LSTM for Long-term Forecasting
Approximation Guarantees
Theorem 1 [Yu et al 2017]: Let the state transition function 𝑓 ∈ ℋ𝜇
𝑘
be a
Hölder continuous, with bounded derivatives up to order k and finite
Fourier magnitude distribution 𝐶𝑓. Then a single layer Tensor Train RNN
can approximate 𝑓 with an estimation error of 𝜀 using with ℎ hidden units:
ℎ ≤
𝐶𝑓
2
𝜀
𝑑 − 1
𝑟 + 1 − 𝑘−1
𝑘 − 1
+
𝐶𝑓
2
𝜀
𝐶 𝑘 𝑝−𝑘
Where 𝑑 is the size of the state space, 𝑟 is the tensor-train rank and 𝑝 is
the degree of high-order polynomials i.e., the order of tensor.
• Easier to approximate if function is smooth
• Polynomial interactions are more efficient
Long-term Forecasting usingTensor-Train RNNs by R.Yu, S. Zheng, A,Y.Yue, 2017.
Visual Question & Answering
Tensors for multiple modalities
Visual Question & Answering
Tensors for multiple modalities
Visual Question & Answering
Tensor Sketching Algorithms
Tensors for Topic Modelingh t t p s
Tensors for Topic Detection
Tensors for Topic Detection
Government
Information
Technology
Politics
Topics
LDA Topic Model
brai
n
comput
data
evolve
gene
neuron
Justice
Education
Sports
Topics
Topic-word matrix [word = i|topic = j ]
Topic proportions P[topic = j|document]
Moment Tensor: Co-occurrence of Word Triplets
= + +
crim
e
Sports
Educa
on
Learning LDA Model
Topic-word matrix [word = i|topic = j ]
Topic proportions P[topic = j|document]
Moment Tensor: Co-occurrence of Word Triplets
= + +
crim
e
Sports
Educa
on
Learning LDA Model
Spectral Decomposition
Why Tensors?
Statistical reasons:
• Incorporate higher order relationships in data
• Discover hidden topics (not possible with matrix methods)
A. Anandkumar etal,Tensor Decompositions for Learning Latent Variable Models, JMLR 2014.
Why Tensors?
Statistical reasons:
• Incorporate higher order relationships in data
• Discover hidden topics (not possible with matrix methods)
Computational reasons:
• Tensor algebra is parallelizable like linear algebra.
• Faster than other algorithms for LDA
• Flexible: Training and inference decoupled
• Guaranteed in theory to converge to global optimum
A. Anandkumar etal,Tensor Decompositions for Learning Latent Variable Models, JMLR 2014.
Spectral LDA on AWS SageMaker
SageMaker LDA training is faster
0.00
20.00
40.00
60.00
80.00
100.00
5 10 15 20 25 30 50 75 100
Timeinminutes
Number of Topics
Training time for NYTimes
Spectral Time(minutes) Mallet Time (minutes)
0.00
50.00
100.00
150.00
200.00
250.00
5 10 15 20 25 50 100
Timeinminutes
Number of Topics
Training time for PubMed
Spectral Time (minutes) Mallet Time (minutes)
8 million documents
22x faster on average 12x faster on average
• Mallet is an open-source framework for topic modeling
• Mallet does training and inference together
• Benchmarks on AWS SageMaker Platform
300000 documents
End-to-end
Machine Learning
Platform
Zero setup Flexible model
training
Pay by the
second
Introducing Amazon SageMaker
The quickest and easiest way to get ML models from idea to production
XGBoost, FM,
and Linear for
classification and
regression
Kmeans and PCA
for clustering and
dimensionality
reduction
Image
classification with
convolutional
neural networks
LDA and NTM for
topic modeling,
seq2seq for
translation
Many optimized algorithms on SageMaker
AWS ML Stack
Frameworks &
Infrastructure
AWS Deep Learning AMI
GPU
(P3 Instances)
Mobile
CPU
(C5 Instances)
IoT
(Greengrass)
Vision:
Rekognition Image
Rekognition Video
Speech:
Polly
Transcribe
Language:
Lex Translate
Comprehend
Apache
MXNet
PyTorch
Cognitive
Toolkit
Keras
Caffe2
& Caffe
TensorFlow Gluon
Application
Services
Platform
Services
Amazon Machine
Learning
Mechanical
Turk
Spark &
EMR
Amazon
SageMaker
AWS
DeepLens
CONCLUSION
Conclusion
• Tensors are higher extensions of matrices.
• Encode data dimensions, modalities and higher order relationships.
• Tensor algebra is richer than matrix algebra. Richer neural network
architectures. More compact networks/better accuracy.
• Tensor operations are embarrassingly parallel. New primitives for tensor
operations.
= + ..
THANK YOU!

Mais conteúdo relacionado

Mais procurados

MNIST and machine learning - presentation
MNIST and machine learning - presentationMNIST and machine learning - presentation
MNIST and machine learning - presentation
Steve Dias da Cruz
 

Mais procurados (20)

Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image Generation
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
 
Lstm
LstmLstm
Lstm
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
 
MNIST and machine learning - presentation
MNIST and machine learning - presentationMNIST and machine learning - presentation
MNIST and machine learning - presentation
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
09 Inference for Networks – Exponential Random Graph Models (2017)
09 Inference for Networks – Exponential Random Graph Models (2017)09 Inference for Networks – Exponential Random Graph Models (2017)
09 Inference for Networks – Exponential Random Graph Models (2017)
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural Networks
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
 
Ensemble Model (Hybrid model)
Ensemble Model (Hybrid model)Ensemble Model (Hybrid model)
Ensemble Model (Hybrid model)
 
Tensorflow presentation
Tensorflow presentationTensorflow presentation
Tensorflow presentation
 
Personalizing Session-based Recommendations with Hierarchical Recurrent Neura...
Personalizing Session-based Recommendations with Hierarchical Recurrent Neura...Personalizing Session-based Recommendations with Hierarchical Recurrent Neura...
Personalizing Session-based Recommendations with Hierarchical Recurrent Neura...
 
Flow based generative models
Flow based generative modelsFlow based generative models
Flow based generative models
 
Deep learning for person re-identification
Deep learning for person re-identificationDeep learning for person re-identification
Deep learning for person re-identification
 
Data mining Measuring similarity and desimilarity
Data mining Measuring similarity and desimilarityData mining Measuring similarity and desimilarity
Data mining Measuring similarity and desimilarity
 

Semelhante a Role of Tensors in Machine Learning

NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
zukun
 
Chapter24rev1.pptPart 6Chapter 24Boundary-Valu.docx
Chapter24rev1.pptPart 6Chapter 24Boundary-Valu.docxChapter24rev1.pptPart 6Chapter 24Boundary-Valu.docx
Chapter24rev1.pptPart 6Chapter 24Boundary-Valu.docx
tiffanyd4
 

Semelhante a Role of Tensors in Machine Learning (20)

Recurrent and Recursive Nets (part 2)
Recurrent and Recursive Nets (part 2)Recurrent and Recursive Nets (part 2)
Recurrent and Recursive Nets (part 2)
 
Design of ternary sequence using msaa
Design of ternary sequence using msaaDesign of ternary sequence using msaa
Design of ternary sequence using msaa
 
Heavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdfHeavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdf
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
 
Metody logiczne w analizie danych
Metody logiczne w analizie danych Metody logiczne w analizie danych
Metody logiczne w analizie danych
 
L010628894
L010628894L010628894
L010628894
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processing
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
 
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
 
Efficient Broadcast Authentication with Highest Life Span in Wireless Sensor ...
Efficient Broadcast Authentication with Highest Life Span in Wireless Sensor ...Efficient Broadcast Authentication with Highest Life Span in Wireless Sensor ...
Efficient Broadcast Authentication with Highest Life Span in Wireless Sensor ...
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networks
 
Modeling of Granular Mixing using Markov Chains and the Discrete Element Method
Modeling of Granular Mixing using Markov Chains and the Discrete Element MethodModeling of Granular Mixing using Markov Chains and the Discrete Element Method
Modeling of Granular Mixing using Markov Chains and the Discrete Element Method
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
 
Performance of Matching Algorithmsfor Signal Approximation
Performance of Matching Algorithmsfor Signal ApproximationPerformance of Matching Algorithmsfor Signal Approximation
Performance of Matching Algorithmsfor Signal Approximation
 
lepibwp74jd2rz.pdf
lepibwp74jd2rz.pdflepibwp74jd2rz.pdf
lepibwp74jd2rz.pdf
 
lecture_01.ppt
lecture_01.pptlecture_01.ppt
lecture_01.ppt
 
Concepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, AttentionConcepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, Attention
 
Chapter24rev1.pptPart 6Chapter 24Boundary-Valu.docx
Chapter24rev1.pptPart 6Chapter 24Boundary-Valu.docxChapter24rev1.pptPart 6Chapter 24Boundary-Valu.docx
Chapter24rev1.pptPart 6Chapter 24Boundary-Valu.docx
 
240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx
240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx
240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Role of Tensors in Machine Learning

  • 1. AWS re:INVENTRole of Tensors in Large-Scale Machine Learning Anima Anandk umar Pr inc ipal Sc ientis t, AW S Br en Pr ofes s or, C altec h
  • 3. Tensors are Multi-Dimensional Modern data is inherently multi-dimensional
  • 4. Tensors to encode multi-dimensional data Images: 3 dimensions
  • 5. Tensors to encode multi-dimensional data Images: 3 dimensions Videos: 4 dimensions
  • 6. Tensors to encode multi-dimensional data Are there trees in the image? Images: 3 dimensions Videos: 4 dimensions Visual q&a: ? dimensions
  • 7. Tensors to encode higher order moments Pairwise correlations
  • 8. Tensors to encode higher order moments Pairwise correlations Third order correlations
  • 9. Operations on Tensors: Tensor Contraction
  • 10. Tensorized Neural Networks h t t p s : / / g i t h u b . c o m / a w s l a b s / a m a z o n - s a g e m a k e r - e x a m p l e s
  • 11. Deep Neural Nets: Transforming Tensors
  • 12. Deep Tensorized Networks Tensor Contraction & Regression Networks by J. Kossaifi, Z. C. Lipton,A. Khanna,T. Furlanello, A, 2017.
  • 13. Space Saving in Deep Tensorized Networks
  • 14. Tensorly: Framework for Tensor Algebra • Python programming • User-friendly API • Multiple backends: flexible + scalable • Example notebooks in repository
  • 15. Tensorly with Pytorch backend import tensorly as tl from tensorly.random import tucker_tensor tl.set_backend(‘pytorch’) core, factors = tucker_tensor((5, 5, 5), rank=(3, 3, 3)) core = Variable(core, requires_grad=True) factors = [Variable(f, requires_grad=True) for f in factors] optimiser = torch.optim.Adam([core]+factors, lr=lr) for i in range(1, n_iter): optimiser.zero_grad() rec = tucker_to_tensor(core, factors) loss = (rec - tensor).pow(2).sum() for f in factors: loss = loss + 0.01*f.pow(2).sum() loss.backward() optimiser.step() Set Pytorch backend Attach gradients Set optimizer TuckerTensor form
  • 16. Speeding up Tensor Contractions h t t p s : / / g i t h u b . c o m / a w s l a b s / a m a z o n - s a g e m a k e r - e x a m p l e s
  • 17. Speeding up Tensor Contractions
  • 18. Speeding up Tensor Contractions
  • 19. New Primitive for Tensor Contractions
  • 20. Performance of StridedBatchedGEMM Performance on par with pure GEMM (P100 and beyond).
  • 21. Tensors in Time Series h t t p s : / / g i t h u b . c o m / a w s l a b s / a m a z o n - s a g e m a k e r - e x a m p l e s
  • 22. Tensors for long-term forecasting Difficulties in long term forecasting: • Long-term dependencies • High-order correlations • Error propagation 22
  • 23. RNNs: First-Order Markov Models Input state 𝑥 𝑡, hidden state ℎ 𝑡, output 𝑦𝑡, ℎ 𝑡= 𝑓 𝑥 𝑡, ℎ 𝑡−1 ; 𝜃 ; 𝑦𝑡 = 𝑔( ℎ 𝑡; 𝜃) 23
  • 24. Tensorized RNNs L-order Markov ℎ 𝑡= 𝑓 𝑥𝑡, ℎ 𝑡−1 , ℎ 𝑡−2 , … ℎ 𝑡−𝐿 ; 𝜃 ; 𝑦𝑡 = 𝑔( ℎ 𝑡; 𝜃) Polynomial Interactions 𝑠𝑡−1 = [1, ℎ 𝑡−1, ℎ 𝑡−2 , … ℎ 𝑡−𝐿 ] ℎ 𝑡,𝛼 = 𝑓 𝑊ℎ𝑥 𝑥𝑡 + 𝒲 𝛼,𝑖1,…𝑖 𝑃 𝑠𝑡−1;𝑖1 ⨂ … ⨂𝑠𝑡−1;𝑖 𝑃 ; 𝜃 ; 24
  • 25. Tensor-Train Decomposition 25 Reduce # of parameters Linear tensor network Dimension-free decomposition 𝒲𝑖1…𝑖 𝑃 = 𝛼0…𝛼 𝑃 𝒜 𝛼0 𝑖1 𝛼1 … 𝒜 𝛼 𝑃−1 𝑖 𝑃 𝛼 𝑃
  • 26. Tensor-Train RNNs and LSTMs 26 Seq2seq architecture TT-LSTM cells
  • 27. C l i m a t e d a t a s e tTr a ff i c d a t a s e t Tensor LSTM for Long-term Forecasting
  • 28. Approximation Guarantees Theorem 1 [Yu et al 2017]: Let the state transition function 𝑓 ∈ ℋ𝜇 𝑘 be a Hölder continuous, with bounded derivatives up to order k and finite Fourier magnitude distribution 𝐶𝑓. Then a single layer Tensor Train RNN can approximate 𝑓 with an estimation error of 𝜀 using with ℎ hidden units: ℎ ≤ 𝐶𝑓 2 𝜀 𝑑 − 1 𝑟 + 1 − 𝑘−1 𝑘 − 1 + 𝐶𝑓 2 𝜀 𝐶 𝑘 𝑝−𝑘 Where 𝑑 is the size of the state space, 𝑟 is the tensor-train rank and 𝑝 is the degree of high-order polynomials i.e., the order of tensor. • Easier to approximate if function is smooth • Polynomial interactions are more efficient Long-term Forecasting usingTensor-Train RNNs by R.Yu, S. Zheng, A,Y.Yue, 2017.
  • 29. Visual Question & Answering Tensors for multiple modalities
  • 30. Visual Question & Answering Tensors for multiple modalities
  • 31. Visual Question & Answering Tensor Sketching Algorithms
  • 32. Tensors for Topic Modelingh t t p s
  • 33. Tensors for Topic Detection
  • 34. Tensors for Topic Detection Government Information Technology Politics Topics
  • 36. Topic-word matrix [word = i|topic = j ] Topic proportions P[topic = j|document] Moment Tensor: Co-occurrence of Word Triplets = + + crim e Sports Educa on Learning LDA Model
  • 37. Topic-word matrix [word = i|topic = j ] Topic proportions P[topic = j|document] Moment Tensor: Co-occurrence of Word Triplets = + + crim e Sports Educa on Learning LDA Model
  • 39. Why Tensors? Statistical reasons: • Incorporate higher order relationships in data • Discover hidden topics (not possible with matrix methods) A. Anandkumar etal,Tensor Decompositions for Learning Latent Variable Models, JMLR 2014.
  • 40. Why Tensors? Statistical reasons: • Incorporate higher order relationships in data • Discover hidden topics (not possible with matrix methods) Computational reasons: • Tensor algebra is parallelizable like linear algebra. • Faster than other algorithms for LDA • Flexible: Training and inference decoupled • Guaranteed in theory to converge to global optimum A. Anandkumar etal,Tensor Decompositions for Learning Latent Variable Models, JMLR 2014.
  • 41. Spectral LDA on AWS SageMaker
  • 42. SageMaker LDA training is faster 0.00 20.00 40.00 60.00 80.00 100.00 5 10 15 20 25 30 50 75 100 Timeinminutes Number of Topics Training time for NYTimes Spectral Time(minutes) Mallet Time (minutes) 0.00 50.00 100.00 150.00 200.00 250.00 5 10 15 20 25 50 100 Timeinminutes Number of Topics Training time for PubMed Spectral Time (minutes) Mallet Time (minutes) 8 million documents 22x faster on average 12x faster on average • Mallet is an open-source framework for topic modeling • Mallet does training and inference together • Benchmarks on AWS SageMaker Platform 300000 documents
  • 43. End-to-end Machine Learning Platform Zero setup Flexible model training Pay by the second Introducing Amazon SageMaker The quickest and easiest way to get ML models from idea to production
  • 44. XGBoost, FM, and Linear for classification and regression Kmeans and PCA for clustering and dimensionality reduction Image classification with convolutional neural networks LDA and NTM for topic modeling, seq2seq for translation Many optimized algorithms on SageMaker
  • 45. AWS ML Stack Frameworks & Infrastructure AWS Deep Learning AMI GPU (P3 Instances) Mobile CPU (C5 Instances) IoT (Greengrass) Vision: Rekognition Image Rekognition Video Speech: Polly Transcribe Language: Lex Translate Comprehend Apache MXNet PyTorch Cognitive Toolkit Keras Caffe2 & Caffe TensorFlow Gluon Application Services Platform Services Amazon Machine Learning Mechanical Turk Spark & EMR Amazon SageMaker AWS DeepLens
  • 47. Conclusion • Tensors are higher extensions of matrices. • Encode data dimensions, modalities and higher order relationships. • Tensor algebra is richer than matrix algebra. Richer neural network architectures. More compact networks/better accuracy. • Tensor operations are embarrassingly parallel. New primitives for tensor operations. = + ..

Notas do Editor

  1. End-to-End Machine Learning Platform Amazon SageMaker offers a familiar integrated development environment so that you can start processing your training dataset and developing your algorithms immediately. With one-click training, Amazon SageMaker provides a distributed training environment complete with high-performance machine learning algorithms, and built-in hyperparameter optimization for auto-tuning your models. When you’re ready to deploy, launching a secure and elastically scalable production environment is as simple as clicking a button in the Amazon SageMaker management console.   Zero Setup Amazon SageMaker provides hosted Jupyter notebooks that require no setup, so you can begin processing your training datasets and developing your algorithms immediately. With a few clicks in the Amazon SageMaker console, you can create a fully managed notebook instance, pre-loaded with useful libraries for machine learning and deep learning frameworks like TensorFlow, and Apache MXNet. You need only add your data. Flexible Model Training With native support for bring-your-own-algorithms and frameworks, model training in Amazon SageMaker is flexible. Amazon SageMaker provides native Apache MXNet and TensorFlow support, and offers a range of built-in, high performance machine learning algorithms, in addition to supporting popular open source algorithms. If you want to train against another algorithm or with an alternative deep learning framework, you simply bring your own algorithms or deep learning frameworks via a Docker container. Pay by the second With Amazon SageMaker , you pay only for what you use. Authoring, training, and hosting is billed by the second, with no minimum fees and no upfront commitments. Pricing within Amazon SageMaker is broken down by on-demand ML instances, ML storage, and fees for data processing in notebooks and hosting instances.
  2. The result of this is 1)      Linear Learner - Regression 2)      Linear Learner - Classification 3)      K-means 4)      Principal Component Analysis 5)      Factorization Machines 6)      Neural Topic Modeling 7)      Latent Dirichlet Allocation 8)      XGBoost 9)      Seq2Seq 10)  Image classification (ResNet)