oneDNN Graph API extends oneDNN with a graph interface which reduces deep learning integration costs and maximizes compute efficiency across a variety of AI hardware including AI accelerators. Get started on your AI Developer Journey @ software.intel.com/ai.
Leveraging Artificial Intelligence Processing on Edge DevicesICS
Semelhante a Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| Software for AI Optimization Summit 2021 Technical Session (20)
2. 2
Deep Learning Trends
INT8
FP32
Training
Inference
Deep Learning Steps
Data Precision
Topologies
Computer Vision Natural Language Processing
Recommendation Systems
Re-Inforcement Learning
Frameworks
ResNet-50, Squeezenets, Mobilenet GNMT, Bert
NCF, Wide & Deep
MiniGO
Diverse and rapidly
evolving
BFloat16
3. The driving forces of AI Optimization
Diversifying AI
application
3
(conv: General Matrix Multiply)
conv
Recommendation
Engine
conv
Natural Language Processing
conv
Computer
Vision
Hardware
Acceleration
for AI
CPU
+ DL
Acceleration
GPU
+DL
Acceleration
Accelera
tors
4. 4
Deep learn workload time breakdown
• Accelerating matrix multiplication alone doesn’t solve the problem
• Conv and Matmul operations are less dominant beyond computer vision application
• Low-Precision introduces memory bound quantize operations
• Amdahl's law
• Need to have aggressive fusion
*Profiling data collected from internal performance study
5. Accelerating Matrix Multiplication
5
Dot product
Matrix A
Matrix C
Matrix B
M
K
K
N
Dot product with
matrix operation
Matrix A
Matrix C
Matrix B
M
K
K
N
potential
fusion function
7. Framework Graph
Representation for Gelu
Passing
Graph
Limitation of Pattern Match
7
Another Framework Graph
Representation for Gelu
Passing
Graph
Gelu
conv
relu
conv
relu
conv
relu
Input
NHWC
Output0
NHWC
Output1
NHWC
Output2
NHWC
Small pattern miss optimization for large graph
conv
relu
conv
relu
conv
relu
Input
NHWC
Output0
Blocked Layout
Output1
Blocked Layout
Output2
NHWC
Pattern too rigid to match the input graphs
8. 8
• Graph API allows HW backend to maximize performance
• Same integration for multiple AI HW: CPU, GPU, and accelerators
Today
Deep Learning frameworks
Primitives API
HW
Accel
Future
Deep Learning frameworks
CPU
+ DL
Acceleration
GPU
+DL
Acceleration
HW
Accel
Primitives API + Graph API
oneDNN
CPU
+ DL
Acceleration
GPU
+DL
Acceleration
oneDNN
oneDNN is evolving…
10. 10
oneDNN Graph API Usage
oneDNN
Graph API
Graph
Rewrite
Framework
Graph
Passing
Graph
1
3
4
2
1
3
4
2
DL Framework
Framework
Runtime Context
1
3
4
2
CPU GPU
Intel®, ARM Intel®, NVIDIA GPU
* Other names and brands may be claimed as the property of others.
Other implementations
Accelerators
Graph
Rewrite
Framework
Graph
Passing
Graph
1
3
4
2
1
3
4
2
DL Framework
Framework
Runtime Context
1
2
4
3
Leverage oneDNN based framework
integration and oneDNN implementation
Leverage oneDNN based framework
integration and bring your own
implementation based on backend API
Unified API for DL
acceleration libraries
targeting AI HWs
1
3
4
2
4
2 4
2 4
2
oneDNN w/ Graph
backend API
11. Industry
Momentum
oneDNN implementation
ported to A64FX Fugaku CPU
Optimized for the Armv8-A and
SVE instruction set
9.3x speedup for Tensorflow
Resnet-50 training and 7.8x for
inference on A64FX
https://github.com/oneapi-src/oneDNN
11
https://blog.fltech.dev/entry/2020/11/19/fugaku-onednn-deep-dive-en
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy
12. Call to action
• Join us on this journey -
• Hardware developers – read, provide feedback, and adopt oneDNN Graph for
XPU computing!
https://spec.oneapi.com/onednn-graph/latest/
https://github.com/oneapi-src/oneDNN/tree/dev-graph
• Check out www.oneAPI.com for oneAPI specification
• Software developers – try out oneAPI in the Intel DevCloud
https://software.intel.com/content/www/us/en/develop/tools/devcloud.html
12
Preview