SlideShare uma empresa Scribd logo
1 de 41
Baixar para ler offline
Chirag Patel
Engineer, Principal/Manager
Qualcomm AI Research
September 21, 2022
@QCOMResearch
The future of model efficiency
for edge AI
Tijmen Blankevoort
Director of Engineering
Qualcomm AI Research
Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc
2
Our presenters
Why model efficiency is
important for on-device AI
Overview of integer quantization
(INT) versus floating point (FP)
3
1
2
4
Agenda
Chirag
Patel
Engineer, Principal/Manager,
Qualcomm AI Research
Tijmen
Blankevoort
Director, Engineering,
Qualcomm AI Research
5 Questions?
Open-source tools: AI Model
Efficiency Toolkit (AIMET) and
AIMET Model Zoo
AIMET and AIMET Model Zoo are products of Qualcomm Innovation Center, Inc.
Improving low-bit quantization
3
3
Video monitoring
Extended reality Smart cities
Smart factories
Autonomous vehicles
Video conferencing
Smart homes
Smartphone
3
AI is being used all around us
increasing productivity, enhancing collaboration, and transforming industries
4
4
Source: Welling
Will we have reached the capacity of the human brain?
Energy efficiency of the human brain is estimated
to be 100,000x better than current hardware
2025
Weight
parameter
count
1940 1950 1960 1970 1980 1990 2000 2010 2020 2030
1943: First NN (+/- N=10)
1988: NetTalk
(+/- N=20K)
2009: Hinton’s Deep
Belief Net (+/- N=10M)
2013: Google/Y!
(N=+/- 1B)
2025:
N = 100T = 1014
2017: Very large neural
networks (N=137B)
1012
1010
108
106
1014
104
102
100
Deep neural networks
are energy hungry
and growing fast
AI is being powered by the explosive
growth of deep neural networks
2021: Extremely large
neural networks (N=1.6T)
5
Power and thermal
efficiency are essential
for on-device AI
The challenge of
AI workloads
Constrained mobile
environment
Very compute
intensive
Large,
complicated neural
network models
Complex
concurrencies
Always-on
Real-time
Must be thermally
efficient for sleek,
ultra-light designs
Storage/memory
bandwidth limitations
Requires long battery
life for all-day use
6
Holistic
model efficiency
research
Multiple axes to shrink
AI models and efficiently
run them on hardware
Quantization
Learning to reduce
bit-precision while keeping
desired accuracy
Conditional
compute
Learning to execute only parts
of a large inference model
based on the input
Neural
architecture
search
Learning to design smaller
neural networks that are
on par or outperform
hand-designed
architectures on
real hardware
Compilation
Learning to compile
AI models for efficient
hardware execution
7
7
AIMET and AIMET Model Zoo are products of Qualcomm Innovation Center, Inc.
Leading AI research and fast commercialization
Driving the industry towards integer inference and power-efficient AI
AI Model Efficiency Toolkit (AIMET)
AIMET Model Zoo
Relaxed Quantization
(ICLR 2019)
Data-free Quantization
(ICCV 2019)
AdaRound
(ICML 2020)
Bayesian Bits
(NeurIPS 2020)
Quantization
research
Quantization
open-sourcing
Overcoming Oscillations
(ICML 2022)
Transformer Quantization
(EMNLP 2021)
Joint Pruning and Quantization
(ECCV 2020)
FP8 Quantization
(NeurIPS 2022)
8
1: FP32 model compared to quantized model
Leading
research to
efficiently
quantize
AI models
Promising results show that
low-precision integer inference
can become widespread
Virtually the same accuracy
between a FP32 and quantized
AI model through:
• Automated, data free,
post-training methods
• Automated training-based
mixed-precision method
Significant performance per watt
improvements through quantization
Automated reduction in precision
of weights and activations while
maintaining accuracy Models trained at
high precision
32-bit floating point
3452.3194
8-bit Integer
255
Increase in performance
per watt from savings in
memory and compute1
Inference at
lower precision
16-bit Integer
3452
01010101
Increase in performance
per watt from savings in
memory and compute1
up to
4X
4-bit Integer
15
Increase in performance
per watt from savings in
memory and compute1
01010101
up to
16X
up to
64X
01010101
0101
01010101 01010101 01010101 01010101
9
What does it mean to quantize a neural network?
Weight and activation quantization can have different bit-precisions to maintain accuracy
Biases
Choose
quantization
bit-width
Choose
quantization
bit-width
• Simulated quantization ops
are added in the neural network
after each usage of weights
and activations, and after
every ‘operation’
• Quantization is generally
simulated in floating point
instead of running
in integer math
• Weights and activations can
be quantized with the same
or different precisions within
a model layer
• For example, W8A16 uses
quantized 8-bit weights and
16-bit activations. INT8 means
quantized 8-bit weight
and 8-bit activations.
Act quant
Input Conv / FC + RELU Output
Weights
Wt quant
10
10
What algorithm to choose to improve accuracy?
Post-training quantization
(PTQ)
Quantization-aware training
(QAT)
Take pre-trained FP32 model and
convert it directly into fixed-point network
Train/fine-tune the network with the
simulated quantization operations in place
No need for the original
training pipeline
Requires access to the training
pipeline and labelled data
Data-free or small (unlabeled)
calibration set needed
Simple usage
(⇔ single API call)
Longer training times
Might not reach as high
accuracy as QAT
Hyper-parameter tuning
Achieves higher accuracy,
especially for lower bit-widths
11
Which is the better format for quantizing neural networks?
Floating point vs integer
12
12
2 2
S
4
S
2
S 2
𝑠: scale; 𝑚: mantissa; 𝑒: exponent; S: sign
INT8 and FP8 have the same number of values
but different distributions
Multiple FP8 formats exist,
and they consume more power than INT8
FP: 𝑧 = 𝑠 ⋅ 𝑚 ⋅ 2!
Formats
7
S
6 1
S
5 2
S
4 3
S
3 4
S
2 5
S
INT8
INT8
FP8 5/2
FP8 4/3
FP8 3/4
FP8 2/5
Formats most
commonly proposed
in the industry
INT: 𝑧 = 𝑠 ⋅ 𝑚
Mantissa Exponent
13
8
7
6
5
4
2
0
3
1
10
8
6
4
2
0
Most layers of models
do not have large outliers
FP8 may be
useful for model
layers with
large outliers
Normal
Outlier-heavy distribution
Uniform
Weight distribution Signal-to-noise ratio (SNR)
higher better
INT8 5M2E 4M3E 3M4E 2M5E
INT8 5M2E 4M3E 3M4E 2M5E
SNR
INT8 5M2E 4M3E 3M4E 2M5E
12
10
8
6
4
2
0
SNR
SNR
Normal
Outlier-heavy distribution
Uniform
Some
outliers
14
14
Several FP8 formats are required to get the best PTQ inference results
For different networks, different formats are better — it depends on the amount of outliers
Supporting multiple formats in hardware is expensive
Model FP32 Best FP8 format Best result Worst FP8 format Worst FP8 format
ResNet18 69.72% 69.66% 64.92%
MobileNetV2 71.70% 71.06% 49.51%
BERT 83.06 82.80 71.56
SalsaNext 55.80 55.67 55.12
HRNet 81.05 81.04 80.77
DeepLabV3
(MobileNetV2)
72.91 72.58
37.93
ViT 77.75% 77.71% 76.69
5 2
5 2
3 4
4 3
5 2
4 3
4 3
2 5
3 4
3 4
3 4
“FP8 Quantization: The Power of the Exponent”, NeurIPS 2022
2 5
5 2
2 5
15
FP32 INT8 Best FP8*
69.72%
69.55% 69.66%
70.43% 69.82%
ResNet
FP32 INT8 Best FP8*
71.70%
70.94% 71.06%
71.82% 71.54%
MobileNet V2
FP32 INT8 Best FP8*
83.06
71.03 82.80
83.26 83.70
Bert
FP32 INT8 Best FP8*
72.91
71.24 72.58
73.99 72.41
DeepLabV3
*: Best FP8 is the best result from testing the different FP8 formats.
“FP8 Quantization: The Power of the Exponent”, NeurIPS 2022
INT8 has similar
results as FP8
with QAT
Outliers can
be suitably
trained with
QAT
PTQ
QAT
• No PTQ tricks –
Per-channel
• All QAT results
per-tensor quantization
• FP8 mantissa and
exponent format
was optimized for
this comparison
16
FP32 INT
(W8A8)
INT
(W8A16)
Best FP8
result
ResNet18 69.72% 69.55% 69.75% 69.66%
HRNet 81.05 80.93 81.08 81.04
INT W8A16
accuracy is better
than
FP8 for all models
with PTQ
No real gap FP8/INT8
FP32 INT
(W8A8)
INT
(W8A16)
Best FP8
result
BERT 83.06 71.03 82.90 82.80
SalsaNext 55.80 54.22 55.82 55.67
ViT 77.75% 76.39% 77.73% 77.71%
MobileNetV2 69.72% 69.55% 69.75% 69.66%
• Min-max range setting
• Per-channel quantization
17
INT16 performs
better than FP16
unless there are
large outliers
• 1,000 samples of 𝑋 ~
𝑁𝑜𝑟𝑚𝑎𝑙(0, 1)
• We add one outlier
and vary its value
• INT13 performs comparable
to FP16 in terms of MSE
10−4
10−5
10−7
10−6
0 250 500 750 1000
Outlier
1250 1500 1750 2000
MSE
MSE Error for w/o activation function for sigma =1.0, 100 neurons
int16
float16
18
INT16
outperforms
FP16 in accuracy
and runs faster
in hardware
MobileNetV2
71.74%
FP32
71.69%
FP16
71.74%
INT16
EfficientDet-D1
40.08
FP32
40.07
FP16
40.07
INT16
19
19
Integer quantization is the way to do AI inference
Enabled through PTQ and QAT techniques
Mixed precision gives the best of both worlds,
using extra precision only when necessary
INT 4 INT 8 INT 16
Best Better Good
Power efficiency
and latency
Best
Better
Good
Accuracy
Overcoming oscillations in quantization-aware training
Improving quantization-aware
training at lower bit-widths
21
21
Poor validation accuracy is consistent across various learning rates and epochs during QAT
“Overcoming Oscillations in Quantization-Aware Training” (ICML 2022)
Validation accuracy for QAT is typically unstable
Why do we see the validation accuracy drop for 4-bit QAT?
Training accuracy
Epoch
Different
learning rates (LR)
71%
67%
62%
2 4 6 8 10 12 14 16 18 20
Validation accuracy
Epoch
44%
70%
60%
50%
42%
2 4 6 8 10 12 14 16 18 20
22
22
2
0
−2
−4
Oscillations are present in QAT
Example of MobileNetV2 training (last 1000 iterations of training)
Quantized weights, 𝑞 𝑤
(Lowest bit of 4−bit weight)
Latent weights, 𝑤
(Positive FP values zoomed on 0.5)
Sign
and
Bit0
weights
Iterations during QAT
FP
weights
0.5004
0.5002
0.5000
0.4998
0.4996
0 200 400 600 1000
23
Network Bits Pre-BN Acc. Post-BN Acc.
MobileNetV2 8 71.79 0.07
71.89 0.05
MobileNetV2 4 68.99 0.44
71.01 0.05
MobileNetV2 3 64.97 1.23
69.50 0.04
Method Train Loss Val. Acc. (%)
Baseline 1.3566 69.50
SR (mean + std) 1.3547 0.0053
69.58 0.09
SR (best) 1.3391 69.85
AdaRound 1.3070 70.12
+2.02
+4.53
Corrupts batch norm statistics
• At inference, BN uses running
statistics from training
• Oscillations lead to big changes
in statistics -> running statistics
are not a good estimate
• Solution: BN re-estimation
Disrupts model convergence
• At the end of training, oscillating weights
may not be on the correct ‘side’
• Stochastic rounding (SR) and binary
optimization (AdaRound) show that they
are indeed not in the best possible state.
• Oscillations prevent network from
converging to best local minimum
Oscillating weights are harmful when training a model
23
24
EMA = Exponential Moving Average
Higher oscillation
frequencies during
QAT negatively
impact accuracy
Oscillation
occurs
The integer value changes
&
Its direction is opposite
to its previous one
25
25
“Overcoming Oscillations in Quantization-Aware Training” (ICML 2022)
Oscillation dampening and iterative freezing fix the QAT issue
Dampening takes a regularizing approach:
the weights are forced closer to the bin center
Freezing the oscillating weights stabilizes training
and mitigates the unwanted effects of oscillations
100
80
60
40
20
0
−0.4 −0.2 0.0 0.2 0.4
wint − w/s wint − w/s
−0.4 −0.2 0.0 0.2 0.4
400
300
200
100
0
Dampening Freezing
Frozen
Not frozen
26
1: “Overcoming Oscillations in Quantization-Aware Training” (ICML 2022)
2: “Learned step size quantization” (ICLR,2020)
We achieve
SOTA results
for INT4
quantization1
• Train with learned
step-size quantization (LSQ2)
and re-estimation
• Dampening and freezing
perform on par with
each other
Method W/A Val. Acc. (%)
Full-precision 32/32 65.1
LSQ*
(Esser et al., 2020)
4/4 61.0 (-4.1)
LSQ + BR
(Han et al., 2021)
4/4 61.5 (-3.6)
LSQ + Dampen
(ours)
4/4 63.7 (-1.4)
LSQ + Freeze
(ours)
4/4 63.6 (-1.5)
Method W/A Val. Acc. (%)
Full-precision 32/32 71.7
LSQ*
(Esser et al., 2020)
4/4 69.5 (−2.3)
LSQ + BR
(Han et al., 2021)
4/4 70.4 (−1.4)
LSQ + Dampen
(ours)
4/4 70.5 (−1.2)
LSQ + Freeze
(ours)
4/4 70.6 (−1.1)
MobileNetV3 MobileNetV2
27
Open-source projects to scale energy-efficient AI to the masses
AIMET &
AIMET Model Zoo
28
28
AIMET makes AI models small
Open-sourced GitHub project that includes state-of-the-art quantization
and compression techniques from Qualcomm AI Research
Features: State-of-the-art
network compression
tools
State-of-the-art
quantization
tools
Support for both
TensorFlow
and PyTorch
Benchmarks
and tests for
many models
Developed by
professional software
developers
If interested, please join the AIMET GitHub project: https://github.com/quic/aimet
Trained
AI model
AI Model Efficiency Toolkit
(AIMET)
Optimized
AI model
TensorFlow or PyTorch
Compression
Quantization
Deployed
AI model
AIMET
Providing advanced
model efficiency
features and benefits
Benefits
Lower memory
bandwidth
Lower
power
Lower
storage
Higher
performance
Maintains model
accuracy
Simple
ease of use
Features
Quantization
Compression
State-of-the-art INT8 and INT4 performance
Quantization-aware training
(QAT)
Efficient tensor decomposition
and removal of redundant
channels in convolution layers
Spatial singular value
decomposition (SVD)
Channel pruning
Visualization
Analysis tools for drawing insights
for quantization and compression
Weight ranges
Per-layer compression sensitivity
Quantization simulation
Post-training quantization
(PTQ) methods:
• Data-Free Quantization
• Adaptive Rounding (AdaRound),
• Automatic Mixed Precision (AMP)
• AutoQuant
29
30
30
AIMET features and APIs are easy to use
Designed to fit naturally in the AI model development workflow for researchers, developers, and ISVs
PyTorch
model
PyTorch
Model
Train
No change
Same API
PyTorch
Model
Train
Create
QuantSim
Evaluate
Typical model
training workflow
User-friendly
QAT workflow in AIMET
No change
Same API
Evaluate
User-friendly APIs invoked directly
from the existing model pipeline
Example Jupyter notebooks on AIMET GitHub
AIMET
extensions extensions
Model optimization library
(techniques to compress & quantize models)
Framework specific API Algorithm API
Other
frameworks
31
Low resolution Super resolution
First 4K
super-resolution
demo at 100+ FPS
on mobile
Our new machine-learning
based super-resolution method
8-bit quantized model created
using AIMET QAT
32
With better
PTQ and QAT
techniques,
more models
will achieve better
power efficiency
AIMET enables accurate INT W4A8
for wide range of use cases
Task Model FP32 INT W4A8
Classification
ResNet50 76.10% 75.4%
ResNet18 69.75% 68.96%
EfficientNet-Lite 75.31% 74.33%
Regnext 78.3% 77.2%
Segmentation
Deeplabv3
(RN-50)
76.07% 75.91%
Super-resolution ABPN 31.97 dB 31.67 (dB)
Pose detection
PoseNet
(HRNet-32)
0.765 0.763
33
33
Comparison between FP32 model and model quantized with AIMET
AIMET quantizes transformers with high accuracy,
comparable to FP32
Top-1 accuracy
81.30
FP32
80.88
INT8/
W8A16
(PTQ)
ViT base
GLUE
84.99
FP32
84.60
INT8
(QAT)
RoBERTa base
GLUE
82.73
FP32
81.95
INT8
(QAT)
BERT base
(uncased)
GLUE
79.21
FP32
78.61
INT8
(QAT)
DistilBERT base
(uncased)
34
34
AIMET
Model Zoo
Accurate pre-trained 8-bit
quantized models
Image
classification
Semantic
segmentation
Pose
estimation
Speech
recognition
Object
detection
Super
resolution
35
35
35
*: Comparison between FP32 model and INT8 model quantized with AIMET. For further details, check out: https://github.com/quic/aimet-model-zoo/
AIMET Model Zoo includes popular quantized AI models
Accuracy is maintained for INT8 models — less than 1% loss*
Top-1 accuracy*
75.21%
FP32
74.96%
INT8
ResNet-50 (v1)
Top-1 accuracy*
75%
FP32
74.21%
INT8
MobileNet-v2-1.4
Top-1 accuracy*
74.93%
FP32
74.99%
INT8
EfficientNet Lite
mAP*
0.2469
FP32
0.2456
INT8
ResNet-50 (v1)
mAP*
0.35
FP32
0.349
INT8
RetinaNet
mAP*
0.383
FP32
0.379
INT8
Pose estimation
PSNR*
25.45
FP32
24.78
INT8
SRGAN
Top-1 accuracy*
71.67%
FP32
71.14%
INT8
MobileNetV2
Top-1 accuracy*
75.42%
FP32
74.44%
INT8
EfficientNet-lite0
mIoU*
72.62%
FP32
72.22%
INT8
DeepLabV3+
mAP*
68.7%
FP32
68.6%
INT8
MobileNetV2-SSD-Lite
mAP*
0.364
FP32
0.359
INT8
Pose estimation
PSNR
25.51
FP32
25.5
INT8
SRGAN
WER*
9.92%
FP32
10.22%
INT8
DeepSpeech2
PSNR
32.75
FP32
32.69
INT8
ABPN
<1%
Loss in
accuracy*
36
*: Comparison between FP32 model and INT8 model quantized with AIMET. For further details, check out: https://github.com/quic/aimet-model-zoo/
Super resolution
model suite
Wide variety
of models,
suited for fast,
energy-efficient
INT8 inference
• Virtually no accuracy
loss compared to FP32
• Simple and convenient
for developer integration
• Useful across diverse
applications, from gaming
and photography to XR
and autonomous driving
1 Anchor-based Plain Net (ABPN)
2 Robust Real-Time Single-Image Super Resolution (XLSR)
3 Super-Efficient Super Resolution (SESR)
PSNR (dB)
32.66
FP32
32.58
INT8
SESR-M73
PSNR (dB)
32.41
FP32
32.25
INT8
SESR-M33
PSNR (dB)
32.71
FP32
32.64
INT8
ABPN1
PSNR (dB)
32.57
FP32
32.30
INT8
XLSR2
PSNR (dB)
33.03
FP32
32.92
INT8
SESR-XL3
INT8 PSNR
and
visual quality
comparable
to FP32*
37
37
Explore our open-source projects and tools
AIMET
State-of-the-art quantization
and compression techniques
github.com/quic/aimet
AIMET Model Zoo
Accurate pre-trained
8-bit quantized models
github.com/quic/aimet-model-zoo
Quantization
whitepaper
arxiv.org/abs/2201.08442
AI Frameworks
Qualcomm® Neural Processing SDK TF Lite
TF Lite Micro Direct ML
AI Runtimes
Programming Languages
Virtual platforms
Core Libraries
Math Libraries
Profilers & Debuggers
Compilers
System Interface SoC, accelerator drivers
Auto
XR Robotics
IoT
ACPC
Smartphones Cloud
Platforms
Qualcomm® AI Engine Direct (QNN)
Tools:
Emulation Support
Qualcomm Neural Processing SDK, Qualcomm AI Model Studio, and Qualcomm AI Engine Direct are products of Qualcomm Technologies, Inc. and/or its subsidiaries
Qualcomm AI
Model Studio
AIMET
AIMET
Model Zoo
NAS
Model
analyzers
Infrastructure:
39
Model efficiency is key for
enabling on-device AI and
accelerating the growth of the
connected intelligent edge
INT8/16 perform better
than FP8/16
Qualcomm AI Research is
enabling 4-bit integer models
AIMET is making fixed-point
quantization possible at scale
without sacrificing accuracy
40
www.qualcomm.com/research/
artificial-intelligence
@QCOMResearch
www.qualcomm.com/news/onq
www.youtube.com/c/QualcommResearch www.slideshare.net/qualcommwirelessevolution
Connect with us
Questions
Nothing in these materials is an offer to sell any of the components
or devices referenced herein.
©2018-2022 Qualcomm Technologies, Inc. and/or its affiliated
companies. All Rights Reserved.
Qualcomm, Hexagon, and Snapdragon are trademarks or registered
trademarks of Qualcomm Incorporated. Other products and brand names
may be trademarks or registered trademarks of their respective owners.
References in this presentation to “Qualcomm” may mean Qualcomm
Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or
business units within the Qualcomm corporate structure, as applicable.
Qualcomm Incorporated includes our licensing business, QTL, and the vast
majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary
of Qualcomm Incorporated, operates, along with its subsidiaries,
substantially all of our engineering, research and development functions,
and substantially all of our products and services businesses, including our
QCT semiconductor business.
Follow us on:
For more information, visit us at:
qualcomm.com & qualcomm.com/blog
Thank you

Mais conteúdo relacionado

Mais procurados

Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Rakuten Group, Inc.
 
Enabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through QuantizationEnabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through QuantizationQualcomm Research
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxDeep Learning Italia
 
Bringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensingBringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensingQualcomm Research
 
3GPP Release 17: Completing the first phase of 5G evolution
3GPP Release 17: Completing the first phase of 5G evolution3GPP Release 17: Completing the first phase of 5G evolution
3GPP Release 17: Completing the first phase of 5G evolutionQualcomm Research
 
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...Edge AI and Vision Alliance
 
Propelling 5G forward: a closer look at 3GPP Release-16
Propelling 5G forward: a closer look at 3GPP Release-16Propelling 5G forward: a closer look at 3GPP Release-16
Propelling 5G forward: a closer look at 3GPP Release-16Qualcomm Research
 
Cellular V2X is Gaining Momentum
Cellular V2X is Gaining MomentumCellular V2X is Gaining Momentum
Cellular V2X is Gaining MomentumQualcomm Research
 
HPC+AI ってよく聞くけど結局なんなの
HPC+AI ってよく聞くけど結局なんなのHPC+AI ってよく聞くけど結局なんなの
HPC+AI ってよく聞くけど結局なんなのNVIDIA Japan
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware LandscapeGrigory Sapunov
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkNader Karimi
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfPo-Chuan Chen
 
Next Generation V2X Technology
Next Generation V2X TechnologyNext Generation V2X Technology
Next Generation V2X TechnologyMalik Saad
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]Dongmin Choi
 
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN FrameworkKeymate.AI
 
Onnx and onnx runtime
Onnx and onnx runtimeOnnx and onnx runtime
Onnx and onnx runtimeVishwas N
 

Mais procurados (20)

Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)
 
Enabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through QuantizationEnabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through Quantization
 
NVIDIA Keynote #GTC21
NVIDIA Keynote #GTC21 NVIDIA Keynote #GTC21
NVIDIA Keynote #GTC21
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptx
 
Bringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensingBringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensing
 
3GPP Release 17: Completing the first phase of 5G evolution
3GPP Release 17: Completing the first phase of 5G evolution3GPP Release 17: Completing the first phase of 5G evolution
3GPP Release 17: Completing the first phase of 5G evolution
 
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
 
Propelling 5G forward: a closer look at 3GPP Release-16
Propelling 5G forward: a closer look at 3GPP Release-16Propelling 5G forward: a closer look at 3GPP Release-16
Propelling 5G forward: a closer look at 3GPP Release-16
 
Cellular V2X is Gaining Momentum
Cellular V2X is Gaining MomentumCellular V2X is Gaining Momentum
Cellular V2X is Gaining Momentum
 
HPC+AI ってよく聞くけど結局なんなの
HPC+AI ってよく聞くけど結局なんなのHPC+AI ってよく聞くけど結局なんなの
HPC+AI ってよく聞くけど結局なんなの
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware Landscape
 
NVIDIA @ AI FEST
NVIDIA @ AI FESTNVIDIA @ AI FEST
NVIDIA @ AI FEST
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
 
Next Generation V2X Technology
Next Generation V2X TechnologyNext Generation V2X Technology
Next Generation V2X Technology
 
FPGAs and Machine Learning
FPGAs and Machine LearningFPGAs and Machine Learning
FPGAs and Machine Learning
 
Cloud, Fog & Edge Computing
Cloud, Fog & Edge ComputingCloud, Fog & Edge Computing
Cloud, Fog & Edge Computing
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN Framework
 
Onnx and onnx runtime
Onnx and onnx runtimeOnnx and onnx runtime
Onnx and onnx runtime
 

Semelhante a The future of model efficiency for edge AI: Overcoming oscillations in quantization-aware training

Technology overview
Technology overviewTechnology overview
Technology overviewvirtuehm
 
Introduction to embedded system & density based traffic light system
Introduction to embedded system & density based traffic light systemIntroduction to embedded system & density based traffic light system
Introduction to embedded system & density based traffic light systemRani Loganathan
 
IRJET- Android based Home Automation System with Power Optimization Modes
IRJET-  	  Android based Home Automation System with Power Optimization ModesIRJET-  	  Android based Home Automation System with Power Optimization Modes
IRJET- Android based Home Automation System with Power Optimization ModesIRJET Journal
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGATO project
 
Design Efficient Wireless Monitoring Platform for Recycling Point Spots
Design Efficient Wireless Monitoring Platform for Recycling Point SpotsDesign Efficient Wireless Monitoring Platform for Recycling Point Spots
Design Efficient Wireless Monitoring Platform for Recycling Point SpotsIJMTST Journal
 
IRJET- Design & Implementation of Black Box in Automobiles System
IRJET-  	  Design & Implementation of Black Box in Automobiles SystemIRJET-  	  Design & Implementation of Black Box in Automobiles System
IRJET- Design & Implementation of Black Box in Automobiles SystemIRJET Journal
 
IRJET- Data Acquisition using Tensile Strength Testing Machine
IRJET- Data Acquisition using Tensile Strength Testing MachineIRJET- Data Acquisition using Tensile Strength Testing Machine
IRJET- Data Acquisition using Tensile Strength Testing MachineIRJET Journal
 
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...iosrjce
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Report on Embedded Based Home security system
Report on Embedded Based Home security systemReport on Embedded Based Home security system
Report on Embedded Based Home security systemNIT srinagar
 
09.50 Ernst Vrolijks
09.50 Ernst Vrolijks09.50 Ernst Vrolijks
09.50 Ernst VrolijksThemadagen
 
Arduino in TinyML with Edge Impulse
Arduino in TinyML with Edge ImpulseArduino in TinyML with Edge Impulse
Arduino in TinyML with Edge ImpulseRobocraze
 
Implementation of an Improved Microcontroller Based Moving Message Display Sy...
Implementation of an Improved Microcontroller Based Moving Message Display Sy...Implementation of an Improved Microcontroller Based Moving Message Display Sy...
Implementation of an Improved Microcontroller Based Moving Message Display Sy...IOSR Journals
 
Implementation of an Improved Microcontroller Based Moving Message Display Sy...
Implementation of an Improved Microcontroller Based Moving Message Display Sy...Implementation of an Improved Microcontroller Based Moving Message Display Sy...
Implementation of an Improved Microcontroller Based Moving Message Display Sy...IOSR Journals
 
Implementation of an Improved Microcontroller Based Moving Message Display Sy...
Implementation of an Improved Microcontroller Based Moving Message Display Sy...Implementation of an Improved Microcontroller Based Moving Message Display Sy...
Implementation of an Improved Microcontroller Based Moving Message Display Sy...IOSR Journals
 
Implementation of an Improved Microcontroller Based Moving Message Display Sy...
Implementation of an Improved Microcontroller Based Moving Message Display Sy...Implementation of an Improved Microcontroller Based Moving Message Display Sy...
Implementation of an Improved Microcontroller Based Moving Message Display Sy...IOSR Journals
 
Proteus Simulation Based Pic Projects _ PIC Microcontroller.pdf
Proteus Simulation Based Pic Projects _ PIC Microcontroller.pdfProteus Simulation Based Pic Projects _ PIC Microcontroller.pdf
Proteus Simulation Based Pic Projects _ PIC Microcontroller.pdfIsmailkhan77481
 

Semelhante a The future of model efficiency for edge AI: Overcoming oscillations in quantization-aware training (20)

Technology overview
Technology overviewTechnology overview
Technology overview
 
Introduction to embedded system & density based traffic light system
Introduction to embedded system & density based traffic light systemIntroduction to embedded system & density based traffic light system
Introduction to embedded system & density based traffic light system
 
IRJET- Android based Home Automation System with Power Optimization Modes
IRJET-  	  Android based Home Automation System with Power Optimization ModesIRJET-  	  Android based Home Automation System with Power Optimization Modes
IRJET- Android based Home Automation System with Power Optimization Modes
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
 
Design Efficient Wireless Monitoring Platform for Recycling Point Spots
Design Efficient Wireless Monitoring Platform for Recycling Point SpotsDesign Efficient Wireless Monitoring Platform for Recycling Point Spots
Design Efficient Wireless Monitoring Platform for Recycling Point Spots
 
IRJET- Design & Implementation of Black Box in Automobiles System
IRJET-  	  Design & Implementation of Black Box in Automobiles SystemIRJET-  	  Design & Implementation of Black Box in Automobiles System
IRJET- Design & Implementation of Black Box in Automobiles System
 
IRJET- Data Acquisition using Tensile Strength Testing Machine
IRJET- Data Acquisition using Tensile Strength Testing MachineIRJET- Data Acquisition using Tensile Strength Testing Machine
IRJET- Data Acquisition using Tensile Strength Testing Machine
 
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Noise Immune and Area Optimized Serial Interface for FPGA based Industrial In...
Noise Immune and Area Optimized Serial Interface for FPGA based Industrial In...Noise Immune and Area Optimized Serial Interface for FPGA based Industrial In...
Noise Immune and Area Optimized Serial Interface for FPGA based Industrial In...
 
Report on Embedded Based Home security system
Report on Embedded Based Home security systemReport on Embedded Based Home security system
Report on Embedded Based Home security system
 
FPGA Implementation of High Speed FIR Filters and less power consumption stru...
FPGA Implementation of High Speed FIR Filters and less power consumption stru...FPGA Implementation of High Speed FIR Filters and less power consumption stru...
FPGA Implementation of High Speed FIR Filters and less power consumption stru...
 
09.50 Ernst Vrolijks
09.50 Ernst Vrolijks09.50 Ernst Vrolijks
09.50 Ernst Vrolijks
 
Arduino in TinyML with Edge Impulse
Arduino in TinyML with Edge ImpulseArduino in TinyML with Edge Impulse
Arduino in TinyML with Edge Impulse
 
Implementation of an Improved Microcontroller Based Moving Message Display Sy...
Implementation of an Improved Microcontroller Based Moving Message Display Sy...Implementation of an Improved Microcontroller Based Moving Message Display Sy...
Implementation of an Improved Microcontroller Based Moving Message Display Sy...
 
Implementation of an Improved Microcontroller Based Moving Message Display Sy...
Implementation of an Improved Microcontroller Based Moving Message Display Sy...Implementation of an Improved Microcontroller Based Moving Message Display Sy...
Implementation of an Improved Microcontroller Based Moving Message Display Sy...
 
Implementation of an Improved Microcontroller Based Moving Message Display Sy...
Implementation of an Improved Microcontroller Based Moving Message Display Sy...Implementation of an Improved Microcontroller Based Moving Message Display Sy...
Implementation of an Improved Microcontroller Based Moving Message Display Sy...
 
E010132736
E010132736E010132736
E010132736
 
Implementation of an Improved Microcontroller Based Moving Message Display Sy...
Implementation of an Improved Microcontroller Based Moving Message Display Sy...Implementation of an Improved Microcontroller Based Moving Message Display Sy...
Implementation of an Improved Microcontroller Based Moving Message Display Sy...
 
Proteus Simulation Based Pic Projects _ PIC Microcontroller.pdf
Proteus Simulation Based Pic Projects _ PIC Microcontroller.pdfProteus Simulation Based Pic Projects _ PIC Microcontroller.pdf
Proteus Simulation Based Pic Projects _ PIC Microcontroller.pdf
 

Mais de Qualcomm Research

Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdfQualcomm Research
 
Understanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfUnderstanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfQualcomm Research
 
Enabling the metaverse with 5G- web.pdf
Enabling the metaverse with 5G- web.pdfEnabling the metaverse with 5G- web.pdf
Enabling the metaverse with 5G- web.pdfQualcomm Research
 
How will sidelink bring a new level of 5G versatility.pdf
How will sidelink bring a new level of 5G versatility.pdfHow will sidelink bring a new level of 5G versatility.pdf
How will sidelink bring a new level of 5G versatility.pdfQualcomm Research
 
Scaling 5G to new frontiers with NR-Light (RedCap)
Scaling 5G to new frontiers with NR-Light (RedCap)Scaling 5G to new frontiers with NR-Light (RedCap)
Scaling 5G to new frontiers with NR-Light (RedCap)Qualcomm Research
 
Realizing mission-critical industrial automation with 5G
Realizing mission-critical industrial automation with 5GRealizing mission-critical industrial automation with 5G
Realizing mission-critical industrial automation with 5GQualcomm Research
 
Setting off the 5G Advanced evolution with 3GPP Release 18
Setting off the 5G Advanced evolution with 3GPP Release 18Setting off the 5G Advanced evolution with 3GPP Release 18
Setting off the 5G Advanced evolution with 3GPP Release 18Qualcomm Research
 
5G positioning for the connected intelligent edge
5G positioning for the connected intelligent edge5G positioning for the connected intelligent edge
5G positioning for the connected intelligent edgeQualcomm Research
 
The essential role of AI in the 5G future
The essential role of AI in the 5G futureThe essential role of AI in the 5G future
The essential role of AI in the 5G futureQualcomm Research
 
How AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsHow AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsQualcomm Research
 
Role of localization and environment perception in autonomous driving
Role of localization and environment perception in autonomous drivingRole of localization and environment perception in autonomous driving
Role of localization and environment perception in autonomous drivingQualcomm Research
 
How to build high performance 5G networks with vRAN and O-RAN
How to build high performance 5G networks with vRAN and O-RANHow to build high performance 5G networks with vRAN and O-RAN
How to build high performance 5G networks with vRAN and O-RANQualcomm Research
 
What's in the future of 5G millimeter wave?
What's in the future of 5G millimeter wave? What's in the future of 5G millimeter wave?
What's in the future of 5G millimeter wave? Qualcomm Research
 
Efficient video perception through AI
Efficient video perception through AIEfficient video perception through AI
Efficient video perception through AIQualcomm Research
 
Enabling the rise of the smartphone: Chronicling the developmental history at...
Enabling the rise of the smartphone: Chronicling the developmental history at...Enabling the rise of the smartphone: Chronicling the developmental history at...
Enabling the rise of the smartphone: Chronicling the developmental history at...Qualcomm Research
 
5G spectrum innovations and global update
5G spectrum innovations and global update5G spectrum innovations and global update
5G spectrum innovations and global updateQualcomm Research
 
Transforming enterprise and industry with 5G private networks
Transforming enterprise and industry with 5G private networksTransforming enterprise and industry with 5G private networks
Transforming enterprise and industry with 5G private networksQualcomm Research
 
The essential role of technology standards
The essential role of technology standardsThe essential role of technology standards
The essential role of technology standardsQualcomm Research
 

Mais de Qualcomm Research (20)

Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdf
 
The future of AI is hybrid
The future of AI is hybridThe future of AI is hybrid
The future of AI is hybrid
 
Understanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfUnderstanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdf
 
Enabling the metaverse with 5G- web.pdf
Enabling the metaverse with 5G- web.pdfEnabling the metaverse with 5G- web.pdf
Enabling the metaverse with 5G- web.pdf
 
How will sidelink bring a new level of 5G versatility.pdf
How will sidelink bring a new level of 5G versatility.pdfHow will sidelink bring a new level of 5G versatility.pdf
How will sidelink bring a new level of 5G versatility.pdf
 
Scaling 5G to new frontiers with NR-Light (RedCap)
Scaling 5G to new frontiers with NR-Light (RedCap)Scaling 5G to new frontiers with NR-Light (RedCap)
Scaling 5G to new frontiers with NR-Light (RedCap)
 
Realizing mission-critical industrial automation with 5G
Realizing mission-critical industrial automation with 5GRealizing mission-critical industrial automation with 5G
Realizing mission-critical industrial automation with 5G
 
Setting off the 5G Advanced evolution with 3GPP Release 18
Setting off the 5G Advanced evolution with 3GPP Release 18Setting off the 5G Advanced evolution with 3GPP Release 18
Setting off the 5G Advanced evolution with 3GPP Release 18
 
5G positioning for the connected intelligent edge
5G positioning for the connected intelligent edge5G positioning for the connected intelligent edge
5G positioning for the connected intelligent edge
 
The essential role of AI in the 5G future
The essential role of AI in the 5G futureThe essential role of AI in the 5G future
The essential role of AI in the 5G future
 
How AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsHow AI research is enabling next-gen codecs
How AI research is enabling next-gen codecs
 
Role of localization and environment perception in autonomous driving
Role of localization and environment perception in autonomous drivingRole of localization and environment perception in autonomous driving
Role of localization and environment perception in autonomous driving
 
Pioneering 5G broadcast
Pioneering 5G broadcastPioneering 5G broadcast
Pioneering 5G broadcast
 
How to build high performance 5G networks with vRAN and O-RAN
How to build high performance 5G networks with vRAN and O-RANHow to build high performance 5G networks with vRAN and O-RAN
How to build high performance 5G networks with vRAN and O-RAN
 
What's in the future of 5G millimeter wave?
What's in the future of 5G millimeter wave? What's in the future of 5G millimeter wave?
What's in the future of 5G millimeter wave?
 
Efficient video perception through AI
Efficient video perception through AIEfficient video perception through AI
Efficient video perception through AI
 
Enabling the rise of the smartphone: Chronicling the developmental history at...
Enabling the rise of the smartphone: Chronicling the developmental history at...Enabling the rise of the smartphone: Chronicling the developmental history at...
Enabling the rise of the smartphone: Chronicling the developmental history at...
 
5G spectrum innovations and global update
5G spectrum innovations and global update5G spectrum innovations and global update
5G spectrum innovations and global update
 
Transforming enterprise and industry with 5G private networks
Transforming enterprise and industry with 5G private networksTransforming enterprise and industry with 5G private networks
Transforming enterprise and industry with 5G private networks
 
The essential role of technology standards
The essential role of technology standardsThe essential role of technology standards
The essential role of technology standards
 

Último

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

The future of model efficiency for edge AI: Overcoming oscillations in quantization-aware training

  • 1. Chirag Patel Engineer, Principal/Manager Qualcomm AI Research September 21, 2022 @QCOMResearch The future of model efficiency for edge AI Tijmen Blankevoort Director of Engineering Qualcomm AI Research Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc
  • 2. 2 Our presenters Why model efficiency is important for on-device AI Overview of integer quantization (INT) versus floating point (FP) 3 1 2 4 Agenda Chirag Patel Engineer, Principal/Manager, Qualcomm AI Research Tijmen Blankevoort Director, Engineering, Qualcomm AI Research 5 Questions? Open-source tools: AI Model Efficiency Toolkit (AIMET) and AIMET Model Zoo AIMET and AIMET Model Zoo are products of Qualcomm Innovation Center, Inc. Improving low-bit quantization
  • 3. 3 3 Video monitoring Extended reality Smart cities Smart factories Autonomous vehicles Video conferencing Smart homes Smartphone 3 AI is being used all around us increasing productivity, enhancing collaboration, and transforming industries
  • 4. 4 4 Source: Welling Will we have reached the capacity of the human brain? Energy efficiency of the human brain is estimated to be 100,000x better than current hardware 2025 Weight parameter count 1940 1950 1960 1970 1980 1990 2000 2010 2020 2030 1943: First NN (+/- N=10) 1988: NetTalk (+/- N=20K) 2009: Hinton’s Deep Belief Net (+/- N=10M) 2013: Google/Y! (N=+/- 1B) 2025: N = 100T = 1014 2017: Very large neural networks (N=137B) 1012 1010 108 106 1014 104 102 100 Deep neural networks are energy hungry and growing fast AI is being powered by the explosive growth of deep neural networks 2021: Extremely large neural networks (N=1.6T)
  • 5. 5 Power and thermal efficiency are essential for on-device AI The challenge of AI workloads Constrained mobile environment Very compute intensive Large, complicated neural network models Complex concurrencies Always-on Real-time Must be thermally efficient for sleek, ultra-light designs Storage/memory bandwidth limitations Requires long battery life for all-day use
  • 6. 6 Holistic model efficiency research Multiple axes to shrink AI models and efficiently run them on hardware Quantization Learning to reduce bit-precision while keeping desired accuracy Conditional compute Learning to execute only parts of a large inference model based on the input Neural architecture search Learning to design smaller neural networks that are on par or outperform hand-designed architectures on real hardware Compilation Learning to compile AI models for efficient hardware execution
  • 7. 7 7 AIMET and AIMET Model Zoo are products of Qualcomm Innovation Center, Inc. Leading AI research and fast commercialization Driving the industry towards integer inference and power-efficient AI AI Model Efficiency Toolkit (AIMET) AIMET Model Zoo Relaxed Quantization (ICLR 2019) Data-free Quantization (ICCV 2019) AdaRound (ICML 2020) Bayesian Bits (NeurIPS 2020) Quantization research Quantization open-sourcing Overcoming Oscillations (ICML 2022) Transformer Quantization (EMNLP 2021) Joint Pruning and Quantization (ECCV 2020) FP8 Quantization (NeurIPS 2022)
  • 8. 8 1: FP32 model compared to quantized model Leading research to efficiently quantize AI models Promising results show that low-precision integer inference can become widespread Virtually the same accuracy between a FP32 and quantized AI model through: • Automated, data free, post-training methods • Automated training-based mixed-precision method Significant performance per watt improvements through quantization Automated reduction in precision of weights and activations while maintaining accuracy Models trained at high precision 32-bit floating point 3452.3194 8-bit Integer 255 Increase in performance per watt from savings in memory and compute1 Inference at lower precision 16-bit Integer 3452 01010101 Increase in performance per watt from savings in memory and compute1 up to 4X 4-bit Integer 15 Increase in performance per watt from savings in memory and compute1 01010101 up to 16X up to 64X 01010101 0101 01010101 01010101 01010101 01010101
  • 9. 9 What does it mean to quantize a neural network? Weight and activation quantization can have different bit-precisions to maintain accuracy Biases Choose quantization bit-width Choose quantization bit-width • Simulated quantization ops are added in the neural network after each usage of weights and activations, and after every ‘operation’ • Quantization is generally simulated in floating point instead of running in integer math • Weights and activations can be quantized with the same or different precisions within a model layer • For example, W8A16 uses quantized 8-bit weights and 16-bit activations. INT8 means quantized 8-bit weight and 8-bit activations. Act quant Input Conv / FC + RELU Output Weights Wt quant
  • 10. 10 10 What algorithm to choose to improve accuracy? Post-training quantization (PTQ) Quantization-aware training (QAT) Take pre-trained FP32 model and convert it directly into fixed-point network Train/fine-tune the network with the simulated quantization operations in place No need for the original training pipeline Requires access to the training pipeline and labelled data Data-free or small (unlabeled) calibration set needed Simple usage (⇔ single API call) Longer training times Might not reach as high accuracy as QAT Hyper-parameter tuning Achieves higher accuracy, especially for lower bit-widths
  • 11. 11 Which is the better format for quantizing neural networks? Floating point vs integer
  • 12. 12 12 2 2 S 4 S 2 S 2 𝑠: scale; 𝑚: mantissa; 𝑒: exponent; S: sign INT8 and FP8 have the same number of values but different distributions Multiple FP8 formats exist, and they consume more power than INT8 FP: 𝑧 = 𝑠 ⋅ 𝑚 ⋅ 2! Formats 7 S 6 1 S 5 2 S 4 3 S 3 4 S 2 5 S INT8 INT8 FP8 5/2 FP8 4/3 FP8 3/4 FP8 2/5 Formats most commonly proposed in the industry INT: 𝑧 = 𝑠 ⋅ 𝑚 Mantissa Exponent
  • 13. 13 8 7 6 5 4 2 0 3 1 10 8 6 4 2 0 Most layers of models do not have large outliers FP8 may be useful for model layers with large outliers Normal Outlier-heavy distribution Uniform Weight distribution Signal-to-noise ratio (SNR) higher better INT8 5M2E 4M3E 3M4E 2M5E INT8 5M2E 4M3E 3M4E 2M5E SNR INT8 5M2E 4M3E 3M4E 2M5E 12 10 8 6 4 2 0 SNR SNR Normal Outlier-heavy distribution Uniform Some outliers
  • 14. 14 14 Several FP8 formats are required to get the best PTQ inference results For different networks, different formats are better — it depends on the amount of outliers Supporting multiple formats in hardware is expensive Model FP32 Best FP8 format Best result Worst FP8 format Worst FP8 format ResNet18 69.72% 69.66% 64.92% MobileNetV2 71.70% 71.06% 49.51% BERT 83.06 82.80 71.56 SalsaNext 55.80 55.67 55.12 HRNet 81.05 81.04 80.77 DeepLabV3 (MobileNetV2) 72.91 72.58 37.93 ViT 77.75% 77.71% 76.69 5 2 5 2 3 4 4 3 5 2 4 3 4 3 2 5 3 4 3 4 3 4 “FP8 Quantization: The Power of the Exponent”, NeurIPS 2022 2 5 5 2 2 5
  • 15. 15 FP32 INT8 Best FP8* 69.72% 69.55% 69.66% 70.43% 69.82% ResNet FP32 INT8 Best FP8* 71.70% 70.94% 71.06% 71.82% 71.54% MobileNet V2 FP32 INT8 Best FP8* 83.06 71.03 82.80 83.26 83.70 Bert FP32 INT8 Best FP8* 72.91 71.24 72.58 73.99 72.41 DeepLabV3 *: Best FP8 is the best result from testing the different FP8 formats. “FP8 Quantization: The Power of the Exponent”, NeurIPS 2022 INT8 has similar results as FP8 with QAT Outliers can be suitably trained with QAT PTQ QAT • No PTQ tricks – Per-channel • All QAT results per-tensor quantization • FP8 mantissa and exponent format was optimized for this comparison
  • 16. 16 FP32 INT (W8A8) INT (W8A16) Best FP8 result ResNet18 69.72% 69.55% 69.75% 69.66% HRNet 81.05 80.93 81.08 81.04 INT W8A16 accuracy is better than FP8 for all models with PTQ No real gap FP8/INT8 FP32 INT (W8A8) INT (W8A16) Best FP8 result BERT 83.06 71.03 82.90 82.80 SalsaNext 55.80 54.22 55.82 55.67 ViT 77.75% 76.39% 77.73% 77.71% MobileNetV2 69.72% 69.55% 69.75% 69.66% • Min-max range setting • Per-channel quantization
  • 17. 17 INT16 performs better than FP16 unless there are large outliers • 1,000 samples of 𝑋 ~ 𝑁𝑜𝑟𝑚𝑎𝑙(0, 1) • We add one outlier and vary its value • INT13 performs comparable to FP16 in terms of MSE 10−4 10−5 10−7 10−6 0 250 500 750 1000 Outlier 1250 1500 1750 2000 MSE MSE Error for w/o activation function for sigma =1.0, 100 neurons int16 float16
  • 18. 18 INT16 outperforms FP16 in accuracy and runs faster in hardware MobileNetV2 71.74% FP32 71.69% FP16 71.74% INT16 EfficientDet-D1 40.08 FP32 40.07 FP16 40.07 INT16
  • 19. 19 19 Integer quantization is the way to do AI inference Enabled through PTQ and QAT techniques Mixed precision gives the best of both worlds, using extra precision only when necessary INT 4 INT 8 INT 16 Best Better Good Power efficiency and latency Best Better Good Accuracy
  • 20. Overcoming oscillations in quantization-aware training Improving quantization-aware training at lower bit-widths
  • 21. 21 21 Poor validation accuracy is consistent across various learning rates and epochs during QAT “Overcoming Oscillations in Quantization-Aware Training” (ICML 2022) Validation accuracy for QAT is typically unstable Why do we see the validation accuracy drop for 4-bit QAT? Training accuracy Epoch Different learning rates (LR) 71% 67% 62% 2 4 6 8 10 12 14 16 18 20 Validation accuracy Epoch 44% 70% 60% 50% 42% 2 4 6 8 10 12 14 16 18 20
  • 22. 22 22 2 0 −2 −4 Oscillations are present in QAT Example of MobileNetV2 training (last 1000 iterations of training) Quantized weights, 𝑞 𝑤 (Lowest bit of 4−bit weight) Latent weights, 𝑤 (Positive FP values zoomed on 0.5) Sign and Bit0 weights Iterations during QAT FP weights 0.5004 0.5002 0.5000 0.4998 0.4996 0 200 400 600 1000
  • 23. 23 Network Bits Pre-BN Acc. Post-BN Acc. MobileNetV2 8 71.79 0.07 71.89 0.05 MobileNetV2 4 68.99 0.44 71.01 0.05 MobileNetV2 3 64.97 1.23 69.50 0.04 Method Train Loss Val. Acc. (%) Baseline 1.3566 69.50 SR (mean + std) 1.3547 0.0053 69.58 0.09 SR (best) 1.3391 69.85 AdaRound 1.3070 70.12 +2.02 +4.53 Corrupts batch norm statistics • At inference, BN uses running statistics from training • Oscillations lead to big changes in statistics -> running statistics are not a good estimate • Solution: BN re-estimation Disrupts model convergence • At the end of training, oscillating weights may not be on the correct ‘side’ • Stochastic rounding (SR) and binary optimization (AdaRound) show that they are indeed not in the best possible state. • Oscillations prevent network from converging to best local minimum Oscillating weights are harmful when training a model 23
  • 24. 24 EMA = Exponential Moving Average Higher oscillation frequencies during QAT negatively impact accuracy Oscillation occurs The integer value changes & Its direction is opposite to its previous one
  • 25. 25 25 “Overcoming Oscillations in Quantization-Aware Training” (ICML 2022) Oscillation dampening and iterative freezing fix the QAT issue Dampening takes a regularizing approach: the weights are forced closer to the bin center Freezing the oscillating weights stabilizes training and mitigates the unwanted effects of oscillations 100 80 60 40 20 0 −0.4 −0.2 0.0 0.2 0.4 wint − w/s wint − w/s −0.4 −0.2 0.0 0.2 0.4 400 300 200 100 0 Dampening Freezing Frozen Not frozen
  • 26. 26 1: “Overcoming Oscillations in Quantization-Aware Training” (ICML 2022) 2: “Learned step size quantization” (ICLR,2020) We achieve SOTA results for INT4 quantization1 • Train with learned step-size quantization (LSQ2) and re-estimation • Dampening and freezing perform on par with each other Method W/A Val. Acc. (%) Full-precision 32/32 65.1 LSQ* (Esser et al., 2020) 4/4 61.0 (-4.1) LSQ + BR (Han et al., 2021) 4/4 61.5 (-3.6) LSQ + Dampen (ours) 4/4 63.7 (-1.4) LSQ + Freeze (ours) 4/4 63.6 (-1.5) Method W/A Val. Acc. (%) Full-precision 32/32 71.7 LSQ* (Esser et al., 2020) 4/4 69.5 (−2.3) LSQ + BR (Han et al., 2021) 4/4 70.4 (−1.4) LSQ + Dampen (ours) 4/4 70.5 (−1.2) LSQ + Freeze (ours) 4/4 70.6 (−1.1) MobileNetV3 MobileNetV2
  • 27. 27 Open-source projects to scale energy-efficient AI to the masses AIMET & AIMET Model Zoo
  • 28. 28 28 AIMET makes AI models small Open-sourced GitHub project that includes state-of-the-art quantization and compression techniques from Qualcomm AI Research Features: State-of-the-art network compression tools State-of-the-art quantization tools Support for both TensorFlow and PyTorch Benchmarks and tests for many models Developed by professional software developers If interested, please join the AIMET GitHub project: https://github.com/quic/aimet Trained AI model AI Model Efficiency Toolkit (AIMET) Optimized AI model TensorFlow or PyTorch Compression Quantization Deployed AI model
  • 29. AIMET Providing advanced model efficiency features and benefits Benefits Lower memory bandwidth Lower power Lower storage Higher performance Maintains model accuracy Simple ease of use Features Quantization Compression State-of-the-art INT8 and INT4 performance Quantization-aware training (QAT) Efficient tensor decomposition and removal of redundant channels in convolution layers Spatial singular value decomposition (SVD) Channel pruning Visualization Analysis tools for drawing insights for quantization and compression Weight ranges Per-layer compression sensitivity Quantization simulation Post-training quantization (PTQ) methods: • Data-Free Quantization • Adaptive Rounding (AdaRound), • Automatic Mixed Precision (AMP) • AutoQuant 29
  • 30. 30 30 AIMET features and APIs are easy to use Designed to fit naturally in the AI model development workflow for researchers, developers, and ISVs PyTorch model PyTorch Model Train No change Same API PyTorch Model Train Create QuantSim Evaluate Typical model training workflow User-friendly QAT workflow in AIMET No change Same API Evaluate User-friendly APIs invoked directly from the existing model pipeline Example Jupyter notebooks on AIMET GitHub AIMET extensions extensions Model optimization library (techniques to compress & quantize models) Framework specific API Algorithm API Other frameworks
  • 31. 31 Low resolution Super resolution First 4K super-resolution demo at 100+ FPS on mobile Our new machine-learning based super-resolution method 8-bit quantized model created using AIMET QAT
  • 32. 32 With better PTQ and QAT techniques, more models will achieve better power efficiency AIMET enables accurate INT W4A8 for wide range of use cases Task Model FP32 INT W4A8 Classification ResNet50 76.10% 75.4% ResNet18 69.75% 68.96% EfficientNet-Lite 75.31% 74.33% Regnext 78.3% 77.2% Segmentation Deeplabv3 (RN-50) 76.07% 75.91% Super-resolution ABPN 31.97 dB 31.67 (dB) Pose detection PoseNet (HRNet-32) 0.765 0.763
  • 33. 33 33 Comparison between FP32 model and model quantized with AIMET AIMET quantizes transformers with high accuracy, comparable to FP32 Top-1 accuracy 81.30 FP32 80.88 INT8/ W8A16 (PTQ) ViT base GLUE 84.99 FP32 84.60 INT8 (QAT) RoBERTa base GLUE 82.73 FP32 81.95 INT8 (QAT) BERT base (uncased) GLUE 79.21 FP32 78.61 INT8 (QAT) DistilBERT base (uncased)
  • 34. 34 34 AIMET Model Zoo Accurate pre-trained 8-bit quantized models Image classification Semantic segmentation Pose estimation Speech recognition Object detection Super resolution
  • 35. 35 35 35 *: Comparison between FP32 model and INT8 model quantized with AIMET. For further details, check out: https://github.com/quic/aimet-model-zoo/ AIMET Model Zoo includes popular quantized AI models Accuracy is maintained for INT8 models — less than 1% loss* Top-1 accuracy* 75.21% FP32 74.96% INT8 ResNet-50 (v1) Top-1 accuracy* 75% FP32 74.21% INT8 MobileNet-v2-1.4 Top-1 accuracy* 74.93% FP32 74.99% INT8 EfficientNet Lite mAP* 0.2469 FP32 0.2456 INT8 ResNet-50 (v1) mAP* 0.35 FP32 0.349 INT8 RetinaNet mAP* 0.383 FP32 0.379 INT8 Pose estimation PSNR* 25.45 FP32 24.78 INT8 SRGAN Top-1 accuracy* 71.67% FP32 71.14% INT8 MobileNetV2 Top-1 accuracy* 75.42% FP32 74.44% INT8 EfficientNet-lite0 mIoU* 72.62% FP32 72.22% INT8 DeepLabV3+ mAP* 68.7% FP32 68.6% INT8 MobileNetV2-SSD-Lite mAP* 0.364 FP32 0.359 INT8 Pose estimation PSNR 25.51 FP32 25.5 INT8 SRGAN WER* 9.92% FP32 10.22% INT8 DeepSpeech2 PSNR 32.75 FP32 32.69 INT8 ABPN <1% Loss in accuracy*
  • 36. 36 *: Comparison between FP32 model and INT8 model quantized with AIMET. For further details, check out: https://github.com/quic/aimet-model-zoo/ Super resolution model suite Wide variety of models, suited for fast, energy-efficient INT8 inference • Virtually no accuracy loss compared to FP32 • Simple and convenient for developer integration • Useful across diverse applications, from gaming and photography to XR and autonomous driving 1 Anchor-based Plain Net (ABPN) 2 Robust Real-Time Single-Image Super Resolution (XLSR) 3 Super-Efficient Super Resolution (SESR) PSNR (dB) 32.66 FP32 32.58 INT8 SESR-M73 PSNR (dB) 32.41 FP32 32.25 INT8 SESR-M33 PSNR (dB) 32.71 FP32 32.64 INT8 ABPN1 PSNR (dB) 32.57 FP32 32.30 INT8 XLSR2 PSNR (dB) 33.03 FP32 32.92 INT8 SESR-XL3 INT8 PSNR and visual quality comparable to FP32*
  • 37. 37 37 Explore our open-source projects and tools AIMET State-of-the-art quantization and compression techniques github.com/quic/aimet AIMET Model Zoo Accurate pre-trained 8-bit quantized models github.com/quic/aimet-model-zoo Quantization whitepaper arxiv.org/abs/2201.08442
  • 38. AI Frameworks Qualcomm® Neural Processing SDK TF Lite TF Lite Micro Direct ML AI Runtimes Programming Languages Virtual platforms Core Libraries Math Libraries Profilers & Debuggers Compilers System Interface SoC, accelerator drivers Auto XR Robotics IoT ACPC Smartphones Cloud Platforms Qualcomm® AI Engine Direct (QNN) Tools: Emulation Support Qualcomm Neural Processing SDK, Qualcomm AI Model Studio, and Qualcomm AI Engine Direct are products of Qualcomm Technologies, Inc. and/or its subsidiaries Qualcomm AI Model Studio AIMET AIMET Model Zoo NAS Model analyzers Infrastructure:
  • 39. 39 Model efficiency is key for enabling on-device AI and accelerating the growth of the connected intelligent edge INT8/16 perform better than FP8/16 Qualcomm AI Research is enabling 4-bit integer models AIMET is making fixed-point quantization possible at scale without sacrificing accuracy
  • 41. Nothing in these materials is an offer to sell any of the components or devices referenced herein. ©2018-2022 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Qualcomm, Hexagon, and Snapdragon are trademarks or registered trademarks of Qualcomm Incorporated. Other products and brand names may be trademarks or registered trademarks of their respective owners. References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business. Follow us on: For more information, visit us at: qualcomm.com & qualcomm.com/blog Thank you