Co-design of DL Accelerators in VEDLIoT. Muhammad Waqar Azhar. Workshop on Deep Learning for IoT (DL4IoT), co-located with HiPEAC 2022, Budapest, Hungary, June 2022.
8. 8
▪ Model case-study: MobileNet
▪ Observation: generic HW not efficient
▪ Challenge: Depthwise convolution
Co-Design Example - Motivation
● Heterogeneity at different levels:
○ Model layers of different type (e.g. depthwise and pointwise convolution)
○ Within same layer type (e.g. activation and filter sizes and shapes)
○ Determines: buffer sizes, reuse, parallelism
Layer-specific hardware to capture heterogeneity!
9. 9
Co-Design Example: Open Questions & Approaches
Approach A:
one-HW-for-all
DOG
Approach B:
one-HW-per-layer-type
Approach C:
one-HW-per-layer
+ Runs any model
- Suboptimal efficiency
+ Matches layer types
- Suboptimal utilization
+ Best efficiency
- Resource-hungry
10. 10
Co-Design Example: Open Questions & Approaches
Approach A:
one-HW-for-all
DOG
Approach B:
one-HW-per-layer-type
Approach C:
one-HW-per-layer
TVM-VTA
PYNQ-Z2
Unique Kernels
ZCU102
Xilinx FINN
ZCU102
• MobileNet requires aggressive
quantization (4b)
• Performance: 35 GOPS and 68
GOPS using MobileNetsV1 1x
and 0.5x
• Su, Jiang, et al. "Redundancy-
reduced mobilenet acceleration on
reconfigurable logic for imagenet
classification."
• Performance: approx 90 GOPS
• Resnet34
• Performance: approx 8 GOPS
• (DPU performance > 20 GOPS)
11. 11
Proposed Solution
Co-design:
▪ Approach B:
▪ Mapping is good but throughput is below threshold…
▪ Approach C:
▪ FINN requires large HW to support original model -> more aggressive quantization
▪ Quantized model fits in HW but accuracy is below threshold…
?
B + C
12. 12
▪ Current situation:
▪ Zoo of DNN models
▪ Zoo of HW accelerators
▪ Heterogeneity in the model -> Heterogeneity in the hardware
The need for Co-Design!
Co-Design with both generic and layer-specific HW modules
Conclusions