HiPEAC2022-DL4IoT workshop_ Muhammad Waqar Azhar

•Transferir como PPTX, PDF•

0 gostou•15 visualizações

Co-design of DL Accelerators in VEDLIoT. Muhammad Waqar Azhar. Workshop on Deep Learning for IoT (DL4IoT), co-located with HiPEAC 2022, Budapest, Hungary, June 2022.

Ciências

Muhammad Waqar Azhar, Pedro Trancoso
Chalmers University of Technology
20. June 2022
Co-design of DL Accelerators in
VEDLIoT

2
Which one is the best?
Which one is the best for me?
DL Accelerators

3
DL Acceleration
Hardware Software
CPU GPU
TPU
NPU FPGA
Models
Quantization Pruning

4
DL Accelerator Co-Design
Design for…
Map to…
Software
Hardware

5
DL Accelerator Co-Design
Design for…
Map to…
Software
Hardware

6
DL Accelerator Co-Design
Design for…
Map to…
Software
Hardware

7
DL Accelerator Co-Design
Design for…
Map to…
Software
Hardware

8
▪ Model case-study: MobileNet
▪ Observation: generic HW not efficient
▪ Challenge: Depthwise convolution
Co-Design Example - Motivation
● Heterogeneity at different levels:
○ Model layers of different type (e.g. depthwise and pointwise convolution)
○ Within same layer type (e.g. activation and filter sizes and shapes)
○ Determines: buffer sizes, reuse, parallelism
Layer-specific hardware to capture heterogeneity!

9
Co-Design Example: Open Questions & Approaches
Approach A:
one-HW-for-all
DOG
Approach B:
one-HW-per-layer-type
Approach C:
one-HW-per-layer
+ Runs any model
- Suboptimal efficiency
+ Matches layer types
- Suboptimal utilization
+ Best efficiency
- Resource-hungry

10
Co-Design Example: Open Questions & Approaches
Approach A:
one-HW-for-all
DOG
Approach B:
one-HW-per-layer-type
Approach C:
one-HW-per-layer
TVM-VTA
PYNQ-Z2
Unique Kernels
ZCU102
Xilinx FINN
ZCU102
• MobileNet requires aggressive
quantization (4b)
• Performance: 35 GOPS and 68
GOPS using MobileNetsV1 1x
and 0.5x
• Su, Jiang, et al. "Redundancy-
reduced mobilenet acceleration on
reconfigurable logic for imagenet
classification."
• Performance: approx 90 GOPS
• Resnet34
• Performance: approx 8 GOPS
• (DPU performance > 20 GOPS)

11
Proposed Solution
Co-design:
▪ Approach B:
▪ Mapping is good but throughput is below threshold…
▪ Approach C:
▪ FINN requires large HW to support original model -> more aggressive quantization
▪ Quantized model fits in HW but accuracy is below threshold…
?
B + C

12
▪ Current situation:
▪ Zoo of DNN models
▪ Zoo of HW accelerators
▪ Heterogeneity in the model -> Heterogeneity in the hardware
The need for Co-Design!
Co-Design with both generic and layer-specific HW modules
Conclusions

Mais conteúdo relacionado

Semelhante a HiPEAC2022-DL4IoT workshop_ Muhammad Waqar Azhar

Thinking in parallel ab tuladevPavel Tsukanov

Exploring emerging technologies in the HPC co-design spacejsvetter

Thoughts on CybersecurityFrank Wuerthwein

Parallelformersgohyunwoong

Feature-Oriented Software EvolutionLeonardo Passos

Ceph Day SF 2015 - Keynote Ceph Community

Deep Learning on ARM Platforms - SFO17-509Linaro

Comparing OrchestrationKnoldus Inc.

TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...chiportal

Presentationbutest

Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15MLconf

10 more lessons learned from building Machine Learning systems - MLConfXavier Amatriain

10 more lessons learned from building Machine Learning systemsXavier Amatriain

Are High Level Programming Languages for Multicore and Safety Critical Conver...InfinIT - Innovationsnetværket for it

London Ceph Day Keynote: Building Tomorrow's Ceph Ceph Community

Icpc16.pptPtidej Team

Icpc16.pptYann-Gaël Guéhéneuc

IoT Tech Expo 2023_Pedro Trancoso presentationVEDLIoT Project

Ceph: A decade in the making and still going strongPatrick McGarry

Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph Ceph Community

Semelhante a HiPEAC2022-DL4IoT workshop_ Muhammad Waqar Azhar (20)

Thinking in parallel ab tuladev

Exploring emerging technologies in the HPC co-design space

Thoughts on Cybersecurity

Parallelformers

Feature-Oriented Software Evolution

Ceph Day SF 2015 - Keynote

Deep Learning on ARM Platforms - SFO17-509

Comparing Orchestration

TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...

Presentation

Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15

10 more lessons learned from building Machine Learning systems - MLConf

10 more lessons learned from building Machine Learning systems

Are High Level Programming Languages for Multicore and Safety Critical Conver...

London Ceph Day Keynote: Building Tomorrow's Ceph

Icpc16.ppt

IoT Tech Expo 2023_Pedro Trancoso presentation

Ceph: A decade in the making and still going strong

Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph

Mais de VEDLIoT Project

IoT Tech Expo 2023_Micha vor dem Berge presentationVEDLIoT Project

Computing Frontiers 2023_Pedro Trancoso presentationVEDLIoT Project

HiPEAC-CSW 2022_Pedro Trancoso presentationVEDLIoT Project

IoT Week 2022-NGIoT session_Micha vor dem Berge presentationVEDLIoT Project

Next Generation IoT Architectures_Hans SalomonssonVEDLIoT Project

CONASENSE 2022_Jens Hagemeyer presentationVEDLIoT Project

NGIoT standardisation workshops_Jens Hagemeyer presentationVEDLIoT Project

HiPEAC-CSW 2022_Kevin Mika presentationVEDLIoT Project

HiPEAC 2022-DL4IoT workshop_René Griessl presentationVEDLIoT Project

SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentationVEDLIoT Project

HiPEAC2023-DL4IoT Workshop_Jean Hagemeyer presentationVEDLIoT Project

IoT Week 2021_Jens Hagemeyer presentationVEDLIoT Project

HiPEAC 2022_Marcelo Pasin presentationVEDLIoT Project

IoT Tech Expo 2023_Marcelo Pasin presentationVEDLIoT Project

IoT Tech Expo 2023_Hans-Martin Heyn presentationVEDLIoT Project

HiPEAC 2022_Marco Tassemeier presentationVEDLIoT Project

HiPEAC Computing Systems Week 2022_Mario Porrmann presentationVEDLIoT Project

HiPEAC2022_António Casimiro presentationVEDLIoT Project

NGIoT Sustainability Workshop 2023_ Hans-Martin Heyn presentationVEDLIoT Project

EU-IoT Training Workshops Series: AIoT and Edge Machine Learning 2021_Jens Ha...VEDLIoT Project

Mais de VEDLIoT Project (20)

IoT Tech Expo 2023_Micha vor dem Berge presentation

Computing Frontiers 2023_Pedro Trancoso presentation

HiPEAC-CSW 2022_Pedro Trancoso presentation

IoT Week 2022-NGIoT session_Micha vor dem Berge presentation

Next Generation IoT Architectures_Hans Salomonsson

CONASENSE 2022_Jens Hagemeyer presentation

NGIoT standardisation workshops_Jens Hagemeyer presentation

HiPEAC-CSW 2022_Kevin Mika presentation

HiPEAC 2022-DL4IoT workshop_René Griessl presentation

SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation

HiPEAC2023-DL4IoT Workshop_Jean Hagemeyer presentation

IoT Week 2021_Jens Hagemeyer presentation

HiPEAC 2022_Marcelo Pasin presentation

IoT Tech Expo 2023_Marcelo Pasin presentation

IoT Tech Expo 2023_Hans-Martin Heyn presentation

HiPEAC 2022_Marco Tassemeier presentation

HiPEAC Computing Systems Week 2022_Mario Porrmann presentation

HiPEAC2022_António Casimiro presentation

NGIoT Sustainability Workshop 2023_ Hans-Martin Heyn presentation

EU-IoT Training Workshops Series: AIoT and Edge Machine Learning 2021_Jens Ha...

Último

Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra

Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav

Botany 4th semester series (krishna).pdfSumit Kumar yadav

Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav

Zoology 4th semester series (krishna).pdfSumit Kumar yadav

Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju

Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1

Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174

Animal Communication- Auditory and Visual.pptxUmerFayaz5

GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji

DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi

GBSN - Microbiology (Unit 1)Areesha Ahmad

Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani

VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P

The Philosophy of ScienceUniversity of Hertfordshire

Biological Classification BioHack (3).pdfmuntazimhurra

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Low Rate Call Girls In Saket, Delhi NCR

HiPEAC2022-DL4IoT workshop_ Muhammad Waqar Azhar

1. Muhammad Waqar Azhar, Pedro Trancoso Chalmers University of Technology 20. June 2022 Co-design of DL Accelerators in VEDLIoT

2. 2 Which one is the best? Which one is the best for me? DL Accelerators

3. 3 DL Acceleration Hardware Software CPU GPU TPU NPU FPGA Models Quantization Pruning

4. 4 DL Accelerator Co-Design Design for… Map to… Software Hardware

5. 5 DL Accelerator Co-Design Design for… Map to… Software Hardware

6. 6 DL Accelerator Co-Design Design for… Map to… Software Hardware

7. 7 DL Accelerator Co-Design Design for… Map to… Software Hardware

8. 8 ▪ Model case-study: MobileNet ▪ Observation: generic HW not efficient ▪ Challenge: Depthwise convolution Co-Design Example - Motivation ● Heterogeneity at different levels: ○ Model layers of different type (e.g. depthwise and pointwise convolution) ○ Within same layer type (e.g. activation and filter sizes and shapes) ○ Determines: buffer sizes, reuse, parallelism Layer-specific hardware to capture heterogeneity!

9. 9 Co-Design Example: Open Questions & Approaches Approach A: one-HW-for-all DOG Approach B: one-HW-per-layer-type Approach C: one-HW-per-layer + Runs any model - Suboptimal efficiency + Matches layer types - Suboptimal utilization + Best efficiency - Resource-hungry

10. 10 Co-Design Example: Open Questions & Approaches Approach A: one-HW-for-all DOG Approach B: one-HW-per-layer-type Approach C: one-HW-per-layer TVM-VTA PYNQ-Z2 Unique Kernels ZCU102 Xilinx FINN ZCU102 • MobileNet requires aggressive quantization (4b) • Performance: 35 GOPS and 68 GOPS using MobileNetsV1 1x and 0.5x • Su, Jiang, et al. "Redundancy- reduced mobilenet acceleration on reconfigurable logic for imagenet classification." • Performance: approx 90 GOPS • Resnet34 • Performance: approx 8 GOPS • (DPU performance > 20 GOPS)

11. 11 Proposed Solution Co-design: ▪ Approach B: ▪ Mapping is good but throughput is below threshold… ▪ Approach C: ▪ FINN requires large HW to support original model -> more aggressive quantization ▪ Quantized model fits in HW but accuracy is below threshold… ? B + C

12. 12 ▪ Current situation: ▪ Zoo of DNN models ▪ Zoo of HW accelerators ▪ Heterogeneity in the model -> Heterogeneity in the hardware The need for Co-Design! Co-Design with both generic and layer-specific HW modules Conclusions

13. 13 Thank you for your attention.

HiPEAC2022-DL4IoT workshop_ Muhammad Waqar Azhar

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a HiPEAC2022-DL4IoT workshop_ Muhammad Waqar Azhar

Semelhante a HiPEAC2022-DL4IoT workshop_ Muhammad Waqar Azhar (20)

Mais de VEDLIoT Project

Mais de VEDLIoT Project (20)

Último

Último (20)

HiPEAC2022-DL4IoT workshop_ Muhammad Waqar Azhar