SlideShare uma empresa Scribd logo
1 de 19
Pedro Trancoso
Chalmers University of Technology,
Gothenburg, Sweden
VEDLIoT Cognitive IoT
Hardware Platform,
Accelerators and Co-Design
26. September 2023
F. Qararyah, S. Zouzoula, M. Waqar, P. Trancoso, M.
Rothmann, M. Tassemeier, M. Porrmann, D. Ödman,
H. Salomonsson, F. Porrmann, R. Griessl, N. Kucza, K.
Mika, C. Stollenwerk, M. Kaiserm, L. Tigges, J.
Hagemeyer
2
Context
3
▪ Heterogenenous hardware platform
▪ Resource Efficient Cluster Server (RECS)
platform: cloud to edge
Cognitive IoT HW Platform
4
▪ Heterogenenous hardware platform
▪ Resource Efficient Cluster Server (RECS)
platform: cloud to edge
Cognitive IoT HW Platform cloud
edge
5
▪ Heterogenenous hardware platform
▪ Resource Efficient Cluster Server (RECS)
platform: cloud to edge
▪ u.RECS for far edge with 3 slots:
▪ NVIDIA NX – embeded GPU
▪ SMARC 2.1 – FPGA, CPU
▪ M.2 – dedicated accelerator
Cognitive IoT HW Platform
6
▪ Accelerators “landscape”
▪ Evaluated a multitude of accelerators (CPUs, GPUs, FPGAs, ASICs)
▪ Same model (YoloV4), different batch sizes (1, 4, 8)
▪ Efficiency from 100GOPS/W to 1250 GOPS/W
Accelerators
7
Accelerators
High-performance
Low power
Efficiency
▪ Accelerators “landscape”
▪ Evaluated a multitude of accelerators (CPUs, GPUs, FPGAs, ASICs)
▪ Same model (YoloV4), different batch sizes (1, 4, 8)
▪ Efficiency from 100GOPS/W to 1250 GOPS/W
▪ Identify different categories
FPGA-based accelerators:
• Flexibility
• Reconfigurability
• Efficiency
8
Accelerators – Xilinx DPU
● Baseline for evaluation of FPGA accelerators developed in VEDLIoT
● Xilinx Deep Learning Processor Unit (DPU)
○ Programmable engine for convolutional neural networks
○ Easy integration as an IP core in
Xilinx UltraScale+ and Versal MPSoCs
○ Configurable hardware architecture
(e.g., parallelism, memory/DSP usage)
● Large design space
○ Goal: Find the best suitable
implementation for your requirements
9
Dynamic reconfiguration of Xilinx DPU
● Change the characteristics of the DL accelerator at run-time
(e.g., change performance-power trade-off or performance-accuracy trade-off)
Different modes of operation:
• High-performance versus low-power
• City versus highway driving
• …
10
STANN – Synthesis Templates for ANNs (1/2)
● Library for simple yet efficient generation
of DL-accelerators on FPGAs
● Templates for common layers
○ Network architecture parameterizable
e.g., number of neurons
○ Hardware implementation parameterizable
e.g., parallelism of processing units
● Resource efficiency by flexible quantization
○ Floating point and integer from 32bit to 8bit
● High level synthesis enables
fast design space exploration
○ Automatic code generation based on ONNX description
○ Highly parameterizable:
Reuse of hardware blocks vs. parallel execution
11
STANN – Synthesis Templates for ANNs (2/2)
● STANN enables inference and training on FPGAs
● Training with Dataflow Architecture
○ Forward path similar to inference,
but needs to store more intermediate values
○ Backpropagation and weight update module
for each layer
● Fast, but uses a lot of resources
● Well suited for small networks,
used, e.g., in deep reinforcement learning
● Application example: Motor control
○ DQN (Deep Q-Network)
replaces manual parameter tuning
○ Used in OC Project Power Edge RL
Rothmann, M.; Porrmann, M.: STANN – Synthesis Templates for Artificial Neural Network Inference and Training. In: 17th International Work-Conference on Artificial Neural Networks, IWANN
2023, Ponta Delgada, Azores, Portugal, June 19-21, 2023
12
Accelerators - FiBHA (1/3)
"FiBHA: Fixed Budget Hybrid CNN Accelerator", Fareed Qararyah, Muhammad Waqar Azhar, Pedro Trancoso, IEEE 34th International Symposium on Computer Architecture and High
Performance Computing (SBAC-PAD 2022), Bordeaux, France, November 2–5 2022
?
Generic ↔ Dedicated
13
Accelerators - FiBHA (2/3)
"FiBHA: Fixed Budget Hybrid CNN Accelerator", Fareed Qararyah, Muhammad Waqar Azhar, Pedro Trancoso, IEEE 34th International Symposium on Computer Architecture and High
Performance Computing (SBAC-PAD 2022), Bordeaux, France, November 2–5 2022
Monolithic design
● One engine
computes all the
core layers
● E.g. TPU
SEML
● One engine
computes all
layers of the
same type
● PW engine, DW
engine
SESL
● One engine per
layer
● E.g. FINN
FiBHA
● SESL + SEML
14
Accelerators - FiBHA (3/3)
● FiBHA compared to both alternatives
○ Up to 4x throughput improvement compared
to SESL (FINN)
■ Better use of the resource budget
○ Up to 1.7X throughput improvement
compared to SEML
■ Capturing more heterogeneity
● FiBHA compared to SEML
○ Representative set of heterogeneous CNNs
○ Various resource budgets
■ 1024 PEs - 4096 PEs
○ FiBHA constantly outperform SEML
"FiBHA: Fixed Budget Hybrid CNN Accelerator", Fareed Qararyah, Muhammad Waqar Azhar, Pedro Trancoso, IEEE 34th International Symposium on Computer Architecture and High
Performance Computing (SBAC-PAD 2022), Bordeaux, France, November 2–5 2022
15
Memory Analysis/Recommendation Tool - Rainbow
● Set of different analyses
for on-chip memory and
off-chip data transfers
● Optimizers
○ Optimal data reuse
○ Co-design for multi-
precision model
quantization
○ Data reuse with batch
execution
● Heterogeneous execution
plans
"RAINBOW: Multi-Dimensional Hardware-Software Co-Design for DL Accelerator On-Chip Memory", S. Zouzoula, M.
W. Azhar, P. Trancoso, 2023 IEEE International Symposium on Performance Analysis of Systems and Software
(ISPASS-2023), pp. 1-3, April 2023
16
▪ Optimizing DL models
▪ Harware-aware optimizations
▪ Model compression without loss of accuracy
Model-Accelerator Co-Design
17
▪ Optimizing DL models
▪ Harware-aware optimizations
▪ Model compression without loss of accuracy
▪ Hardware software co-design
▪ Reconfigurable (FPGA) accelerators
▪ Template-based description
▪ Heterogenenous engines
Model-Accelerator Co-Design
Co-design
18
Integration of Deep Learning into IoT devices with restricted computing capabilities and
minimal power consumption requirements – energy-efficient computing
▪ Cognitive IoT hardware platform with tailored hardware components and accelerators:
from embedded systems to edge computing and cloud platforms
▪ Wide range of accelerator designs from off-the-shelf to FPGA-based generic and
dedicated engines
▪ Dynamic reconfiguration for increased efficiency
▪ Memory analysis and recommendation for design space exploration, configuration, and
execution plans
▪ Model-Hardware co-design loop for optimized solutions
Summary
Co-Design with wide range of
options for accelerator designs
Most energy-efficient solution for a
particular application and constraints
19
Thank you for your attention.

Mais conteúdo relacionado

Semelhante a IoT Tech Expo 2023_Pedro Trancoso presentation

BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterLinaro
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
Scientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitGanesan Narayanasamy
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Ahsan Javed Awan
 
Design and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGADesign and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGAIJERA Editor
 
Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptxachakracu
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdfRioCarthiis
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 
Conference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentConference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentEricsson
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkNear Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkAhsan Javed Awan
 
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Spark Summit
 
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT Project
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spacejsvetter
 
Implementing AI: Hardware Challenges
Implementing AI: Hardware ChallengesImplementing AI: Hardware Challenges
Implementing AI: Hardware ChallengesKTN
 

Semelhante a IoT Tech Expo 2023_Pedro Trancoso presentation (20)

BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
AI Super computer update
AI Super computer update AI Super computer update
AI Super computer update
 
Scientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on Summit
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
Design and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGADesign and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGA
 
Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptx
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdf
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
Conference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentConference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environment
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkNear Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
 
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
 
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design space
 
ICIECA 2014 Paper 10
ICIECA 2014 Paper 10ICIECA 2014 Paper 10
ICIECA 2014 Paper 10
 
OCRE webinar - April 14 - Cloud_Validation_Suite_Ignacio Peluaga Lozada.pdf
OCRE webinar - April 14 - Cloud_Validation_Suite_Ignacio Peluaga Lozada.pdfOCRE webinar - April 14 - Cloud_Validation_Suite_Ignacio Peluaga Lozada.pdf
OCRE webinar - April 14 - Cloud_Validation_Suite_Ignacio Peluaga Lozada.pdf
 
Implementing AI: Hardware Challenges
Implementing AI: Hardware ChallengesImplementing AI: Hardware Challenges
Implementing AI: Hardware Challenges
 
Japan's post K Computer
Japan's post K ComputerJapan's post K Computer
Japan's post K Computer
 
Thoughts on Cybersecurity
Thoughts on CybersecurityThoughts on Cybersecurity
Thoughts on Cybersecurity
 

Mais de VEDLIoT Project

IoT Tech Expo 2023_Micha vor dem Berge presentation
IoT Tech Expo 2023_Micha vor dem Berge presentationIoT Tech Expo 2023_Micha vor dem Berge presentation
IoT Tech Expo 2023_Micha vor dem Berge presentationVEDLIoT Project
 
Computing Frontiers 2023_Pedro Trancoso presentation
Computing Frontiers 2023_Pedro Trancoso presentationComputing Frontiers 2023_Pedro Trancoso presentation
Computing Frontiers 2023_Pedro Trancoso presentationVEDLIoT Project
 
HiPEAC-CSW 2022_Pedro Trancoso presentation
HiPEAC-CSW 2022_Pedro Trancoso presentationHiPEAC-CSW 2022_Pedro Trancoso presentation
HiPEAC-CSW 2022_Pedro Trancoso presentationVEDLIoT Project
 
IoT Week 2022-NGIoT session_Micha vor dem Berge presentation
IoT Week 2022-NGIoT session_Micha vor dem Berge presentationIoT Week 2022-NGIoT session_Micha vor dem Berge presentation
IoT Week 2022-NGIoT session_Micha vor dem Berge presentationVEDLIoT Project
 
Next Generation IoT Architectures_Hans Salomonsson
Next Generation IoT Architectures_Hans SalomonssonNext Generation IoT Architectures_Hans Salomonsson
Next Generation IoT Architectures_Hans SalomonssonVEDLIoT Project
 
CONASENSE 2022_Jens Hagemeyer presentation
CONASENSE 2022_Jens Hagemeyer presentationCONASENSE 2022_Jens Hagemeyer presentation
CONASENSE 2022_Jens Hagemeyer presentationVEDLIoT Project
 
NGIoT standardisation workshops_Jens Hagemeyer presentation
NGIoT standardisation workshops_Jens Hagemeyer presentationNGIoT standardisation workshops_Jens Hagemeyer presentation
NGIoT standardisation workshops_Jens Hagemeyer presentationVEDLIoT Project
 
HiPEAC-CSW 2022_Kevin Mika presentation
HiPEAC-CSW 2022_Kevin Mika presentationHiPEAC-CSW 2022_Kevin Mika presentation
HiPEAC-CSW 2022_Kevin Mika presentationVEDLIoT Project
 
HiPEAC 2022-DL4IoT workshop_René Griessl presentation
HiPEAC 2022-DL4IoT workshop_René Griessl presentationHiPEAC 2022-DL4IoT workshop_René Griessl presentation
HiPEAC 2022-DL4IoT workshop_René Griessl presentationVEDLIoT Project
 
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentationSS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentationVEDLIoT Project
 
HiPEAC2023-DL4IoT Workshop_Jean Hagemeyer presentation
HiPEAC2023-DL4IoT Workshop_Jean Hagemeyer presentationHiPEAC2023-DL4IoT Workshop_Jean Hagemeyer presentation
HiPEAC2023-DL4IoT Workshop_Jean Hagemeyer presentationVEDLIoT Project
 
IoT Week 2021_Jens Hagemeyer presentation
IoT Week 2021_Jens Hagemeyer presentationIoT Week 2021_Jens Hagemeyer presentation
IoT Week 2021_Jens Hagemeyer presentationVEDLIoT Project
 
HiPEAC 2022_Marcelo Pasin presentation
HiPEAC 2022_Marcelo Pasin presentationHiPEAC 2022_Marcelo Pasin presentation
HiPEAC 2022_Marcelo Pasin presentationVEDLIoT Project
 
IoT Tech Expo 2023_Marcelo Pasin presentation
IoT Tech Expo 2023_Marcelo Pasin presentationIoT Tech Expo 2023_Marcelo Pasin presentation
IoT Tech Expo 2023_Marcelo Pasin presentationVEDLIoT Project
 
IoT Tech Expo 2023_Hans-Martin Heyn presentation
IoT Tech Expo 2023_Hans-Martin Heyn presentationIoT Tech Expo 2023_Hans-Martin Heyn presentation
IoT Tech Expo 2023_Hans-Martin Heyn presentationVEDLIoT Project
 
HiPEAC 2022_Marco Tassemeier presentation
HiPEAC 2022_Marco Tassemeier presentationHiPEAC 2022_Marco Tassemeier presentation
HiPEAC 2022_Marco Tassemeier presentationVEDLIoT Project
 
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationHiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationVEDLIoT Project
 
HiPEAC2022_António Casimiro presentation
HiPEAC2022_António Casimiro presentationHiPEAC2022_António Casimiro presentation
HiPEAC2022_António Casimiro presentationVEDLIoT Project
 
NGIoT Sustainability Workshop 2023_ Hans-Martin Heyn presentation
NGIoT Sustainability Workshop 2023_ Hans-Martin Heyn presentationNGIoT Sustainability Workshop 2023_ Hans-Martin Heyn presentation
NGIoT Sustainability Workshop 2023_ Hans-Martin Heyn presentationVEDLIoT Project
 
EU-IoT Training Workshops Series: AIoT and Edge Machine Learning 2021_Jens Ha...
EU-IoT Training Workshops Series: AIoT and Edge Machine Learning 2021_Jens Ha...EU-IoT Training Workshops Series: AIoT and Edge Machine Learning 2021_Jens Ha...
EU-IoT Training Workshops Series: AIoT and Edge Machine Learning 2021_Jens Ha...VEDLIoT Project
 

Mais de VEDLIoT Project (20)

IoT Tech Expo 2023_Micha vor dem Berge presentation
IoT Tech Expo 2023_Micha vor dem Berge presentationIoT Tech Expo 2023_Micha vor dem Berge presentation
IoT Tech Expo 2023_Micha vor dem Berge presentation
 
Computing Frontiers 2023_Pedro Trancoso presentation
Computing Frontiers 2023_Pedro Trancoso presentationComputing Frontiers 2023_Pedro Trancoso presentation
Computing Frontiers 2023_Pedro Trancoso presentation
 
HiPEAC-CSW 2022_Pedro Trancoso presentation
HiPEAC-CSW 2022_Pedro Trancoso presentationHiPEAC-CSW 2022_Pedro Trancoso presentation
HiPEAC-CSW 2022_Pedro Trancoso presentation
 
IoT Week 2022-NGIoT session_Micha vor dem Berge presentation
IoT Week 2022-NGIoT session_Micha vor dem Berge presentationIoT Week 2022-NGIoT session_Micha vor dem Berge presentation
IoT Week 2022-NGIoT session_Micha vor dem Berge presentation
 
Next Generation IoT Architectures_Hans Salomonsson
Next Generation IoT Architectures_Hans SalomonssonNext Generation IoT Architectures_Hans Salomonsson
Next Generation IoT Architectures_Hans Salomonsson
 
CONASENSE 2022_Jens Hagemeyer presentation
CONASENSE 2022_Jens Hagemeyer presentationCONASENSE 2022_Jens Hagemeyer presentation
CONASENSE 2022_Jens Hagemeyer presentation
 
NGIoT standardisation workshops_Jens Hagemeyer presentation
NGIoT standardisation workshops_Jens Hagemeyer presentationNGIoT standardisation workshops_Jens Hagemeyer presentation
NGIoT standardisation workshops_Jens Hagemeyer presentation
 
HiPEAC-CSW 2022_Kevin Mika presentation
HiPEAC-CSW 2022_Kevin Mika presentationHiPEAC-CSW 2022_Kevin Mika presentation
HiPEAC-CSW 2022_Kevin Mika presentation
 
HiPEAC 2022-DL4IoT workshop_René Griessl presentation
HiPEAC 2022-DL4IoT workshop_René Griessl presentationHiPEAC 2022-DL4IoT workshop_René Griessl presentation
HiPEAC 2022-DL4IoT workshop_René Griessl presentation
 
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentationSS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
 
HiPEAC2023-DL4IoT Workshop_Jean Hagemeyer presentation
HiPEAC2023-DL4IoT Workshop_Jean Hagemeyer presentationHiPEAC2023-DL4IoT Workshop_Jean Hagemeyer presentation
HiPEAC2023-DL4IoT Workshop_Jean Hagemeyer presentation
 
IoT Week 2021_Jens Hagemeyer presentation
IoT Week 2021_Jens Hagemeyer presentationIoT Week 2021_Jens Hagemeyer presentation
IoT Week 2021_Jens Hagemeyer presentation
 
HiPEAC 2022_Marcelo Pasin presentation
HiPEAC 2022_Marcelo Pasin presentationHiPEAC 2022_Marcelo Pasin presentation
HiPEAC 2022_Marcelo Pasin presentation
 
IoT Tech Expo 2023_Marcelo Pasin presentation
IoT Tech Expo 2023_Marcelo Pasin presentationIoT Tech Expo 2023_Marcelo Pasin presentation
IoT Tech Expo 2023_Marcelo Pasin presentation
 
IoT Tech Expo 2023_Hans-Martin Heyn presentation
IoT Tech Expo 2023_Hans-Martin Heyn presentationIoT Tech Expo 2023_Hans-Martin Heyn presentation
IoT Tech Expo 2023_Hans-Martin Heyn presentation
 
HiPEAC 2022_Marco Tassemeier presentation
HiPEAC 2022_Marco Tassemeier presentationHiPEAC 2022_Marco Tassemeier presentation
HiPEAC 2022_Marco Tassemeier presentation
 
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationHiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
 
HiPEAC2022_António Casimiro presentation
HiPEAC2022_António Casimiro presentationHiPEAC2022_António Casimiro presentation
HiPEAC2022_António Casimiro presentation
 
NGIoT Sustainability Workshop 2023_ Hans-Martin Heyn presentation
NGIoT Sustainability Workshop 2023_ Hans-Martin Heyn presentationNGIoT Sustainability Workshop 2023_ Hans-Martin Heyn presentation
NGIoT Sustainability Workshop 2023_ Hans-Martin Heyn presentation
 
EU-IoT Training Workshops Series: AIoT and Edge Machine Learning 2021_Jens Ha...
EU-IoT Training Workshops Series: AIoT and Edge Machine Learning 2021_Jens Ha...EU-IoT Training Workshops Series: AIoT and Edge Machine Learning 2021_Jens Ha...
EU-IoT Training Workshops Series: AIoT and Edge Machine Learning 2021_Jens Ha...
 

Último

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 

Último (20)

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 

IoT Tech Expo 2023_Pedro Trancoso presentation

  • 1. Pedro Trancoso Chalmers University of Technology, Gothenburg, Sweden VEDLIoT Cognitive IoT Hardware Platform, Accelerators and Co-Design 26. September 2023 F. Qararyah, S. Zouzoula, M. Waqar, P. Trancoso, M. Rothmann, M. Tassemeier, M. Porrmann, D. Ödman, H. Salomonsson, F. Porrmann, R. Griessl, N. Kucza, K. Mika, C. Stollenwerk, M. Kaiserm, L. Tigges, J. Hagemeyer
  • 3. 3 ▪ Heterogenenous hardware platform ▪ Resource Efficient Cluster Server (RECS) platform: cloud to edge Cognitive IoT HW Platform
  • 4. 4 ▪ Heterogenenous hardware platform ▪ Resource Efficient Cluster Server (RECS) platform: cloud to edge Cognitive IoT HW Platform cloud edge
  • 5. 5 ▪ Heterogenenous hardware platform ▪ Resource Efficient Cluster Server (RECS) platform: cloud to edge ▪ u.RECS for far edge with 3 slots: ▪ NVIDIA NX – embeded GPU ▪ SMARC 2.1 – FPGA, CPU ▪ M.2 – dedicated accelerator Cognitive IoT HW Platform
  • 6. 6 ▪ Accelerators “landscape” ▪ Evaluated a multitude of accelerators (CPUs, GPUs, FPGAs, ASICs) ▪ Same model (YoloV4), different batch sizes (1, 4, 8) ▪ Efficiency from 100GOPS/W to 1250 GOPS/W Accelerators
  • 7. 7 Accelerators High-performance Low power Efficiency ▪ Accelerators “landscape” ▪ Evaluated a multitude of accelerators (CPUs, GPUs, FPGAs, ASICs) ▪ Same model (YoloV4), different batch sizes (1, 4, 8) ▪ Efficiency from 100GOPS/W to 1250 GOPS/W ▪ Identify different categories FPGA-based accelerators: • Flexibility • Reconfigurability • Efficiency
  • 8. 8 Accelerators – Xilinx DPU ● Baseline for evaluation of FPGA accelerators developed in VEDLIoT ● Xilinx Deep Learning Processor Unit (DPU) ○ Programmable engine for convolutional neural networks ○ Easy integration as an IP core in Xilinx UltraScale+ and Versal MPSoCs ○ Configurable hardware architecture (e.g., parallelism, memory/DSP usage) ● Large design space ○ Goal: Find the best suitable implementation for your requirements
  • 9. 9 Dynamic reconfiguration of Xilinx DPU ● Change the characteristics of the DL accelerator at run-time (e.g., change performance-power trade-off or performance-accuracy trade-off) Different modes of operation: • High-performance versus low-power • City versus highway driving • …
  • 10. 10 STANN – Synthesis Templates for ANNs (1/2) ● Library for simple yet efficient generation of DL-accelerators on FPGAs ● Templates for common layers ○ Network architecture parameterizable e.g., number of neurons ○ Hardware implementation parameterizable e.g., parallelism of processing units ● Resource efficiency by flexible quantization ○ Floating point and integer from 32bit to 8bit ● High level synthesis enables fast design space exploration ○ Automatic code generation based on ONNX description ○ Highly parameterizable: Reuse of hardware blocks vs. parallel execution
  • 11. 11 STANN – Synthesis Templates for ANNs (2/2) ● STANN enables inference and training on FPGAs ● Training with Dataflow Architecture ○ Forward path similar to inference, but needs to store more intermediate values ○ Backpropagation and weight update module for each layer ● Fast, but uses a lot of resources ● Well suited for small networks, used, e.g., in deep reinforcement learning ● Application example: Motor control ○ DQN (Deep Q-Network) replaces manual parameter tuning ○ Used in OC Project Power Edge RL Rothmann, M.; Porrmann, M.: STANN – Synthesis Templates for Artificial Neural Network Inference and Training. In: 17th International Work-Conference on Artificial Neural Networks, IWANN 2023, Ponta Delgada, Azores, Portugal, June 19-21, 2023
  • 12. 12 Accelerators - FiBHA (1/3) "FiBHA: Fixed Budget Hybrid CNN Accelerator", Fareed Qararyah, Muhammad Waqar Azhar, Pedro Trancoso, IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2022), Bordeaux, France, November 2–5 2022 ? Generic ↔ Dedicated
  • 13. 13 Accelerators - FiBHA (2/3) "FiBHA: Fixed Budget Hybrid CNN Accelerator", Fareed Qararyah, Muhammad Waqar Azhar, Pedro Trancoso, IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2022), Bordeaux, France, November 2–5 2022 Monolithic design ● One engine computes all the core layers ● E.g. TPU SEML ● One engine computes all layers of the same type ● PW engine, DW engine SESL ● One engine per layer ● E.g. FINN FiBHA ● SESL + SEML
  • 14. 14 Accelerators - FiBHA (3/3) ● FiBHA compared to both alternatives ○ Up to 4x throughput improvement compared to SESL (FINN) ■ Better use of the resource budget ○ Up to 1.7X throughput improvement compared to SEML ■ Capturing more heterogeneity ● FiBHA compared to SEML ○ Representative set of heterogeneous CNNs ○ Various resource budgets ■ 1024 PEs - 4096 PEs ○ FiBHA constantly outperform SEML "FiBHA: Fixed Budget Hybrid CNN Accelerator", Fareed Qararyah, Muhammad Waqar Azhar, Pedro Trancoso, IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2022), Bordeaux, France, November 2–5 2022
  • 15. 15 Memory Analysis/Recommendation Tool - Rainbow ● Set of different analyses for on-chip memory and off-chip data transfers ● Optimizers ○ Optimal data reuse ○ Co-design for multi- precision model quantization ○ Data reuse with batch execution ● Heterogeneous execution plans "RAINBOW: Multi-Dimensional Hardware-Software Co-Design for DL Accelerator On-Chip Memory", S. Zouzoula, M. W. Azhar, P. Trancoso, 2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2023), pp. 1-3, April 2023
  • 16. 16 ▪ Optimizing DL models ▪ Harware-aware optimizations ▪ Model compression without loss of accuracy Model-Accelerator Co-Design
  • 17. 17 ▪ Optimizing DL models ▪ Harware-aware optimizations ▪ Model compression without loss of accuracy ▪ Hardware software co-design ▪ Reconfigurable (FPGA) accelerators ▪ Template-based description ▪ Heterogenenous engines Model-Accelerator Co-Design Co-design
  • 18. 18 Integration of Deep Learning into IoT devices with restricted computing capabilities and minimal power consumption requirements – energy-efficient computing ▪ Cognitive IoT hardware platform with tailored hardware components and accelerators: from embedded systems to edge computing and cloud platforms ▪ Wide range of accelerator designs from off-the-shelf to FPGA-based generic and dedicated engines ▪ Dynamic reconfiguration for increased efficiency ▪ Memory analysis and recommendation for design space exploration, configuration, and execution plans ▪ Model-Hardware co-design loop for optimized solutions Summary Co-Design with wide range of options for accelerator designs Most energy-efficient solution for a particular application and constraints
  • 19. 19 Thank you for your attention.