HiPEAC-CSW 2022_Pedro Trancoso presentation

1
Agenda
▪ 11:30 – 12:00 EEST (10:30 – 11:00 CEST)
Introduction to VEDLIoT
Pedro Trancoso (Chalmers University of Technology)
▪ 12:00 – 12:25 EEST (11:00 – 11:25 CEST)
VEDLIoT Hardware Platforms
Kevin Mika (Bielefeld University)
▪ 12:25 – 12:45 EEST (11:25 – 11:45 CEST)
Performance Evaluation and
Benchmarking in VEDLIoT
Mario Pormann (Osnabrueck University)
HAccIoT: Heterogeneous Hardware
Acceleration for Edge and IoT

2
Agenda
▪ 11:30 – 12:00 EEST (10:30 – 11:00 CEST)
▪ 12:00 – 12:25 EEST (11:00 – 11:25 CEST)
▪ 12:25 – 12:45 EEST (11:25 – 11:45 CEST)

Pedro Trancoso
Chalmers University of Technology
27. April 2022
Project Overview

4
FUTURE…
▪ Deep Learning: Solve more challenging & complex
problems
▪ Everywhere: transportation & industry & home
▪ Systems: Performance + security & privacy &
robustness
Motivation
VEDLIoT offers a framework for the next generation
internet based on IoT devices that collaboratively solve
complex DL applications across a distributed system

5
▪ Platform
▪ Hardware: Scalable, heterogeneous, distributed
▪ Accelerators: Efficiency boost by FPGA and ASIC technology
▪ Toolchain: Optimizing Deep Learning for IoT
▪ Use cases
▪ Industrial IoT
▪ Automotive
▪ Smart Home
▪ Open call
▪ At project mid-term
▪ Early use and evaluation of VEDLIoT technology
Very Efficient Deep Learning for IoT – VEDLIoT
▪ Call: H2020-ICT2020-1
▪ Topic: ICT-56-2020 Next Generation Internet of Things
▪ Duration: 1. November 2020 – 31. Oktober 2023
▪ Coordinator: Bielefeld University (Germany)
▪ Overall budget: 7 996 646.25 €
▪ Consortium: 12 partners from 4 EU countries
(Germany, Poland, Portugal and Sweden) and one
associated country (Switzerland).
More info:
⇒ https://www.vedliot.eu/
⇒ https://twitter.com/VEDLIoT
⇒ https://www.linkedin.com/company/vedliot/

7
▪ Focus on collision detection/avoidance scenario
▪ Improve performance/cost ratio – AI processing hardware
distributed over the entire chain
Use case: Automotive
Challenge:
Distribution
of work

8
Use case: Industrial IoT – drive condition classification
▪ Control applications need DL-based condition classification
▪ On the edge device for low power consumption
▪ Suggestions for control and maintenance
▪ DL methods on all communication layers
▪ DL in a distributed architecture
▪ Dynamically configured systems
▪ Sensored testbench with 2 motors
▪ Acceleration, Magnetic field, Temperature,
IR-Cam (temperature), Current-Sensors, Torque
▪ On / Off detection without
motor current or voltage
▪ Cooling fault detection
▪ Bearing fault detection
Challenge:
Low-power /
Efficiency

9
Use case: Industrial IoT – Arc detection
▪ AI based pattern recognition for different local sensor data
▪ current, magnetic field, vibration, temperature, low resolution infrared picture
▪ Safety critical nature
▪ response time should be <10ms
▪ AI based or AI supported decision made by the sensor node itself or by a local part of the sensor
network
Challenge:
Accuracy

10
▪ Increase safety, health and well being of residents – acceleration of
AI methods for demand-oriented user-home interaction
▪ Smart Mirror as central user interface
▪ Own mirror image can be seen normally
▪ Intuitive control over gesture and voice
▪ Shows personalized information
▪ Data privacy as the highest priority
▪ Edge computation of many neural networks
Use case: Smart Home / Assisted Living
Challenge:
Data privacy

11
▪ Face recognition
▪ Mobilenet SSD trained on WIDERFACE dataset
▪ Object detection
▪ YoloV3, Efficient-Net, yoloV4-tiny
▪ Gesture detection
▪ YoloV4-tiny with 3 Yolo layers (usually: 2 layers)
▪ Speech recognition
▪ Mozilla DeepSpeech
▪ AI Art: Style-Gan trained on works of arts
▪ Collect usage data in situation memory
Use case: Smart Mirror – Neural Networks
Challenge:
Multi-model

12
VEDLIoT‘s Deep Learning Toolchain
• Image
Classification
• Object Detection
• Semantic
Segmentation
• Instance
Segmentation
• Extractive
Question
Answering
Model Zoo Optimization
Engine
Compilers &
Runtime APIs
Heterogeneous
Hardware
Platforms

13
▪ End of Moore’s law & dark silicon – Domain Specific Architectures (DSA)
▪ Efficient, flexible, scalable accelerators for the compute continuum
▪ Algotecture – DL algorithm + computer architecture co-design
DL Accelerators
Evaluation of existing accelerators in
Kevin’s presentation!
Reconfigurable and Dynamically
Reconfigurable DL accelerators in
Mario’s presentation!
Now!

14
● Considering the characteristics of the algorithm (SW) when designing the
accelerator (HW) and vice versa
What is Codesign

15
● Designing the algorithm (SW) while considering accelerator (HW) characteristics
Partial Codesign

16
● DNN compression
○ Lossless compression (e.g. Huffman Coding)
○ Quantization
○ Pruning
● Compact models with efficiency-oriented layers (e.g. Efficientnet)
● Neural Architecture Search (NAS)
Partial Codesign (HW → SW)

17
● Designing the accelerator (HW) while considering algorithm (SW) characteristics
Partial Codesign

18
● Domain specific accelerators
● Optimize the common flow (DNN-oblivious)
● Optimize per model (DNN-specific)
● Neural Architecture Search (NAS)
Partial Codesign (SW → HW)

19
● Using techniques from both directions of partial co-design iteratively
Fully Simultaneous Codesign

20
● DNN Model-Oblivious
+ Generic
+ Design once, or very infrequently
- Less ability to harness model-based optimizations
● DNN Model-Specific
+ Model-tailored accelerator: exploit model-characteristics (layer types, parallelism
patterns, degree of sparsity) to gain more efficiency (speed, energy, area)
- Per-model design/ redesign
Model-Oblivious or Model-Specific?

21
● Evaluation of TVM-VTA
○ DNN models (e.g Resnet50)
○ Deploy to FPGA (CPU and CPU+VTA)
○ Collect metrics (e.g energy, inference time)
Model-Oblivious Accelerator: IP-Cores
https://tvm.apache.org/docs/topic/vta/index.html

22
● In progress
○ Xilinx FINN
■ dataflow-style architectures
■ customized for each network
■ Per-layer parallelization strategy
■ Supports Low bit-widths
● Future directions
○ Sparsity-aware and low bit-width DNN-
specific accelerator
○ Mixed precision
■ < 8 bits quantization
■ Intra-layer or interlayer or both?
Model-Specific Accelerator
Layer 1 Layer 2 Layer n
. . .

23
● DNN models (e.g MobilenetV3Small)
● Systolic array with support for different dataflows (ws, os)
● DRAM accesses for Input / Weights / Outputs
● Outcomes:
● Buffer size to minimize
DRAM accesses
● Memory technology based
on bandwidth
Memory characterization
https://github.com/scalesim-project/scale-sim-v2

24
VEDLIoT Hardware Platform
▪ Heterogeneous, modular, scalable microserver system
▪ Supporting the full spectrum of IoT from embedded over the edge towards the cloud
x86
GPU
ML-ASIC
ARM v8
GPU
SoC
FPGA
SoC
RISC-V
FPGA
VEDLIOT Cognitive
IoT Platform
More details in Kevin’s presentation!

25
Supporting co-design of distributed DL systems
• A compositional
architectural framework
eases co-design of different
concerns, e.g., AI Design,
Hardware, Communication.
• Each concern is considered
at different levels of
abstraction.

26
• A compositional
architectural framework
eases co-design of different
concerns, e.g., AI Design,
Hardware, Communication.
• Each concern is considered
at different levels of
abstraction.
• Quality concerns are “co-
designed” as own clusters
of concerns.
• That allows for “safety-by-
design”, “ethical-by-design”,
“secure-by-design”, etc.
Co-Design of safety, security, ethics, etc.

27
Security, Privacy and Trust (1)
▪ Leverage CPU hardware capabilities to create trusted execution environments.
▪ Today, we are supporting Intel SGX (for cloud) and Arm TrustZone (for edge/IoT).
▪ WebAssembly is a novel binary standard for portable executable code.
▪ It may be obtained compiling software from many programming languages.
▪ We use trusted environments in WebAssembly.
▪ We provide security and privacy that is portable over cloud and edge
deployments.

28
Security, Privacy and Trust (2)
▪ IoT devices’ state is proved genuine thanks to
remote attestation.
▪ We introduce SIRE, a new security service to
attest, authenticate and authorize devices to
participate in an IoT distributed application.
▪ SIRE employs Byzantine Fault Tolerant algorithms
to operate correctly despite failures and intrusions.
▪ SIRE attest WebAssembly applications running on
IoT devices to provide trust.

29
Robustness monitoring
● Goals
○ Detect input/output data quality issues
○ Raise warnings to trigger possible
mitigation actions
● Local monitoring of data integrity
○ Consider predefined settings computed
during training
○ Consider past input and output data
● Remote monitoring of model integrity
○ Periodically send data to remote service
○ Run trustworthy model on these data
○ Compare to output of local model
● Applications
○ Time series: MLP Neural Network
○ Time series: LSTM
○ Classification: YOLO, IoU and scores

30
Simulation platform for IoT
• Open source framework for software/hardware co-development with CI-driven testing
capabilities, as well as metrics for measuring efficiency of ML workloads
• Enables development and continuous testing of VEDLIoT’s Machine Learning solutions
• Renode is available to all project members and future users of VEDLIoT and will include a
simulated model of the RISC-V-based FPGA SoC platform developed as part of the VEDLIoT
project

31
▪ Kenning is an open source framework for deploying, running and benchmarking various AI frameworks on
different hardware platforms
▪ It aims towards providing wrappers for deep learning deployment steps that can be seamlessly combined
regardless of underlying deep learning frameworks and compilers
▪ Available on GitHub
▪ https://github.com/antmicro/kenning
▪ https://antmicro.github.io/kenning/
Benchmarking Edge AI frameworks

32
▪ Platform
▪ Hardware: Scalable, heterogeneous, distributed
▪ Accelerators: Efficiency boost by FPGA and ASIC technology
▪ Toolchain: Optimizing Deep Learning for IoT
▪ Use cases
▪ Industrial IoT
▪ Automotive
▪ Smart Home
▪ Open call
▪ At project mid-term
▪ Early use and evaluation of VEDLIoT technology
Very Efficient Deep Learning for IoT – VEDLIoT
▪ Call: H2020-ICT2020-1
▪ Topic: ICT-56-2020 Next Generation Internet of Things
▪ Duration: 1. November 2020 – 31. Oktober 2023
▪ Coordinator: Bielefeld University (Germany)
▪ Overall budget: 7 996 646.25 €
▪ Consortium: 12 partners from 4 EU countries
(Germany, Poland, Portugal and Sweden) and one
associated country (Switzerland).
More info:
⇒ https://www.vedliot.eu/
⇒ https://twitter.com/VEDLIoT
⇒ https://www.linkedin.com/company/vedliot/

33
33
Agenda
▪ 11:30 – 12:00 EEST (10:30 – 11:00 CEST)
▪ 12:00 – 12:25 EEST (11:00 – 11:25 CEST)
▪ 12:25 – 12:45 EEST (11:25 – 11:45 CEST)

HiPEAC-CSW 2022_Pedro Trancoso presentation

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a HiPEAC-CSW 2022_Pedro Trancoso presentation

Semelhante a HiPEAC-CSW 2022_Pedro Trancoso presentation (20)

Mais de VEDLIoT Project

Mais de VEDLIoT Project (20)

Último

Último (20)

HiPEAC-CSW 2022_Pedro Trancoso presentation