VEDLIoT Cognitive IoT Hardware Platform. René Griessl. Workshop on Deep Learning for IoT (DL4IoT), co-located with HiPEAC 2022, Budapest, Hungary, June 2022
3. 3
VEDLIoT Hardware Platform
Heterogeneous, modular, scalable microserver system
Supporting the full spectrum of IoT from embedded over the edge towards the cloud
Different technology concepts for improving
x86
GPU
ML-ASIC
ARM v8
GPU
SoC
FPGA
SoC
RISC-V
FPGA
VEDLIOT Cognitive
IoT Platform
Performance
Cost-effectiveness
Maintainability
Reliability
Energy-Efficiency
Safety
4. 4
RECS|BOX Overview
RECS Server Backplane (up to 15 Carriers)
Carrier (PCIe Expansion)
Carrier (High Performance)
e.g. GPU-Accelerator
Carrier (Low Power)
#3
#2
Microserver
(High Performance)
#1
Microserver
(Low Power)
#16
#3
#2
Microserver
(Low Power)
#1
High-Speed Low-Latency Network (PCIe, High-Speed Serial)
Compute Network (up to 40 GbE)
Management Network (KVM, Monitoring, …)
HDMI/USB
iPass+ HD
QSFP+
RJ45
Ext. Connectors
GPU
SoC
FPGA
SoC
ARM
Soc
Low-Power Microserver
(Apalis/Jetson)
x86 ARM v8
High-Performance Microserver (COMExpress)
FPGA SoC
High-Performance
Carrier
(up to 3 microservers)
Low-Power Carrier
(up to 16 microservers)
5. 5
Server Architecture
• Microserver modules based on
established Computer on Module
standards
• COM Express
• Nvidia Jetson
• Toradex Apalis
Baseboard
Baseboard
Baseboard
3/16 Microservers
per Baseboard
Microserver Module
CPU Mem I/O
Backplane
Up to 15 Baseboards per Server
Microserver Module
CPU Mem I/O
Microserver Module
CPU Mem I/O
KVM & Monitoring
Storage and I/O-Extension
Ethernet (10/40 GbE)
High-Speed Low-Latency
Communication (>60 Gbit/s)
6. 6
Server Architecture
• Dedicated monitoring and control
network
• iKVM to every microserver
• Fine-grained monitoring of power,
voltage and temperature
• Distributed network of
microcontrollers for data-
aggregation and pre-processing
• High-speed monitoring
Baseboard
Baseboard
Baseboard
3/16 Microservers
per Baseboard
Microserver Module
CPU Mem I/O
Microserver Module
CPU Mem I/O
Microserver Module
CPU Mem I/O
Backplane
Up to 15 Baseboards per Server
Distributed
Monitoring
and KVM
Distributed
Monitoring
and KVM
KVM & Monitoring
Storage and I/O-Extension
Ethernet (10/40 GbE)
High-Speed Low-Latency
Communication (>60 Gbit/s)
7. 7
Server Architecture
• Multiple 1Gb/10Gb Ethernet links
per Microserver
• 40 Gb Ethernet from Baseboard
to Backplane
• Internally switched on Baseboard
and Backplane
Baseboard
Baseboard
Baseboard
3/16 Microservers
per Baseboard
Microserver Module
CPU Mem I/O
Microserver Module
CPU Mem I/O
Microserver Module
CPU Mem I/O
Backplane
Up to 15 Baseboards per Server
Distributed
Monitoring
and KVM
Ethernet
Communication
Infrastructure
Distributed
Monitoring
and KVM
Ethernet
Communication
Infrastructure
KVM & Monitoring
Storage and I/O-Extension
Ethernet (10/40 GbE)
High-Speed Low-Latency
Communication (>60 Gbit/s)
8. 8
Server Architecture
Baseboard
Baseboard
Baseboard
3/16 Microservers
per Baseboard
Microserver Module
CPU Mem I/O
Microserver Module
CPU Mem I/O
Microserver Module
CPU Mem I/O
Backplane
Up to 15 Baseboards per Server
Distributed
Monitoring
and KVM
High-speed
Low-latency
Communication
Ethernet
Communication
Infrastructure
Distributed
Monitoring
and KVM
Ethernet
Communication
Infrastructure
High-speed
Low-latency
Communication
KVM & Monitoring
Storage and I/O-Extension
Ethernet (10/40 GbE)
High-Speed Low-Latency
Communication (>60 Gbit/s)
• Multiple 1Gb/10Gb Ethernet links
per Microserver
• 40 Gb Ethernet from Baseboard
to Backplane
• Internally switched on Baseboard
and Backplane
9. 9
Server Architecture
Baseboard
Baseboard
Baseboard
3/16 Microservers
per Baseboard
Microserver Module
CPU Mem I/O
Microserver Module
CPU Mem I/O
Microserver Module
CPU Mem I/O
Backplane
Up to 15 Baseboards per Server
Distributed
Monitoring
and KVM
High-speed
Low-latency
Communication
Ethernet
Communication
Infrastructure
Distributed
Monitoring
and KVM
Storage /
I/O-Ext.
Ethernet
Communication
Infrastructure
High-speed
Low-latency
Communication
KVM & Monitoring
Storage and I/O-Extension
Ethernet (10/40 GbE)
High-Speed Low-Latency
Communication (>60 Gbit/s)
• Connection to storage
and I/O extensions
• Easy integration of
PCIe-based extension cards
and storage subsystems
10. 10
VEDLIoT Hardware Platform
Heterogeneous, modular, scalable microserver system
Supporting the full spectrum of IoT from embedded over the edge towards the cloud
Different technology concepts for improving
x86
GPU
ML-ASIC
ARM v8
GPU
SoC
FPGA
SoC
RISC-V
FPGA
VEDLIOT Cognitive
IoT Platform
Performance
Cost-effectiveness
Maintainability
Reliability
Energy-Efficiency
Safety
11. 11
t.RECS
Optimized platform for
local / edge applications
Provide interfaces for
Video
Camera
Peripheral input (USB)
Combine FPGA and
GPU acceleration
Compact dimensions
1 RU, E-ATX form factor
(2 RU/ 3 RU for special cases)
t.RECS Overview
Microserver #3
(COM-HPC Client)
Microserver #1
(COM-HPC Client)
Microserver #2
(COM-HPC Server)
Switched PCIe (Host to Host)
External
interfaces
PCIe
expansion
Ethernet (up to 10 GbE)
Management Network (KVM, Monitoring, …)
I/O (Camera, Display, Radar/Lidar, Audio)
12. 12
t.RECS Architecture
Modular architecture
1x Large Form Factor (SFF)
2x Small Form Factor (LFF)
Communication infrastructure
High-Speed Low-Latency via PCIe
Switched &
ring topology
Support for cache-coherent
accelerators (CCIX)
Switched ETH for
data (10 GbE) and management
(1GbE)
PCIe expansion slot for
additional accelerators (GPU or
FPGA)
Microserver
Client 2
(COM-HPC Client
Type A,B or C)
Microserver
Client 1
(COM-HPC Client
Type A or B)
Microserver
Server 1
(COM-HPC Server
Type D)
x8 lanes
x16
To PCIe slot
PCIe
Switch
13. 13
t.RECS Reconfigurable Communication
Infrastructure
a) A classical CPU-based clustering with PCIe host-2-host communication
b) A CPU-centric approach including two accelerators connected via PCIe as
endpoints
c) Ring topology using PCIe
d) Ring topology using Xilinx Aurora
14. 14
VEDLIoT Hardware Platform
Heterogeneous, modular, scalable microserver system
Supporting the full spectrum of IoT from embedded over the edge towards the cloud
Different technology concepts for improving
x86
GPU
ML-ASIC
ARM v8
GPU
SoC
FPGA
SoC
RISC-V
FPGA
VEDLIOT Cognitive
IoT Platform
Performance
Cost-effectiveness
Maintainability
Reliability
Energy-Efficiency
Safety
16. 16
u.RECS Architecture
• Two module slots
• 2 acc. slots
GPIO
CSI
USB 3
SMARC 2.1
FPGA
x86
ARM
Nvidia
Xavier
Jetson NX
M.2 M-Key
Accelerator / Storage
mPCIe
Accelerator /
Communication
HDMI
USB-C
Power
Barrel Plug
COM
Brick
PCIe x4
PCIe x1
PCIe x4
PCIe x4
GigE
Switch
SpE
Phy
Single Pair
Ethernet
2x RJ45
with PoE
GigE
GigE
GPIO
CSI
USB 2
HDMI
USB 3
power sensing
USB 3
Mux
USB 3 USB 3
BMC
ESP32
LoRa
• PCIe x4 Gen.3
• 1 Gbit Ethernet
• USB 3.0
• Battery powered
• Advanced power
measurement
• Board management
controller with WiFi,
BLE and LoRa
17. 17
RECS Power Measurement
• Power measurement for all microsevers
with 1 Hz sampling rate accessible via
graphana or web GUI
• Oscilloscope mode available with
1 Ksps sampling rate
19. 19
RECS|Box microserver & architectures
RECS|Box
Jetson TX2
NVIDIA
Tegra X2
COM Express
Intel Core i7
8th Gen
COM Express
ARM v8 Server
SoC Hi1616
COM Express
Intel Stratix 10
Jetson nano
NVIDIA
Xavier NX
COM Express
Xilinx Zynq 7045
Apalis
Exynos (2xARM
Cortex-A15)
Apalis
Xilinx Zynq 7020
COM Express
AMD Ryzen
V1807B
COM Express
AMD EPYC
3451
CPU
FPGA
SoC
GPU
SoC
Deneb
Durin
20. 20
RECS|Box microserver & architectures
Jetson TX2
NVIDIA
Tegra X2
COM Express
Intel Core i7
8th Gen
COM Express
ARM v8 Server
SoC Hi1616
COM Express
Intel Stratix 10
Jetson nano
NVIDIA
Xavier NX
COM Express
Xilinx Zynq 7045
Apalis
Exynos (2xARM
Cortex-A15)
Apalis
Xilinx Zynq 7020
COM Express
AMD Ryzen
V1807B
COM Express
AMD EPYC
3451
CPU
FPGA
SoC
GPU
SoC
t.RECS
Supports
COM Express
microserver
via adaptor
COM-HPC client
size B to
NVIDIA Xavier AGX
COM-HPC client
size B to
NVIDIA Orin AGX
COM-HPC
client size B
Xilinx Zynq
UltraScale+
COM-HPC
server size D
Intel Agilex
COM-HPC
client size A
Intel Core i7
11th Gen
COM-HPC
client size C
Intel Core i9
12th Gen
21. 21
RECS|Box microserver & architectures
uRECS
CPU
FPGA
SoC
GPU
SoC
Jetson nano
NVIDIA
Xavier NX
ML
Accel.
M.2 PCIe/ USB
Intel Myriad X
SMARC
Xilinx Zynq
UltraScale+
SMARC 2.1
NXP i.MX 8M
(4x Cortex-A53)
SMARC 2.1
Intel Atom
Raspberry Pi
Compute
Module 4
Xilinx Kria
K26
M.2
PCIe
Hailo-8
M.2 PCIe
Google Coral
TPU
Dual chip
Smarc 2.1
Coherent Logix
HX40416
RPi CM4
ARVSOM
22. 22
Summary
• VEDIoT provides a scalable modular and heterogeneous hardware platform for
next generation AIoT applications
• Wide variety of available micro servers with industry-proven form factors
• The integrated flexible and reconfigurable communication infrastructure enables
tight coupling between micro servers, resulting in highest energy efficiency and
performance
• Integrated management and monitoring enables comprehensive application
benchmarking and characterization
23. 23
Thank you for your
attention.
Contact
René Griessl
Bielefeld University, Germany
rgriessl@cit-ec.uni-bielefeld.de
24. 24
VEDLIoT Deep Learning Plattforms
Supported Computer-On-Module form factors
Raspberry Pi Compute
Module 4
Jetson Xavier NX
SMARC
Xilinx Kria
Jetson AGX Xavier
COM Express
(Type 6/7)
COM-HPC
Client (Type A-C)
COM-HPC
Server (Type D/E)
Size
(higher distance
is smaller)
I/O
Flexibility
Performance
Supported
Architectures
Market
Share
uRECS
RECS|Box
&
t.RECS
Notas do Editor
Energy aware Benchmark everyhting in addition to performance
Since we have all these FPGA accelerators
Much Module, so Wow
You can build system you need!