SlideShare uma empresa Scribd logo
1 de 19
The LEGaTO project has received funding from the European Union's Horizon 2020 research and
innovation programme under the grant agreement No 780681
LEGaTO Integration
LEGaTO thematic session – HiPEAC CSW
Autumn 2020
Xavier Martorell
19/01/2020 2
• OmpSs integration with Xitao
• OmpSs support for CUDA and OpenCL environments
• OmpSs with support for Xilinx FPGAs (integrated and discrete)
• OmpSs integration with Dfiant
• OmpSs integration with Maxeler
• OmpSs integration with SGX
• Linter tool
• Eclipse plugins
• Conclusions and Future Work
Outline
19/01/2020 3
• Targeting SMP and big-LITTLE
environments
• Nanos6 and XiTAO runtimes coexisting
to execute their own tasks
• Use taskset to separate core
resources between the two models
OmpSs integration with XiTAO
OmpSs@XiTAO
Application
GCC
Programmer splits the code
Nanos6 tasksXiTAO tasks
OmpSs@XiTAO.elf SMP
Nanos6
XiTAO
Main app
19/01/2020 4
• Source code including OmpSs and XiTAO tasks
OmpSs integration with XiTAO
#pragma oss task for shared(A_omp, C_omp, B, openmp_work_size, N)
for(size_t i = 0; i < openmp_work_size; ++i) {
for(size_t k = 0; k < N; ++k) {
for(size_t j = 0; j < N; ++j) {
C_omp[i * N + j] += A_omp[i * N + k] * B[k * N + j];
}
}
}
/*! this TAO will take two matrices and multiply them.
This TAO implements internal dynamic scheduling.*/
class MatVecTAO : public AssemblyTask
{
public:
//! Inherited pure virtual function that is called by the runtime upon executing the TAO.
void execute(int threadid)
{
// int tid = threadid - leader;
size_t li = i++;
while(li < nrows){
for (size_t j = 0; j < N; ++j) {
for(size_t k = 0; k < N; ++k) {
C[li*N + j] += A[li*N + k] * B[k*N + j];
}
}
li = i++;
}
}
OmpSs
TAO
19/01/2020 5
• AMD GPUs
• Intel/Altera FPGAs
• Using existing kernels
• “Implements”
allows to execute
kernels on the 3
architectures
OmpSs support for OpenCL and CUDA kernels
OmpSs@OpenCL running on AMD GPUs and FPGA Terasic DE5net_a7 / Attila / Stratix 10 boards
OmpSs Application
Mercurium
GCC
Nanos
Extrae Instr.
OmpSs.elf
Acc code
SMP
Acc
OmpSs phase OpenCL phase
Code generation
OpenCL compiler
Host code + Nanos calls
OpenCL (.cl)
files
Object files
CUDA phase
Nvidia CUDA compiler
CUDA (.cu) files
• Allow the runtime system to execute the same functionality in diverse
resources at the same time
o CPU: optimized implementation (MKL, OpenBLAS…)
o GPU: optimized kernel / CUBlas
o FPGA: synthetized kernel (OpenCL, HLS, [Maxeler, DFIANT],…)
16/10/2020
“Implements” approach
#pragma omp target device(smp) copy_deps
#pragma omp task in([bsize*bsize]A, [bsize*bsize]B) inout([bsize*bsize]C)
void matrixMult(REAL *C, REAL *A, REAL * B, int wa, int bsize)
{
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, bsize, bsize, bsize,
1.0f, A, bsize, B, bsize, 1.0f, C, bsize);
}
SMP
• Adding versions for the same task
o OpenCL
o CUDA
16/10/2020
“Implements” approach
#pragma omp target device(opencl) ndrange(2,NB,NB,BL_SIZE,BL_SIZE) copy_deps implements(matrixMult)
#pragma omp task inout([NB*NB]C) in([NB*NB]A,[NB*NB]B)
__kernel void matrixMult_opencl(__global REAL* C,__global REAL* A, __global REAL* B,int wA, int wB);
#pragma omp target device(cuda) ndrange(2,NB,NB,BL_SIZE,BL_SIZE) copy_deps implements(matrixMult)
#pragma omp task inout([NB*NB]C) in([NB*NB]A,[NB*NB]B)
__global__ void matrixMult_cuda(REAL* C, REAL* A, REAL * B, int wA, int wB);
FPGA
GPGPU
• Matrix multiplication (blocked), using versions
16/10/2020
“Implements” approach
void matmul( int m, int l, int n, int mDIM, int lDIM, int nDIM, REAL **tileA, REAL **tileB, REAL **tileC )
{
int i, j, k;
for(i = 0;i < mDIM; i++){
for (j = 0; j < nDIM; j++){
for (k = 0; k < lDIM; k++){
//Kernel call
matrixMult(tileC[i*nDIM+j], tileA[i*lDIM+k], tileB[k*nDIM+j],NB,NB);
}
}
}
#pragma omp taskwait
}
SMP
FPGA
GPGPU
• OmpSs with OpenCL on FPGA and GPU
16/10/2020
“Implements” approach
Intel 4-core i7-7700 @3.6GHz with 2 ht/core
Nvidia GeForce GTX TITAN X
Intel Arria 10 de5net_a7
0
50
100
150
200
250
300
350
400
Performance of Matrix Multiplication (2048x2048)
Gflop/s Gflop/s max
19/01/2020 10
• Targeting Xilinx FPGAs
List of supported FPGAs:
 Zynq-7000, 32 bits (Xilinx ZC702,
ZC706, Digilent Zedboard, Zybo)
 Zynq-U+, 64 bits (AXIOM board,
Trenz board, Xilinx ZCU102)
 Ported to the COM Express board
from LEGaTO
 Alpha-Data (discrete)
 Xilinx Alveo U200 (disc.)
 Similar implementation for
Maxeler target (discrete)
OmpSs support for HLS kernels
OmpSs@FPGA Compilation environment: improvement of autoVivado
19/01/2020 11
• Single source parallel programming
• FPGA and cores used at the same time
OmpSs@FPGA
“Implements” approach
#pragma omp target device(fpga) implements(matrix_multiply) num_instances(3)
#pragma omp task in(a,b) inout(c)
void matrix_multiply_fpga(float a[BS][BS], float b[BS][BS],
float c[BS][BS]);
#pragma omp target device(smp) copy_deps
#pragma omp task depend(in:a,b) depend (inout:c)
void matrix_multiply(float a[BS][BS], float b[BS][BS],
float c[BS][BS]);
SMP
FPGA
• implements() indicates
equivalent functions
in SMP and FPGA
• num_instances() allows
to generate the
indicated number of
IP accelerators
19/01/2020 12
Evaluation
• Environment: Xilinx ZCU102
• 4 Cortex-A53 Arm cores
• 1 Ultrascale+ FPGA
• Increasing data bus width
• 32 to 128 bits
• “Implements” allows cores
contribute to performance
• 130-150 Gflops sustained
depending on data bus width
…
for (i=0; i<NB; i++)
for (j=0; j<NB; j++)
for (k=0; k<NB; k++)
matrix_multiply(A[i][k], B[k][j], C[i][j]);
…
19/01/2020 13
• Targeting Xilinx FPGAs
• DFiant kernels
managed as
HLS kernels
Running DFiant kernels
AutoAIT tool
OmpSs@FPGA
Application
Mercurium
GCC
Nanos
FPGA dmalib
Extrae OMPT
OmpSs.elf
bitstream
SMP
FPGA
OS (Linux) +
Platform Device Tree +
FPGA DMA driver
(BOOT.bin, image.ub)
OmpSs phase FPGA phase
Code generation
Vivado HLS
Hardware
generation
Netlist
Task
Manager DMA
engines
Host code + Nanos calls
Instrumentation
Interconnect
Vivado
OmpSs.elf
bitstream
SMP
FPGA
DFiant compiler
DFiant kernels
19/01/2020 14
• Nanos runtime has been
integrated with Maxeler using the
SLiC interface
• Maxeler kernels
compiled offline
and invoked at
runtime
[similar to CUDA/OpenCL]
Running Maxeler kernels
OmpSs@Maxeler
Application
GCC
Programmer splits the code
Source code with calls
to Nanos++ runtime
Rewrite to Maxj version
Nanos
libmaxeleros
OmpSs.elf
bitstream
SMP
Maxeler
OS (Linux) +
MaxelerOS
driver
MaxJ Compiler
Slic object file
19/01/2020 15
• Nanos can invoke secure tasks
through SGX
Integration with SGX tasks
OmpSs@SGX
Application
GCC
Source code with calls
to Nanos++ runtime
Programmer adapts
tasks to SGX
Nanos
Enclave support
OmpSs.elf SMP
OS (Linux) +
iSGX driver
Compile for SGX
Enclave
kernels
Mercuriu
m
19/01/2020 16
• Profile support for OmpSs apps
• Collecting application memory accesses
• Based on the Pin tool
• Goals
• Identify issues with data annotations
• Generate a report with the problems
found
OmpSs Linter
Improving debugging of OmpSs applications
Linter
Trace
generator
Pin VM
Trace
Application
OmpSs-2
program
Debug
information
Nanos
Trace
processor
Task
primitives
Instrument API
Events
Task state
transitions
16/10/2020
IDE Plugin
Support in Eclipse
• Support for OpenMP and OmpSs development in eclipse
• Plugins developed
• Support for most of the programming models directives and clauses
• Including small help descriptions
• Based on context, with autocompletion
• Integration of the compilation environment
• Eclipse Che
16/10/2020
Eclipse CHE
Compilation environment
16/10/2020
Conclusions and Future Work
• LEGaTO integrations
• Tools to support GPUs, and HLS and OpenCL FPGAs
oFor task offloading and application acceleration
• Task scheduling with OmpSs and XiTAO
• Prototypes supporting DFiant, Maxeler, and secure tasks
• Linter support for detecting data races
• Eclipse plug-in for programming assistance
• Now preparing the code distribution with all these contributions
• Further experimentation with the LEGaTO benchmarks

Mais conteúdo relacionado

Mais procurados

An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff
 An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff
An Open Discussion of RISC-V BitManip, trends, and comparisons _ CuffRISC-V International
 
BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!Linaro
 
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...Shinya Takamaeda-Y
 
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux DeviceAdding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux DeviceSamsung Open Source Group
 
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...Linaro
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Linaro
 
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
An Open Discussion of RISC-V BitManip, trends, and comparisons _ ClaireRISC-V International
 
LAS16-405:OpenDataPlane: Software Defined Dataplane leader
LAS16-405:OpenDataPlane: Software Defined Dataplane leaderLAS16-405:OpenDataPlane: Software Defined Dataplane leader
LAS16-405:OpenDataPlane: Software Defined Dataplane leaderLinaro
 
IoTivity for Automotive IoT Interoperability
IoTivity for Automotive IoT InteroperabilityIoTivity for Automotive IoT Interoperability
IoTivity for Automotive IoT InteroperabilitySamsung Open Source Group
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPThomas Graf
 
PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern F...
PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern F...PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern F...
PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern F...Shinya Takamaeda-Y
 
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Thin...
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Thin...JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Thin...
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Thin...Samsung Open Source Group
 
A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...
A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...
A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...Shinya Takamaeda-Y
 
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
RISC-V  and OpenPOWER open-ISA and open-HW - a swiss army knife for HPCRISC-V  and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPCGanesan Narayanasamy
 

Mais procurados (20)

An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff
 An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff
 
RISC-V 30908 patra
RISC-V 30908 patraRISC-V 30908 patra
RISC-V 30908 patra
 
Hands on OpenCL
Hands on OpenCLHands on OpenCL
Hands on OpenCL
 
BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!
 
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
 
Run Your Own 6LoWPAN Based IoT Network
Run Your Own 6LoWPAN Based IoT NetworkRun Your Own 6LoWPAN Based IoT Network
Run Your Own 6LoWPAN Based IoT Network
 
64-bit Android
64-bit Android64-bit Android
64-bit Android
 
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux DeviceAdding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
 
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509
 
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 
LAS16-405:OpenDataPlane: Software Defined Dataplane leader
LAS16-405:OpenDataPlane: Software Defined Dataplane leaderLAS16-405:OpenDataPlane: Software Defined Dataplane leader
LAS16-405:OpenDataPlane: Software Defined Dataplane leader
 
IoTivity for Automotive IoT Interoperability
IoTivity for Automotive IoT InteroperabilityIoTivity for Automotive IoT Interoperability
IoTivity for Automotive IoT Interoperability
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDP
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern F...
PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern F...PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern F...
PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern F...
 
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Thin...
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Thin...JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Thin...
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Thin...
 
A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...
A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...
A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...
 
Introduction to IoT.JS
Introduction to IoT.JSIntroduction to IoT.JS
Introduction to IoT.JS
 
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
RISC-V  and OpenPOWER open-ISA and open-HW - a swiss army knife for HPCRISC-V  and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
 

Semelhante a LEGaTO Integration

DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceLEGATO project
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeOfer Rosenberg
 
[OpenStack Day in Korea 2015] Track 1-6 - 갈라파고스의 이구아나, 인프라에 오픈소스를 올리다. 그래서 보이...
[OpenStack Day in Korea 2015] Track 1-6 - 갈라파고스의 이구아나, 인프라에 오픈소스를 올리다. 그래서 보이...[OpenStack Day in Korea 2015] Track 1-6 - 갈라파고스의 이구아나, 인프라에 오픈소스를 올리다. 그래서 보이...
[OpenStack Day in Korea 2015] Track 1-6 - 갈라파고스의 이구아나, 인프라에 오픈소스를 올리다. 그래서 보이...OpenStack Korea Community
 
Devoxx 2015 - Building the Internet of Things with Eclipse IoT
Devoxx 2015 - Building the Internet of Things with Eclipse IoTDevoxx 2015 - Building the Internet of Things with Eclipse IoT
Devoxx 2015 - Building the Internet of Things with Eclipse IoTBenjamin Cabé
 
Introduction to FreeRTOS
Introduction to FreeRTOSIntroduction to FreeRTOS
Introduction to FreeRTOSICS
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilersAnastasiaStulova
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGATO project
 
Tizen RT: A Lightweight RTOS Platform for Low-End IoT Devices
Tizen RT: A Lightweight RTOS Platform for Low-End IoT DevicesTizen RT: A Lightweight RTOS Platform for Low-End IoT Devices
Tizen RT: A Lightweight RTOS Platform for Low-End IoT DevicesSamsung Open Source Group
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...PROIDEA
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAFacultad de Informática UCM
 
20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Jorisimec.archive
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCinside-BigData.com
 
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...LEGATO project
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGATO project
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)Kirill Tsym
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingKernel TLV
 

Semelhante a LEGaTO Integration (20)

DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
[OpenStack Day in Korea 2015] Track 1-6 - 갈라파고스의 이구아나, 인프라에 오픈소스를 올리다. 그래서 보이...
[OpenStack Day in Korea 2015] Track 1-6 - 갈라파고스의 이구아나, 인프라에 오픈소스를 올리다. 그래서 보이...[OpenStack Day in Korea 2015] Track 1-6 - 갈라파고스의 이구아나, 인프라에 오픈소스를 올리다. 그래서 보이...
[OpenStack Day in Korea 2015] Track 1-6 - 갈라파고스의 이구아나, 인프라에 오픈소스를 올리다. 그래서 보이...
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
Devoxx 2015 - Building the Internet of Things with Eclipse IoT
Devoxx 2015 - Building the Internet of Things with Eclipse IoTDevoxx 2015 - Building the Internet of Things with Eclipse IoT
Devoxx 2015 - Building the Internet of Things with Eclipse IoT
 
Introduction to FreeRTOS
Introduction to FreeRTOSIntroduction to FreeRTOS
Introduction to FreeRTOS
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilers
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
 
Tizen RT: A Lightweight RTOS Platform for Low-End IoT Devices
Tizen RT: A Lightweight RTOS Platform for Low-End IoT DevicesTizen RT: A Lightweight RTOS Platform for Low-End IoT Devices
Tizen RT: A Lightweight RTOS Platform for Low-End IoT Devices
 
RISC V in Spacer
RISC V in SpacerRISC V in Spacer
RISC V in Spacer
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
CFD on Power
CFD on Power CFD on Power
CFD on Power
 
20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCC
 
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
 

Mais de LEGATO project

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitLEGATO project
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemLEGATO project
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsLEGATO project
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworkLEGATO project
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...LEGATO project
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGATO project
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edgeLEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGATO project
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGATO project
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneLEGATO project
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingLEGATO project
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edgeLEGATO project
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyLEGATO project
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...LEGATO project
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsLEGATO project
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingLEGATO project
 
Secure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXSecure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXLEGATO project
 
HiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat dataHiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat dataLEGATO project
 

Mais de LEGATO project (20)

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for Profit
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating system
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEs
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow Framework
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use Case
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edge
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming Models
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow Computing
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edge
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
 
Secure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXSecure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGX
 
HiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat dataHiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat data
 

Último

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to VirusesAreesha Ahmad
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 

Último (20)

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 

LEGaTO Integration

  • 1. The LEGaTO project has received funding from the European Union's Horizon 2020 research and innovation programme under the grant agreement No 780681 LEGaTO Integration LEGaTO thematic session – HiPEAC CSW Autumn 2020 Xavier Martorell
  • 2. 19/01/2020 2 • OmpSs integration with Xitao • OmpSs support for CUDA and OpenCL environments • OmpSs with support for Xilinx FPGAs (integrated and discrete) • OmpSs integration with Dfiant • OmpSs integration with Maxeler • OmpSs integration with SGX • Linter tool • Eclipse plugins • Conclusions and Future Work Outline
  • 3. 19/01/2020 3 • Targeting SMP and big-LITTLE environments • Nanos6 and XiTAO runtimes coexisting to execute their own tasks • Use taskset to separate core resources between the two models OmpSs integration with XiTAO OmpSs@XiTAO Application GCC Programmer splits the code Nanos6 tasksXiTAO tasks OmpSs@XiTAO.elf SMP Nanos6 XiTAO Main app
  • 4. 19/01/2020 4 • Source code including OmpSs and XiTAO tasks OmpSs integration with XiTAO #pragma oss task for shared(A_omp, C_omp, B, openmp_work_size, N) for(size_t i = 0; i < openmp_work_size; ++i) { for(size_t k = 0; k < N; ++k) { for(size_t j = 0; j < N; ++j) { C_omp[i * N + j] += A_omp[i * N + k] * B[k * N + j]; } } } /*! this TAO will take two matrices and multiply them. This TAO implements internal dynamic scheduling.*/ class MatVecTAO : public AssemblyTask { public: //! Inherited pure virtual function that is called by the runtime upon executing the TAO. void execute(int threadid) { // int tid = threadid - leader; size_t li = i++; while(li < nrows){ for (size_t j = 0; j < N; ++j) { for(size_t k = 0; k < N; ++k) { C[li*N + j] += A[li*N + k] * B[k*N + j]; } } li = i++; } } OmpSs TAO
  • 5. 19/01/2020 5 • AMD GPUs • Intel/Altera FPGAs • Using existing kernels • “Implements” allows to execute kernels on the 3 architectures OmpSs support for OpenCL and CUDA kernels OmpSs@OpenCL running on AMD GPUs and FPGA Terasic DE5net_a7 / Attila / Stratix 10 boards OmpSs Application Mercurium GCC Nanos Extrae Instr. OmpSs.elf Acc code SMP Acc OmpSs phase OpenCL phase Code generation OpenCL compiler Host code + Nanos calls OpenCL (.cl) files Object files CUDA phase Nvidia CUDA compiler CUDA (.cu) files
  • 6. • Allow the runtime system to execute the same functionality in diverse resources at the same time o CPU: optimized implementation (MKL, OpenBLAS…) o GPU: optimized kernel / CUBlas o FPGA: synthetized kernel (OpenCL, HLS, [Maxeler, DFIANT],…) 16/10/2020 “Implements” approach #pragma omp target device(smp) copy_deps #pragma omp task in([bsize*bsize]A, [bsize*bsize]B) inout([bsize*bsize]C) void matrixMult(REAL *C, REAL *A, REAL * B, int wa, int bsize) { cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, bsize, bsize, bsize, 1.0f, A, bsize, B, bsize, 1.0f, C, bsize); } SMP
  • 7. • Adding versions for the same task o OpenCL o CUDA 16/10/2020 “Implements” approach #pragma omp target device(opencl) ndrange(2,NB,NB,BL_SIZE,BL_SIZE) copy_deps implements(matrixMult) #pragma omp task inout([NB*NB]C) in([NB*NB]A,[NB*NB]B) __kernel void matrixMult_opencl(__global REAL* C,__global REAL* A, __global REAL* B,int wA, int wB); #pragma omp target device(cuda) ndrange(2,NB,NB,BL_SIZE,BL_SIZE) copy_deps implements(matrixMult) #pragma omp task inout([NB*NB]C) in([NB*NB]A,[NB*NB]B) __global__ void matrixMult_cuda(REAL* C, REAL* A, REAL * B, int wA, int wB); FPGA GPGPU
  • 8. • Matrix multiplication (blocked), using versions 16/10/2020 “Implements” approach void matmul( int m, int l, int n, int mDIM, int lDIM, int nDIM, REAL **tileA, REAL **tileB, REAL **tileC ) { int i, j, k; for(i = 0;i < mDIM; i++){ for (j = 0; j < nDIM; j++){ for (k = 0; k < lDIM; k++){ //Kernel call matrixMult(tileC[i*nDIM+j], tileA[i*lDIM+k], tileB[k*nDIM+j],NB,NB); } } } #pragma omp taskwait } SMP FPGA GPGPU
  • 9. • OmpSs with OpenCL on FPGA and GPU 16/10/2020 “Implements” approach Intel 4-core i7-7700 @3.6GHz with 2 ht/core Nvidia GeForce GTX TITAN X Intel Arria 10 de5net_a7 0 50 100 150 200 250 300 350 400 Performance of Matrix Multiplication (2048x2048) Gflop/s Gflop/s max
  • 10. 19/01/2020 10 • Targeting Xilinx FPGAs List of supported FPGAs:  Zynq-7000, 32 bits (Xilinx ZC702, ZC706, Digilent Zedboard, Zybo)  Zynq-U+, 64 bits (AXIOM board, Trenz board, Xilinx ZCU102)  Ported to the COM Express board from LEGaTO  Alpha-Data (discrete)  Xilinx Alveo U200 (disc.)  Similar implementation for Maxeler target (discrete) OmpSs support for HLS kernels OmpSs@FPGA Compilation environment: improvement of autoVivado
  • 11. 19/01/2020 11 • Single source parallel programming • FPGA and cores used at the same time OmpSs@FPGA “Implements” approach #pragma omp target device(fpga) implements(matrix_multiply) num_instances(3) #pragma omp task in(a,b) inout(c) void matrix_multiply_fpga(float a[BS][BS], float b[BS][BS], float c[BS][BS]); #pragma omp target device(smp) copy_deps #pragma omp task depend(in:a,b) depend (inout:c) void matrix_multiply(float a[BS][BS], float b[BS][BS], float c[BS][BS]); SMP FPGA • implements() indicates equivalent functions in SMP and FPGA • num_instances() allows to generate the indicated number of IP accelerators
  • 12. 19/01/2020 12 Evaluation • Environment: Xilinx ZCU102 • 4 Cortex-A53 Arm cores • 1 Ultrascale+ FPGA • Increasing data bus width • 32 to 128 bits • “Implements” allows cores contribute to performance • 130-150 Gflops sustained depending on data bus width … for (i=0; i<NB; i++) for (j=0; j<NB; j++) for (k=0; k<NB; k++) matrix_multiply(A[i][k], B[k][j], C[i][j]); …
  • 13. 19/01/2020 13 • Targeting Xilinx FPGAs • DFiant kernels managed as HLS kernels Running DFiant kernels AutoAIT tool OmpSs@FPGA Application Mercurium GCC Nanos FPGA dmalib Extrae OMPT OmpSs.elf bitstream SMP FPGA OS (Linux) + Platform Device Tree + FPGA DMA driver (BOOT.bin, image.ub) OmpSs phase FPGA phase Code generation Vivado HLS Hardware generation Netlist Task Manager DMA engines Host code + Nanos calls Instrumentation Interconnect Vivado OmpSs.elf bitstream SMP FPGA DFiant compiler DFiant kernels
  • 14. 19/01/2020 14 • Nanos runtime has been integrated with Maxeler using the SLiC interface • Maxeler kernels compiled offline and invoked at runtime [similar to CUDA/OpenCL] Running Maxeler kernels OmpSs@Maxeler Application GCC Programmer splits the code Source code with calls to Nanos++ runtime Rewrite to Maxj version Nanos libmaxeleros OmpSs.elf bitstream SMP Maxeler OS (Linux) + MaxelerOS driver MaxJ Compiler Slic object file
  • 15. 19/01/2020 15 • Nanos can invoke secure tasks through SGX Integration with SGX tasks OmpSs@SGX Application GCC Source code with calls to Nanos++ runtime Programmer adapts tasks to SGX Nanos Enclave support OmpSs.elf SMP OS (Linux) + iSGX driver Compile for SGX Enclave kernels Mercuriu m
  • 16. 19/01/2020 16 • Profile support for OmpSs apps • Collecting application memory accesses • Based on the Pin tool • Goals • Identify issues with data annotations • Generate a report with the problems found OmpSs Linter Improving debugging of OmpSs applications Linter Trace generator Pin VM Trace Application OmpSs-2 program Debug information Nanos Trace processor Task primitives Instrument API Events Task state transitions
  • 17. 16/10/2020 IDE Plugin Support in Eclipse • Support for OpenMP and OmpSs development in eclipse • Plugins developed • Support for most of the programming models directives and clauses • Including small help descriptions • Based on context, with autocompletion • Integration of the compilation environment • Eclipse Che
  • 19. 16/10/2020 Conclusions and Future Work • LEGaTO integrations • Tools to support GPUs, and HLS and OpenCL FPGAs oFor task offloading and application acceleration • Task scheduling with OmpSs and XiTAO • Prototypes supporting DFiant, Maxeler, and secure tasks • Linter support for detecting data races • Eclipse plug-in for programming assistance • Now preparing the code distribution with all these contributions • Further experimentation with the LEGaTO benchmarks