The increasing demand for computing power in fields such as biology, finance, machine learning is pushing the adoption of reconfigurable hardware in order to keep up with the required performance level at a sustainable power consumption. Within this context, FPGA devices represent an interesting solution as they combine the benefits of power efficiency, performance and flexibility. Nevertheless, the steep learning curve and experience needed to develop efficient FPGA-based systems represents one of the main limiting factor for a broad utilization of such devices.
In this talk, I will first present CAOS, a framework which helps the application designer in identifying acceleration opportunities and guides through the implementation of the final FPGA-based system. The CAOS platform targets the full stack of the application optimization process, starting from the identification of the kernel functions to accelerate, to the optimization of such kernels and to the generation of the runtime management and the configuration files needed to program the FPGA. After CAOS, I will present the HUGenomics projects, based on the CAOS framework. The unique genetic profile of a species is leading to the development of customized treatments, from personalized medicine to agrigenomics, but the exponential growth of available genomic data requires a computational effort that may limit the progress of these fields. The HUGenomics framework aims at facilitating genome assembly process by means of both hardware accelerated algorithms and scientific data visualization tools. Indeed, the system raises the level of abstraction allowing users to easily integrate custom algorithms into the hardware pipeline without any knowledge of the underneath architecture.
1. Politecnico di Milano
WRC'2019: Workshop on Reconfigurable Computing!
Valencia @ 21 Jan, 2019!
Marco D. Santambrogio !
<marco.santambrogio@polimi.it>!
Politecnico di Milano!
2.2. | IL MARCHIO, IL LOGOTIPO: LE DECLINAZIONI
The NECSTLab Multi-Faceted
Experience with AWS F1!
Teaching, Research, Framework and Application stack!
10. Heterogeneous Complex Systems!
• Ryft ONE!
– Big Data infrastructure due to an FPGA-accellerated architecture!
– http://www.ryft.com/!
• IBM Power8!
– Introducing the Coherent Accelerator Processor Interface (CAPI) port that is
layered on top of PCI Express 3.0!
– http://www-304.ibm.com/webapp/set2/sas/f/capi/home.html!
• Microsoft Catapult!
– Stratix V (Arria 10 FPGA)!
– http://research.microsoft.com/en-us/projects/catapult/!
• Amazon EC2 F1 Instances!
– Xilinx UltraScale Plus FPGA!
– https://aws.amazon.com/about-aws/whats-new/2017/04/amazon-ec2-f1-
instances-customizable-fpgas-for-hardware-acceleration-are-now-generally-
available/!
• OpenPower Foundation!
– http://openpowerfoundation.org/!
10
29. CAOS Frontend – IR Generation!
• Functions extraction and generation of the
application call graph!
• Current implementation leverages Doxygen!
29
.c
.c
.c
f1
f2
f3
f6
f4
f5
f7
applica0on IR: call graph +
func0ons descrip0on
30. CAOS Frontend – applicability check!
• Verifies the applicability of an architectural
template w.r.t.:!
– Application!
– System description!
30
f1
f2
f3
f6
f4
f5
f7
IR
Architectural
template 1
Architectural
template 2
Architectural
template 3
f1
f2
f3
f6
f4
f5
f7
f1
f2
f3
f6
f4
f5
f7
HW candidate
31. CAOS Frontend – applicability check!
• Runs the application against multiple user-
defined datasets!
• For each functions collects:!
– Self execution time !
– Total execution time!
– Function calls!
31
IR
f1
f2
f3
f6
f4
f5
f7
Datasets
f1
f2
f3
f6
f4
f5
f7
Profiled IR
Total = 100%
Self = 2% - 4%
7-9 calls …
…
32. CAOS Frontend – HW/SW Partitioning!
• Identifies the subtree to accelerate for each
architectural template!
• If needed, translate the identified code for
subsequent optimizations (e.g. C to MaxJ)!
32
IR
f1
f2
f3
f6
f4
f5
f7 Self = 10%
Self = 2%
Self = 20%
f1
f2
f3
f6
f4
f5
f7 Self = 10%
Self = 2%
Self = 20%
Work done in collaboration with!
43. Hints on the problem…!
43
*
[*] Vipin, K. and Fahmy, S. A.: Architecture-aware reconfiguraEon-centric floorplanning for parEal reconfiguraEon. In ARC,
pages 13-25, 2012.
45. Objective function!
• Cost function can be defined starting from the
variables and parameters of the MILP model!
!
• Implemented metrics:!
– Global wirelength measured using HPWL ( )!
– Regions perimeter ( )!
– Wasted resources ( )!
45
47. Hints on the problem…!
47
*
[*] Vipin, K. and Fahmy, S. A.: Architecture-aware reconfiguraEon-centric floorplanning for parEal reconfiguraEon. In ARC,
pages 13-25, 2012.
48. Hints on the problem…!
• Optimal solution in 29s!
• 34% wasted frames
reduction!
– No DSP and CLB wasted
by the Video Decoder RR!
– No BRAM wasted by the
Signal Decoder RR!
• Approximately same
wirelength!
48
*
[*] Vipin, K. and Fahmy, S. A.: Architecture-aware reconfiguraEon-centric floorplanning for parEal reconfiguraEon. In ARC,
pages 13-25, 2012.
49. Evaluations!
[1, 2] Streaming Stencil Time-step (SST)!
[3] Pearson Correlation Coefficient, Asian Option Pricing!
[5] Protein Folding!
[4] Smith Waterman and Vessels Segmentation!
49
TABLE I. EXPERIMENTAL RESULTS
Case Study Board
Improvement wrt CPU
Performance Energy Efficiency
IV-A Virtex 7 3.68x 11.8x
IV-A Kintex 14.15x 45x
IV-B Virtex 7 1.61x 15.29x
IV-C Virtex 7 3.1x 2.2x
IV-D jacobi-2d Virtex 7 1.09x 12.9x
IV-D heat-3d Virtex 7 0.22x 2.46x
validated th
case studies
different fra
to ultimatel
HPC.
The wo
project run
[1]
[2]
[3]
[5]
[4]
[4]
[*] intel Xeon E5 1410
32 GB RAM
[*]
51. Some Applicative Domains !
for FPGA Acceleration!
• Image and Video Processing!
• Security!
• Machine Learning!
• Genomics!
• Financial Analytics!
• Big Data Analytics!
51
54. Open Challenges!
• It is necessary to keep-up with continuous
development of biological research !
54
55. Open Challenges!
• It is necessary to keep-up with continuous
development of biological research !
55
• Each individual DNA provides huge amount of
data!
56. Open Challenges!
• It is necessary to keep-up with continuous
development of biological research !
56
• Each individual DNA provides huge amount of
data!
• To produce a tailor-made drug, for each DNA:!
57. Personalized Medicine Today!
• FPGA-based acceleration!
– optimal ratio performance/power consumption!
– reconfigurability!
• Possibility to use pre-accelerated biological
pipelines!
• Available on-site or for AWS cloud!
57
58. HUGenomics!
58
An advanced support for genomic
research that,!
!
by means of reconfigurable hardware
accelerators,!
!
is capable of delivering massive
performance!
for fast-changing algorithms, letting
researchers !
!
to focus on delivering best-in-class
results in the least amount of time!
59. Rationale Behind HUG!
59
NO hardware competences required
Possibility to handle massive amount of data
Possibility to integrate custom code
Reduction in research time
60. Genome Assembly!
60
up to
658x
Performance Improvement
Smith-Waterman
(Software)
30h
Smith-Waterman
(HUGenomics)
2,5 mins
up to
2160x
Performance Improvement
Haplotype Caller - PairHMM
(Software)
10h
Haplotype Caller - PairHMM
(HUGenomics)
17s
63. Genomics HW Pipeline!
63
FAST
PROTOTYPING
CUSTOM HARDWARE
ALGORITHM
PIPELINE CREATION
OR INTEGRATION DATA
UPLOAD
PROCESSING DATA
VISUALIZATION
YESLINE
NOLINE
Is the algorithm
available on HUG?
64. Genomics HW Pipeline!
64
FAST
PROTOTYPING
CUSTOM HARDWARE
ALGORITHM
PIPELINE CREATION
OR INTEGRATION DATA
UPLOAD
PROCESSING DATA
VISUALIZATION
YESLINE
NOLINE
Is the algorithm
available on HUG?
65. Benefits of the AWS F1 Cloud
Compute Platform!
• Makes FPGA acceleration available to a large
community of developers, and to millions of
potential AWS users!
• Provides dedicated and large amounts of FPGA
logic with elasticity to scale to multiple FPGAs!
• Simplifies the development process by providing
cloud-based FPGA development tools!
• Provides a Marketplace for FPGA applications,
giving more choice, secure and easy access to
millions of AWS users!
65
82. A worldwide class to share
knowledge and preview game
changing technologies
What We Have!
83. A worldwide class to share
knowledge and preview game
changing technologies
CAD for efficient HW/SW
solutions for high performance
FPGA-based systems
What We Have!
84. CAD for efficient HW/SW
solutions for high performance
FPGA-based systems
A worldwide class to share
knowledge and preview game
changing technologies
An advanced support to
genomic research by
heterogeneous HW architectures
What We Have!
85. An advanced support to
genomic research by
heterogeneous HW architectures
A worldwide class to share
knowledge and preview game
changing technologies
CAD for efficient HW/SW
solutions for high performance
FPGA-based systems
What We Have!
86. CAD for efficient HW/SW
solutions for high performance
FPGA-based systems
An advanced support to
genomic research by
heterogeneous HW architectures
A worldwide class to share
knowledge and preview game
changing technologies
What We Have!
87. CAD for efficient HW/SW
solutions for high performance
FPGA-based systems
A worldwide class to share
knowledge and preview game
changing technologies
An advanced support to
genomic research by
heterogeneous HW architectures
What We Have!
88. heterogeneous HW architectures
preview game
changing technologies
CAD for
high performance
FPGA-based systems
What We Have!
89. heterogeneous HW architectures
preview game
changing technologies
CAD for
high performance
FPGA-based systems
hips://www.anandtech.com/show/12509/xilinx-announces-project-everest-fpga-soc-hybrid
What We Have!
92. Politecnico di Milano
Marco D. Santambrogio !
<marco.santambrogio@polimi.it>!
Politecnico di Milano!
2.2. | IL MARCHIO, IL LOGOTIPO: LE DECLINAZIONI
The NECSTLab Multi-Faceted
Experience with AWS F1!
Teaching, Research, Framework and Application stack!