1. HLR
Core allocation and relocation
management for self dynamically
reconfigurable architectures
Reconfigurable Computing Italian Meeting
19 December 2008
Room S01, Politecnico di Milano - Milan (Italy)
Massimo Morandi: massimo.morandi@dresd.org
Marco Novati: marco.novati@dresd.org
3. Aims
Provide support for self partially and dynamically
reconfigurable systems:
Relocation support:
1D and 2D solutions
SW and HW solutions
Runtime Core Placement support with:
Low overhead
High versatility
Efficient use of resources
3
4. Reconfigurable architecture
A basic reconfigurable architecture consists of:
a Static area: a basic Harward architecture
a Reconfigurable area: an device area composed by
several reconfigurable regions
4
5. Basic Definitions
Core: a specific representation of a functionality. It is
Core
possible, for example, to have a core described in
VHDL, in C or in an intermediate representation (e.g. a
DFG)
IP-Core: a core described using a HD Language
Core
combined with its communication infrastructure (i.e.
the bus interface)
Reconfigurable Functional Unit: an IP-Core that can be
Unit
plugged and/or unplugged at runtime in an already
working architecture
Reconfigurable Region: a portion of the device area
Region
used to implement a reconfigurable core
5
6. Relocation: The Problem
Set of Available RFU
Functionalities Implementations
B
B 1/2 A
F 2/2
RR1 RR2 RR3 RR1 RR2 RR3
C
2/2
2/1 A
E 1/1
D 1/1
D
C
Legenda:
Fi
Area/Time RR1 RR2 RR3 RR1 RR2 RR3
E F
RR1 RR2 RR3 RR1 RR2 RR3
6
7. Relocation: Motivation
Area Area
Demanded Tasks
A
A
2/1 A B 1/2
B B
Rec. C Rec. C
2/2 C D 1/1
R2 D
C C D
Legenda:
E 1/1
Ti
Area/Time
Request Rec. D
R2 F
Sequnce
F 2/2
D
Rec. E
F
E
Rec. F
F
Time
7
8. Relocation: Rationale
Bitstreams relocation technique to:
speedup the overall system execution
reduce the amount of memory used to store partial bitstreams
achieve a core preemptive execution
assign at runtime the bitstreams placement
8
9. Proposed Relocation Solutions
Architectural support for relocation:
Create an integrated HW/SW system to manage online
relocation (1D and 2D) in reconfigurable architecture
Create efficient bitstream relocation solutions suitable
for the target system:
1D (BiRF) – 2D (BiRF Square)
HW (BiRF, BiRF Square) – SW (BAnMaT Lite)
9
13. Relocation Solutions Results (1/2)
BiRF, BiRF Square, BAnMaT Lite
Permit to support relocation in a self partially and
dynamically 1D or 2D reconfigurable system
The occupation ratio is relatively small
Frequency more than acceptable
Reduction of internal memory requirements
Throughput:
BiRF: 6 MB/s
BiRF Square: 7.3 MB/s
BAnMaT Lite: 2.6 MB/s
13
14. Relocation Solutions Results (2/2)
A total configuration file size is about 1 MB
Considering an architecture:
1/3 of the area as fixed part
2/3 as reconfigurable part with 6 slots
With such hypothesis
Size of a partial bitstream will be about 110 KB
Relocation time of about:
18 ms with BiRF
15 ms with BiRF Square
42 ms with BAnMaT Lite
14
16. Runtime Core Allocation Management
Choose where to place Cores to achieve:
Low Core Rejection Rate (CRR)
Fast application completion time
Small management overhead
Other policy driven goal
Choose how to maintain information on empty space
Keep all information (Expensive but more accurate)
Heuristically prune information (Cheaper)
16
17. Evaluation and Proposed Approach
Choice driven by:
Need for low complexity solution to reduce overhead at runtime
Desire to keep high flexibility, to best suit user needs
We propose heuristic (KNER-like) empty space manager:
Support both general and focused policy (in particular FF, BF, RA)
Suitable for dynamic schedule and blind schedule
Exploiting multiple RFUs per Core, to improve quality
17
18. The Online Placement Algorithm
The whole processing of a Core
is completed in linear time
18
19. Experiment 1: Routing Aware
Comparison against literature solutions
dynamic schedule scenario, RA placement policy
Measuring CRR, routing costs and overhead
Benchmark of 100 randomly generated Cores:
Size (5% to 20% of FPGA), randomly interconnected
19
20. Experiment 2: Application Completion Time
Benchmark applications composed of cores taken from
opencores.org like JPEG, AES, 3DES …
Blind Schedule, measure the time instants needed to complete
the applications with different amounts of resources
Infinite resources is shown, to compare against lower bound
20
21. Experiment 3: Multiple Shapes
Similar benchmark, but Cores have deadlines (for CRR)
Shapes defined with the heuristics described previously
Difference in runtime on average 30% more for 3
shapes and 40% more for 5 shapes w.r.t. 1 shape
CRR more than halved, often reduced to one third
21
22. Concluding Remarks
Original goals met:
Create efficient bitstream relocation solution suitable
for target systems:
1D - 2D
HW – SW
Create a Core allocation manager with:
Low overhead
High efficiency (CRR, application completion time,
routing costs …)
High versatility
22