SlideShare uma empresa Scribd logo
1 de 72
Baixar para ler offline
LLVM Register
Allocation
Kai
kai@skymizer.com
Outline
• Introduction to Register Allocation Problem
• LLVM Base Register Allocation Interface
• LLVM Basic Register Allocation
• LLVM Greedy Register Allocation
Introduction to Register
Allocation
• Definition
• Register allocation is the problem of mapping
program variables to either machine registers or
memory addresses.
• Best solution
• minimise the number of loads/stores from/to memory
• NP-complete
int main()
{
int i, j;
int answer;
for (i = 1; i < 10; i++)
for (j = 1; j < 10; j++) {
answer = i * j;
}
return 0;
}
_main:
@ BB#0:
sub sp, #16
movs r0, #0
str r0, [sp, #12]
movs r0, #1
str r0, [sp, #8]
b LBB0_2
LBB0_1:
adds r1, #1
str r1, [sp, #8]
LBB0_2:
ldr r1, [sp, #8]
cmp r1, #9
bgt LBB0_6
@ BB#3:
str r0, [sp, #4]
b LBB0_5
LBB0_4:
ldr r2, [sp, #4]
muls r1, r2, r1
str r1, [sp]
ldr r1, [sp, #4]
adds r1, #1
Graph Coloring
• For an arbitrary graph G; a coloring of G assigns a
color to each node in G so that no pair of adjacent
nodes have the same color.
2-colorable 3-colorable
Graph Coloring for RA
• Node: Live interval
• Edge: Two live intervals have interference
• Color: Physical register
• Find a feasible colouring for the graph
…
a0 = …
b0 = …
… = b0
d0 = …
c0 = …
…
d1 = c0
… = a0
… = d1
B0
B1 B2
B3
…
LIa = …
LIb = …
… = LIb
LIc = …
…
LId = LIc
… = LIa
… = LId
B0
B1 B2
B3
LRa
LRb LRc
LRd
…
LIa = …
LIb = …
… = LIb
LIc = …
…
LId = LIc
… = LIa
… = LId
B0
B1 B2
B3
An Example from “Engineering A Compiler”
Why Not Graph Coloring
• Interference graph is expensive to build
• Spill code placement is more important than
colouring
• Need to model aliases and overlapping register
classes
• Flexibility is more important than the coloring
algorithm
(Adopted from “Register Allocation in LLVM 3.0”)
Excerpt from tricore_llvm.pdf
SSA Properties
* Each denition in the procedure creates a unique name.
* Each use refers to a single denition.
LLVM Register Allocation
• Basic
• Provide a minimal implementation of the basic register allocator
• Greedy
• Global live range splitting.
• Fast
• This register allocator allocates registers to a basic block at a
time.
• PBQP
• Partitioned Boolean Quadratic Programming (PBQP) based
register allocator for LLVM
LLVM Base Register Allocation Interface
Calculate
LiveInterval Weight
Enqueue All
LiveInterval
selectOrSplit for One
LiveInterval
Assign the Physical
Register
Enqueue Split
LiveInterval
dequeue
physical register is available
split live interval
update LiveInterval.weight
(spill cost)
allocatePhysRegs
enqueue
seedLiveRegs
Q
customised by new RA algorithm
for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) {
unsigned Reg = TargetRegisterInfo::index2VirtReg(i);
if (MRI->reg_nodbg_empty(Reg))
continue;
enqueue(&LIS->getInterval(Reg));
}
LLVM Basic Register Allocation
Calculate
LiveInterval Weight
Enqueue All
LiveInterval
RABasic::selectOrSplit
Assign the Physical
Register
Enqueue Split
LiveInterval
dequeue
physical register is available
split live interval
update LiveInterval.weight
(spill cost)
allocatePhysRegs
enqueue
seedLiveRegs
priority Q
(spill cost)
customised by RABasic algorithm
struct CompSpillWeight {
bool operator()(LiveInterval *A, LiveInterval *B) const {
return A->weight < B->weight;
}
};
// Check for an available register in this class.
AllocationOrder Order(VirtReg.reg, *VRM, RegClassInfo);
while (unsigned PhysReg = Order.next()) {
// Check for interference in PhysReg
switch (Matrix->checkInterference(VirtReg, PhysReg)) {
case LiveRegMatrix::IK_Free:
// PhysReg is available, allocate it.
return PhysReg;
case LiveRegMatrix::IK_VirtReg:
// Only virtual registers in the way, we may be able to spill them.
PhysRegSpillCands.push_back(PhysReg);
continue;
default:
// RegMask or RegUnit interference.
continue;
}
}
LiveInterval Weight
• Weight for one instruction with the register
• weight = (isDef + isUse) * (Block Frequency / Entry Frequency)
• loop induction variable: weight *= 3
• For all instructions with the register
• totalWeight += weight
• Hint: totalWeight *= 1.01
• Re-materializable: totalWeight *= 0.5
• LiveInterval.weight = totalWeight / size of LiveInterval
Matrix->checkInterference()
• How to represent live/dead points?
• SlotIndex
• How to represent a value?
• VNInfo
• How to represent a live interval?
• LiveInterval
• How to check interference between live intervals?
• LiveIntervalUnion & LiveRegMatrix
Liveness Slot
• There are four kind of slots to describe a position at which a register can become live, or cease to be
live.
• Block (B)
• entering or leaving a block
• PHI-def
• Early Clobber (e)
• kill slot for early-clobber def
• A = A op B ( )
• Register (r)
• normal register use/def slot
• Dead (d)
• dead def
********** INTERVALS **********
%vreg0 [208r,320r:0)[416B,432r:0) 0@208r
%vreg1 [16r,32r:0) 0@16r
%vreg2 [48r,480B:0) 0@48r
%vreg3 [96r,112r:0) 0@96r
%vreg4 [496r,512r:0) 0@496r
%vreg6 [224r,240r:0) 0@224r
%vreg7 [432r,448r:0) 0@432r
%vreg8 [304r,320r:0) 0@304r
%vreg9 [320r,336r:0) 0@320r
%vreg10 [352r,368r:0) 0@352r
%vreg11 [368r,384r:0) 0@368r
SlotIndex
((MachineInstr *, index), slot)
Slot_Block
Slot_EarlyClobber
Slot_Register
Slot_Dead
unsigned getIndex() const {
return listEntry()->getIndex() | getSlot();
}
listEntry()
Numbering of Machine
Instruction
0B BB#0: derived from LLVM BB %entry
16B %vreg1<def> = t2MOVi 0, pred:14, pred:%noreg, opt:%noreg; rGPR:%vreg1
32B t2STRi12 %vreg1, <fi#0>, 0, pred:14, pred:%noreg; mem:ST4[%retval] rGPR:%vreg1
48B %vreg2<def> = t2MOVi 1, pred:14, pred:%noreg, opt:%noreg; rGPR:%vreg2
64B t2STRi12 %vreg2, <fi#1>, 0, pred:14, pred:%noreg; mem:ST4[%i] rGPR:%vreg2
Successors according to CFG: BB#1
for (MachineBasicBlock::iterator miItr = mbb->begin(), miEnd = mbb->end();
miItr != miEnd; ++miItr) {
MachineInstr *mi = miItr;
if (mi->isDebugValue())
continue;
// Insert a store index for the instr.
indexList.push_back(createEntry(mi, index += SlotIndex::InstrDist));
// Save this base index in the maps.
mi2iMap.insert(std::make_pair(mi, SlotIndex(&indexList.back(),
SlotIndex::Slot_Block)));
}
VNInfo
• hold information about a machine level value
• (id, def)
• def: SlotIndex of the defining instruction
Live Interval
• Segment
• start, end, valno
• LiveRange
• an ordered list of Segment
• LiveInterval
• LiveRange with register and weight (spill cost)
********** INTERVALS **********
%vreg0 [208r,320r:0)[416B,432r:0) 0@208r
%vreg1 [16r,32r:0) 0@16r
%vreg2 [48r,480B:0) 0@48r
%vreg3 [96r,112r:0) 0@96r
%vreg4 [496r,512r:0) 0@496r
%vreg6 [224r,240r:0) 0@224r
%vreg7 [432r,448r:0) 0@432r
%vreg8 [304r,320r:0) 0@304r
%vreg9 [320r,336r:0) 0@320r
%vreg10 [352r,368r:0) 0@352r
%vreg11 [368r,384r:0) 0@368r
Segment
LiveRange
LiveInterval VNInfo
Example
192B BB#3: derived from LLVM BB %for.cond.1
208B %vreg0<def> = t2LDRi12 <fi#1>, 0
224B %vreg6<def> = t2LDRi12 <fi#2>, 0
240B t2CMPri %vreg6, 9
256B t2Bcc <BB#5>
272B t2B <BB#4>
416B BB#5: derived from LLVM BB %for.inc.4
432B %vreg7<def> = t2ADDri %vreg0, 1
448B t2STRi12 %vreg7, <fi#1>, 0
********** INTERVALS **********
%vreg0 [208r,320r:0)[416B,432r:0) 0@208r
%vreg1 [16r,32r:0) 0@16r
%vreg2 [48r,480B:0) 0@48r
%vreg3 [96r,112r:0) 0@96r
%vreg4 [496r,512r:0) 0@496r
%vreg6 [224r,240r:0) 0@224r
%vreg7 [432r,448r:0) 0@432r
%vreg8 [304r,320r:0) 0@304r
%vreg9 [320r,336r:0) 0@320r
%vreg10 [352r,368r:0) 0@352r
%vreg11 [368r,384r:0) 0@368r
288B BB#4: derived from LLVM BB %for.body.3
304B %vreg8<def> = t2LDRi12 <fi#2>, 0
320B %vreg9<def> = t2MUL %vreg0, %vreg8
336B t2STRi12 %vreg9, <fi#3>, 0
352B %vreg10<def> = t2LDRi12 <fi#2>, 0
368B %vreg11<def> = t2ADDri %vreg10, 1
384B t2STRi12 %vreg11, <fi#2>, 0
400B t2B <BB#3>
208r
320r
416B
432r
LiveRegMatrix
AH AL BH BL XMM31
V3
V3
V5
V0
V4
V1
V2
V6
RegUnit
LiveIntervalUnion
EAX => AH, AL
AX => AH, AL
AH => AH
AL => AL
Check Interference
unsigned LiveIntervalUnion::Query::
collectInterferingVRegs(unsigned MaxInterferingRegs) {
…
// Check for overlapping interference.
while (VirtRegI->start < LiveUnionI.stop() &&
VirtRegI->end > LiveUnionI.start()) {
// This is an overlap, record the interfering register.
LiveInterval *VReg = LiveUnionI.value();
if (VReg != RecentReg && !isSeenInterference(VReg)) {
RecentReg = VReg;
InterferingVRegs.push_back(VReg);
if (InterferingVRegs.size() >= MaxInterferingRegs)
return InterferingVRegs.size();
}
// This LiveUnion segment is no longer interesting.
if (!(++LiveUnionI).valid()) {
SeenAllInterferences = true;
return InterferingVRegs.size();
}
}
…
}
LiveIntervalUnion VirtReg
start()
stop()
start
end
start()
stop()
start
end
start()
stop()
start
end
start()
stop()
start
end
Check Interference
AH AL BH BL XMM31
V3
V3
V5
V0
V4
V1
V2
V6
V7
// Check the matrix for virtual register interference.
for (MCRegUnitIterator Units(PhysReg, TRI); Units.isValid(); ++Units)
if (query(VirtReg, *Units).checkInterference())
return IK_VirtReg;
Greedy Register
Allocation
Use Split to Improve RA
• Live Range Splitting
• Insert copy/re-materialize to split up live ranges
• hopefully reduces need for spilling
• Also control spill code placement
• Example
Q0
D0 D1
Q1
D2 D3
V1
V2
V3 V4
V5
Q0
D0 D1
Q1
D2 D3
V1
V2
V3 V4
V5
• No physical register for V1
Q0
D0 D1
Q1
D2 D3
V1
V2
V3 V4
V5
• Evict V2
Q0
D0 D1
Q1
D2 D3
V1
V2
V3V4
V5
stack
• Split V2
Q0
D0 D1
Q1
D2 D3
V1
V2b
V3V4
V5
V2a
V2c
• Split V2
Q0
D0 D1
Q1
D2 D3
V1
V2b
V3V4
V5
V2a
V2c
stack
Greedy RA Stages
• RS_New: created
• RS_Assign: enqueue
• RS_Split: need to split
• RS_Split2
• used for split products that may not be making progress
• RS_Spill: need to spill
• RS_Done: assigned a physical register or created by spill
RS_Split2
• The live intervals created by split will enqueue to
process again.
• There is a risk of creating infinite loops.
… = vreg1 …
… = vreg1 …
… = vreg1 …
vreg2 = COPY vreg1
… = vreg2 …
vreg3 = COPY vreg1
… = vreg3 …
… = vreg3 …
RS_New
RS_Split2
Greedy Register Allocation
try to assign physical register
(hint > zero cost reg > low cost reg)
try to evict to nd better register
enter RS_Split
stage
try last chance
recoloring
split
spill
pick a physical register and evict all
interference
found
register
stage >= RS_Done stage < RS_Split
selectOrSplit(d+1)
enter RS_Done
stage
selectOrSplit(d)
Last Chance Recoloring
• Try to assign a color to VirtReg by recoloring its
interferences.
• The recoloring process may recursively use the
last chance recoloring. Therefore, when a virtual
register has been assigned a color by this
mechanism, it is marked as Fixed.
vA can use {R1, R2 }
vB can use { R2, R3}
vC can use {R1 }
vA => R1
vB => R2
vC => fails
vA => R2
vB => R3
vC => R1 (xed)
How to Split?
is stage
beyond
RS_Spill?
is in one BB? tryLocalSplit
tryInstructionSplit
No
Yes
tryRegionSplit
is stage less
than RS_Split2?
No
spill
Yes
success?
No
success?
spill
No
tryBlockSplit
Yes
No
success?
No
success?
spill
No
done
Yes
Yes
done
Yes
Yes
BlockInfo
(LiveIn)
(LiveOut)
FirstInstr: First instruction accessing current reg.
LastInstr: Last instruction accessing current reg.
Live-through blocks without any uses don’t get BlockInfo entries.
tryLocalSplit
• Try to split virtual register interval into smaller
intervals inside its only basic block.
• calculate gap weights
• adjust the split region
Calculate Gap Weights
NumGaps = 4
Calculate Gap Weights
LI.weight
VirtReg LI
If there is a RegUnit occupied by VirtReg:0
0
Calculate Gap Weights
LI.weight
Fixed RegUnit
If there is a xed RegUnit:0
0
huge_valf
Adjust Split Region
SplitAfter = 1
SplitBefore = 0
normalise
spill weight >
max gap
BestBefore = SplitBefore
BestAfter = SplitAfter
SplitAfter++
SplitBefore++
YesNo
normalise spill weight = spill cost / distance
= (#gap * block_freq) / distance(SplitBefore, SplitAfter)
Adjust Split Region
BestAfter
BestBefore
normalise
spill weight >
max gap
BestBefore = SplitBefore
BestAfter = SplitAfter
SplitAfter++
SplitBefore++
YesNo
normalise spill weight = spill cost / distance
= (#gap * block_freq) / distance(SplitBefore, SplitAfter)
RS_New
(or RS_Split2)
RS_New
Find the most critical range.
tryInstructionSplit
• Split a live range around individual instructions.
• Every “use” instruction has its own live interval.
tryBlockSplit
• Split a global live range around every block with
uses.
FirstInstr
LastInstr
tryRegionSplit
• For every physical register
• Prepare interference cache
• Construct Hopfield Network
• Construct block constraints
• Update Hopfield Network biases and values according to block
constraints
• Add links in Hopfield Network and iterate
• Get the best candidate (minimize split cost + spill cost)
• Do region split
Hopeld Network
• A form of recurrent artificial neural network popularised by John
Hopeld in 1982.
• Guaranteed to converge to a local minimum.
Hopeld Network
• Node: edge bundle
• Link: transparent basic blocks have the variable
live through.
• Energy function (the cost of spilling)
• Weight: block frequency
• Bias: according to block constraints
Block Constraints
No Interference
PrefReg
Intf.rst()
MustSpill PrefSpill
FirstInstr
LastInstr
PrefReg
FirstInstr
LastInstr
FirstInstr
LastInstr
FirstInstr
LastInstr
PrefReg
MustSpill
FirstInstr
LastInstr
PrefReg
FirstInstr
LastInstr
FirstInstr
LastInstr
FirstInstr
LastInstr
PrefSpill
Last Split Point
Edge Bundle
BB #0
BB #1
BB #3
BB #2
BB #4 BB #5
BB #6
// Join the outgoing bundle with the ingoing bundles of all successors.
for (MachineBasicBlock::const_succ_iterator SI = MBB.succ_begin(),
SE = MBB.succ_end(); SI != SE; ++SI)
EC.join(OutE, 2 * (*SI)->getNumber());
EC:
(BB#0, in) Bundle #0: 0 0 0
(BB#0, out) Bundle #1: 1 1 1
(BB#1, in) Bundle #2: 2 1 1
(BB#1, out) Bundle #3: 3 3 2
(BB#2, in) Bundle #4: 4 3 2
(BB#2, out) Bundle #5: 5 5 3
(BB#3, in) Bundle #6: 6 5 3
(BB#3, out) Bundle #7: 7 7 4
(BB#4, in) Bundle #8: 8 7 4
(BB#4, out) Bundle #9: 9 5 3
(BB#5, in) Bundle #10: 10 7 4
(BB#5, out) Bundle #11: 11 1 1
(BB#6, in) Bundle #12: 12 3 2
(BB#6, out) Bundle #13: 13 13 5
void join(unsigned a, unsigned b) {
unsigned eca = EC[a];
unsigned ecb = EC[b];
while (eca != ecb)
if (eca < ecb)
EC[b] = eca, b = ecb, ecb = EC[b];
else
EC[a] = ecb, a = eca, eca = EC[a];
}
Edge Bundle
BB #0
BB #1
BB #3
BB #2
BB #4 BB #5
BB #6 Blocks:
Bundle #0: BB#0
Bundle #1: BB#0, BB#1, BB#5
Bundle #2: BB#1, BB#2, BB#6
Bundle #3: BB#2, BB#3, BB#4
Bundle #4: BB#3, BB#4, BB#5
Bundle #5: BB#6
Bundle #6:
Bundle #7:
Bundle #8:
Bundle #9:
Bundle #10:
Bundle #11:
Bundle #12:
Bundle #13:
EC:
(BB#0, in) Bundle #0: 0 0 0
(BB#0, out) Bundle #1: 1 1 1
(BB#1, in) Bundle #2: 2 1 1
(BB#1, out) Bundle #3: 3 3 2
(BB#2, in) Bundle #4: 4 3 2
(BB#2, out) Bundle #5: 5 5 3
(BB#3, in) Bundle #6: 6 5 3
(BB#3, out) Bundle #7: 7 7 4
(BB#4, in) Bundle #8: 8 7 4
(BB#4, out) Bundle #9: 9 5 3
(BB#5, in) Bundle #10: 10 7 4
(BB#5, out) Bundle #11: 11 1 1
(BB#6, in) Bundle #12: 12 3 2
(BB#6, out) Bundle #13: 13 13 5
Edge Bundle
BB #0
BB #1
BB #3
BB #2
BB #4 BB #5
BB #6 Blocks:
Bundle #0: BB#0
Bundle #1: BB#0, BB#1, BB#5
Bundle #2: BB#1, BB#2, BB#6
Bundle #3: BB#2, BB#3, BB#4
Bundle #4: BB#3, BB#4, BB#5
Bundle #5: BB#6
Bundle #6:
Bundle #7:
Bundle #8:
Bundle #9:
Bundle #10:
Bundle #11:
Bundle #12:
Bundle #13:
EC:
(BB#0, in) Bundle #0: 0 0 0
(BB#0, out) Bundle #1: 1 1 1
(BB#1, in) Bundle #2: 2 1 1
(BB#1, out) Bundle #3: 3 3 2
(BB#2, in) Bundle #4: 4 3 2
(BB#2, out) Bundle #5: 5 5 3
(BB#3, in) Bundle #6: 6 5 3
(BB#3, out) Bundle #7: 7 7 4
(BB#4, in) Bundle #8: 8 7 4
(BB#4, out) Bundle #9: 9 5 3
(BB#5, in) Bundle #10: 10 7 4
(BB#5, out) Bundle #11: 11 1 1
(BB#6, in) Bundle #12: 12 3 2
(BB#6, out) Bundle #13: 13 13 5
Edge Bundle
BB #0
BB #1
BB #3
BB #2
BB #4 BB #5
BB #6 Blocks:
Bundle #0: BB#0
Bundle #1: BB#0, BB#1, BB#5
Bundle #2: BB#1, BB#2, BB#6
Bundle #3: BB#2, BB#3, BB#4
Bundle #4: BB#3, BB#4, BB#5
Bundle #5: BB#6
Bundle #6:
Bundle #7:
Bundle #8:
Bundle #9:
Bundle #10:
Bundle #11:
Bundle #12:
Bundle #13:
EC:
(BB#0, in) Bundle #0: 0 0 0
(BB#0, out) Bundle #1: 1 1 1
(BB#1, in) Bundle #2: 2 1 1
(BB#1, out) Bundle #3: 3 3 2
(BB#2, in) Bundle #4: 4 3 2
(BB#2, out) Bundle #5: 5 5 3
(BB#3, in) Bundle #6: 6 5 3
(BB#3, out) Bundle #7: 7 7 4
(BB#4, in) Bundle #8: 8 7 4
(BB#4, out) Bundle #9: 9 5 3
(BB#5, in) Bundle #10: 10 7 4
(BB#5, out) Bundle #11: 11 1 1
(BB#6, in) Bundle #12: 12 3 2
(BB#6, out) Bundle #13: 13 13 5
Edge Bundle
BB #0
BB #1
BB #3
BB #2
BB #4 BB #5
BB #6 Blocks:
Bundle #0: BB#0
Bundle #1: BB#0, BB#1, BB#5
Bundle #2: BB#1, BB#2, BB#6
Bundle #3: BB#2, BB#3, BB#4
Bundle #4: BB#3, BB#4, BB#5
Bundle #5: BB#6
Bundle #6:
Bundle #7:
Bundle #8:
Bundle #9:
Bundle #10:
Bundle #11:
Bundle #12:
Bundle #13:
EC:
(BB#0, in) Bundle #0: 0 0 0
(BB#0, out) Bundle #1: 1 1 1
(BB#1, in) Bundle #2: 2 1 1
(BB#1, out) Bundle #3: 3 3 2
(BB#2, in) Bundle #4: 4 3 2
(BB#2, out) Bundle #5: 5 5 3
(BB#3, in) Bundle #6: 6 5 3
(BB#3, out) Bundle #7: 7 7 4
(BB#4, in) Bundle #8: 8 7 4
(BB#4, out) Bundle #9: 9 5 3
(BB#5, in) Bundle #10: 10 7 4
(BB#5, out) Bundle #11: 11 1 1
(BB#6, in) Bundle #12: 12 3 2
(BB#6, out) Bundle #13: 13 13 5
Edge Bundle
BB #0
BB #1
BB #3
BB #2
BB #4 BB #5
BB #6 Blocks:
Bundle #0: BB#0
Bundle #1: BB#0, BB#1, BB#5
Bundle #2: BB#1, BB#2, BB#6
Bundle #3: BB#2, BB#3, BB#4
Bundle #4: BB#3, BB#4, BB#5
Bundle #5: BB#6
Bundle #6:
Bundle #7:
Bundle #8:
Bundle #9:
Bundle #10:
Bundle #11:
Bundle #12:
Bundle #13:
EC:
(BB#0, in) Bundle #0: 0 0 0
(BB#0, out) Bundle #1: 1 1 1
(BB#1, in) Bundle #2: 2 1 1
(BB#1, out) Bundle #3: 3 3 2
(BB#2, in) Bundle #4: 4 3 2
(BB#2, out) Bundle #5: 5 5 3
(BB#3, in) Bundle #6: 6 5 3
(BB#3, out) Bundle #7: 7 7 4
(BB#4, in) Bundle #8: 8 7 4
(BB#4, out) Bundle #9: 9 5 3
(BB#5, in) Bundle #10: 10 7 4
(BB#5, out) Bundle #11: 11 1 1
(BB#6, in) Bundle #12: 12 3 2
(BB#6, out) Bundle #13: 13 13 5
Edge Bundle
BB #0
BB #1
BB #3
BB #2
BB #4 BB #5
BB #6 Blocks:
Bundle #0: BB#0
Bundle #1: BB#0, BB#1, BB#5
Bundle #2: BB#1, BB#2, BB#6
Bundle #3: BB#2, BB#3, BB#4
Bundle #4: BB#3, BB#4, BB#5
Bundle #5: BB#6
Bundle #6:
Bundle #7:
Bundle #8:
Bundle #9:
Bundle #10:
Bundle #11:
Bundle #12:
Bundle #13:
EC:
(BB#0, in) Bundle #0: 0 0 0
(BB#0, out) Bundle #1: 1 1 1
(BB#1, in) Bundle #2: 2 1 1
(BB#1, out) Bundle #3: 3 3 2
(BB#2, in) Bundle #4: 4 3 2
(BB#2, out) Bundle #5: 5 5 3
(BB#3, in) Bundle #6: 6 5 3
(BB#3, out) Bundle #7: 7 7 4
(BB#4, in) Bundle #8: 8 7 4
(BB#4, out) Bundle #9: 9 5 3
(BB#5, in) Bundle #10: 10 7 4
(BB#5, out) Bundle #11: 11 1 1
(BB#6, in) Bundle #12: 12 3 2
(BB#6, out) Bundle #13: 13 13 5
Edge Bundle
BB #0
BB #1
BB #3
BB #2
BB #4 BB #5
BB #6 Blocks:
Bundle #0: BB#0
Bundle #1: BB#0, BB#1, BB#5
Bundle #2: BB#1, BB#2, BB#6
Bundle #3: BB#2, BB#3, BB#4
Bundle #4: BB#3, BB#4, BB#5
Bundle #5: BB#6
Bundle #6:
Bundle #7:
Bundle #8:
Bundle #9:
Bundle #10:
Bundle #11:
Bundle #12:
Bundle #13:
EC:
(BB#0, in) Bundle #0: 0 0 0
(BB#0, out) Bundle #1: 1 1 1
(BB#1, in) Bundle #2: 2 1 1
(BB#1, out) Bundle #3: 3 3 2
(BB#2, in) Bundle #4: 4 3 2
(BB#2, out) Bundle #5: 5 5 3
(BB#3, in) Bundle #6: 6 5 3
(BB#3, out) Bundle #7: 7 7 4
(BB#4, in) Bundle #8: 8 7 4
(BB#4, out) Bundle #9: 9 5 3
(BB#5, in) Bundle #10: 10 7 4
(BB#5, out) Bundle #11: 11 1 1
(BB#6, in) Bundle #12: 12 3 2
(BB#6, out) Bundle #13: 13 13 5
SpillPlacement::addConstraints
• update BiasN, BiasP according to BorderConstraint
BB #n (freq)
… = Y op …
PrefReg
PrefSpill
Bundle ib
BiasP += freq
Bundle ob
BiasN += freq
void addBias(BlockFrequency freq, BorderConstraint direction) {
switch (direction) {
default:
break;
case PrefReg:
BiasP += freq;
break;
case PrefSpill:
BiasN += freq;
break;
case MustSpill:
BiasN = BlockFrequency::getMaxFrequency(); // (uint64_t)-1ULL
break;
}
}
Hopeld Network Node
• Node.update(nodes, Threshold)
Bundle X
BiasN
BiasP
Value
Bundle A
Value = -1
Bundle B
Value = 1
Bundle C
Value = 1
Bundle D
Value = 1
Links
SumN = BiasN + freqA
SunP = BiasP + freqB + freqC + freqD
(freqA, A) (freqB, B) (freqC, C) (freqD, D)
if (SumN >= SumP + Threshold)
Value = -1;
else if (SumP >= SumN + Threshold)
Value = 1;
else
Value = 0;
Grow Region
• Live through blocks in positive bundles.
No Interference
Intf.rst()
MustSpill PrefSpill
Used as links
between bundles
SpillPlacement::addConstraints
Intf.last()
MustSpill PrefSpill
SpillPlacement::addLinks
BB #n (freq)
Bundle ib
Bundle ob
Bundle ib
Bundle ob
(freq, ob)
(freq, ib)
SpillPlacement::iterate
for (unsigned iteration = 0; iteration != 10; ++iteration) {
bool Changed = false;
for (SmallVectorImpl<unsigned>::const_reverse_iterator I =
iteration == 0 ? Linked.rbegin() : std::next(Linked.rbegin()),
E = Linked.rend(); I != E; ++I) {
unsigned n = *I;
if (nodes[n].update(nodes, Threshold)) {
Changed = true;
if (nodes[n].preferReg())
RecentPositive.push_back(n);
}
}
if (!Changed || !RecentPositive.empty())
return;
Changed = false;
for (SmallVectorImpl<unsigned>::const_iterator I =
std::next(Linked.begin()), E = Linked.end(); I != E; ++I) {
unsigned n = *I;
if (nodes[n].update(nodes, Threshold)) {
Changed = true;
if (nodes[n].preferReg())
RecentPositive.push_back(n);
}
}
if (!Changed || !RecentPositive.empty())
return;
}
Spill Cost
No Interference
PrefReg
Intf.rst()
MustSpill PrefSpill
FirstInstr
LastInstr
PrefReg
FirstInstr
LastInstr
FirstInstr
LastInstr
FirstInstr
LastInstr
PrefReg
MustSpill
FirstInstr
LastInstr
PrefReg
FirstInstr
LastInstr
FirstInstr
LastInstr
FirstInstr
LastInstr
PrefSpill
Last Split Point
++Ins ++Ins ++Ins
++Ins ++Ins ++Ins
Cost = Block_Frequency * Ins
Split Cost
BB #n (freq)
… = Y op …
Bundle ib
Value
Bundle ob
Value
Use Block
RegIn
RegOut
BC.Entry
BC.Exit
if (BI.LiveIn)
Ins += RegIn != (BC.Entry == SpillPlacement::PrefReg);
if (BI.LiveOut)
Ins += RegOut != (BC.Exit == SpillPlacement::PrefReg);
while (Ins--)
GlobalCost += SpillPlacer->getBlockFrequency(BC.Number);
Live Through
BB #n (freq)
Bundle ib
Value
Bundle ob
Value
RegIn
RegOut
RegIn RegOut Cost
0 0 0
0 1 freq
1 0 freq
1 1 2 x freq
(interfer)
The Best Candidate
• For all physical registers, calculate region split
cost.
• Cost = block constraints cost (spill cost) + global
split cost
• The best candidate has the lowest cost.
Split
• splitLiveThroughBlock
• splitRegInBlock
• splitRegOutBlock
splitLiveThroughBlock
Bundle ib
Value == 1
Bundle ob
Value != 1
Live Through
LiveOut on Stack
rst non-PHI
Start
New Int
Bundle ib
Value != 1
Bundle ob
Value == 1
Live Through
LiveIn on Stack
last split point
End
New Int
Live Through
No Interference
Bundle ib
Value == 1
Bundle ob
Value == 1
End
New Int
Start
splitLiveThroughBlock
Bundle ib
Value == 1
Bundle ob
Value == 1
LiveThrough
Non-overlapping interference
New Int
Interference.st()
Interference.last()
New Int
Bundle ib
Value == 1
Bundle ob
Value == 1
LiveThrough
Overlapping interference
New Int
Interference.st()
Interference.last()
New Int
splitRegInBlock
Bundle ib
Value == 1
No LiveOut
Interference after kill
Start
New Int
Bundle ib
Value == 1
Bundle ob
Value != 1
LiveOut on Stack
Interference after last use
LiveOut on Stack
Interference after last use
Interference.st()
LastInstr
LastInstr
last split point
New Int
Start
Bundle ib
Value == 1
Bundle ob
Value != 1
LastInstr
last split point
New Int
Start
Interference.st()
Interference.st()
splitRegInBlock
Bundle ib
Value == 1
LiveOut on Stack
Interference overlapping uses
Start
New Int
Bundle ib
Value == 1
Interference.st()
LastInstr
last split point
New Int
Start
New Int
Interference.st()
LastInstr
last split point
New Int
Bundle ob
Value != 1
Bundle ob
Value != 1
LiveOut on Stack
Interference overlapping uses
splitRegOutBlock
No LiveIn
Interference before def
End
New Int
Bundle ib
Value != 1
Bundle ob
Value == 1
Live Through
Interference before def
Live Through
Interference overlapping uses
Interference.last()
FirstInstr
Bundle ib
Value != 1
Bundle ob
Value == 1
Bundle ob
Value == 1
End
New Int
Interference.last()
FirstInstr
last split point
End
New Int
Interference.last()
FirstInstr
New Int

Mais conteĂşdo relacionado

Mais procurados

GCC RTL and Machine Description
GCC RTL and Machine DescriptionGCC RTL and Machine Description
GCC RTL and Machine DescriptionPriyatham Bollimpalli
 
ARM Trusted FirmwareのBL31を単体で使う!
ARM Trusted FirmwareのBL31を単体で使う!ARM Trusted FirmwareのBL31を単体で使う!
ARM Trusted FirmwareのBL31を単体で使う!Mr. Vengineer
 
Part II: LLVM Intermediate Representation
Part II: LLVM Intermediate RepresentationPart II: LLVM Intermediate Representation
Part II: LLVM Intermediate RepresentationWei-Ren Chen
 
Zynq + Vivado HLS入門
Zynq + Vivado HLS入門Zynq + Vivado HLS入門
Zynq + Vivado HLS入門narusugimoto
 
A whirlwind tour of the LLVM optimizer
A whirlwind tour of the LLVM optimizerA whirlwind tour of the LLVM optimizer
A whirlwind tour of the LLVM optimizerNikita Popov
 
twlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdsotwlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdsoViller Hsiao
 
Address/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerAddress/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerPlatonov Sergey
 
LLVM Backend の紹介
LLVM Backend の紹介LLVM Backend の紹介
LLVM Backend の紹介Akira Maruoka
 
DWARF Data Representation
DWARF Data RepresentationDWARF Data Representation
DWARF Data RepresentationWang Hsiangkai
 
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)Mr. Vengineer
 
SSE4.2の文字列処理命令の紹介
SSE4.2の文字列処理命令の紹介SSE4.2の文字列処理命令の紹介
SSE4.2の文字列処理命令の紹介MITSUNARI Shigeo
 
Integrated Register Allocation introduction
Integrated Register Allocation introductionIntegrated Register Allocation introduction
Integrated Register Allocation introductionShiva Chen
 
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgenIntel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgenMITSUNARI Shigeo
 
いまさら聞けないarmを使ったNEONの基礎と活用事例
いまさら聞けないarmを使ったNEONの基礎と活用事例いまさら聞けないarmを使ったNEONの基礎と活用事例
いまさら聞けないarmを使ったNEONの基礎と活用事例Fixstars Corporation
 
Intro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみたIntro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみたMITSUNARI Shigeo
 
Zynq VIPを利用したテストベンチ
Zynq VIPを利用したテストベンチZynq VIPを利用したテストベンチ
Zynq VIPを利用したテストベンチMr. Vengineer
 
x86とコンテキストスイッチ
x86とコンテキストスイッチx86とコンテキストスイッチ
x86とコンテキストスイッチMasami Ichikawa
 

Mais procurados (20)

Qemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System EmulationQemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System Emulation
 
GCC RTL and Machine Description
GCC RTL and Machine DescriptionGCC RTL and Machine Description
GCC RTL and Machine Description
 
ARM Trusted FirmwareのBL31を単体で使う!
ARM Trusted FirmwareのBL31を単体で使う!ARM Trusted FirmwareのBL31を単体で使う!
ARM Trusted FirmwareのBL31を単体で使う!
 
Build Programming Language Runtime with LLVM
Build Programming Language Runtime with LLVMBuild Programming Language Runtime with LLVM
Build Programming Language Runtime with LLVM
 
Part II: LLVM Intermediate Representation
Part II: LLVM Intermediate RepresentationPart II: LLVM Intermediate Representation
Part II: LLVM Intermediate Representation
 
Zynq + Vivado HLS入門
Zynq + Vivado HLS入門Zynq + Vivado HLS入門
Zynq + Vivado HLS入門
 
A whirlwind tour of the LLVM optimizer
A whirlwind tour of the LLVM optimizerA whirlwind tour of the LLVM optimizer
A whirlwind tour of the LLVM optimizer
 
twlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdsotwlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdso
 
Address/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerAddress/Thread/Memory Sanitizer
Address/Thread/Memory Sanitizer
 
LLVM Backend の紹介
LLVM Backend の紹介LLVM Backend の紹介
LLVM Backend の紹介
 
LLVM
LLVMLLVM
LLVM
 
DWARF Data Representation
DWARF Data RepresentationDWARF Data Representation
DWARF Data Representation
 
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)
 
SSE4.2の文字列処理命令の紹介
SSE4.2の文字列処理命令の紹介SSE4.2の文字列処理命令の紹介
SSE4.2の文字列処理命令の紹介
 
Integrated Register Allocation introduction
Integrated Register Allocation introductionIntegrated Register Allocation introduction
Integrated Register Allocation introduction
 
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgenIntel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
 
いまさら聞けないarmを使ったNEONの基礎と活用事例
いまさら聞けないarmを使ったNEONの基礎と活用事例いまさら聞けないarmを使ったNEONの基礎と活用事例
いまさら聞けないarmを使ったNEONの基礎と活用事例
 
Intro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみたIntro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみた
 
Zynq VIPを利用したテストベンチ
Zynq VIPを利用したテストベンチZynq VIPを利用したテストベンチ
Zynq VIPを利用したテストベンチ
 
x86とコンテキストスイッチ
x86とコンテキストスイッチx86とコンテキストスイッチ
x86とコンテキストスイッチ
 

Semelhante a LLVM Register Allocation

Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"Yulia Tsisyk
 
State of the .Net Performance
State of the .Net PerformanceState of the .Net Performance
State of the .Net PerformanceCUSTIS
 
Where the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-OptimisationsWhere the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-OptimisationsMatt Warren
 
Happy To Use SIMD
Happy To Use SIMDHappy To Use SIMD
Happy To Use SIMDWei-Ta Wang
 
How Triton can help to reverse virtual machine based software protections
How Triton can help to reverse virtual machine based software protectionsHow Triton can help to reverse virtual machine based software protections
How Triton can help to reverse virtual machine based software protectionsJonathan Salwan
 
Demystify eBPF JIT Compiler
Demystify eBPF JIT CompilerDemystify eBPF JIT Compiler
Demystify eBPF JIT CompilerNetronome
 
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsReverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsAsuka Nakajima
 
Devirtualizing FinSpy
Devirtualizing FinSpyDevirtualizing FinSpy
Devirtualizing FinSpyjduart
 
Ice mini guide
Ice mini guideIce mini guide
Ice mini guideAdy Liu
 
Clojure concurrency
Clojure concurrencyClojure concurrency
Clojure concurrencyAlex Navis
 
Forgive me for i have allocated
Forgive me for i have allocatedForgive me for i have allocated
Forgive me for i have allocatedTomasz Kowalczewski
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityBrendan Gregg
 
Day2 Verilog HDL Basic
Day2 Verilog HDL BasicDay2 Verilog HDL Basic
Day2 Verilog HDL BasicRon Liu
 
07 140430-ipp-languages used in llvm during compilation
07 140430-ipp-languages used in llvm during compilation07 140430-ipp-languages used in llvm during compilation
07 140430-ipp-languages used in llvm during compilationAdam HusĂĄr
 
Cvim half precision floating point
Cvim half precision floating pointCvim half precision floating point
Cvim half precision floating pointtomoaki0705
 
Using Smalltalk for controlling robotics systems
Using Smalltalk for controlling robotics systemsUsing Smalltalk for controlling robotics systems
Using Smalltalk for controlling robotics systemsSerge Stinckwich
 

Semelhante a LLVM Register Allocation (20)

Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"
 
State of the .Net Performance
State of the .Net PerformanceState of the .Net Performance
State of the .Net Performance
 
Where the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-OptimisationsWhere the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-Optimisations
 
Happy To Use SIMD
Happy To Use SIMDHappy To Use SIMD
Happy To Use SIMD
 
How Triton can help to reverse virtual machine based software protections
How Triton can help to reverse virtual machine based software protectionsHow Triton can help to reverse virtual machine based software protections
How Triton can help to reverse virtual machine based software protections
 
Demystify eBPF JIT Compiler
Demystify eBPF JIT CompilerDemystify eBPF JIT Compiler
Demystify eBPF JIT Compiler
 
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsReverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
 
embedded C.pptx
embedded C.pptxembedded C.pptx
embedded C.pptx
 
Devirtualizing FinSpy
Devirtualizing FinSpyDevirtualizing FinSpy
Devirtualizing FinSpy
 
Ice mini guide
Ice mini guideIce mini guide
Ice mini guide
 
Vectorization in ATLAS
Vectorization in ATLASVectorization in ATLAS
Vectorization in ATLAS
 
Clojure concurrency
Clojure concurrencyClojure concurrency
Clojure concurrency
 
Forgive me for i have allocated
Forgive me for i have allocatedForgive me for i have allocated
Forgive me for i have allocated
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 
Day2 Verilog HDL Basic
Day2 Verilog HDL BasicDay2 Verilog HDL Basic
Day2 Verilog HDL Basic
 
Interpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratchInterpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratch
 
07 140430-ipp-languages used in llvm during compilation
07 140430-ipp-languages used in llvm during compilation07 140430-ipp-languages used in llvm during compilation
07 140430-ipp-languages used in llvm during compilation
 
Cvim half precision floating point
Cvim half precision floating pointCvim half precision floating point
Cvim half precision floating point
 
Solr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene EuroconSolr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene Eurocon
 
Using Smalltalk for controlling robotics systems
Using Smalltalk for controlling robotics systemsUsing Smalltalk for controlling robotics systems
Using Smalltalk for controlling robotics systems
 

Mais de Wang Hsiangkai

Debug Line Issues After Relaxation.
Debug Line Issues After Relaxation.Debug Line Issues After Relaxation.
Debug Line Issues After Relaxation.Wang Hsiangkai
 
Machine Trace Metrics
Machine Trace MetricsMachine Trace Metrics
Machine Trace MetricsWang Hsiangkai
 
Something About Dynamic Linking
Something About Dynamic LinkingSomething About Dynamic Linking
Something About Dynamic LinkingWang Hsiangkai
 
Effective Modern C++
Effective Modern C++Effective Modern C++
Effective Modern C++Wang Hsiangkai
 
Perf File Format
Perf File FormatPerf File Format
Perf File FormatWang Hsiangkai
 
Introduction to Perf
Introduction to PerfIntroduction to Perf
Introduction to PerfWang Hsiangkai
 
SSA - PHI-functions Placements
SSA - PHI-functions PlacementsSSA - PHI-functions Placements
SSA - PHI-functions PlacementsWang Hsiangkai
 

Mais de Wang Hsiangkai (10)

Debug Line Issues After Relaxation.
Debug Line Issues After Relaxation.Debug Line Issues After Relaxation.
Debug Line Issues After Relaxation.
 
Machine Trace Metrics
Machine Trace MetricsMachine Trace Metrics
Machine Trace Metrics
 
GCC LTO
GCC LTOGCC LTO
GCC LTO
 
LTO plugin
LTO pluginLTO plugin
LTO plugin
 
Something About Dynamic Linking
Something About Dynamic LinkingSomething About Dynamic Linking
Something About Dynamic Linking
 
Effective Modern C++
Effective Modern C++Effective Modern C++
Effective Modern C++
 
GCC GENERIC
GCC GENERICGCC GENERIC
GCC GENERIC
 
Perf File Format
Perf File FormatPerf File Format
Perf File Format
 
Introduction to Perf
Introduction to PerfIntroduction to Perf
Introduction to Perf
 
SSA - PHI-functions Placements
SSA - PHI-functions PlacementsSSA - PHI-functions Placements
SSA - PHI-functions Placements
 

Último

%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationShrmpro
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...masabamasaba
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 

Último (20)

%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 

LLVM Register Allocation

  • 2. Outline • Introduction to Register Allocation Problem • LLVM Base Register Allocation Interface • LLVM Basic Register Allocation • LLVM Greedy Register Allocation
  • 3. Introduction to Register Allocation • Denition • Register allocation is the problem of mapping program variables to either machine registers or memory addresses. • Best solution • minimise the number of loads/stores from/to memory • NP-complete
  • 4. int main() { int i, j; int answer; for (i = 1; i < 10; i++) for (j = 1; j < 10; j++) { answer = i * j; } return 0; } _main: @ BB#0: sub sp, #16 movs r0, #0 str r0, [sp, #12] movs r0, #1 str r0, [sp, #8] b LBB0_2 LBB0_1: adds r1, #1 str r1, [sp, #8] LBB0_2: ldr r1, [sp, #8] cmp r1, #9 bgt LBB0_6 @ BB#3: str r0, [sp, #4] b LBB0_5 LBB0_4: ldr r2, [sp, #4] muls r1, r2, r1 str r1, [sp] ldr r1, [sp, #4] adds r1, #1
  • 5. Graph Coloring • For an arbitrary graph G; a coloring of G assigns a color to each node in G so that no pair of adjacent nodes have the same color. 2-colorable 3-colorable
  • 6. Graph Coloring for RA • Node: Live interval • Edge: Two live intervals have interference • Color: Physical register • Find a feasible colouring for the graph
  • 7. … a0 = … b0 = … … = b0 d0 = … c0 = … … d1 = c0 … = a0 … = d1 B0 B1 B2 B3 … LIa = … LIb = … … = LIb LIc = … … LId = LIc … = LIa … = LId B0 B1 B2 B3
  • 8. LRa LRb LRc LRd … LIa = … LIb = … … = LIb LIc = … … LId = LIc … = LIa … = LId B0 B1 B2 B3 An Example from “Engineering A Compiler”
  • 9. Why Not Graph Coloring • Interference graph is expensive to build • Spill code placement is more important than colouring • Need to model aliases and overlapping register classes • Flexibility is more important than the coloring algorithm (Adopted from “Register Allocation in LLVM 3.0”)
  • 10. Excerpt from tricore_llvm.pdf SSA Properties * Each denition in the procedure creates a unique name. * Each use refers to a single denition.
  • 11. LLVM Register Allocation • Basic • Provide a minimal implementation of the basic register allocator • Greedy • Global live range splitting. • Fast • This register allocator allocates registers to a basic block at a time. • PBQP • Partitioned Boolean Quadratic Programming (PBQP) based register allocator for LLVM
  • 12. LLVM Base Register Allocation Interface Calculate LiveInterval Weight Enqueue All LiveInterval selectOrSplit for One LiveInterval Assign the Physical Register Enqueue Split LiveInterval dequeue physical register is available split live interval update LiveInterval.weight (spill cost) allocatePhysRegs enqueue seedLiveRegs Q customised by new RA algorithm for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) { unsigned Reg = TargetRegisterInfo::index2VirtReg(i); if (MRI->reg_nodbg_empty(Reg)) continue; enqueue(&LIS->getInterval(Reg)); }
  • 13. LLVM Basic Register Allocation Calculate LiveInterval Weight Enqueue All LiveInterval RABasic::selectOrSplit Assign the Physical Register Enqueue Split LiveInterval dequeue physical register is available split live interval update LiveInterval.weight (spill cost) allocatePhysRegs enqueue seedLiveRegs priority Q (spill cost) customised by RABasic algorithm struct CompSpillWeight { bool operator()(LiveInterval *A, LiveInterval *B) const { return A->weight < B->weight; } }; // Check for an available register in this class. AllocationOrder Order(VirtReg.reg, *VRM, RegClassInfo); while (unsigned PhysReg = Order.next()) { // Check for interference in PhysReg switch (Matrix->checkInterference(VirtReg, PhysReg)) { case LiveRegMatrix::IK_Free: // PhysReg is available, allocate it. return PhysReg; case LiveRegMatrix::IK_VirtReg: // Only virtual registers in the way, we may be able to spill them. PhysRegSpillCands.push_back(PhysReg); continue; default: // RegMask or RegUnit interference. continue; } }
  • 14. LiveInterval Weight • Weight for one instruction with the register • weight = (isDef + isUse) * (Block Frequency / Entry Frequency) • loop induction variable: weight *= 3 • For all instructions with the register • totalWeight += weight • Hint: totalWeight *= 1.01 • Re-materializable: totalWeight *= 0.5 • LiveInterval.weight = totalWeight / size of LiveInterval
  • 15. Matrix->checkInterference() • How to represent live/dead points? • SlotIndex • How to represent a value? • VNInfo • How to represent a live interval? • LiveInterval • How to check interference between live intervals? • LiveIntervalUnion & LiveRegMatrix
  • 16. Liveness Slot • There are four kind of slots to describe a position at which a register can become live, or cease to be live. • Block (B) • entering or leaving a block • PHI-def • Early Clobber (e) • kill slot for early-clobber def • A = A op B ( ) • Register (r) • normal register use/def slot • Dead (d) • dead def ********** INTERVALS ********** %vreg0 [208r,320r:0)[416B,432r:0) 0@208r %vreg1 [16r,32r:0) 0@16r %vreg2 [48r,480B:0) 0@48r %vreg3 [96r,112r:0) 0@96r %vreg4 [496r,512r:0) 0@496r %vreg6 [224r,240r:0) 0@224r %vreg7 [432r,448r:0) 0@432r %vreg8 [304r,320r:0) 0@304r %vreg9 [320r,336r:0) 0@320r %vreg10 [352r,368r:0) 0@352r %vreg11 [368r,384r:0) 0@368r
  • 17. SlotIndex ((MachineInstr *, index), slot) Slot_Block Slot_EarlyClobber Slot_Register Slot_Dead unsigned getIndex() const { return listEntry()->getIndex() | getSlot(); } listEntry()
  • 18. Numbering of Machine Instruction 0B BB#0: derived from LLVM BB %entry 16B %vreg1<def> = t2MOVi 0, pred:14, pred:%noreg, opt:%noreg; rGPR:%vreg1 32B t2STRi12 %vreg1, <fi#0>, 0, pred:14, pred:%noreg; mem:ST4[%retval] rGPR:%vreg1 48B %vreg2<def> = t2MOVi 1, pred:14, pred:%noreg, opt:%noreg; rGPR:%vreg2 64B t2STRi12 %vreg2, <fi#1>, 0, pred:14, pred:%noreg; mem:ST4[%i] rGPR:%vreg2 Successors according to CFG: BB#1 for (MachineBasicBlock::iterator miItr = mbb->begin(), miEnd = mbb->end(); miItr != miEnd; ++miItr) { MachineInstr *mi = miItr; if (mi->isDebugValue()) continue; // Insert a store index for the instr. indexList.push_back(createEntry(mi, index += SlotIndex::InstrDist)); // Save this base index in the maps. mi2iMap.insert(std::make_pair(mi, SlotIndex(&indexList.back(), SlotIndex::Slot_Block))); }
  • 19. VNInfo • hold information about a machine level value • (id, def) • def: SlotIndex of the dening instruction
  • 20. Live Interval • Segment • start, end, valno • LiveRange • an ordered list of Segment • LiveInterval • LiveRange with register and weight (spill cost) ********** INTERVALS ********** %vreg0 [208r,320r:0)[416B,432r:0) 0@208r %vreg1 [16r,32r:0) 0@16r %vreg2 [48r,480B:0) 0@48r %vreg3 [96r,112r:0) 0@96r %vreg4 [496r,512r:0) 0@496r %vreg6 [224r,240r:0) 0@224r %vreg7 [432r,448r:0) 0@432r %vreg8 [304r,320r:0) 0@304r %vreg9 [320r,336r:0) 0@320r %vreg10 [352r,368r:0) 0@352r %vreg11 [368r,384r:0) 0@368r Segment LiveRange LiveInterval VNInfo
  • 21. Example 192B BB#3: derived from LLVM BB %for.cond.1 208B %vreg0<def> = t2LDRi12 <fi#1>, 0 224B %vreg6<def> = t2LDRi12 <fi#2>, 0 240B t2CMPri %vreg6, 9 256B t2Bcc <BB#5> 272B t2B <BB#4> 416B BB#5: derived from LLVM BB %for.inc.4 432B %vreg7<def> = t2ADDri %vreg0, 1 448B t2STRi12 %vreg7, <fi#1>, 0 ********** INTERVALS ********** %vreg0 [208r,320r:0)[416B,432r:0) 0@208r %vreg1 [16r,32r:0) 0@16r %vreg2 [48r,480B:0) 0@48r %vreg3 [96r,112r:0) 0@96r %vreg4 [496r,512r:0) 0@496r %vreg6 [224r,240r:0) 0@224r %vreg7 [432r,448r:0) 0@432r %vreg8 [304r,320r:0) 0@304r %vreg9 [320r,336r:0) 0@320r %vreg10 [352r,368r:0) 0@352r %vreg11 [368r,384r:0) 0@368r 288B BB#4: derived from LLVM BB %for.body.3 304B %vreg8<def> = t2LDRi12 <fi#2>, 0 320B %vreg9<def> = t2MUL %vreg0, %vreg8 336B t2STRi12 %vreg9, <fi#3>, 0 352B %vreg10<def> = t2LDRi12 <fi#2>, 0 368B %vreg11<def> = t2ADDri %vreg10, 1 384B t2STRi12 %vreg11, <fi#2>, 0 400B t2B <BB#3> 208r 320r 416B 432r
  • 22. LiveRegMatrix AH AL BH BL XMM31 V3 V3 V5 V0 V4 V1 V2 V6 RegUnit LiveIntervalUnion EAX => AH, AL AX => AH, AL AH => AH AL => AL
  • 23. Check Interference unsigned LiveIntervalUnion::Query:: collectInterferingVRegs(unsigned MaxInterferingRegs) { … // Check for overlapping interference. while (VirtRegI->start < LiveUnionI.stop() && VirtRegI->end > LiveUnionI.start()) { // This is an overlap, record the interfering register. LiveInterval *VReg = LiveUnionI.value(); if (VReg != RecentReg && !isSeenInterference(VReg)) { RecentReg = VReg; InterferingVRegs.push_back(VReg); if (InterferingVRegs.size() >= MaxInterferingRegs) return InterferingVRegs.size(); } // This LiveUnion segment is no longer interesting. if (!(++LiveUnionI).valid()) { SeenAllInterferences = true; return InterferingVRegs.size(); } } … } LiveIntervalUnion VirtReg start() stop() start end start() stop() start end start() stop() start end start() stop() start end
  • 24. Check Interference AH AL BH BL XMM31 V3 V3 V5 V0 V4 V1 V2 V6 V7 // Check the matrix for virtual register interference. for (MCRegUnitIterator Units(PhysReg, TRI); Units.isValid(); ++Units) if (query(VirtReg, *Units).checkInterference()) return IK_VirtReg;
  • 26. Use Split to Improve RA • Live Range Splitting • Insert copy/re-materialize to split up live ranges • hopefully reduces need for spilling • Also control spill code placement
  • 27. • Example Q0 D0 D1 Q1 D2 D3 V1 V2 V3 V4 V5
  • 29. • No physical register for V1 Q0 D0 D1 Q1 D2 D3 V1 V2 V3 V4 V5
  • 30. • Evict V2 Q0 D0 D1 Q1 D2 D3 V1 V2 V3V4 V5 stack
  • 31. • Split V2 Q0 D0 D1 Q1 D2 D3 V1 V2b V3V4 V5 V2a V2c
  • 32. • Split V2 Q0 D0 D1 Q1 D2 D3 V1 V2b V3V4 V5 V2a V2c stack
  • 33. Greedy RA Stages • RS_New: created • RS_Assign: enqueue • RS_Split: need to split • RS_Split2 • used for split products that may not be making progress • RS_Spill: need to spill • RS_Done: assigned a physical register or created by spill
  • 34. RS_Split2 • The live intervals created by split will enqueue to process again. • There is a risk of creating innite loops. … = vreg1 … … = vreg1 … … = vreg1 … vreg2 = COPY vreg1 … = vreg2 … vreg3 = COPY vreg1 … = vreg3 … … = vreg3 … RS_New RS_Split2
  • 35. Greedy Register Allocation try to assign physical register (hint > zero cost reg > low cost reg) try to evict to nd better register enter RS_Split stage try last chance recoloring split spill pick a physical register and evict all interference found register stage >= RS_Done stage < RS_Split selectOrSplit(d+1) enter RS_Done stage selectOrSplit(d)
  • 36. Last Chance Recoloring • Try to assign a color to VirtReg by recoloring its interferences. • The recoloring process may recursively use the last chance recoloring. Therefore, when a virtual register has been assigned a color by this mechanism, it is marked as Fixed. vA can use {R1, R2 } vB can use { R2, R3} vC can use {R1 } vA => R1 vB => R2 vC => fails vA => R2 vB => R3 vC => R1 (xed)
  • 37. How to Split? is stage beyond RS_Spill? is in one BB? tryLocalSplit tryInstructionSplit No Yes tryRegionSplit is stage less than RS_Split2? No spill Yes success? No success? spill No tryBlockSplit Yes No success? No success? spill No done Yes Yes done Yes Yes
  • 38. BlockInfo (LiveIn) (LiveOut) FirstInstr: First instruction accessing current reg. LastInstr: Last instruction accessing current reg. Live-through blocks without any uses don’t get BlockInfo entries.
  • 39. tryLocalSplit • Try to split virtual register interval into smaller intervals inside its only basic block. • calculate gap weights • adjust the split region
  • 41. Calculate Gap Weights LI.weight VirtReg LI If there is a RegUnit occupied by VirtReg:0 0
  • 42. Calculate Gap Weights LI.weight Fixed RegUnit If there is a xed RegUnit:0 0 huge_valf
  • 43. Adjust Split Region SplitAfter = 1 SplitBefore = 0 normalise spill weight > max gap BestBefore = SplitBefore BestAfter = SplitAfter SplitAfter++ SplitBefore++ YesNo normalise spill weight = spill cost / distance = (#gap * block_freq) / distance(SplitBefore, SplitAfter)
  • 44. Adjust Split Region BestAfter BestBefore normalise spill weight > max gap BestBefore = SplitBefore BestAfter = SplitAfter SplitAfter++ SplitBefore++ YesNo normalise spill weight = spill cost / distance = (#gap * block_freq) / distance(SplitBefore, SplitAfter) RS_New (or RS_Split2) RS_New Find the most critical range.
  • 45. tryInstructionSplit • Split a live range around individual instructions. • Every “use” instruction has its own live interval.
  • 46. tryBlockSplit • Split a global live range around every block with uses. FirstInstr LastInstr
  • 47. tryRegionSplit • For every physical register • Prepare interference cache • Construct Hopeld Network • Construct block constraints • Update Hopeld Network biases and values according to block constraints • Add links in Hopeld Network and iterate • Get the best candidate (minimize split cost + spill cost) • Do region split
  • 48. Hopeld Network • A form of recurrent articial neural network popularised by John Hopeld in 1982. • Guaranteed to converge to a local minimum.
  • 49. Hopeld Network • Node: edge bundle • Link: transparent basic blocks have the variable live through. • Energy function (the cost of spilling) • Weight: block frequency • Bias: according to block constraints
  • 50. Block Constraints No Interference PrefReg Intf.rst() MustSpill PrefSpill FirstInstr LastInstr PrefReg FirstInstr LastInstr FirstInstr LastInstr FirstInstr LastInstr PrefReg MustSpill FirstInstr LastInstr PrefReg FirstInstr LastInstr FirstInstr LastInstr FirstInstr LastInstr PrefSpill Last Split Point
  • 51. Edge Bundle BB #0 BB #1 BB #3 BB #2 BB #4 BB #5 BB #6 // Join the outgoing bundle with the ingoing bundles of all successors. for (MachineBasicBlock::const_succ_iterator SI = MBB.succ_begin(), SE = MBB.succ_end(); SI != SE; ++SI) EC.join(OutE, 2 * (*SI)->getNumber()); EC: (BB#0, in) Bundle #0: 0 0 0 (BB#0, out) Bundle #1: 1 1 1 (BB#1, in) Bundle #2: 2 1 1 (BB#1, out) Bundle #3: 3 3 2 (BB#2, in) Bundle #4: 4 3 2 (BB#2, out) Bundle #5: 5 5 3 (BB#3, in) Bundle #6: 6 5 3 (BB#3, out) Bundle #7: 7 7 4 (BB#4, in) Bundle #8: 8 7 4 (BB#4, out) Bundle #9: 9 5 3 (BB#5, in) Bundle #10: 10 7 4 (BB#5, out) Bundle #11: 11 1 1 (BB#6, in) Bundle #12: 12 3 2 (BB#6, out) Bundle #13: 13 13 5 void join(unsigned a, unsigned b) { unsigned eca = EC[a]; unsigned ecb = EC[b]; while (eca != ecb) if (eca < ecb) EC[b] = eca, b = ecb, ecb = EC[b]; else EC[a] = ecb, a = eca, eca = EC[a]; }
  • 52. Edge Bundle BB #0 BB #1 BB #3 BB #2 BB #4 BB #5 BB #6 Blocks: Bundle #0: BB#0 Bundle #1: BB#0, BB#1, BB#5 Bundle #2: BB#1, BB#2, BB#6 Bundle #3: BB#2, BB#3, BB#4 Bundle #4: BB#3, BB#4, BB#5 Bundle #5: BB#6 Bundle #6: Bundle #7: Bundle #8: Bundle #9: Bundle #10: Bundle #11: Bundle #12: Bundle #13: EC: (BB#0, in) Bundle #0: 0 0 0 (BB#0, out) Bundle #1: 1 1 1 (BB#1, in) Bundle #2: 2 1 1 (BB#1, out) Bundle #3: 3 3 2 (BB#2, in) Bundle #4: 4 3 2 (BB#2, out) Bundle #5: 5 5 3 (BB#3, in) Bundle #6: 6 5 3 (BB#3, out) Bundle #7: 7 7 4 (BB#4, in) Bundle #8: 8 7 4 (BB#4, out) Bundle #9: 9 5 3 (BB#5, in) Bundle #10: 10 7 4 (BB#5, out) Bundle #11: 11 1 1 (BB#6, in) Bundle #12: 12 3 2 (BB#6, out) Bundle #13: 13 13 5
  • 53. Edge Bundle BB #0 BB #1 BB #3 BB #2 BB #4 BB #5 BB #6 Blocks: Bundle #0: BB#0 Bundle #1: BB#0, BB#1, BB#5 Bundle #2: BB#1, BB#2, BB#6 Bundle #3: BB#2, BB#3, BB#4 Bundle #4: BB#3, BB#4, BB#5 Bundle #5: BB#6 Bundle #6: Bundle #7: Bundle #8: Bundle #9: Bundle #10: Bundle #11: Bundle #12: Bundle #13: EC: (BB#0, in) Bundle #0: 0 0 0 (BB#0, out) Bundle #1: 1 1 1 (BB#1, in) Bundle #2: 2 1 1 (BB#1, out) Bundle #3: 3 3 2 (BB#2, in) Bundle #4: 4 3 2 (BB#2, out) Bundle #5: 5 5 3 (BB#3, in) Bundle #6: 6 5 3 (BB#3, out) Bundle #7: 7 7 4 (BB#4, in) Bundle #8: 8 7 4 (BB#4, out) Bundle #9: 9 5 3 (BB#5, in) Bundle #10: 10 7 4 (BB#5, out) Bundle #11: 11 1 1 (BB#6, in) Bundle #12: 12 3 2 (BB#6, out) Bundle #13: 13 13 5
  • 54. Edge Bundle BB #0 BB #1 BB #3 BB #2 BB #4 BB #5 BB #6 Blocks: Bundle #0: BB#0 Bundle #1: BB#0, BB#1, BB#5 Bundle #2: BB#1, BB#2, BB#6 Bundle #3: BB#2, BB#3, BB#4 Bundle #4: BB#3, BB#4, BB#5 Bundle #5: BB#6 Bundle #6: Bundle #7: Bundle #8: Bundle #9: Bundle #10: Bundle #11: Bundle #12: Bundle #13: EC: (BB#0, in) Bundle #0: 0 0 0 (BB#0, out) Bundle #1: 1 1 1 (BB#1, in) Bundle #2: 2 1 1 (BB#1, out) Bundle #3: 3 3 2 (BB#2, in) Bundle #4: 4 3 2 (BB#2, out) Bundle #5: 5 5 3 (BB#3, in) Bundle #6: 6 5 3 (BB#3, out) Bundle #7: 7 7 4 (BB#4, in) Bundle #8: 8 7 4 (BB#4, out) Bundle #9: 9 5 3 (BB#5, in) Bundle #10: 10 7 4 (BB#5, out) Bundle #11: 11 1 1 (BB#6, in) Bundle #12: 12 3 2 (BB#6, out) Bundle #13: 13 13 5
  • 55. Edge Bundle BB #0 BB #1 BB #3 BB #2 BB #4 BB #5 BB #6 Blocks: Bundle #0: BB#0 Bundle #1: BB#0, BB#1, BB#5 Bundle #2: BB#1, BB#2, BB#6 Bundle #3: BB#2, BB#3, BB#4 Bundle #4: BB#3, BB#4, BB#5 Bundle #5: BB#6 Bundle #6: Bundle #7: Bundle #8: Bundle #9: Bundle #10: Bundle #11: Bundle #12: Bundle #13: EC: (BB#0, in) Bundle #0: 0 0 0 (BB#0, out) Bundle #1: 1 1 1 (BB#1, in) Bundle #2: 2 1 1 (BB#1, out) Bundle #3: 3 3 2 (BB#2, in) Bundle #4: 4 3 2 (BB#2, out) Bundle #5: 5 5 3 (BB#3, in) Bundle #6: 6 5 3 (BB#3, out) Bundle #7: 7 7 4 (BB#4, in) Bundle #8: 8 7 4 (BB#4, out) Bundle #9: 9 5 3 (BB#5, in) Bundle #10: 10 7 4 (BB#5, out) Bundle #11: 11 1 1 (BB#6, in) Bundle #12: 12 3 2 (BB#6, out) Bundle #13: 13 13 5
  • 56. Edge Bundle BB #0 BB #1 BB #3 BB #2 BB #4 BB #5 BB #6 Blocks: Bundle #0: BB#0 Bundle #1: BB#0, BB#1, BB#5 Bundle #2: BB#1, BB#2, BB#6 Bundle #3: BB#2, BB#3, BB#4 Bundle #4: BB#3, BB#4, BB#5 Bundle #5: BB#6 Bundle #6: Bundle #7: Bundle #8: Bundle #9: Bundle #10: Bundle #11: Bundle #12: Bundle #13: EC: (BB#0, in) Bundle #0: 0 0 0 (BB#0, out) Bundle #1: 1 1 1 (BB#1, in) Bundle #2: 2 1 1 (BB#1, out) Bundle #3: 3 3 2 (BB#2, in) Bundle #4: 4 3 2 (BB#2, out) Bundle #5: 5 5 3 (BB#3, in) Bundle #6: 6 5 3 (BB#3, out) Bundle #7: 7 7 4 (BB#4, in) Bundle #8: 8 7 4 (BB#4, out) Bundle #9: 9 5 3 (BB#5, in) Bundle #10: 10 7 4 (BB#5, out) Bundle #11: 11 1 1 (BB#6, in) Bundle #12: 12 3 2 (BB#6, out) Bundle #13: 13 13 5
  • 57. Edge Bundle BB #0 BB #1 BB #3 BB #2 BB #4 BB #5 BB #6 Blocks: Bundle #0: BB#0 Bundle #1: BB#0, BB#1, BB#5 Bundle #2: BB#1, BB#2, BB#6 Bundle #3: BB#2, BB#3, BB#4 Bundle #4: BB#3, BB#4, BB#5 Bundle #5: BB#6 Bundle #6: Bundle #7: Bundle #8: Bundle #9: Bundle #10: Bundle #11: Bundle #12: Bundle #13: EC: (BB#0, in) Bundle #0: 0 0 0 (BB#0, out) Bundle #1: 1 1 1 (BB#1, in) Bundle #2: 2 1 1 (BB#1, out) Bundle #3: 3 3 2 (BB#2, in) Bundle #4: 4 3 2 (BB#2, out) Bundle #5: 5 5 3 (BB#3, in) Bundle #6: 6 5 3 (BB#3, out) Bundle #7: 7 7 4 (BB#4, in) Bundle #8: 8 7 4 (BB#4, out) Bundle #9: 9 5 3 (BB#5, in) Bundle #10: 10 7 4 (BB#5, out) Bundle #11: 11 1 1 (BB#6, in) Bundle #12: 12 3 2 (BB#6, out) Bundle #13: 13 13 5
  • 58. Edge Bundle BB #0 BB #1 BB #3 BB #2 BB #4 BB #5 BB #6 Blocks: Bundle #0: BB#0 Bundle #1: BB#0, BB#1, BB#5 Bundle #2: BB#1, BB#2, BB#6 Bundle #3: BB#2, BB#3, BB#4 Bundle #4: BB#3, BB#4, BB#5 Bundle #5: BB#6 Bundle #6: Bundle #7: Bundle #8: Bundle #9: Bundle #10: Bundle #11: Bundle #12: Bundle #13: EC: (BB#0, in) Bundle #0: 0 0 0 (BB#0, out) Bundle #1: 1 1 1 (BB#1, in) Bundle #2: 2 1 1 (BB#1, out) Bundle #3: 3 3 2 (BB#2, in) Bundle #4: 4 3 2 (BB#2, out) Bundle #5: 5 5 3 (BB#3, in) Bundle #6: 6 5 3 (BB#3, out) Bundle #7: 7 7 4 (BB#4, in) Bundle #8: 8 7 4 (BB#4, out) Bundle #9: 9 5 3 (BB#5, in) Bundle #10: 10 7 4 (BB#5, out) Bundle #11: 11 1 1 (BB#6, in) Bundle #12: 12 3 2 (BB#6, out) Bundle #13: 13 13 5
  • 59. SpillPlacement::addConstraints • update BiasN, BiasP according to BorderConstraint BB #n (freq) … = Y op … PrefReg PrefSpill Bundle ib BiasP += freq Bundle ob BiasN += freq void addBias(BlockFrequency freq, BorderConstraint direction) { switch (direction) { default: break; case PrefReg: BiasP += freq; break; case PrefSpill: BiasN += freq; break; case MustSpill: BiasN = BlockFrequency::getMaxFrequency(); // (uint64_t)-1ULL break; } }
  • 60. Hopeld Network Node • Node.update(nodes, Threshold) Bundle X BiasN BiasP Value Bundle A Value = -1 Bundle B Value = 1 Bundle C Value = 1 Bundle D Value = 1 Links SumN = BiasN + freqA SunP = BiasP + freqB + freqC + freqD (freqA, A) (freqB, B) (freqC, C) (freqD, D) if (SumN >= SumP + Threshold) Value = -1; else if (SumP >= SumN + Threshold) Value = 1; else Value = 0;
  • 61. Grow Region • Live through blocks in positive bundles. No Interference Intf.rst() MustSpill PrefSpill Used as links between bundles SpillPlacement::addConstraints Intf.last() MustSpill PrefSpill
  • 62. SpillPlacement::addLinks BB #n (freq) Bundle ib Bundle ob Bundle ib Bundle ob (freq, ob) (freq, ib)
  • 63. SpillPlacement::iterate for (unsigned iteration = 0; iteration != 10; ++iteration) { bool Changed = false; for (SmallVectorImpl<unsigned>::const_reverse_iterator I = iteration == 0 ? Linked.rbegin() : std::next(Linked.rbegin()), E = Linked.rend(); I != E; ++I) { unsigned n = *I; if (nodes[n].update(nodes, Threshold)) { Changed = true; if (nodes[n].preferReg()) RecentPositive.push_back(n); } } if (!Changed || !RecentPositive.empty()) return; Changed = false; for (SmallVectorImpl<unsigned>::const_iterator I = std::next(Linked.begin()), E = Linked.end(); I != E; ++I) { unsigned n = *I; if (nodes[n].update(nodes, Threshold)) { Changed = true; if (nodes[n].preferReg()) RecentPositive.push_back(n); } } if (!Changed || !RecentPositive.empty()) return; }
  • 64. Spill Cost No Interference PrefReg Intf.rst() MustSpill PrefSpill FirstInstr LastInstr PrefReg FirstInstr LastInstr FirstInstr LastInstr FirstInstr LastInstr PrefReg MustSpill FirstInstr LastInstr PrefReg FirstInstr LastInstr FirstInstr LastInstr FirstInstr LastInstr PrefSpill Last Split Point ++Ins ++Ins ++Ins ++Ins ++Ins ++Ins Cost = Block_Frequency * Ins
  • 65. Split Cost BB #n (freq) … = Y op … Bundle ib Value Bundle ob Value Use Block RegIn RegOut BC.Entry BC.Exit if (BI.LiveIn) Ins += RegIn != (BC.Entry == SpillPlacement::PrefReg); if (BI.LiveOut) Ins += RegOut != (BC.Exit == SpillPlacement::PrefReg); while (Ins--) GlobalCost += SpillPlacer->getBlockFrequency(BC.Number); Live Through BB #n (freq) Bundle ib Value Bundle ob Value RegIn RegOut RegIn RegOut Cost 0 0 0 0 1 freq 1 0 freq 1 1 2 x freq (interfer)
  • 66. The Best Candidate • For all physical registers, calculate region split cost. • Cost = block constraints cost (spill cost) + global split cost • The best candidate has the lowest cost.
  • 68. splitLiveThroughBlock Bundle ib Value == 1 Bundle ob Value != 1 Live Through LiveOut on Stack rst non-PHI Start New Int Bundle ib Value != 1 Bundle ob Value == 1 Live Through LiveIn on Stack last split point End New Int Live Through No Interference Bundle ib Value == 1 Bundle ob Value == 1 End New Int Start
  • 69. splitLiveThroughBlock Bundle ib Value == 1 Bundle ob Value == 1 LiveThrough Non-overlapping interference New Int Interference.st() Interference.last() New Int Bundle ib Value == 1 Bundle ob Value == 1 LiveThrough Overlapping interference New Int Interference.st() Interference.last() New Int
  • 70. splitRegInBlock Bundle ib Value == 1 No LiveOut Interference after kill Start New Int Bundle ib Value == 1 Bundle ob Value != 1 LiveOut on Stack Interference after last use LiveOut on Stack Interference after last use Interference.st() LastInstr LastInstr last split point New Int Start Bundle ib Value == 1 Bundle ob Value != 1 LastInstr last split point New Int Start Interference.st() Interference.st()
  • 71. splitRegInBlock Bundle ib Value == 1 LiveOut on Stack Interference overlapping uses Start New Int Bundle ib Value == 1 Interference.st() LastInstr last split point New Int Start New Int Interference.st() LastInstr last split point New Int Bundle ob Value != 1 Bundle ob Value != 1 LiveOut on Stack Interference overlapping uses
  • 72. splitRegOutBlock No LiveIn Interference before def End New Int Bundle ib Value != 1 Bundle ob Value == 1 Live Through Interference before def Live Through Interference overlapping uses Interference.last() FirstInstr Bundle ib Value != 1 Bundle ob Value == 1 Bundle ob Value == 1 End New Int Interference.last() FirstInstr last split point End New Int Interference.last() FirstInstr New Int