Proposed/implemented a memory bandwidth control mechanism on Linux kernel for Multi-core systems.
Described a timing analysis method based up on the proposed mechanism.
Axa Assurance Maroc - Insurer Innovation Award 2024
Memory access control in multiprocessor for real-time system with mixed criticality
1. Memory Access Control in
Multiprocessor for Real-time Systems
with Mixed Criticality
Heechul Yun+, Gang Yao+, Rodolfo Pellizzoni*,
Marco Caccamo+, Lui Sha+
University of Illinois at Urbana and Champaign+
Univerity of Waterloo*
2. Multi-core Systems
• Mainstream in smartphone
– Dual/quad-core smartphones
– More performance with less
power
Tegra 3 (4 cores)
• Traditional embedded/real-time domains
– Avionics companies are investigating [Nowotsch12]
• 8 core P4080 processor from Freescale
2
[Nowotsch12] “Leveraging Multi-Core Computing Architectures in Avionics”, EDCC, 2012
3. Challenge
Core1 Core2 Core3 Core4
System bus
DRAM
• Timing isolation is hard to achieve
3
4. Challenge
Appl1 App2 App 3 App 4
Core1 Core2 Core3 Core4
System bus
DRAM
• Cores compete for shared HW resources
– System bus, DRAM controller, shared cache, …
4
5. Effect of Memory Contention
Run-time increase (%)
70%
60% 60% WCET increase
50% is unacceptable
40%
30%
App.
App membomb 20%
10%
0%
Core Core 429.mcf 471.omnetpp 473.astar 433.milc 470.lbm
Shared system bus • Run-time increase due to contention
Memory – Five SPEC2006 benchmarks
– Compared to solo execution
5
6. Goal
• Mechanism to control memory contention
– Software based controller for COTS multi-core
processors
• Response time analysis accounting memory
contention effect
– Based on proposed software based controller
6
7. Outline
• Motivation
• Memory Access Control System
• Response Time Analysis
• Evaluation
• Conclusion
7
8. System Architecture
Core1 Core2 Core3 Core4
Memory bandwidth controllers (Part of OS)
20% 30% 10% 40%
System bus
DRAM
• Assign memory bandwidth to each core using per-core
memory bandwidth controller
8
9. Memory Bandwidth Controller
• Periodic server for memory resource
• Periodically monitor memory accesses of the core
and control user specified bandwidth using OS
scheduler
– Monitoring can be efficiently done by using per-core
hardware performance counter
– Bandwidth = # memory accesses X avg. access time
9
10. Memory Bandwidth Controller
• Period: 10 time unit, Budget: 2 memory accesses
– memory access takes 1 time unit
Enqueue tasks
2
Budget
1
Task
0 10 20
Dequeue tasks Dequeue tasks
computation
10
memory fetch
11. Outline
• Motivation
• Memory Bandwidth Control System
• Response Time Analysis
• Evaluation
• Conclusion
11
12. System Model
Critical core Interfering cores
Core1 Core2 Core3 Core4
Memory bandwidth controller
System bus
DRAM
• Cores are partitioned based on criticality
• Critical core runs periodic real-time tasks with fixed
priority scheduling algorithm
• Interfering cores run non-critical workload and
regulated with proposed memory access controller
12
13. Assumptions
Critical core Interfering cores
Core1 Core2 Core3 Core4
Memory bandwidth controller
System bus
DRAM
• Private or partitioned last level cache (LLC)
• Round-robin bus arbitration policy
• Memory access latency is constant
• 1 LLC miss = 1 DRAM access
13
14. Simple Case: One Interfering Core
Critical Interfering
Core Core
Memory
bandwidth
controller
System bus
DRAM
• Critical core - core under analysis
• Interfering core – generating memory interference
14
15. Problem Formulation
• For a given periodic real-time task set 𝑇 = {𝜏1 ,
𝜏1 ,…, 𝜏 𝑛 } on a critical core
• Problem:
– Determine 𝑇 is schedulable on the critical core
given memory access control budget Q and period
P on the interfering core
15
16. Task Model
CM
computation
memory fetch
(cache stall)
C time
– C : WCET of a task on isolated core (no interference)
– CM: number of last level cache misses (DRAM accesses)
– L: stall time of single cache miss
16
17. Memory Interference Model
– P : memory access controller period
– Q: memory access time budget
– αu(t): Linearized interfering memory traffic upper-bound
17
18. Background [Pellizzoni07]
• Accounting Memory Interference
– Cache bound: maximum interference time <=
maximum number cache-accesses (CM) * L of the task
under analysis
– Traffic bound: maximum interference time <=
maximum bus time requested by the interfering core
Cache-bound Traffic bound
𝐶 : WCET account memory stall delay
L: stall time of single cache miss
18
[Pellizzoni07] “Toward the predictable integration of real-time cots based systems,” RTSS, 2007
19. Classic Response Time Analysis
𝑅𝑖 𝑘
𝑅 𝑖𝑘+1 = 𝐶 𝑖 + ∗ 𝐶𝑗
𝑇𝑗
𝑗<𝑖
– Tasks are sorted in priority order
• low index = high priority task
– 𝐶 𝑖 : WCET of task i (in isolation w/o memory interference)
– 𝑅 𝑖 : Response time of task i
– 𝑇𝑗 : Period of task I
19
20. Extended Response Time Analysis
𝑅𝑖 𝑘
𝑅 𝑖𝑘+1 = 𝐶 𝑖 + ∗ 𝐶𝑗 + min 𝑁 𝑅 𝑘 ∗ 𝐿, 𝛼 𝑢 𝑅 𝑘
𝑇𝑗
𝑗<𝑖 𝑡
where 𝑁 𝑡 = 𝑗≤𝑖 ∗ 𝐶𝑀𝑗
𝑇𝑗
𝑢
𝑄 𝑄(𝑃 − 𝑄)
𝛼 𝑡 = 𝑡 +2
𝑃 𝑃
– 𝑁 𝑡 : aggregated cache misses over time t
– 𝛼 𝑢 (𝑡): interfering memory traffic over time t
– P: memory access control period
– Q: memory access time budget
• Proposed method achieves tighter response time than using 𝐶
20
21. Outline
• Motivation
• Memory Bandwidth Control System
• Response Time Analysis
• Evaluation
• Conclusion
21
22. Linux Kernel Implementation
• Extending CPU bandwidth reservation feature of
group scheduler
– Specify core and bandwidth (memory budget, period)
• mkdir /sys/fs/cgroup/core3; cd /sys/fs/cgroup/core3
• echo 3 > cpuset.cpus core 3
• echo 10000 > cpu.cfs_period_us period
• echo 500000 > cpu.cfs_quota_event cache-misses budget
– Added feature
– Monitor memory usage at every scheduler tick and
context switch
22
23. Experimental Platform
Intel Core2Quad
Core 0 Core 1 Core 2 Core 3
L1-I L1-D L1-I L1-D L1-I L1-D L1-I L1-D
L2 Cache L2 Cache
System Bus
DRAM
• Core 0,2 were disabled to simulate a private LLC system
• Running a modified Linux 3.2 kernel
– https://github.com/heechul/linux-sched-coreidle/tree/sched-3.2-throttle-v2
23
24. Synthetic Task
App.
App membomb
Core1 Core3
Shared system bus
Memory
• Core under analysis runs a synthetic task with 50% memory bandwidth
• Vary throttling budget of the interfering core from 0 to 100%
• Two findings: (1) we can control interference, (2) analysis provide an upper
bound (albeit still pessimistic)
24
25. H.264 Movie Playback
• Cache-miss counts sampled over every 100ms
• Some inaccuracy in regulation due to implementation limitation
– Current version is improved accurate by using hardware overflow interrupt
25
26. Conclusion
• Shared hardware resources in multi-core systems
are big challenges for designing real-time systems
• We proposed and implemented a mechanism to
provide memory bandwidth reservation
capability on COTS multi-core processors
• We developed a response time analysis method
using the proposed memory access control
mechanism
26