SlideShare a Scribd company logo
1 of 15
Download to read offline
Deadline Scheduling Accounting,
CPUset and CPUhotplug
Mathieu Poirier
ENGINEERS AND DEVICES
WORKING TOGETHER
In This Presentation
● Describe a classic feature interaction problem between:
○ The deadline (DL) scheduler
○ CPUhotplug
○ CPUset
● Short overview of the initial RFC published before Linux Plumbers
● Open discussion on how best to approach the remaining work
● We are not talking about…
○ How the Linux scheduler works
○ What is deadline scheduling
○ How deadline scheduling works
ENGINEERS AND DEVICES
WORKING TOGETHER
A Very Simple Problem
● Deadline accounting metrics are lost when...
○ A CPU is hotpluged in or out of a system
○ CPUsets are created or destroyed
● Why is this a problem?
○ Losing track of deadline utilisation leads to a collapse of the mathematical model behind DL
○ Under utilisation of the system’s potential
○ Over selling capacity → failure of the system to honor deadline contracts
● Steve Rostedt’s initial report [1].
[1]. https://lkml.org/lkml/2016/2/3/966
ENGINEERS AND DEVICES
WORKING TOGETHER
Before a Hotplug Operation...
root@linaro-developer:/home/linaro# grep dl /proc/sched_debug
dl_rq[0]:
.dl_nr_running : 0
.dl_nr_migratory : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 629145 <------ Result of a 6:10 DL reservation is accounted
Dl_rq[1]: on all runqueues included in the root domain
.dl_nr_running : 0 (60% of 1048576 == 629145)
.dl_nr_migratory : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 629145
dl_rq[2]:
.dl_nr_running : 1
.dl_nr_migratory : 1
.dl_bw->bw : 996147
.dl_bw->total_bw : 629145
dl_rq[3]:
.dl_nr_running : 0
.dl_nr_migratory : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 629145
ENGINEERS AND DEVICES
WORKING TOGETHER
After a Hotplug Operation...
root@linaro-developer:/home/linaro# grep dl /proc/sched_debug
dl_rq[0]:
.dl_nr_running : 0
.dl_nr_migratory : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 0 <------ DL accounting is lost on all the runqueues
Dl_rq[1]: in the root domain
.dl_nr_running : 0
.dl_nr_migratory : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 0
dl_rq[2]:
.dl_nr_running : 1
.dl_nr_migratory : 1
.dl_bw->bw : 996147
.dl_bw->total_bw : 0
● The same can be achieved with various CPUset operations
ENGINEERS AND DEVICES
WORKING TOGETHER
Why Is This Happening?
● When a CPUhotplug/CPUset operation happens:
○ Runqueues are removed from the current root domain and added to the default root domain
○ Current root domains are deleted
○ New root domains are created
○ Runqueues are removed from the default root domain and added to the new root domains
● DL accounting metrics are lost in the first step
● For CPUhotplug the above is split in a two-step (and asynchronous) process
● Tricky to fix because the code is shared between CPUhotplug and CPUset
ENGINEERS AND DEVICES
WORKING TOGETHER
Obvious Solution, Not So Easy To Implement
● Obviously recompute DL bandwidth utilisation when new root domains are
created
But…
● Common code between CPUhotplug and CPUset
○ But the processes calling the common code is different
■ CPUhotplug → Asynchronous
■ CPUSet → Synchronous
● Problems to keep in mind:
○ Tasks are removed from runqueues when they get suspended → they can go anywhere
○ Tasks can span more than one root domain → not supported by DL scheduler
ENGINEERS AND DEVICES
WORKING TOGETHER
The Current Solution
● An RFC that fix the problem using CPUsets has been posted [1]
● It is based on the premise that every task (suspended or not) in the system
belongs to a single CPUset.
● As such to recompute root domains’ DL metrics:
○ Circle all the CPUsets in the system
○ If a DL task is found, get a reference to the root domain it is associated to [2]
○ Add DL task utilisation to the root domain
● General consensus this is the right approach
[1]. https://marc.info/?l=linux-kernel&m=150291845422763&w=2
[2]. By way of the task’s cpu_allowed field and runqueues
ENGINEERS AND DEVICES
WORKING TOGETHER
Pros And Cons Of The Solution
● Advantages:
○ Approach is relatively simple
○ Works for any kind of CPUset topology one can imagine
○ Deals with running, runnable and suspended task in the same way
● Disadvantages:
○ Currently prevents tasks from belonging to more than a single root domain
○ Hard to cover all the scenarios that can lead to the above
ENGINEERS AND DEVICES
WORKING TOGETHER
Task In Multiple Root Domains
How can this happen?
○ By default most new tasks can use all the CPUs in the system
○ New tasks belong to the default CPUset [1]
○ Newly created CPUsets will have their own root domain
○ Existing tasks aren’t impacted and as such are allowed to use the CPUs in the new root
domains, something that breaks the DL scheduler acceptance test
How do we deal with tasks spanning more than a single root domain?
[1].Tasks need to be explicitly assigned to a CPUset in order to belong to that set
ENGINEERS AND DEVICES
WORKING TOGETHER
Possible Solution #1
● Continue with what the current RFC does and prevent DL tasks from spanning
more than one root domain?
○ Somewhat simple
○ Hackish and brittle (in my opinion)
ENGINEERS AND DEVICES
WORKING TOGETHER
Possible Solution #2
● Update the DL scheduler so that it can deal with tasks spanning more than
one root domain?
○ Represent a serious amount of work
○ Jeopardize the integrity of the current DL scheduler
○ Modifies the underlying DL scheduler mathematical model
○ Mathematical model rigorously proven, probably not a good idea to mess with it
ENGINEERS AND DEVICES
WORKING TOGETHER
Possible Solution #3
● Rebuild the scheduling domains before a CPUset/CPUhotplug operation
starts, rather than near the end.
○ Would provide a view of the system after the CPUset/CPUhotplug operation
○ Solid and reliable way to spot tasks ending up in more than one root domain
○ An idea that came to me at 30 thousand feet - could be a pipe dream
ENGINEERS AND DEVICES
WORKING TOGETHER
Your
Thank You
#SFO17
SFO17 keynotes and videos on: connect.linaro.org
For further information: www.linaro.org

More Related Content

More from Linaro

HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
Linaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
Linaro
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
Linaro
 

More from Linaro (20)

Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Deadline Scheduling Accounting, CPUsets and CPUhotplug - SFO17-505

  • 1. Deadline Scheduling Accounting, CPUset and CPUhotplug Mathieu Poirier
  • 2. ENGINEERS AND DEVICES WORKING TOGETHER In This Presentation ● Describe a classic feature interaction problem between: ○ The deadline (DL) scheduler ○ CPUhotplug ○ CPUset ● Short overview of the initial RFC published before Linux Plumbers ● Open discussion on how best to approach the remaining work ● We are not talking about… ○ How the Linux scheduler works ○ What is deadline scheduling ○ How deadline scheduling works
  • 3. ENGINEERS AND DEVICES WORKING TOGETHER A Very Simple Problem ● Deadline accounting metrics are lost when... ○ A CPU is hotpluged in or out of a system ○ CPUsets are created or destroyed ● Why is this a problem? ○ Losing track of deadline utilisation leads to a collapse of the mathematical model behind DL ○ Under utilisation of the system’s potential ○ Over selling capacity → failure of the system to honor deadline contracts ● Steve Rostedt’s initial report [1]. [1]. https://lkml.org/lkml/2016/2/3/966
  • 4. ENGINEERS AND DEVICES WORKING TOGETHER Before a Hotplug Operation... root@linaro-developer:/home/linaro# grep dl /proc/sched_debug dl_rq[0]: .dl_nr_running : 0 .dl_nr_migratory : 0 .dl_bw->bw : 996147 .dl_bw->total_bw : 629145 <------ Result of a 6:10 DL reservation is accounted Dl_rq[1]: on all runqueues included in the root domain .dl_nr_running : 0 (60% of 1048576 == 629145) .dl_nr_migratory : 0 .dl_bw->bw : 996147 .dl_bw->total_bw : 629145 dl_rq[2]: .dl_nr_running : 1 .dl_nr_migratory : 1 .dl_bw->bw : 996147 .dl_bw->total_bw : 629145 dl_rq[3]: .dl_nr_running : 0 .dl_nr_migratory : 0 .dl_bw->bw : 996147 .dl_bw->total_bw : 629145
  • 5. ENGINEERS AND DEVICES WORKING TOGETHER After a Hotplug Operation... root@linaro-developer:/home/linaro# grep dl /proc/sched_debug dl_rq[0]: .dl_nr_running : 0 .dl_nr_migratory : 0 .dl_bw->bw : 996147 .dl_bw->total_bw : 0 <------ DL accounting is lost on all the runqueues Dl_rq[1]: in the root domain .dl_nr_running : 0 .dl_nr_migratory : 0 .dl_bw->bw : 996147 .dl_bw->total_bw : 0 dl_rq[2]: .dl_nr_running : 1 .dl_nr_migratory : 1 .dl_bw->bw : 996147 .dl_bw->total_bw : 0 ● The same can be achieved with various CPUset operations
  • 6. ENGINEERS AND DEVICES WORKING TOGETHER Why Is This Happening? ● When a CPUhotplug/CPUset operation happens: ○ Runqueues are removed from the current root domain and added to the default root domain ○ Current root domains are deleted ○ New root domains are created ○ Runqueues are removed from the default root domain and added to the new root domains ● DL accounting metrics are lost in the first step ● For CPUhotplug the above is split in a two-step (and asynchronous) process ● Tricky to fix because the code is shared between CPUhotplug and CPUset
  • 7. ENGINEERS AND DEVICES WORKING TOGETHER Obvious Solution, Not So Easy To Implement ● Obviously recompute DL bandwidth utilisation when new root domains are created But… ● Common code between CPUhotplug and CPUset ○ But the processes calling the common code is different ■ CPUhotplug → Asynchronous ■ CPUSet → Synchronous ● Problems to keep in mind: ○ Tasks are removed from runqueues when they get suspended → they can go anywhere ○ Tasks can span more than one root domain → not supported by DL scheduler
  • 8. ENGINEERS AND DEVICES WORKING TOGETHER The Current Solution ● An RFC that fix the problem using CPUsets has been posted [1] ● It is based on the premise that every task (suspended or not) in the system belongs to a single CPUset. ● As such to recompute root domains’ DL metrics: ○ Circle all the CPUsets in the system ○ If a DL task is found, get a reference to the root domain it is associated to [2] ○ Add DL task utilisation to the root domain ● General consensus this is the right approach [1]. https://marc.info/?l=linux-kernel&m=150291845422763&w=2 [2]. By way of the task’s cpu_allowed field and runqueues
  • 9. ENGINEERS AND DEVICES WORKING TOGETHER Pros And Cons Of The Solution ● Advantages: ○ Approach is relatively simple ○ Works for any kind of CPUset topology one can imagine ○ Deals with running, runnable and suspended task in the same way ● Disadvantages: ○ Currently prevents tasks from belonging to more than a single root domain ○ Hard to cover all the scenarios that can lead to the above
  • 10. ENGINEERS AND DEVICES WORKING TOGETHER Task In Multiple Root Domains How can this happen? ○ By default most new tasks can use all the CPUs in the system ○ New tasks belong to the default CPUset [1] ○ Newly created CPUsets will have their own root domain ○ Existing tasks aren’t impacted and as such are allowed to use the CPUs in the new root domains, something that breaks the DL scheduler acceptance test How do we deal with tasks spanning more than a single root domain? [1].Tasks need to be explicitly assigned to a CPUset in order to belong to that set
  • 11. ENGINEERS AND DEVICES WORKING TOGETHER Possible Solution #1 ● Continue with what the current RFC does and prevent DL tasks from spanning more than one root domain? ○ Somewhat simple ○ Hackish and brittle (in my opinion)
  • 12. ENGINEERS AND DEVICES WORKING TOGETHER Possible Solution #2 ● Update the DL scheduler so that it can deal with tasks spanning more than one root domain? ○ Represent a serious amount of work ○ Jeopardize the integrity of the current DL scheduler ○ Modifies the underlying DL scheduler mathematical model ○ Mathematical model rigorously proven, probably not a good idea to mess with it
  • 13. ENGINEERS AND DEVICES WORKING TOGETHER Possible Solution #3 ● Rebuild the scheduling domains before a CPUset/CPUhotplug operation starts, rather than near the end. ○ Would provide a view of the system after the CPUset/CPUhotplug operation ○ Solid and reliable way to spot tasks ending up in more than one root domain ○ An idea that came to me at 30 thousand feet - could be a pipe dream
  • 15. Thank You #SFO17 SFO17 keynotes and videos on: connect.linaro.org For further information: www.linaro.org