Project ACRN schedule framework introduction

ACRN schedule framework introduction
Shuo A Liu
shuo.a.liu@intel.com
Key contributors: Yu Wang, Jason CJ Chen

• Goal
• Schedule framework
• Pcpu resource assignment
• Backup
Summary

ACRN cpu sharing goal
1. Fully utilize the physical cpu resource to support more virtual machines.
2. Keep small footprint for hypervisor, and satisfy FuSa requirements.
3. No impact on real time and FuSa workloads.
4. Simple tiny scheduler algorithm to satisfy embedded device requirements.

Roles of schedule framework
• Follows modularization, we have vcpu layer
and schedule layer.
• Maintain thread_object state machine.
• Abstract scheduler object, easy extend new
scheduler algorithms.
• Scalable context switch architecture.
vcpu layer
NOOP scheduler BVT scheduler
schedule framework
layer
high level layers

Basic schedule concept
VM0
vcpu0 vcpu1
VM1
vcpu0 vcpu1
VM2
vcpu0 vcpu1
per_cpu
Real clock Real clock
runqueue
Real clock
runqueue
Real clock
pcpu0 pcpu1 pcpu2 pcpu3
Context Switch
pick
vcpu
Context Switch
pick
vcpu
Context Switch
pick
vcpu
pick
vcpu
noop scheduler BVT scheduler
per_cpu
Context Switch

per_cpu
Schedule framework relationship diagram
vcpu
vcpu_id
thread_obj
…
sched_control
vcpu
vcpu_id
thread_obj
…
thread_object thread_object
vcpu
vcpu_id
thread_obj
…
sched_bvt_control
vcpu
vcpu_id
thread_obj
…
run_list
pcpu_id
thread
switch_out
switch_in
status
data
run_list
pcpu_id
thread
switch_out
switch_in
status
data
run_list
pcpu_id
thread
switch_out
switch_in
status
data
run_list
pcpu_id
thread
switch_out
switch_in
status
data
curr_obj
scheduler
priv
…
PCPU2
timer
…
runqueue
sched_bvtPCPU0
vcpu
vcpu_id
thread_obj
…
sched_control
thread_object
per_cpu
curr_obj
scheduler
priv
…
sched_noop_control
…
noop_thread_obj
sched_noop
PCPU1
vcpu
vcpu_id
thread_obj
…
sched_control
thread_object
per_cpu
curr_obj
scheduler
priv
…
sched_noop_context
…
noop_thread_obj
pcpu_id
thread
switch_out
switch_in
status
data
run_list
init
deinit
init_data
deinit_data
yield
prioritize
sleep
name
pick_next
wake
name
init
deinit
init_data
deinit_data
yield
prioritize
sleep
pick_next
wake
pcpu_id
thread
switch_out
switch_in
status
data
run_list

vcpu_array
created_vcpu
s
…
acrn_vm
type
hw
name
GUID
pcpu_bitmap
vcpu_num
…
vm_id
vcpu_array
created_vcpu
s
…
acrn_vm
type
hw
name
GUID
pcpu_bitmap
vcpu_num
…
vm_id
vcpu_array
created_vcpu
s
…
acrn_vm
type
hw
name
GUID
pcpu_bitmap
vcpu_num
…
vm_id
vcpu0
vcpu1
vcpu0
vcpu1
vm_hw_info vm_hw_info vm_hw_info
Relationship of vcpu and schedule
vcpu0
vcpu1
per_cpu
vcpu
vcpu_id
thread_obj
…
thread_control
thread_object
vcpu
vcpu_id
thread_obj
…
sched_bvt_control
vcpu
vcpu_id
thread_obj
…
run_list
pcpu_id
thread
switch_out
switch_in
status
data
run_list
pcpu_id
thread
switch_out
switch_in
status
data
run_list
pcpu_id
thread
switch_out
switch_in
status
data
curr_obj
scheduler
priv
…
PCPU
2
timer
…
runqueue
sched_bvt
name
init
deinit
init_data
deinit_data
yield
prioritize
sleep
pick_next
wake
PCPU
0
PCPU
1
PCPU
3

Schedule API architecture
vcpu layer
thread_obj
Functionality component layers
schedulers
layer
schedule framework
layer
scheduler
callbacks
API for vcpu
vcpu API
• vcpu derived from thread_obj.
• thread_obj is the schedule entity from schedule framework perspective
• scheduler is abstracted from schedule framework, several callbacks need to be implemented by
each scheduler.
API for
scheduler
Hardware management
layer

Schedule object state transition
THREAD_STS_BLOCKED
THREAD_STS_RUNNINGTHREAD_STS_RUNNABLE
init
Scheduler pick to running
run out of timeslice /
yield / be preempted

Scheduling points & schedule request points
Type Points APIs
Scheduling points Before VM-Entry
I/O request
default_idle
Schedule request points Scheduler tick
S3 sleep/wake
Pause/Halt instruction emulation yield/sleep
vcpu launch wake
Virtual interrupt/exception injection kick/wake
EPT flush kick
EOI vmexit bitmap update kick
Hypercalls(pause_vm/resume_vm) sleep/wake

Context Switch
Vcpu layer will register the switch_in/switch_out for platform architecture level
context switch handling.
MSRs IA32_STAR, IA32_LSTAR, IA32_FMASK,
IA32_KERNEL_GS_BASE
xsave/xrstors X87 state, SSE state and platform specific state components.
vmcs switch
General registers flags, rbx, rbp, r12, r13, r14, r15, rdi, rsp

VCPU Power management
• Vcpu C/P state
• For noop, pass through then controlled by guest itself.
• Hardcode p-state and c-state data table in HV, export to guest through ACPI table by DM.
• Forward MSR_IA32_PERF_CTL msr writing of guests to hardware by HV for p-state.
• Passthrough c-state IO port writing for c-state.
• For BVT, consider to disable C/P-state capability for guest.
• Disable p-state and c-state table for guest through virtual ACPI table.
• disable mwait capability of guest in CPUID emulation.
• MSR_IA32_PERF_CTL emulation will reject the request.

FuSa/Security requirements
• FuSa
• The schedule architecture can guarantee pcpu sharing scheduler will not
impact the pcpu which bind with noop scheduler.
• Every scheduler will only effect its vcpu_affinity limited pcpu resource.
• There has no shared resource between different schedulers.
• Schedule/scheduler architecture follows modularization rules.
• Security
• L1TF and Spectre are needn’t care because they will be handled during
VMEXIT&VMENTRY. And hypervisor only support ring0, so needn’t do extra
operations during context switching.

• Goal
• Scheduler framework
• Pcpu resource assignment
• Backup
Summary

cpu_affinity
• Currently, we do not support vCPU migration; the assignment of vCPU
mapping to pCPU is fixed at the time the VM is launched. The statically
configured cpu_affinity in the VM configuration defines a superset of pCPUs
that the VM is allowed to run on. One bit in this bitmap indicates that one pCPU
could be assigned to this VM, and the bit number is the pCPU ID. A pre-
launched VM is supposed to be launched on exact number of pCPUs that are
assigned in this bitmap. The vCPU to pCPU mapping is implicitly indicated:
vCPU0 maps to the pCPU with lowest pCPU ID, vCPU1 maps to the second
lowest pCPU ID, and so on.
• For post-launched VMs, acrn-dm could choose to launch a subset of pCPUs
that are defined in cpu_affinity by specifying the assigned pCPUs (--
cpu_affinity option). But it can’t assign any pCPUs that are not included in the
VM’s cpu_affinity.

cpu_affinity - example
Use the following settings to support this configuration in the industry
scenario:
• Define the default pCPU pool for each VM:
#define VM1_CONFIG_CPU_AFFINITY (AFFINITY_CPU(0U) | AFFINITY_CPU(1U))
#define VM2_CONFIG_CPU_AFFINITY (AFFINITY_CPU(2U) | AFFINITY_CPU(3U))
• Offline pcpu2-3 in Service VM.
• launch WaaG with “--cpu_affinity 0,1”
• launch RT with “--cpu_affinity 2,3”
• The devicemodel option “--cpu_affinity” is not a must because they define
same pCPU pool with the default value in hypervisor configuration.
pCPU0 pCPU1 pCPU2 pCPU3
Service VM + Waag RT Linux

Project ACRN schedule framework introduction

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Project ACRN schedule framework introduction

Semelhante a Project ACRN schedule framework introduction (20)

Mais de Project ACRN

Mais de Project ACRN (20)

Último

Último (20)

Project ACRN schedule framework introduction