2. 2
These slides are made available to you under a Creative Commons Share-
Alike 3.0 license. The full terms of this license are here:
https://creativecommons.org/licenses/by-sa/3.0/
Attribution requirements and misc., PLEASE READ:
● This slide must remain as-is in this specific location (slide #2), everything
else you are free to change; including the logo :-)
● Use of figures in other documents must feature the below “Originals at”
URL immediately under that figure and the below copyright notice where
appropriate.
● You are free to fill in the “Delivered and/or customized by” space on the
right as you see fit.
● You are FORBIDEN from using the default “About” slide as-is or any of its
contents.
● You are FORBIDEN from using any content provided by 3rd parties without
the EXPLICIT consent from those parties.
(C) Copyright 2016, Opersys inc.
These slides created by: Karim Yaghmour
Originals at: www.opersys.com/community/docs
Delivered and/or customized by
3. 3
About
● Author of:
● Introduced Linux Trace Toolkit in 1999
● Originated Adeos and relayfs (kernel/relay.c)
● Training, Custom Dev, Consulting, ...
4. 4
Agenda
1. Architecture
2. Linux scheduler history
3. Completely Fair Scheduler (CFS)
4. Sched Classes
5. CPU power management
6. Load tracking
7. Android problems w/ Linux scheduler
8. User-space vs. Linux scheduler
9. Framework
10. Summing up
5. 5
1. Architecture
● Hardware on which Android is based
● Android stack
● Startup
● System services
● Binder driver
12. 12
2. Linux scheduler history
● 1995 - Linux 1.2
● Circular queue of tasks w/ round-robin
● 1999 - Linux 2.2
● Scheduling classes: real-time, non-preemptable and non-real-time
● SMP support
● 2001 - Linux 2.4 O(N) scheduler
● Each task gets a slice (epoch)
● 2003 - Linux 2.6 O(1) scheduler
● Two queues: active vs. Expired
● Heuristics-based / no “formal” algorithm
● 2007 - ~ Linux 2.6.21
● Con Kolivas' Rotating Staircase Deadline Scheduler (RSDL)
● 2007 - Linux 2.6.23
● Ingo Molnar's Completely Fair Scheduler (CFS)
13. 13
3. Completely Fair Scheduler (CFS)
●
Tasks put in red-black tree
●
Self-balancing – i.e. no path more than twice other path
● O (log n) operations
●
task_struct->sched_entity->rb_node
●
kernel/core/sched.c:schedule()
●
put_prev_task()
●
pick_next_task()
●
Priorities provide decay factors
●
Higher priority = lower decay factor
● i.e. lower priority uses up their time more quickly
●
Group scheduling – control group mechanism (cgroups)
●
Grouped processes share a single process' virtual time
●
Use of cgroupfs – /acct in Android
15. 15
5. CPU power management
●
Need to integrate CPU power management and scheduler more closely
●
Existing Linux support for power management (governors)
●
cpuidle
– What happens when the CPU is IDLe
● cpufreq drivers
– HW counters- / FW-based decision (intel_pstate / longrun)
– Hard-coded values based on ranges within driver DVFS – “Dynamic Voltage and Frequency Scaling”
●
Trivial
– Performance - highest frequency
– Powersave - lowest frequency
– Userspace - User-space app makes decision
●
Non-trivial
– Based on system load
●
Stats for non-trivial governors come from scheduler stats (/proc/stat)
●
On-demand - Immediate jump on load increase
● Conservative - Gradual scale-up on increase
16. 16
5.1. Making frequency scaling choices
● SCHED_DEADLINE
● We have precise info, because of how this works
● SCHED_FIFO / SCHED_RR
● Put CPU at max right away
● SCHED_NORMAL
● Use stats
17. 17
6. Load tracking
● How to track how much “load” a process is putting on the
system?
● Not just CPU time consumed
● Process waiting for CPU is contributing to load
● Since 3.8
● PELT - Per-Entity Load Tracking
– Each load instance is .y times the previous one:
● L0 + L1*y + L2*y2 + L3*y3 + ...
● New proposal (used in Pixel)
● WALT - Window-Assisted Load Tracking
18. 18
7. Android problems w/ Linux scheduler
● Even “on-demand” isn't good enough
● It takes too many milliseconds for timer to tick
and stats to be updated
● Since decision is based on stats, delay is user-
noticeable
19. 19
7.1. Android Initial Solution
● Android devs wrote their own governor:
● “interactive”
● Detects transition out of idle
● Shortens timeout to scale up to much shorter
time (“1-2 ticks”)
● Out of tree
20. 20
7.2. Intermediate solutions discussed
● Trigger cpufreq when scheduler updates stats
● Instead of trigger using a timeout
● Introduce new governor: schedutil (merged in
4.7)
● Use the info straight from the scheduler stats
update
21. 21
7.3. Recent Work
●
Problems:
● bigLITTLE
● EAS – Energy Aware Schedulign
●
SchedTune Used Pixel phone
● Provide user-space knobs to control schedutil governor
● Android doesn't need to make suppositions, it knows what's happening
●
Implemented as control-group controller
● Each cgroup has tunable “schedtune.boost”
●
sched-freq
● Use CFS runqueue info to scale CPU freq
●
WALT – Window-Assisted Load Tracking
22. 22
8. User-space vs. Linux scheduler
● Control knobs
● /dev/cpuset/ - which tasks run on which CPUs
● /dev/cpuctl/ - restrict CPU time for bg tasks
● /dev/stune/ - EAS stune boosting
● init.rc
● Power HAL:
● Power “hints”
● Set interactive
23. 23
9. Framework
● System Services:
● Activity Manager
– Causes apps to be started through Zygote
– Feed lifecycle events
– Manages foreground vs. background, etc.
●
Scheduling Policy:
– Modifies process scheduler based on request
● Defined thread groups:
● Default
● BG non-interactive
●
Foreground
●
System
●
Audio App -- SCHED_FIFO
● Audio Sys -- SCHED_FIFO
● Top App
● See
●
frameworks/base/core/java/android/os/Process.java
● frameworks/base/core/jni/android_util_Process.cpp
24. 24
10. Summing Up
● Quite a few moving parts
● Linux does bulk of work
● Android gives hints to Linux
● Still ongoing work to get best performance with
least battery usage