SlideShare a Scribd company logo
1 of 39
Download to read offline
SFO17-TR05: EAS Profiling On Hikey960
Leo Yan & Daniel Thompson
Linaro Solutions and Support Engineering
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Overview
● Current status of EAS r1.3 on Hikey960
○ EAS in Android common kernel
○ Hikey960 power management introduction
○ Power modeling on Hikey960
● Testing methodology
● Optimization with CGroup
● Observed issues and proposed patches
● Conclusion
ENGINEERS AND DEVICES
WORKING TOGETHER
EAS in Android common kernel
● EAS r1.3 is merged into Android common kernel and Hikey kernel
○ Many optimizations have been merged in this version!
○ EAS r1.3 “has done a large refactoring of find_best_target” to rework wakeup, simplify the
decision making and make heuristics clearer
○ EAS r1.3 makes ‘schedutil’ the recommended CPUFreq governor
○ EAS r1.3 will optimize small tasks by placing them on LITTLE CPUs most of the time
https://android.googlesource.com/kernel/common/:
Android common kernel
https://android.googlesource.com/kernel/hikey-linaro:
Hikey/Hikey960 Android kernel
https://android-review.googlesource.com:
patches committing and merging
EAS r1.3
EAS r1.3
EAS r1.x
EAS r1.x
ENGINEERS AND DEVICES
WORKING TOGETHER
Hikey960 Power Management Introduction
CA53 OPPs CA73 OPPs
1844MHz
1709MHz
1402MHz
999MHz
533MHz
2362MHz
2112MHz
1805MHz
1421MHz
903MHz
DDR OPPs
1860MHz
1244MHz
1067MHz
685MHz
400MHz
GPU OPPs
960MHz
807MHz
533MHz
400MHz
178MHz
1037MHz
MCU
CPUFreq &
Devfreq
Thermal
Framework
CPU cooling device Dev cooling device
CPUIdle
Low power states
MCU detects 85’C
as alarming point
and limits the
CPU/GPU/DDR
OPPs as last resort
GPU mali driver and VPU driver
have been enabled (VPU driver
is WIP for AOSP repo)
ENGINEERS AND DEVICES
WORKING TOGETHER
Power modeling on Hikey960
Two highlights on modeling:
- CA73 high frequency has much worse
power efficiency;
- CA73 low frequency and CA53 high
frequency have overlap for power
efficiency.
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Overview
● Current status of EAS r1.3 on Hikey960
● Testing methodology
○ Automatic testing framework
○ Testing suites
○ Temperature is critical for fair game!
○ Measurement metrics
● Optimization with CGroup
● Observed issues and proposed patches
● Conclusion
ENGINEERS AND DEVICES
WORKING TOGETHER
Automatic testing framework
Workload Automation
USB-Relay [1]
Fan
Hikey960
Charts
1. Burn images (mainly for kernel)
2. Cool down device with fan
3. Run test cases
4. Output results
[1] https://github.com/daniel-thompson/usb-relay
ARM Energy Probe
ENGINEERS AND DEVICES
WORKING TOGETHER
Workload automation
● “Workload Automation (WA) is a framework for executing workloads and
collecting measurements on Android and Linux devices.” [1]
○ WA captures energy instrument data
○ WA captures performance metrics
○ WA captures ftrace log
○ WA captures kernel statistics (interrupt numbers during test case running)
● WA can run test cases for power scenarios/interactive performance/intensive
performance testing for multiple iterations in single one testing configuration
● WA absents some features but we can easily extend it [2]
○ Support multiple kernel burning
○ Support kernel scheduler statistics
○ Support result comparison with different configurations
● Co-works with LISA for detailed scheduler analysis
[1] https://github.com/ARM-software/workload-automation
[2] https://github.com/Leo-Yan/workload-automation
ENGINEERS AND DEVICES
WORKING TOGETHER
Android
Home screen: on
WIFI: off
Many peripherals and devices
are powered on, so baseline
power is high (~4w)
Configuration for baseline testing
Workload agenda
- Fix DDR@685MHz and GPU@960MHz;
- Disable thermal capping by set high
sustainable value (but MCU capping still
works);
- use “sched-freq” governor and up
threshold is 500us and down threshold is
50ms;
- use PELT signals due it’s more reliable
for energy modeling
ENGINEERS AND DEVICES
WORKING TOGETHER
Temperature is critical for a fair game!
Ambient temperature:
26.5’C ~ 27.5’C
SoC start temperature: 43.5’C;
takes 2~5 minutes to reach start temperature
The fan is used to accelerate cooling down time but it is stopped after cool
down to specific temperature (e.g. 43.5'C). So we can avoid the fan to
interfere the testing and give more penalty for power consumption when SoC
temperature increases and has more close behavior with the real production.
ENGINEERS AND DEVICES
WORKING TOGETHER
Test suites
● Power saving cases
○ “idle” - sleep for 120s
○ “audio” - playback mp3 for 120s
○ “video” - playback video (1080p) for 120s with hardware codec
● Interactive cases
○ UiBench
■ “InflatingListActivity” - Scroll the screen for list widget
■ “InvalidateActivity” - Reverse color for small grids, invalidate and redraw view
■ “ActivityTransition” - Click screen to zoom in and zoom out picture
■ “TrivialAnimationActivity” - Animation for rendering whole screen color
○ “Galleryfling” - Launch gallery with hundreds pictures and fling pictures
○ “Recentfling” - Open recents screen and fling recent tasks list
○ “Emailfling” - Open email app and fling emails list
○ “Browserfling” - Open browser icecat and fling the web page
ENGINEERS AND DEVICES
WORKING TOGETHER
Measurement metrics
# Test cases Scheduler Statistics Power Metrics Performance Metrics
1 idle Wake up path statistics mW = Energy (J) / Time (S) n/a
2 audio Wake up path statistics mW n/a
3 video Wake up path statistics mW n/a
4 galleryfling Wake up path statistics mW janks% = janks / total_frame
5 recentfling Wake up path statistics mW janks%
6 emailfling Wake up path statistics mW janks%
7 UiBench InflatingListActivity Wake up path statistics mW janks%
8 InvalidateActivity Wake up path statistics mW janks%
9 ActivityTransition Wake up path statistics mW janks%
10 TrivialAnimationActivity Wake up path statistics mW janks%
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Overview
● Current status of EAS r1.3 on Hikey960
● Testing methodology
● Optimization with CGroup
○ Using CPUSET for background tasks
○ How to set schedtune’s ‘prefer_idle’?
○ Select boost margin for interactive cases
● Observed issues and proposed patches
● Conclusion
ENGINEERS AND DEVICES
WORKING TOGETHER
Control groups (CGroups) in Android system
● “Control Groups provide a mechanism for aggregating/partitioning sets of tasks, and all their future
children, into hierarchical groups with specialized behaviour… mainly for resource-tracking
purposes”: https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
● A cgroup associates a set of tasks with a set of parameters
○ CPUCtl: restrict CPU time and keeps background tasks to only small portion bandwidth
■ cpu.shares
■ cpu.rt_runtime_us
■ cpu.rt_period_us
○ CPUSet: pin tasks to specified CPUs
■ cpus: CPU affinity for tasks
○ Schedtune: speedup the time-to-completion for a task activation
■ boost: inflate task and CPU utilization and impact OPP selection
■ prefer_idle: select idle CPUs for task to avoid scheduling latency
● Init scripts can statically set cgroups parameters
● Android PowerHAL can dynamically set cgroups parameters for different scenarios
● PowerHAL is disabled in this slides and use WA script to statically set parameters
ENGINEERS AND DEVICES
WORKING TOGETHER
Using CPUSET for background tasks affinity
runtime_params:
sysfile_values:
/dev/cpuset/foreground/cpus: 0-7
/dev/cpuset/top-app/cpus: 0-7
/dev/cpuset/background/cpus: 0-3
/dev/cpuset/system-background/cpus: 0-3
/dev/stune/schedtune.boost: 0
/dev/stune/schedtune.prefer_idle: 0
/dev/stune/background/schedtune.boost: 0
/dev/stune/background/schedtune.prefer_idle: 0
/dev/stune/foreground/schedtune.boost: 0
/dev/stune/foreground/schedtune.prefer_idle: 0
/dev/stune/top-app/schedtune.boost: 0
/dev/stune/top-app/schedtune.prefer_idle: 0
CPUSET knobs
Schedtune knobs
Using CPUSET to set cpus to ‘0-3’ so can
pin background tasks always run on
LITTLE CPUs.
ENGINEERS AND DEVICES
WORKING TOGETHER
Task placement optimization with setting CPUSET 0~3 for background tasks
Compared baseline kernel with CPUSET 0-7, we can
optimization task migration onto LITTLE CPUs by
pinning background tasks with CPUSET 0-3.
ENGINEERS AND DEVICES
WORKING TOGETHER
Understanding ‘prefer_idle’
● ‘prefer_idle’ is indicator to ask scheduler to select idle CPU for ‘prefer_idle’
tasks as possible; it’s mainly used for multi-threading concurrency
performance boosting
● Select idle CPUs with specific orders and impacts power heavily
○ When boost > 0, the big cluster is firstly iterated so ‘prefer_idle’ tasks have much more chance
to be migrated on big CPUs
○ When boost = 0, tries to find IDLE CPU from LITTLE cluster firstly; ‘prefer_idle’ tasks can be
well spreaded within LITTLE cluster, this results in power saving for middle workload scenario
(e.g. video playback)
○ Iteration starts from low ID for CPU, so low ID CPU has higher priority
● Using CPUSET + ‘prefer_idle’ can guarantee latency for important tasks
(top-app)
ENGINEERS AND DEVICES
WORKING TOGETHER
Caveats for ‘prefer_idle’
Assumption: ‘prefer_idle’ is considered to spread tasks
and it doesn’t take account the energy when spread tasks
to big CPUs; ‘prefer_idle’ enabling may introduce higher
power by migration tasks to big CPUs. So we usually avoid
to enable ‘prefer_idle’ for power sensitive scenarios.
In fact ‘prefer_idle’ co-works with find_best_target()
underlying. find_best_target() tries to pack small
tasks onto single CPU, and ‘prefer_idle’ can ease the
packing when ‘prefer_idle’ can spread within LITTLE
cluster and reduce CPU frequency; we can save power by
‘prefer_idle’ for some middle workload case.
/dev/stune/foreground/schedtune.prefer_idle: 0
/dev/stune/top-app/schedtune.prefer_idle: 1
/dev/stune/foreground/schedtune.prefer_idle: 1
/dev/stune/top-app/schedtune.prefer_idle: 1
average_freq = 699MHz
average_freq = 655MHz
ENGINEERS AND DEVICES
WORKING TOGETHER
Caveats for ‘prefer_idle’ - cont.
Caveat #2:
Don’t use prefer_idle too much; otherwise it
has more chance to contend IDLE CPUs
with ‘prefer_idle’ tasks, and prefer_idle
tasks also contend with non prefer-idle
small tasks or even RT threads; as result
there have more chance to migrate
‘prefer_idle’ tasks onto big CPUs, so at the
end cannot save power.
ENGINEERS AND DEVICES
WORKING TOGETHER
Boost margin comparison for interactive cases
“janks%” is performance metric and is lower is better; with boost
margin = 15% or 20%, the janks% can be optimized to less than
10% for most cases; after setting boost = 25% or 30% the janks%
is regressed by thermal throttling. Choosing 15% boost margin in
this slides for interactive cases.
ENGINEERS AND DEVICES
WORKING TOGETHER
CPUSET and schedtune settings
Used for power saving scenarios
runtime_params:
sysfile_values:
/dev/cpuset/foreground/cpus: 0-7
/dev/cpuset/top-app/cpus: 0-7
/dev/cpuset/background/cpus: 0-3
/dev/cpuset/system-background/cpus: 0-3
/dev/stune/schedtune.boost: 0
/dev/stune/schedtune.prefer_idle: 0
/dev/stune/background/schedtune.boost: 0
/dev/stune/background/schedtune.prefer_idle: 0
/dev/stune/foreground/schedtune.boost: 0
/dev/stune/foreground/schedtune.prefer_idle: 1
/dev/stune/top-app/schedtune.boost: 0
/dev/stune/top-app/schedtune.prefer_idle: 1
Used for interactive scenarios
runtime_params:
sysfile_values:
/dev/cpuset/foreground/cpus: 0-7
/dev/cpuset/top-app/cpus: 0-7
/dev/cpuset/background/cpus: 0-3
/dev/cpuset/system-background/cpus: 0-3
/dev/stune/schedtune.boost: 0
/dev/stune/schedtune.prefer_idle: 0
/dev/stune/background/schedtune.boost: 0
/dev/stune/background/schedtune.prefer_idle: 0
/dev/stune/foreground/schedtune.boost: 15
/dev/stune/foreground/schedtune.prefer_idle: 1
/dev/stune/top-app/schedtune.boost: 15
/dev/stune/top-app/schedtune.prefer_idle: 1
Power hints can dynamically switch the settings between different scenarios.
ENGINEERS AND DEVICES
WORKING TOGETHER
Power comparison (boost%=0)
Kernel1_baseline:
/dev/cpuset/background/cpus: 0-7
/dev/cpuset/system-background/cpus: 0-7
/dev/stune/foreground/schedtune.boost: 0
/dev/stune/foreground/schedtune.prefer_idle: 0
/dev/stune/top-app/schedtune.boost: 0
/dev/stune/top-app/schedtune.prefer_idle: 0
Kernel1_baseline_bst00:
/dev/cpuset/background/cpus: 0-3
/dev/cpuset/system-background/cpus: 0-3
/dev/stune/foreground/schedtune.boost: 0
/dev/stune/foreground/schedtune.prefer_idle: 1
/dev/stune/top-app/schedtune.boost: 0
/dev/stune/top-app/schedtune.prefer_idle: 1
ENGINEERS AND DEVICES
WORKING TOGETHER
Power and performance comparison (boost%=15)
Kernel1_baseline:
/dev/cpuset/background/cpus: 0-7
/dev/cpuset/system-background/cpus: 0-7
/dev/stune/foreground/schedtune.boost: 0
/dev/stune/foreground/schedtune.prefer_idle: 0
/dev/stune/top-app/schedtune.boost: 0
/dev/stune/top-app/schedtune.prefer_idle: 0
Kernel1_baseline_bst15:
/dev/cpuset/background/cpus: 0-3
/dev/cpuset/system-background/cpus: 0-3
/dev/stune/foreground/schedtune.boost: 15
/dev/stune/foreground/schedtune.prefer_idle: 1
/dev/stune/top-app/schedtune.boost: 15
/dev/stune/top-app/schedtune.prefer_idle: 1
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Overview
● Current status of EAS r1.3 on Hikey960
● Testing methodology
● Optimization with CGroup
● Observed issues and proposed patches
○ Optimization methodology for EAS core algorithms
○ Optimization with active CPU and IDLE CPU frequency
○ Comparison energy with cpumask (in hacking session)
● Conclusion
ENGINEERS AND DEVICES
WORKING TOGETHER
Methodology for optimization in this chapter
The first step:
We can start to analyze the CPU
frequency selection for some power
sensitive cases (e.g. video/audio
playback etc); if there have some
scenarios with CPU frequency is
overrated, it’s good start point for
analysis.
The second step:
Based on the first step optimization, we
can explore optimization for task
placement crossing different clusters;
this mean we don’t merely optimize
small tasks placed on LITTLE CPUs,
but also need find better occasions to
place big workload tasks to big CPUs.
ENGINEERS AND DEVICES
WORKING TOGETHER
Video playback (1080p) frequency analysis
The big cluster can stay at lowest OPP due video playback has
no big workload tasks, and the LITTLE cluster frequency often
oscillates between 533MHz and 990MHz; the CPU utilization is
about ~100 so in theory LITTLE CPU should stay at 533MHz.
ENGINEERS AND DEVICES
WORKING TOGETHER
RT threads util is high in video playback case
find_best_target() uses CFS class ‘util’ (red signal) to select
candidate CPU, the CPUFreq governor uses all scheduling classes
‘util’ (blue signal); due the RT class ‘util’ (green signal) is big value
but it’s missed by find_best_target() results in find_best_target()
cannot select CPU with lowest OPP.
ENGINEERS AND DEVICES
WORKING TOGETHER
Optimization for CPU selection with lower frequency
prev_cpu target_cpu
backup_cpu
Find selection
winner
vs
winner
prev_cpu vs
winner
EAS r1.3 Proposed optimization
Best active CPU
Best IDLE CPU
Consistent CPU utilization estimation
crosses find_best_target(), energy
calculation and CPUFreq governor.
Change “target CPU” as CPU least util
CPU (and includes IDLE CPU).
ENGINEERS AND DEVICES
WORKING TOGETHER
Function calc_total_util() introduced for total utilization estimation
static unsigned long calc_total_util(int cpu, struct task_struct *p, unsigned long cpu_util,
unsigned long tsk_util)
{
unsigned long cfs_util, rt_util, dl_util, total;
int boost;
int tsk_boost = p ? schedtune_task_boost(p) : 0;
int cpu_boost = schedtune_cpu_boost(cpu);
/*
* Estimate cfs utilization with boost margin
*/
boost = max(tsk_boost, cpu_boost);
cfs_util = cpu_util + tsk_util;
cfs_util += schedtune_margin(cfs_util, boost);
/* Convert rt class utilization with CPU capacity */
rt_util = get_rt_cpu_capacity(cpu) * capacity_orig_of(cpu);
rt_util = rt_util / SCHED_CAPACITY_SCALE;
/* Convert dl class utilization with CPU capacity */
dl_util = get_dl_cpu_capacity(cpu) * capacity_orig_of(cpu);
dl_util = dl_util / SCHED_CAPACITY_SCALE;
total = cfs_util + rt_util + dl_util;
return total;
}
Total util = CFS util
+= CFS boost margin
+= RT util
+= DL util
ENGINEERS AND DEVICES
WORKING TOGETHER
Optimization target CPU with lower OPP
if (idle_cpu(i)) {
[...]
break;
}
/* Favor CPUs with maximum spare capacity */
if ((capacity_orig - new_util) <
target_max_spare_cap)
continue;
target_max_spare_cap = capacity_orig - new_util;
target_cpu = i;
Find one target CPU with lowest util (includes IDLE
CPUs)
prev_cpu target_cpu
backup_cpu
Find selection
winner
vs
winner
prev_cpu vs
winner
EAS r1.3
Best active CPU
Best IDLE CPU
Based on Joonwoo Park’s patch: “sched/fair:
take CPU energy cost into consideration for
placement”, this patch is in reviewing and not
merged into AOSP.
ENGINEERS AND DEVICES
WORKING TOGETHER
Optimized frequency for video playback (1080p)
EAS r1.3
With optimization
average_freq = 655MHz
average_freq = 609MHz
ENGINEERS AND DEVICES
WORKING TOGETHER
Power comparison (boost%=0)
/dev/cpuset/background/cpus: 0-3
/dev/cpuset/system-background/cpus: 0-3
/dev/stune/foreground/schedtune.boost: 0
/dev/stune/foreground/schedtune.prefer_idle: 1
/dev/stune/top-app/schedtune.boost: 0
/dev/stune/top-app/schedtune.prefer_idle: 1
ENGINEERS AND DEVICES
WORKING TOGETHER
Power and performance comparison (boost%=15)
/dev/cpuset/background/cpus: 0-3
/dev/cpuset/system-background/cpus: 0-3
/dev/stune/foreground/schedtune.boost: 15
/dev/stune/foreground/schedtune.prefer_idle: 1
/dev/stune/top-app/schedtune.boost: 15
/dev/stune/top-app/schedtune.prefer_idle: 1
ENGINEERS AND DEVICES
WORKING TOGETHER
Problems for heavy workload (will discuss in hacking session)
The little cluster stays at 1709MHz and
1844MHz for long time, but these two
OPPs have worse power efficiency than
big cluster low OPP 903MHz. And if the
big task is placed on LITTLE CPU and
trigger “overutilized” flag, this finally harms
performance and power.
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Overview
● Current status of EAS r1.3 on Hikey960
● Testing methodology
● Optimization with CGroup
● Observed issues and proposed patches
● Conclusion
ENGINEERS AND DEVICES
WORKING TOGETHER
Conclusion
● EAS on Hikey960 is working well for EAS development; suggest to use
UEFI+ARM-TF firmwares and latest MCU firmware
● CPUSET/SchedTune is the recommended interface for simple EAS tuning by
device manufacturers for quick optimization
● Investigation of EAS on Hikey960 has shown some good ideas for further
improvements and some of these are being proposed on Google code review
for a future AOSP common kernel
ENGINEERS AND DEVICES
WORKING TOGETHER
Conclusion - cont.
● Proposed optimization
○ Firstly we need consistent method for CPU utilization estimation which includes utilization
contributed by RT and DL tasks
○ The power optimization can be achieved by optimization the active CPU frequency; due it has
side effects for spreading tasks to more spare capacity CPU so also can benefit some
interactive cases (Optimize frequency for the IDLE CPU?)
○ Using cpumask to give more chance for energy calculation, it can significantly save power for
interactive cases and sustainable cases (Will discuss in hacking session)
ENGINEERS AND DEVICES
WORKING TOGETHER
Related materials
● Workload-automation scripts
○ Add support for Hikey960 board
○ Add support for flashing kernel/dtb images
○ Add support for fan as cooling device
○ Add workload Uibench test case
○ Add workload browserfling galleryfling emailfling
○ Result comparison and plot results for different testing configurations
● Kernel patches on gerrit but need to rebase on EAS r1.4
○ https://android-review.googlesource.com/#/c/kernel/common/+/453616/
Thank You
#SFO17
SFO17 keynotes and videos on: connect.linaro.org
For further information: www.linaro.org

More Related Content

More from Linaro

OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...Linaro
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramLinaro
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNLinaro
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...Linaro
 

More from Linaro (20)

OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready Program
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NN
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
 

Recently uploaded

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 

Recently uploaded (20)

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 

EASv1.3 Profiling On Hikey960 - SFO17-TR05

  • 1. SFO17-TR05: EAS Profiling On Hikey960 Leo Yan & Daniel Thompson Linaro Solutions and Support Engineering
  • 2. ENGINEERS AND DEVICES WORKING TOGETHER Overview ● Current status of EAS r1.3 on Hikey960 ○ EAS in Android common kernel ○ Hikey960 power management introduction ○ Power modeling on Hikey960 ● Testing methodology ● Optimization with CGroup ● Observed issues and proposed patches ● Conclusion
  • 3. ENGINEERS AND DEVICES WORKING TOGETHER EAS in Android common kernel ● EAS r1.3 is merged into Android common kernel and Hikey kernel ○ Many optimizations have been merged in this version! ○ EAS r1.3 “has done a large refactoring of find_best_target” to rework wakeup, simplify the decision making and make heuristics clearer ○ EAS r1.3 makes ‘schedutil’ the recommended CPUFreq governor ○ EAS r1.3 will optimize small tasks by placing them on LITTLE CPUs most of the time https://android.googlesource.com/kernel/common/: Android common kernel https://android.googlesource.com/kernel/hikey-linaro: Hikey/Hikey960 Android kernel https://android-review.googlesource.com: patches committing and merging EAS r1.3 EAS r1.3 EAS r1.x EAS r1.x
  • 4. ENGINEERS AND DEVICES WORKING TOGETHER Hikey960 Power Management Introduction CA53 OPPs CA73 OPPs 1844MHz 1709MHz 1402MHz 999MHz 533MHz 2362MHz 2112MHz 1805MHz 1421MHz 903MHz DDR OPPs 1860MHz 1244MHz 1067MHz 685MHz 400MHz GPU OPPs 960MHz 807MHz 533MHz 400MHz 178MHz 1037MHz MCU CPUFreq & Devfreq Thermal Framework CPU cooling device Dev cooling device CPUIdle Low power states MCU detects 85’C as alarming point and limits the CPU/GPU/DDR OPPs as last resort GPU mali driver and VPU driver have been enabled (VPU driver is WIP for AOSP repo)
  • 5. ENGINEERS AND DEVICES WORKING TOGETHER Power modeling on Hikey960 Two highlights on modeling: - CA73 high frequency has much worse power efficiency; - CA73 low frequency and CA53 high frequency have overlap for power efficiency.
  • 6. ENGINEERS AND DEVICES WORKING TOGETHER Overview ● Current status of EAS r1.3 on Hikey960 ● Testing methodology ○ Automatic testing framework ○ Testing suites ○ Temperature is critical for fair game! ○ Measurement metrics ● Optimization with CGroup ● Observed issues and proposed patches ● Conclusion
  • 7. ENGINEERS AND DEVICES WORKING TOGETHER Automatic testing framework Workload Automation USB-Relay [1] Fan Hikey960 Charts 1. Burn images (mainly for kernel) 2. Cool down device with fan 3. Run test cases 4. Output results [1] https://github.com/daniel-thompson/usb-relay ARM Energy Probe
  • 8. ENGINEERS AND DEVICES WORKING TOGETHER Workload automation ● “Workload Automation (WA) is a framework for executing workloads and collecting measurements on Android and Linux devices.” [1] ○ WA captures energy instrument data ○ WA captures performance metrics ○ WA captures ftrace log ○ WA captures kernel statistics (interrupt numbers during test case running) ● WA can run test cases for power scenarios/interactive performance/intensive performance testing for multiple iterations in single one testing configuration ● WA absents some features but we can easily extend it [2] ○ Support multiple kernel burning ○ Support kernel scheduler statistics ○ Support result comparison with different configurations ● Co-works with LISA for detailed scheduler analysis [1] https://github.com/ARM-software/workload-automation [2] https://github.com/Leo-Yan/workload-automation
  • 9. ENGINEERS AND DEVICES WORKING TOGETHER Android Home screen: on WIFI: off Many peripherals and devices are powered on, so baseline power is high (~4w) Configuration for baseline testing Workload agenda - Fix DDR@685MHz and GPU@960MHz; - Disable thermal capping by set high sustainable value (but MCU capping still works); - use “sched-freq” governor and up threshold is 500us and down threshold is 50ms; - use PELT signals due it’s more reliable for energy modeling
  • 10. ENGINEERS AND DEVICES WORKING TOGETHER Temperature is critical for a fair game! Ambient temperature: 26.5’C ~ 27.5’C SoC start temperature: 43.5’C; takes 2~5 minutes to reach start temperature The fan is used to accelerate cooling down time but it is stopped after cool down to specific temperature (e.g. 43.5'C). So we can avoid the fan to interfere the testing and give more penalty for power consumption when SoC temperature increases and has more close behavior with the real production.
  • 11. ENGINEERS AND DEVICES WORKING TOGETHER Test suites ● Power saving cases ○ “idle” - sleep for 120s ○ “audio” - playback mp3 for 120s ○ “video” - playback video (1080p) for 120s with hardware codec ● Interactive cases ○ UiBench ■ “InflatingListActivity” - Scroll the screen for list widget ■ “InvalidateActivity” - Reverse color for small grids, invalidate and redraw view ■ “ActivityTransition” - Click screen to zoom in and zoom out picture ■ “TrivialAnimationActivity” - Animation for rendering whole screen color ○ “Galleryfling” - Launch gallery with hundreds pictures and fling pictures ○ “Recentfling” - Open recents screen and fling recent tasks list ○ “Emailfling” - Open email app and fling emails list ○ “Browserfling” - Open browser icecat and fling the web page
  • 12. ENGINEERS AND DEVICES WORKING TOGETHER Measurement metrics # Test cases Scheduler Statistics Power Metrics Performance Metrics 1 idle Wake up path statistics mW = Energy (J) / Time (S) n/a 2 audio Wake up path statistics mW n/a 3 video Wake up path statistics mW n/a 4 galleryfling Wake up path statistics mW janks% = janks / total_frame 5 recentfling Wake up path statistics mW janks% 6 emailfling Wake up path statistics mW janks% 7 UiBench InflatingListActivity Wake up path statistics mW janks% 8 InvalidateActivity Wake up path statistics mW janks% 9 ActivityTransition Wake up path statistics mW janks% 10 TrivialAnimationActivity Wake up path statistics mW janks%
  • 13. ENGINEERS AND DEVICES WORKING TOGETHER Overview ● Current status of EAS r1.3 on Hikey960 ● Testing methodology ● Optimization with CGroup ○ Using CPUSET for background tasks ○ How to set schedtune’s ‘prefer_idle’? ○ Select boost margin for interactive cases ● Observed issues and proposed patches ● Conclusion
  • 14. ENGINEERS AND DEVICES WORKING TOGETHER Control groups (CGroups) in Android system ● “Control Groups provide a mechanism for aggregating/partitioning sets of tasks, and all their future children, into hierarchical groups with specialized behaviour… mainly for resource-tracking purposes”: https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt ● A cgroup associates a set of tasks with a set of parameters ○ CPUCtl: restrict CPU time and keeps background tasks to only small portion bandwidth ■ cpu.shares ■ cpu.rt_runtime_us ■ cpu.rt_period_us ○ CPUSet: pin tasks to specified CPUs ■ cpus: CPU affinity for tasks ○ Schedtune: speedup the time-to-completion for a task activation ■ boost: inflate task and CPU utilization and impact OPP selection ■ prefer_idle: select idle CPUs for task to avoid scheduling latency ● Init scripts can statically set cgroups parameters ● Android PowerHAL can dynamically set cgroups parameters for different scenarios ● PowerHAL is disabled in this slides and use WA script to statically set parameters
  • 15. ENGINEERS AND DEVICES WORKING TOGETHER Using CPUSET for background tasks affinity runtime_params: sysfile_values: /dev/cpuset/foreground/cpus: 0-7 /dev/cpuset/top-app/cpus: 0-7 /dev/cpuset/background/cpus: 0-3 /dev/cpuset/system-background/cpus: 0-3 /dev/stune/schedtune.boost: 0 /dev/stune/schedtune.prefer_idle: 0 /dev/stune/background/schedtune.boost: 0 /dev/stune/background/schedtune.prefer_idle: 0 /dev/stune/foreground/schedtune.boost: 0 /dev/stune/foreground/schedtune.prefer_idle: 0 /dev/stune/top-app/schedtune.boost: 0 /dev/stune/top-app/schedtune.prefer_idle: 0 CPUSET knobs Schedtune knobs Using CPUSET to set cpus to ‘0-3’ so can pin background tasks always run on LITTLE CPUs.
  • 16. ENGINEERS AND DEVICES WORKING TOGETHER Task placement optimization with setting CPUSET 0~3 for background tasks Compared baseline kernel with CPUSET 0-7, we can optimization task migration onto LITTLE CPUs by pinning background tasks with CPUSET 0-3.
  • 17. ENGINEERS AND DEVICES WORKING TOGETHER Understanding ‘prefer_idle’ ● ‘prefer_idle’ is indicator to ask scheduler to select idle CPU for ‘prefer_idle’ tasks as possible; it’s mainly used for multi-threading concurrency performance boosting ● Select idle CPUs with specific orders and impacts power heavily ○ When boost > 0, the big cluster is firstly iterated so ‘prefer_idle’ tasks have much more chance to be migrated on big CPUs ○ When boost = 0, tries to find IDLE CPU from LITTLE cluster firstly; ‘prefer_idle’ tasks can be well spreaded within LITTLE cluster, this results in power saving for middle workload scenario (e.g. video playback) ○ Iteration starts from low ID for CPU, so low ID CPU has higher priority ● Using CPUSET + ‘prefer_idle’ can guarantee latency for important tasks (top-app)
  • 18. ENGINEERS AND DEVICES WORKING TOGETHER Caveats for ‘prefer_idle’ Assumption: ‘prefer_idle’ is considered to spread tasks and it doesn’t take account the energy when spread tasks to big CPUs; ‘prefer_idle’ enabling may introduce higher power by migration tasks to big CPUs. So we usually avoid to enable ‘prefer_idle’ for power sensitive scenarios. In fact ‘prefer_idle’ co-works with find_best_target() underlying. find_best_target() tries to pack small tasks onto single CPU, and ‘prefer_idle’ can ease the packing when ‘prefer_idle’ can spread within LITTLE cluster and reduce CPU frequency; we can save power by ‘prefer_idle’ for some middle workload case. /dev/stune/foreground/schedtune.prefer_idle: 0 /dev/stune/top-app/schedtune.prefer_idle: 1 /dev/stune/foreground/schedtune.prefer_idle: 1 /dev/stune/top-app/schedtune.prefer_idle: 1 average_freq = 699MHz average_freq = 655MHz
  • 19. ENGINEERS AND DEVICES WORKING TOGETHER Caveats for ‘prefer_idle’ - cont. Caveat #2: Don’t use prefer_idle too much; otherwise it has more chance to contend IDLE CPUs with ‘prefer_idle’ tasks, and prefer_idle tasks also contend with non prefer-idle small tasks or even RT threads; as result there have more chance to migrate ‘prefer_idle’ tasks onto big CPUs, so at the end cannot save power.
  • 20. ENGINEERS AND DEVICES WORKING TOGETHER Boost margin comparison for interactive cases “janks%” is performance metric and is lower is better; with boost margin = 15% or 20%, the janks% can be optimized to less than 10% for most cases; after setting boost = 25% or 30% the janks% is regressed by thermal throttling. Choosing 15% boost margin in this slides for interactive cases.
  • 21. ENGINEERS AND DEVICES WORKING TOGETHER CPUSET and schedtune settings Used for power saving scenarios runtime_params: sysfile_values: /dev/cpuset/foreground/cpus: 0-7 /dev/cpuset/top-app/cpus: 0-7 /dev/cpuset/background/cpus: 0-3 /dev/cpuset/system-background/cpus: 0-3 /dev/stune/schedtune.boost: 0 /dev/stune/schedtune.prefer_idle: 0 /dev/stune/background/schedtune.boost: 0 /dev/stune/background/schedtune.prefer_idle: 0 /dev/stune/foreground/schedtune.boost: 0 /dev/stune/foreground/schedtune.prefer_idle: 1 /dev/stune/top-app/schedtune.boost: 0 /dev/stune/top-app/schedtune.prefer_idle: 1 Used for interactive scenarios runtime_params: sysfile_values: /dev/cpuset/foreground/cpus: 0-7 /dev/cpuset/top-app/cpus: 0-7 /dev/cpuset/background/cpus: 0-3 /dev/cpuset/system-background/cpus: 0-3 /dev/stune/schedtune.boost: 0 /dev/stune/schedtune.prefer_idle: 0 /dev/stune/background/schedtune.boost: 0 /dev/stune/background/schedtune.prefer_idle: 0 /dev/stune/foreground/schedtune.boost: 15 /dev/stune/foreground/schedtune.prefer_idle: 1 /dev/stune/top-app/schedtune.boost: 15 /dev/stune/top-app/schedtune.prefer_idle: 1 Power hints can dynamically switch the settings between different scenarios.
  • 22. ENGINEERS AND DEVICES WORKING TOGETHER Power comparison (boost%=0) Kernel1_baseline: /dev/cpuset/background/cpus: 0-7 /dev/cpuset/system-background/cpus: 0-7 /dev/stune/foreground/schedtune.boost: 0 /dev/stune/foreground/schedtune.prefer_idle: 0 /dev/stune/top-app/schedtune.boost: 0 /dev/stune/top-app/schedtune.prefer_idle: 0 Kernel1_baseline_bst00: /dev/cpuset/background/cpus: 0-3 /dev/cpuset/system-background/cpus: 0-3 /dev/stune/foreground/schedtune.boost: 0 /dev/stune/foreground/schedtune.prefer_idle: 1 /dev/stune/top-app/schedtune.boost: 0 /dev/stune/top-app/schedtune.prefer_idle: 1
  • 23. ENGINEERS AND DEVICES WORKING TOGETHER Power and performance comparison (boost%=15) Kernel1_baseline: /dev/cpuset/background/cpus: 0-7 /dev/cpuset/system-background/cpus: 0-7 /dev/stune/foreground/schedtune.boost: 0 /dev/stune/foreground/schedtune.prefer_idle: 0 /dev/stune/top-app/schedtune.boost: 0 /dev/stune/top-app/schedtune.prefer_idle: 0 Kernel1_baseline_bst15: /dev/cpuset/background/cpus: 0-3 /dev/cpuset/system-background/cpus: 0-3 /dev/stune/foreground/schedtune.boost: 15 /dev/stune/foreground/schedtune.prefer_idle: 1 /dev/stune/top-app/schedtune.boost: 15 /dev/stune/top-app/schedtune.prefer_idle: 1
  • 24. ENGINEERS AND DEVICES WORKING TOGETHER Overview ● Current status of EAS r1.3 on Hikey960 ● Testing methodology ● Optimization with CGroup ● Observed issues and proposed patches ○ Optimization methodology for EAS core algorithms ○ Optimization with active CPU and IDLE CPU frequency ○ Comparison energy with cpumask (in hacking session) ● Conclusion
  • 25. ENGINEERS AND DEVICES WORKING TOGETHER Methodology for optimization in this chapter The first step: We can start to analyze the CPU frequency selection for some power sensitive cases (e.g. video/audio playback etc); if there have some scenarios with CPU frequency is overrated, it’s good start point for analysis. The second step: Based on the first step optimization, we can explore optimization for task placement crossing different clusters; this mean we don’t merely optimize small tasks placed on LITTLE CPUs, but also need find better occasions to place big workload tasks to big CPUs.
  • 26. ENGINEERS AND DEVICES WORKING TOGETHER Video playback (1080p) frequency analysis The big cluster can stay at lowest OPP due video playback has no big workload tasks, and the LITTLE cluster frequency often oscillates between 533MHz and 990MHz; the CPU utilization is about ~100 so in theory LITTLE CPU should stay at 533MHz.
  • 27. ENGINEERS AND DEVICES WORKING TOGETHER RT threads util is high in video playback case find_best_target() uses CFS class ‘util’ (red signal) to select candidate CPU, the CPUFreq governor uses all scheduling classes ‘util’ (blue signal); due the RT class ‘util’ (green signal) is big value but it’s missed by find_best_target() results in find_best_target() cannot select CPU with lowest OPP.
  • 28. ENGINEERS AND DEVICES WORKING TOGETHER Optimization for CPU selection with lower frequency prev_cpu target_cpu backup_cpu Find selection winner vs winner prev_cpu vs winner EAS r1.3 Proposed optimization Best active CPU Best IDLE CPU Consistent CPU utilization estimation crosses find_best_target(), energy calculation and CPUFreq governor. Change “target CPU” as CPU least util CPU (and includes IDLE CPU).
  • 29. ENGINEERS AND DEVICES WORKING TOGETHER Function calc_total_util() introduced for total utilization estimation static unsigned long calc_total_util(int cpu, struct task_struct *p, unsigned long cpu_util, unsigned long tsk_util) { unsigned long cfs_util, rt_util, dl_util, total; int boost; int tsk_boost = p ? schedtune_task_boost(p) : 0; int cpu_boost = schedtune_cpu_boost(cpu); /* * Estimate cfs utilization with boost margin */ boost = max(tsk_boost, cpu_boost); cfs_util = cpu_util + tsk_util; cfs_util += schedtune_margin(cfs_util, boost); /* Convert rt class utilization with CPU capacity */ rt_util = get_rt_cpu_capacity(cpu) * capacity_orig_of(cpu); rt_util = rt_util / SCHED_CAPACITY_SCALE; /* Convert dl class utilization with CPU capacity */ dl_util = get_dl_cpu_capacity(cpu) * capacity_orig_of(cpu); dl_util = dl_util / SCHED_CAPACITY_SCALE; total = cfs_util + rt_util + dl_util; return total; } Total util = CFS util += CFS boost margin += RT util += DL util
  • 30. ENGINEERS AND DEVICES WORKING TOGETHER Optimization target CPU with lower OPP if (idle_cpu(i)) { [...] break; } /* Favor CPUs with maximum spare capacity */ if ((capacity_orig - new_util) < target_max_spare_cap) continue; target_max_spare_cap = capacity_orig - new_util; target_cpu = i; Find one target CPU with lowest util (includes IDLE CPUs) prev_cpu target_cpu backup_cpu Find selection winner vs winner prev_cpu vs winner EAS r1.3 Best active CPU Best IDLE CPU Based on Joonwoo Park’s patch: “sched/fair: take CPU energy cost into consideration for placement”, this patch is in reviewing and not merged into AOSP.
  • 31. ENGINEERS AND DEVICES WORKING TOGETHER Optimized frequency for video playback (1080p) EAS r1.3 With optimization average_freq = 655MHz average_freq = 609MHz
  • 32. ENGINEERS AND DEVICES WORKING TOGETHER Power comparison (boost%=0) /dev/cpuset/background/cpus: 0-3 /dev/cpuset/system-background/cpus: 0-3 /dev/stune/foreground/schedtune.boost: 0 /dev/stune/foreground/schedtune.prefer_idle: 1 /dev/stune/top-app/schedtune.boost: 0 /dev/stune/top-app/schedtune.prefer_idle: 1
  • 33. ENGINEERS AND DEVICES WORKING TOGETHER Power and performance comparison (boost%=15) /dev/cpuset/background/cpus: 0-3 /dev/cpuset/system-background/cpus: 0-3 /dev/stune/foreground/schedtune.boost: 15 /dev/stune/foreground/schedtune.prefer_idle: 1 /dev/stune/top-app/schedtune.boost: 15 /dev/stune/top-app/schedtune.prefer_idle: 1
  • 34. ENGINEERS AND DEVICES WORKING TOGETHER Problems for heavy workload (will discuss in hacking session) The little cluster stays at 1709MHz and 1844MHz for long time, but these two OPPs have worse power efficiency than big cluster low OPP 903MHz. And if the big task is placed on LITTLE CPU and trigger “overutilized” flag, this finally harms performance and power.
  • 35. ENGINEERS AND DEVICES WORKING TOGETHER Overview ● Current status of EAS r1.3 on Hikey960 ● Testing methodology ● Optimization with CGroup ● Observed issues and proposed patches ● Conclusion
  • 36. ENGINEERS AND DEVICES WORKING TOGETHER Conclusion ● EAS on Hikey960 is working well for EAS development; suggest to use UEFI+ARM-TF firmwares and latest MCU firmware ● CPUSET/SchedTune is the recommended interface for simple EAS tuning by device manufacturers for quick optimization ● Investigation of EAS on Hikey960 has shown some good ideas for further improvements and some of these are being proposed on Google code review for a future AOSP common kernel
  • 37. ENGINEERS AND DEVICES WORKING TOGETHER Conclusion - cont. ● Proposed optimization ○ Firstly we need consistent method for CPU utilization estimation which includes utilization contributed by RT and DL tasks ○ The power optimization can be achieved by optimization the active CPU frequency; due it has side effects for spreading tasks to more spare capacity CPU so also can benefit some interactive cases (Optimize frequency for the IDLE CPU?) ○ Using cpumask to give more chance for energy calculation, it can significantly save power for interactive cases and sustainable cases (Will discuss in hacking session)
  • 38. ENGINEERS AND DEVICES WORKING TOGETHER Related materials ● Workload-automation scripts ○ Add support for Hikey960 board ○ Add support for flashing kernel/dtb images ○ Add support for fan as cooling device ○ Add workload Uibench test case ○ Add workload browserfling galleryfling emailfling ○ Result comparison and plot results for different testing configurations ● Kernel patches on gerrit but need to rebase on EAS r1.4 ○ https://android-review.googlesource.com/#/c/kernel/common/+/453616/
  • 39. Thank You #SFO17 SFO17 keynotes and videos on: connect.linaro.org For further information: www.linaro.org