How to Troubleshoot Apps for the Modern Connected Worker
LCA13: Who Disturbs My Slumber
1. More about Linaro Connect: connect.linaro.org
More about Linaro: www.linaro.org/about/
More about Linaro engineering: www.linaro.org/engineering/
Hong-Kong (LCE13)
3. 3
EUROPE 2012 (LCE12)
www.linaro.org
Introduction
The power management is a wide area
Power aware scheduler, P-States, C-States, optimizations for idling the
cpu, regulators ...
We will focus on the sources of wake up for ARM architecture
5. 5
EUROPE 2012 (LCE12)
www.linaro.org
Introduction
“Idle” opposite to “Running”, in between an event occurs.
We want a cpu to be idle as much as possible without
impacting the performance of the system
The cpuidle framework takes care of entering the idle state depending
on the predictable events on the system
The depth of the sleep will depend on the next event on the system
What is this event ? An interrupt.
6. 6
EUROPE 2012 (LCE12)
www.linaro.org
Idle vs Running
When there is no more tasks to run, the scheduler choose
the special idle task
This one is an infinite loop entering and exiting the arch
specific idle function
When an interrupt occurs, the function exits and the idle
task is yield or enters again into the idle function
9. 9
EUROPE 2012 (LCE12)
www.linaro.org
Package Idle
The deep idle state could be reached only when all cpus are
idle
One cpu exiting the idle state will lead the package to exit
the idle state also
Software idle states management is challenging on
multicore SoCs
14. 14
EUROPE 2012 (LCE12)
www.linaro.org
From where ?
Hardware interrupt
Network, keyboard, mouse, timer, MMC, USB, serial, ...
Inter Processor Interrupt : Wisely used today, they are tricky
to optimize
The architecture differs across the SoC vendor but they are
slightly similar for the IPI
15. 15
EUROPE 2012 (LCE12)
www.linaro.org
Hardware interrupt
HW interrupt can happen at arbitrary times
The system could be tweaked depending on the hardware
One deterministic interrupt is interesting : the timer
16. 16
EUROPE 2012 (LCE12)
www.linaro.org
Hardware interrupt
Interrupt Deterministic Comment
Network No The network stack switch
to polling on high traffic
Keyboard No
Mouse No
MMC No
Timer Yes Two kinds of timer
USB No
Serial No
17. 17
EUROPE 2012 (LCE12)
www.linaro.org
Timers
A complex infrastructure to handle most of the kernel and
time services for userspace programs
Two kinds of timer
Per cpu timer
Global timer
18. 18
EUROPE 2012 (LCE12)
www.linaro.org
Timers
Timer Watchdog (twd)
The timer is local to the CPU
Also known as “local timer”
It goes down when the processor logic is shutdown
Timer device (architecture dependant)
Less accurate (eg. 32KHz)
Usually always out of the CPU’s power domain
Used as backup when the CPU are sleeping
19. 19
EUROPE 2012 (LCE12)
www.linaro.org
Timers
Used by the kernel for:
Network stack, especially the TCP/IP protocol
Timed-out IO
Delayed work queues
Scheduling tasks
...
Used by the userspace for:
Asynchronous IO and timeout
Threading timed lock
Mainloop (poll, epoll, select)
Posix timers
...
In other words : widely used in the system
20. 20
EUROPE 2012 (LCE12)
www.linaro.org
IPI : Inter Processor Interrupt
A softirq
Limited to 16 on ARM with the GIC
5 used nowadays
IPI0 : defined but unused so far
IPI1 : timer broadcast
IPI2 : Rescheduling interrupt
IPI3 : Function Call interrupt
IPI4 : Single Function Call interrupt
IPI5 : Cpu Stop interrupt
21. 21
EUROPE 2012 (LCE12)
www.linaro.org
IPI1 : Timer broadcast (1/3)
Occurs only if SMP and cpuidle with at least retention mode
state
Broadcast interrupt could occur on any cpu:
idle or not
concerned by the timer expiration or not
Could be optimized with dynamic timer irq affinity
https://lkml.org/lkml/2013/2/19/555
A summary of how timer broadcast works
https://lkml.org/lkml/2013/2/20/216
22. 22
EUROPE 2012 (LCE12)
www.linaro.org
IPI1 : Timer broadcast (2/3)
[1] : the cpuidle driver tells the time
framework the local timer is no
longer a valid source
[2] : the timer framework changes
the source and program the timer
device
[3] / [3'] : the cpuidle driver power
downs the cpus
23. 23
EUROPE 2012 (LCE12)
www.linaro.org
IPI1 : Timer broadcast (3/3)
[1] : the timer expires raising an
interrupt
[2] : the cpu is woken up
[3] : the cpu handles the timer
callback and through the time
framework …
[4] … it sends an IPI to the cpu
which are concerned by the
expiration of this timer
24. 24
EUROPE 2012 (LCE12)
www.linaro.org
IPI2 : Rescheduling interrupt (1/4)
Used by the scheduler to wake up a processor in order to
run a task
Could happen for different reasons:
Load balancing / process migration
An event occurs for a specific task
IO (eg. network ingress packet)
A lock has been released for a blocked task
Signal delivery
26. 26
EUROPE 2012 (LCE12)
www.linaro.org
IPI2 : Rescheduling interrupt (3/4)
The application on CPU0 releases
the lock
The scheduler takes the decision to
wake up the CPU1...
… and send an IPI to it
28. 28
EUROPE 2012 (LCE12)
www.linaro.org
IPI3 : Function Call interrupt
Used to do a remote function invocation on all the CPUs
when the code must be run in the processor context
Setting the timer frequency
Setting up the clock event notify framework
It happens rarely on the ARM system, but often on x86
system under the term of “TLB shootdowns” which happens
at fork time
29. 29
EUROPE 2012 (LCE12)
www.linaro.org
IPI3 : Function Call interrupt
Used to do a remote function invocation on all the CPUs
when the code must be run in the processor context
Setting the timer frequency
Setting up the clock event notify framework
It happens rarely on the ARM system, but often on x86
system under the term of “TLB shootdowns” which happens
at fork time
31. 31
EUROPE 2012 (LCE12)
www.linaro.org
How to reduce the number of interrupts ?
Understand the framework and identify the unnecessary wake up
Deferrable timers
http://lwn.net/Articles/228143/
Round jiffies
http://lkml.org/lkml/2006/10/10/189
RCU no callback
http://lwn.net/Articles/522262
Timer and workqueue migration on non idle cpu
https://lkml.org/lkml/2012/9/27/188
32. 32
EUROPE 2012 (LCE12)
www.linaro.org
Deferrable timers
Used for non critical timers and change their expiration to
occur when the CPU wakes up
How to use it ?
Identify non critical timers
Flag them deferrable
They will be handled when the CPU wakes up for another reason
33. 33
EUROPE 2012 (LCE12)
www.linaro.org
Round jiffies
Used to group timers to expire at the same time slot
How to use it ?
Identify timers for non precise timeout
Group them to the same time slot by using round_jiffies
35. 35
EUROPE 2012 (LCE12)
www.linaro.org
Misc improvements
The kernel does not do everything, userspace applications
have to be improved for power management
The application could group the timers to expire at the same
time if high precision is not needed
Round-jiffies like framework ?
A specific flag for timer syscalls, in order to make them deferrable ?
prctl : PR_SET_TIMERSLACK
36. 36
EUROPE 2012 (LCE12)
www.linaro.org
Misc improvements
Kill all busyloop and replace them with an event based
mainloop
timerfd, signalfd, eventfd, etc …
Increase MTU size for private network area when possible
Use CPU affinity when multiple tasks have to run to
sequentially
Put pressure on developers to fix the applications and make
them more power aware, the kernel can not overcome bad
applications