Building Linux-based low-latency audio processing software for nowadays multi-core devices can be cumbersome. I’ll present some of our on-going research on the topic at the Real-Time Systems Lab of Scuola Superiore Sant’Anna, focusing on sound synthesis on Android where power-efficiency is a must.
The talk will provide basic background information on how the audio sub-system of Linux works, in terms of interactions between the Linux kernel and the ALSA sound architecture, including how user-space applications normally cope with low-latency requirements, touching briefly on design concepts behind the existence of the JACK low-latency framework. Then, a few concepts will be provided on the peculiarities of the Android audio processing pipeline, crossing the concepts with the due complications arising from the world of mobile and power-efficient devices. Throughout the talk, I’ll touch upon concepts behind our research efforts on the topic, describing how properly designed real-time CPU scheduling strategies can make a difference in what is achievable in this area.
2. About me
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 2 / 22
■ 2016-present: Associate Professor at the Real-Time Systems
Laboratory (RETIS) of Scuola Superiore Sant’Anna: teaching
Component-Based Software Design, Cloud Computing, Big-Data, . . .
■ 2014-2016: Software Development Engineer at AWS, improving
the real-time performance and scalability of DynamoDB
■ 2012-2014: Researcher at Alcatel-Lucent Bell Labs, investigating
on security and real-time performance of cloud infrastructures with
focus on IMS and NFV
■ 2005-2012: Researcher at the RETIS, investigating on adaptive
real-time scheduling for multimedia applications on Linux
■ 2001-2004: PhD in Computer Security & Smart-Card Based
Authentication, RETIS
3. About the RETIS
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 3 / 22
■ Belongs to the Institute of Communications, Information and Perception
Technologies of Scuola Superiore Sant’Anna in Pisa
■ Research specialties
◆ predictable execution of software through
■ mechanisms at operating system and kernel level
■ design methodologies and tools
■ performance and timing analysis
◆ provide real-time support for emerging computing platforms
■ multi-core and heterogeneous platforms (big.LITTLE, GPGPU, FPGA)
■ distributed infrastructures for cloud & big-data computing and NFV
◆ make real-time systems resource- and energy- efficient
◆ hard real-time use-cases: automotive, industrial automation, railroads
◆ soft real-time use-cases: multimedia, health-care, telecommunications
4. Introduction
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 4 / 22
Common multimedia processing case: audio playback and
video streaming
■ Works without particular precautions
■ No interactivity nor low-latency requirements
■ 100s of ms, or even seconds of data can be pre-buffered
and pre-processed
■ run-time platform (user-space + kernel) needs only
ensure presenting pre-processed A/V frames timely to
the underlying hardware
5. Introduction
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 4 / 22
Common multimedia processing case: audio playback and
video streaming
■ Works without particular precautions
■ No interactivity nor low-latency requirements
■ 100s of ms, or even seconds of data can be pre-buffered
and pre-processed
■ run-time platform (user-space + kernel) needs only
ensure presenting pre-processed A/V frames timely to
the underlying hardware
What about interactivity ?
6. Problem
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 5 / 22
Interactive multimedia processing
■ low-latency requirement from when a user interaction
happens, to when it is reflected in the output A/V stream
7. Problem
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 5 / 22
Interactive multimedia processing
■ low-latency requirement from when a user interaction
happens, to when it is reflected in the output A/V stream
Examples
■ video editing: change filter(s) and/or parameters in a
real-time video processing pipeline
■ on-line interactive services: eg, office automation,
etc.
■ gaming, VR, AR: user interacts with environment
and/or other users (eg, multi-player shooting)
■ software-based sound synthesis: user presses one or
more instrument keys / controllers
8. Problem
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 6 / 22
Interactive multimedia processing: how can we achieve low
latency ?
■ Digital Audio Workstation (DAW)
◆ DSPs do the real-time work
◆ the general-purpose OS and software just takes care
of configuring its pipeline and parameters
9. Problem
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 6 / 22
Interactive multimedia processing: how can we achieve low
latency ?
■ Digital Audio Workstation (DAW)
◆ DSPs do the real-time work
◆ the general-purpose OS and software just takes care
of configuring its pipeline and parameters
■ EXPENSIVE ! → Software-based solutions
10. Problem
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 6 / 22
Interactive multimedia processing: how can we achieve low
latency ?
■ Digital Audio Workstation (DAW)
◆ DSPs do the real-time work
◆ the general-purpose OS and software just takes care
of configuring its pipeline and parameters
■ EXPENSIVE ! → Software-based solutions
■ “1-system 1-function” paradigm
◆ device dedicated to a single application
◆ nothing else runs with real-time requirements
◆ we can use priorities to minimize interferences
11. Real-time audio processing
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 7 / 22
Commonly found guidelines for low-latency, skip-free
interactive audio processing
eg, from http://jackaudio.org/faq/linux_rt_config.html
■ create group of users who can gain RT priority
groupadd audio
cat /etc/security/limits.d/99-realtime.conf
audio - rtprio 99
audio - memlock unlimited
■ add unprivileged user to the new group
usermod -a -G audio yourUserID
■ install a “real-time / low-latency” kernel
12. Real-time audio processing
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 7 / 22
Commonly found guidelines for low-latency, skip-free
interactive audio processing
eg, from http://jackaudio.org/faq/linux_rt_config.html
■ create group of users who can gain RT priority
groupadd audio
cat /etc/security/limits.d/99-realtime.conf
audio - rtprio 99
audio - memlock unlimited
■ add unprivileged user to the new group
usermod -a -G audio yourUserID
■ install a “real-time / low-latency” kernel
So, problem solved ?
13. What about energy?
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 8 / 22
Plenty of energy saving features in the hardware
■ Dynamic Voltage and Frequency Scaling (DVFS)
■ Performance states (P-states),
Operating Performance Points (OPP)
■ Core idle states (C-states)
■ Turbo Boosting (hmmm....): spike-up CPU frequency
when/if possible
14. What about energy?
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 8 / 22
Plenty of energy saving features in the hardware
■ Dynamic Voltage and Frequency Scaling (DVFS)
■ Performance states (P-states),
Operating Performance Points (OPP)
■ Core idle states (C-states)
■ Turbo Boosting (hmmm....): spike-up CPU frequency
when/if possible
Useful in a number of cases (both battery-operated and not)
■ laptops, tablets, smartphones
■ desktop PCs, servers
15. What about energy?
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 8 / 22
Plenty of energy saving features in the hardware
■ Dynamic Voltage and Frequency Scaling (DVFS)
■ Performance states (P-states),
Operating Performance Points (OPP)
■ Core idle states (C-states)
■ Turbo Boosting (hmmm....): spike-up CPU frequency
when/if possible
Useful in a number of cases (both battery-operated and not)
■ laptops, tablets, smartphones
■ desktop PCs, servers
All bad for performance stability!
16. Platform stability
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 9 / 22
Energy saving features in the hardware adverseley impact
performance stability and software predictability
■ DVFS → CPUs run at different frequencies over time
◆ frequency islands: groups of CPUs are constrained to
the same frequency
■ P-states → even less control on what frequency CPU(s)
are running at
◆ frequency control in hardware, high-level tunable
exposed to software (minPct, maxPct)
■ C-states → time to enter and exit idle state is variable
◆ going to a deep-idle state is worth only if staying
there for a minimum residency time
18. Making the platform stable
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 11 / 22
How users typically make the computing platform (more)
stable/predictable
■ turn-off Turbo Boosting
■ disable DVFS (leverage it to fix frequency), eg:
◆ performance governor or
◆ userspace governor if/when available
■ fix performance % with P-state driver (minPct=maxPct)
■ inhibit deep-idle states
◆ echo 1 > /sys/devices/system/cpu/cpu<n>/cpuidle/state<s>/disable
◆ echo 1 > /sys/devices/system/cpu/cpu0/power/pm qos resume latency us
19. Making the platform stable
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 11 / 22
How users typically make the computing platform (more)
stable/predictable
■ turn-off Turbo Boosting
■ disable DVFS (leverage it to fix frequency), eg:
◆ performance governor or
◆ userspace governor if/when available
■ fix performance % with P-state driver (minPct=maxPct)
■ inhibit deep-idle states
◆ echo 1 > /sys/devices/system/cpu/cpu<n>/cpuidle/state<s>/disable
◆ echo 1 > /sys/devices/system/cpu/cpu0/power/pm qos resume latency us
■ or, just run:
◆ yes > /dev/null & [times # of CPUs]
20. Making the platform stable
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 11 / 22
How users typically make the computing platform (more)
stable/predictable
■ turn-off Turbo Boosting
■ disable DVFS (leverage it to fix frequency), eg:
◆ performance governor or
◆ userspace governor if/when available
■ fix performance % with P-state driver (minPct=maxPct)
■ inhibit deep-idle states
◆ echo 1 > /sys/devices/system/cpu/cpu<n>/cpuidle/state<s>/disable
◆ echo 1 > /sys/devices/system/cpu/cpu0/power/pm qos resume latency us
■ or, just run:
◆ yes > /dev/null & [times # of CPUs]
■ Bad for energy consumption!
21. Why audio skips
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 12 / 22
∎ audio burst in playback (top)
∎ fill-level of audio ring buffer (middle)
∎ RT app thread (bottom)
∎ big ring buffer → high latency!
∎ empty ring buffer → audible glitch!
∎ small ring buffer periodically refilled
→ low latency, glitch-free playback!
22. Android audio architecture
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 13 / 22
Android audio applications
∎ android.media APIs
◇ playing/recording
audio files, Internet
streaming
◇ use of large buffers (no
low-latency use-cases)
◇ regular mixer thread
Low-latency audio applications
∎ native APIs
(OpenSL ES, AAudio)
◇ low-latency audio pro-
cessing
◇ rely on FastMixer and
ALSA
∎ critically low-latency
◇ exclusive mode in
AAudio / ALSA
23. Android audio architecture
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 14 / 22
Power management in Android
■ schedutil selects the minimum operating performance point (OPP)
able to satisfy demand
■ based on CPU utilization statistics
◆ Per-Entity Load-Tracking (PELT)
■ exponentially weighted task utilization
■ slow to detect workload changes (ramp-up, cool-down)
eg, it may take 50–100 ms to detect a 90% increase of CPU % demand
◆ Window-Assisted Load-Tracking (WALT)
■ max{last window util., avg util. over past N windows}
eg, over 3 past 10 ms windows, we have a 10 ms spike detection latency, and a 30 ms
cool-down one
■ it forgets quickly a task demand when the task is off the rq
■ WALT more reactive than PELT, but ...
24. Android audio architecture
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 14 / 22
Power management in Android
■ schedutil selects the minimum operating performance point (OPP)
able to satisfy demand
■ based on CPU utilization statistics
◆ Per-Entity Load-Tracking (PELT)
■ exponentially weighted task utilization
■ slow to detect workload changes (ramp-up, cool-down)
eg, it may take 50–100 ms to detect a 90% increase of CPU % demand
◆ Window-Assisted Load-Tracking (WALT)
■ max{last window util., avg util. over past N windows}
eg, over 3 past 10 ms windows, we have a 10 ms spike detection latency, and a 30 ms
cool-down one
■ it forgets quickly a task demand when the task is off the rq
■ WALT more reactive than PELT, but ... not enough for very
dynamic workloads
■ can we improve on that?
25. SCHED DEADLINE
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 15 / 22
SCHED DEADLINE from RETIS+Evidence
(ACTORS EU project)
■ mainline since v3.14 (2013)
■ reservation-based scheduling
■ a task is reserved a given runtime within a deadline
every period
struct sched attr attr = {
.size = sizeof(struct sched attr),
.sched policy = SCHED DEADLINE,
.sched flags = 0, // RECLAIM | RESET ON FORK
.sched runtime = runtime us * 1000,
.sched deadline = deadline us * 1000,
.sched period = period us * 1000
};
if (sched setattr(0, &attr, 0) < 0) {
perror("setattr() failed");
exit(-1);
}
26. SCHED DEADLINE
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 16 / 22
How is SCHED DEADLINE w.r.t. POSIX RT?
■ any SCHED DEADLINE task runs before any
POSIX RT or CFS task
◆ based on resource reservations (next slide)
◆ throttling safeguard to avoid locking the CPU
(can be disabled if needed)
■ any POSIX RT (FIFO/RR) task runs before any
CFS task
◆ based on priorities
◆ throttling safeguard to avoid locking the CPU
(can be disabled if needed)
■ Completely Fair Scheduler (CFS) tasks run when
no SCHED DEADLINE nor RT tasks can
◆ based on weights (weighted fair scheduler)
27. SCHED DEADLINE
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 17 / 22
Main SCHED DEADLINE properties
■ based on EDF (optimum on uni-processors) and (Hard)
Costant Bandwidth Server (CBS)
■ temporal isolation: a task inability to respect its
runtime doesn’t affect others
■ on multi-processors: anything from G-EDF (tardiness
bound) to P-EDF
When trying to exceed the runtime
■ task gets throttled (original)
■ opportunistically get extra runtime (GRUB), if
RECLAIM used
28. SCHED DEADLINE and schedutil
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 18 / 22
schedutil decided OPP depends on overall system
utilization, in which we have:
■ SCHED DEADLINE tasks’ utilization: runtime
period
dynamic workload demand changes via sched setattr():
■ readily accounted for, by schedutil
29. SCHED DEADLINE and schedutil
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 18 / 22
schedutil decided OPP depends on overall system
utilization, in which we have:
■ SCHED DEADLINE tasks’ utilization: runtime
period
dynamic workload demand changes via sched setattr():
■ readily accounted for, by schedutil
Does it work?
30. SCHED DEADLINE and schedutil
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 18 / 22
schedutil decided OPP depends on overall system
utilization, in which we have:
■ SCHED DEADLINE tasks’ utilization: runtime
period
dynamic workload demand changes via sched setattr():
■ readily accounted for, by schedutil
Does it work? Results on a HiKey 960 board:
■ energy-efficient set-up: glitch-free playback at 2.67ms latency, vs
26.67ms of mainline Android using SCHED FIFO and WALT, at the
cost of +6.25% power consumption
■ low-latency set-up: at 2.67ms latency, saved 40% energy wrt
mainline Android using SCHED FIFO and WALT
31. Heterogeneous Architectures
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 19 / 22
ARM big.LITTLE (and DynamIQ) architectures
■ tasks can migrate among big and LITTLE cores (same
ISA)
■ big cores: high-performance workloads
■ LITTLE cores: energy-efficient workloads
ARM Energy Aware Scheduling (EAS)
■ give kernel awareness of the CPU capacity associated
with big and LITTLE cores
■ give kernel clues as to how capacity of big and LITTLE
cores scales with CPU frequency
■ provide CFS with more informed task placement and
migration decisions
32. Capacity enhancement patches
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 20 / 22
SCHED DEADLINE improvements to account for CPU
capacity
■ runtime is specified in terms of the fastest CPU at the
fastest frequency
◆ it gets automatically rescaled using the CPU capacity
figures
■ if there’s a choice, prefer LITTLE cores before going to
big ones
■ proper consideration of CPU capacity in schedutil
33. Related publications
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 21 / 22
∎ A. Balsini, Towards Hard and Soft Real-time Operating Systems for Multicore Heterogeneous
Architectures, PhD dissertation, 2018
∎ A. Balsini et al., Modeling and simulation of power consumption and execution times for
real-time tasks on embedded heterogeneous architectures, EWILI 2018
∎ T. Cucinotta et al., Improving Responsiveness of Time-Sensitive Applications by Exploiting
Dynamic Task Dependencies, Wiley SPE 2018
∎ C. Scordino et al., Energy-aware real-time scheduling in the linux kernel, ACM SAC 2018
∎ D. B. de Oliveira et al., Nested Locks in the Lock Implementation: The Real-Time
Read-Write Semaphores on Linux, RTSOPS 2018
∎ M. Marinoni et al., Allocation and control of computing resources for real-time Virtual
Network Functions, SOFTNETWORKING 2018
∎ T. Cucinotta et al., Adaptive Real-Time Scheduling for Legacy Multimedia Applications, ACM
TECS 2012
∎ J. Lelli et al., An Experimental Comparison of Different Real-Time Schedulers on Multicore
Systems, Elsevier JSS 2012
∎ T. Cucinotta et al., Virtualised e-Learning on the IRMOS Real-time Cloud, Springer SOCA’12
∎ T. Cucinotta et al., A robust mechanism for adaptive scheduling of multimedia applications,
ACM TECS 2011
∎ T. Cucinotta et al., Low-Latency Audio on Linux by Means of Real-Time Scheduling, LAC’11
∎ T. Cucinotta et al., Virtualised e-Learning with Real-Time Guarantees on the IRMOS
Platform, IEEE SOCA 2010
∎ T. Cucinotta and L. Palopoli, QoS Control for Pipelines of Tasks Using Multiple Resources,
IEEE TOC 2010
∎ L. Palopoli et al, AQuoSA - Adaptive Quality of Service Architecture, Wiley SPE 2008
∎ L. Abeni et al, QoS Management through adaptive reservations, Springer RTSJ 2005
34. Q&A
LinuxLab 2018 T. Cucinotta – Real-Time Systems Laboratory (RETIS) – 22 / 22
Thanks for listening!
Questions ?
http://retis.santannapisa.it/˜tommaso
tommaso.cucinotta@santannapisa.it