With the grown interest in virtualization from big players around the world there are more and more companies choose ARM SoCs as their target platform for running server environments. It is also known that majority of such SoCs come with broad coprocessors available on the die, e.g. GPU, DSP, security etc. But at the moment the only way to speed up guests with these is either using a para-virtualized approach or making that HW dedicated to a specific guest.
Shared coprocessor framework for Xen aims to allow all guest OSes to benefit from this companion HW with ease while running unmodified software and/or firmware on guest side. You don’t need to worry about setting up IO ranges, interrupts, scheduling etc.: it is all covered, making support of new shared HW way faster.
As an example of the shared coprocessor framework usage a virtualized GPU will be shown.
3. Team of developers at EPAM Systems Inc., based in Kyiv, Ukraine
We are focused on:
• Xen on ARM
• Automotive use-cases
• Para-virtualized front drivers, backends and managers: sound, display, input
• SoC’s HW virtualization
• TEE integration
• Power management
• FuSa ISO 61508/26262 certification
• Yocto based build system for multi-domain distributions
We are upstreaming to Xen Project: see us at https://github.com/xen-troops
Introduction
4. In this talk
1. What is it about?
2. Why one would want to share a coprocessor?
3. Scheduling a virtual coprocessor
4. Configuration approaches
5. IOMMU support
6. Proprietary code and native applications
7. Virtual GPU
5. Rationale
● Not only CPU anymore, but SoC
○ GPUs, multimedia encoders, DSPs, FPGAs… you name it
○ Used to offload processing from CPUs to dedicated HW
● Good for one-OS-does-everything
● We have to isolate parts of the system
○ Split HW blocks between users (if HW allows that, e.g. display)
○ Choose which part uses real HW and which does SW emulation
○ Use para-virtual devices
7. Shared coprocessors
● Why one would bother with sharing coprocessors?
○ Performance and complexity issues with para-virtual devices
■ Memory copying
■ Complex ABI (just imagine para-virtual OpenGL)
○ HW cannot be split
○ Different guests may need to run different FW/driver
○ Multiple domains may benefit from platform’s HW capabilities
● It is always a question what needs to be shared or
para-virtualized
8. With shared coprocessors
Safety domain
In-vehicle infotainmentDomDDom0 Instrument cluster
Audio backend
Pictures from: http://www.aa1car.com/library/instrument_cluster.htm
https://www.xda-developers.com/panasonic-automotive-to-build-android-automotive-in-vehicle-infotainment-system-into-fiat-chrysler-vehicles/
GPU Media encoders
Display Audio GPU Media encoders
Driver assistance
ADAS
DSP
Display backend
Xen vGPU vMEncodervDSP vFPGA
GPU
Media encoders
FPGA
10. Shared Coprocessor Framework
• SCF will simplify sharing a coprocessor
• Leave all the burden to the framework, focus on your coproc
• Make coproc support unified
• Benefit from framework bug-fixes and work others do,
contribute
12. Scheduling or why not Xen’s vCPU scheduler
• Cannot use vCPU scheduler
• Not all HW allows context switch or it can be complex
• Guest may be inactive, but its tasks may still be processed by coproc
• Active guest may not use coproc at all, so let others utilize it
• IOMMU context switch may be needed for vcoproc
• Requirements for coproc scheduler
• Priority of a guest - mission critical tasks
• Coproc load/usage - time is not the best measure
13. Scheduling a vCoproc
● Round-robin at the first stage
● Can existing schedulers be used?
○ Null scheduler could be a match
○ Credit/Credit2 seem to need much work
○ Real-time schedulers
■ ARINC 653
■ Real-Time-Deferrable-Server (RTDS)
● Or we need to (re)implement the same for coproc?
● Do we need to be real-time? (mission critical, Audio/Video
use-cases)
14. Configuration
• Configure: MMIO ranges, interrupts, IOMMU etc.
• Need to configure both privileged and guest domains
• Privileged domains may not have configuration file, e.g. Dom0, but
DomD has
• Guest is configured with a configuration file
• Must be able to configure multiple vcoprocs per domain
• To allow coprocessor sharing within the same guest, running different
FW/Drivers, e.g. OpenGL concurrently with OpenCL for vGPU
15. Configuration
• Current implementation
• device tree bootargs to configure Dom0
• partial dtb + DomU configuration file (similar to ARM passthrough)
• partial dtb for DomU (with pdtb passed to XEN) was rejected after community
discussion
• How to pass variable structure data to Xen
• Device-tree, but no x86 support
• ACPI, but is it ARM ready yet?
• Introduce new ABI:
• Pass memory ranges, interrupts etc in a flexible way
• Have convertors for ACPI, DTB etc?
16. IOMMU support
• HW expects to see physically contiguous memory, e.g. for DMA
operations
• Guest cannot guarantee that, “bad” options are:
• 1:1 mapped guest
• If coproc has its own MMU - trap memory access and update MMU manually in SW
• Utilize IOMMU to overcome these problems with better performance:
• 1:1 is not required
• Better memory isolation - control coproc’s memory access
• Overcome 4GB limit for 32-bit DMA capable devices
• Switch handled by the framework
• No changes to existing FW/driver
• No changes to coproc Xen driver
17. Proprietary code
● There is always room for someone’s IP...
● Cannot disclose source/interface: NDA, incompatible license
● Need to move part of coproc’s code into a black box
● Options are being discussed (Volodymyr Babchuk will cover
in detail during the Summit):
○ Stubdom
○ EL0 applications
● Once decision is made it will be adopted by the framework
18. What is expected from a “native application”
• Latency is critical
• MMIO access
• IRQ handling
• System stability
• Recovery from misbehaving proprietary code
• Power and clock management
• Solution to legal problems
19. Next generation car
Picture from http://www.designhmi.com/2015/02/23/in-car-connectivity-and-iot-internet-of-things/
20. Virtual GPU
● One of the key components for automotive use-cases
○ Instrument cluster (IC)
○ Head-up display (HUD)
○ In-vehicle infotainment (IVI)
● Performance and stability are both critical:
○ Not only OpenGL/Vulkan, but OpenCL and more - different firmware at the same
time, even the same guest
○ IVI crash must not affect IC
21. vGPU status
● Proof-of-concept is limited, but working
○ Context switch via power off/on sequence of the GPU
○ IOMMU switch is done via
iommu_deassign_dt_device/iommu_assign_dt_device
○ Future work:
■ Avoid complete off/on sequences
■ Faster switch via context save
● Need proper integration with IOMMU
● Need decision on proprietary code placement
22. SCF status and open questions
• In progress
• Initial shared coprocessor framework design document is available
(needs update)
• Native application approaches are being discussed
• SCF configuration discussion started
• POC is available
• Not started
• Power - reset - clock management
• Need to control clocks and power
• What if external PMIC is used (HW interface, driver, which domain?)
23. What we are working on
Xen
Native EL0 apps / stub domains
Real time scheduling
Heterogenous big.LITTLE support
PMF (cpufreq, cpuidle, thermal, vcoprocpm)
SCF
IOMMUF & IPMMU support
SMC/HVC bridge
PV frontends
Xen apps
PM governor +SoC drivers
TEE manager +OP-TEE driver
GPU mediator +SGX driver
OP-TEE Mullti-domain support
Integration
Android HALs
Sound/Display managers
PV backends
Certification ISO 61508 path 3s
CI Build/release system
24. Resources
● https://github.com/xen-troops
○ Shared coprocessor framework
○ Para-virtual drivers and backends (generic backend library, display,
sound, multi-touch etc)
○ Multidomain Yocto-based build system (xt-distro)
● With your help we will upstream it all