SlideShare uma empresa Scribd logo
1 de 16
Baixar para ler offline
www.huawei.com
HUAWEI TECHNOLOGIES CO., LTD.
Make Accelerator Pluggable for
Container Engine
Author/ Email: Jiuyue Ma / majiuyue@huawei.com
Version: V1.0 (20150504)
HUAWEI TECHNOLOGIES CO., LTD. 2
Today’s Topic
Container Engine
Hardware Accelerators
e.g. GPU、QAT、FPGA
OLTP/OLAP BigData Processing
Neural Network
CloudRAN
Pluggable
Accelerator Support
for Container Engine
HUAWEI TECHNOLOGIES CO., LTD. 3
Agenda
 Use accelerator in containers and How?
 Key problems to deal with
 Identify accelerator requirements
 Prepare accelerator runtimes
 Manage accelerator resources
 Further development
HUAWEI TECHNOLOGIES CO., LTD. 4
Why Container?
HUAWEI TECHNOLOGIES CO., LTD. 5
Why Container? Why NOT Container?
* Raphael Da Silva. OPEN SOURCE TECHNOLOGY: Docker Containers on IBM Bluemix. 2015.
HUAWEI TECHNOLOGIES CO., LTD. 6
 CPU is not fast/effective/… enough
 Heterogeneous is the future
 Nvidia GPU-compute, Intel Skylake Xeon+FPGA, AWS P2/F1 Instance
 use the right processor, in the right place, at the right time
Why Accelerator?
Container
a.k.a. Docker
[baidu, HotChips 2016]
HUAWEI TECHNOLOGIES CO., LTD. 7
Google it ! “Use XXX in docker”
NVIDIA’s GPU-Docker Solution
DPDK
FPGA
GPU
HUAWEI TECHNOLOGIES CO., LTD. 8
Use GPU in Docker (from nvidia-docker)
Wrapped Command
docker run
--volume-driver=nvidia-docker
--volume=nvidia_driver_361.48:/usr/local/nvidia:ro
③ nvidia-docker cli-wrapper
$ ldconfig -p | grep -E 'nvidia|cuda'
libnvidia-ml.so (libc6,x86-64) => /usr/lib/nvidia-361/libnvidia-ml.so
libnvidia-glcore.so.361.48 (libc6,x86-64) => /usr/lib/nvidia-361/lib…
libnvidia-compiler.so.361.48 (libc6,x86-64) => /usr/lib/nvidia-361/lib…
libcuda.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcuda.so
...
$ lsmod | grep nvidia
nvidia_uvm 711531 2
nvidia_modeset 742329 0
nvidia 10058469 80 nvidia_modeset,nvidia_uvm
Key Problem:VERSION MISMATCH
$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
② VOLUME: nvidia-docker/nvidia_driver_361.48
① Labeled Image
LABEL com.nvidia.volumes.needed="nvidia_driver"
LABEL com.nvidia.cuda.version="7.5"
• detect/check cuda version
• hook docker cli args
--volume-driver / --volume nvidia
HUAWEI TECHNOLOGIES CO., LTD. 9
Do we really need “nvidia-docker”?
If so, we also need:
 dpdk-docker, qat-docker, fpga-docker, …
But,
if we need both
GPU and DPDK?
nvidia DPDK QAT FPGA
DPDK
nvidia
Pluggable
Accelerator
HUAWEI TECHNOLOGIES CO., LTD. 10
Device Files
(-d device)
/dev/nvidia0
/dev/nvidia-smi
/dev/nvidia-uvm
/dev/uio0 /dev/fpga0
File Bindings
(-v src:dest)
/usr/lib/nvidia-361/libnvidia-ml.so
/usr/lib/nvidia-361/libnvidia-glcore.so.361.48
/usr/lib/nvidia-361/libnvidia-
compiler.so.361.48
/usr/lib/x86_64-linux-gnu/libcuda.so
...
/sys/bus/pci/devices/xxxx:xx:xx.x
/dev/hugepages
/usr/lib/libfpga.so
/sys/bus/pci/devices/xxxx:xx:xx.x
/dev/hugepages
Environments
(-e env)
PATH=$PATH:/usr/lib/nvidia-361/bin
LD_LIBRARY_PATH=…
Pluggable Accelerator, How?
 What’s the difference?
nvidia DPDK FPGA
Accelerator
Plugin Interface
in-house
FPGA Accel
HUAWEI TECHNOLOGIES CO., LTD. 11
Implementation (1/3): IDENTIFY
 Accelerator Requirements
 We are using the function provided by device, not the device itself
 Identified by runtime instead of device
 e.g. “A SHA256 accelerator” instead of “A FPGA with XXXX-bitstream”
 Identify user requirements
 Integrate into image
 Specify when start container
 Identify device capabilities
 Identified by “accelerator plugin”
 Report to engine through plugin api
LABEL runtime
gpu0=cuda:7.5
$ docker run --accel fpga:crypto 
huawei/crypto-image
image cli / rest-api
 dockerd
Device Accel Runtime
/dev/nvidia0 cuda:7.5, cuda:8.0, OpenCL 2.0, …
/dev/qat0 md5, sha256, aes, …
/dev/fpga0 crypto, compress, decode, …
Accelerator
Plugin
Plugin API
HUAWEI TECHNOLOGIES CO., LTD. 12
Implementation (2/3): PREPARE
 Select plugin from accelerator requirements
 Device initialization by plugin
 Dynamic inject required libraries & devices
DPDK-plugin
- eal.so/…
- igb_uio.ko
- /dev/uioX
FPGA-plugin
- fpga.so
- fpga.ko
- /dev/fpgaX
 container
 Host
Application
fpga.ko
v1
 dockerd
/dev/fpga0
generate
bind
libacc
accel requirements
PaaS
fpga.so/dev/fpga0
fpga.so
v1
(2) mount userspace lib
(3) mount device
GPU-plugin
- gpu.so / cuda.so
- gpu.ko
- /dev/gpuX
QAT-plugin
- qat.so
- qat.ko
- /dev/qatX
HUAWEI TECHNOLOGIES CO., LTD. 13
Implementation (3/3): MANAGEMENT
Running
CreatedStopped
CreatingDeleting
INIT
Container
Life-Cycles
allocate
persistent
allocate
non-persistent
release
non-persistent
release
persistent
prepare
parse accel
requirements
Accelerator
Plugin
 Accelerator Life-Cycle
 Allocate  Prepare  Release
 Plugin hooks for each stage
 Integrate with Container Life-
Cycle
 Persistent:CreateDelete
 Non-persistent: StartStop
HUAWEI TECHNOLOGIES CO., LTD. 14
Example Flow
HUAWEI TECHNOLOGIES CO., LTD. 15
Further Developments
 Standardize “Accelerator Runtime”
 OCI Runtime/Image Spec should be good place
 Integrate with Docker Swarm / K8S / etc
 Cluster level accelerator schedule&management
 More Features
 Accelerator Sharing
 Accelerator Exception Interface
 Accelerator Hot-Plug
 …
Copyright© 2011 Huawei Technologies Co., Ltd. All Rights Reserved.
The information in this document may contain predictive statements including, without limitation, statements regarding the future
financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual
results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information
is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any
time without notice.
Thank you
www.huawei.com

Mais conteúdo relacionado

Mais procurados

OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph GaluschkaOpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
NETWAYS
 

Mais procurados (20)

[20200720]cloud native develoment - Nelson Lin
[20200720]cloud native develoment - Nelson Lin[20200720]cloud native develoment - Nelson Lin
[20200720]cloud native develoment - Nelson Lin
 
OCI Support in Mesos
OCI Support in MesosOCI Support in Mesos
OCI Support in Mesos
 
Containers for the Enterprise: Delivering OpenShift on OpenStack for Performa...
Containers for the Enterprise: Delivering OpenShift on OpenStack for Performa...Containers for the Enterprise: Delivering OpenShift on OpenStack for Performa...
Containers for the Enterprise: Delivering OpenShift on OpenStack for Performa...
 
Scale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 servicesScale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 services
 
Introduction to CNI (Container Network Interface)
Introduction to CNI (Container Network Interface)Introduction to CNI (Container Network Interface)
Introduction to CNI (Container Network Interface)
 
Running Legacy Applications with Containers
Running Legacy Applications with ContainersRunning Legacy Applications with Containers
Running Legacy Applications with Containers
 
OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...
OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...
OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...
 
Unikernelized Linux
Unikernelized LinuxUnikernelized Linux
Unikernelized Linux
 
Linuxcon secureefficientcontainerimagemanagementharbor
Linuxcon secureefficientcontainerimagemanagementharborLinuxcon secureefficientcontainerimagemanagementharbor
Linuxcon secureefficientcontainerimagemanagementharbor
 
Kubernetes 1001
Kubernetes 1001Kubernetes 1001
Kubernetes 1001
 
OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...
OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...
OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...
 
Containerize ovs ovn components
Containerize ovs ovn componentsContainerize ovs ovn components
Containerize ovs ovn components
 
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph GaluschkaOpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
 
Application-Based Routing
Application-Based RoutingApplication-Based Routing
Application-Based Routing
 
DCSF 19 Online Feature Extraction and Event Generation for Computer-Animal In...
DCSF 19 Online Feature Extraction and Event Generation for Computer-Animal In...DCSF 19 Online Feature Extraction and Event Generation for Computer-Animal In...
DCSF 19 Online Feature Extraction and Event Generation for Computer-Animal In...
 
Leveraging the Power of containerd Events - Evan Hazlett
Leveraging the Power of containerd Events - Evan HazlettLeveraging the Power of containerd Events - Evan Hazlett
Leveraging the Power of containerd Events - Evan Hazlett
 
Container Orchestration from Theory to Practice
Container Orchestration from Theory to PracticeContainer Orchestration from Theory to Practice
Container Orchestration from Theory to Practice
 
OpenShift v3 Internal networking details
OpenShift v3 Internal networking detailsOpenShift v3 Internal networking details
OpenShift v3 Internal networking details
 
Virtualization inside kubernetes
Virtualization inside kubernetesVirtualization inside kubernetes
Virtualization inside kubernetes
 
Linux Kernel Development
Linux Kernel DevelopmentLinux Kernel Development
Linux Kernel Development
 

Destaque

Destaque (19)

From Resilient to Antifragile Chaos Engineering Primer
From Resilient to Antifragile Chaos Engineering PrimerFrom Resilient to Antifragile Chaos Engineering Primer
From Resilient to Antifragile Chaos Engineering Primer
 
Secure Containers with EPT Isolation
Secure Containers with EPT IsolationSecure Containers with EPT Isolation
Secure Containers with EPT Isolation
 
Open Source Software Business Models Redux
Open Source Software Business Models ReduxOpen Source Software Business Models Redux
Open Source Software Business Models Redux
 
Rebuild - Simplifying Embedded and IoT Development Using Linux Containers
Rebuild - Simplifying Embedded and IoT Development Using Linux ContainersRebuild - Simplifying Embedded and IoT Development Using Linux Containers
Rebuild - Simplifying Embedded and IoT Development Using Linux Containers
 
GPU Acceleration for Containers on Intel Processor Graphics
GPU Acceleration for Containers on Intel Processor GraphicsGPU Acceleration for Containers on Intel Processor Graphics
GPU Acceleration for Containers on Intel Processor Graphics
 
Status of Embedded Linux
Status of Embedded LinuxStatus of Embedded Linux
Status of Embedded Linux
 
Libvirt API Certification
Libvirt API CertificationLibvirt API Certification
Libvirt API Certification
 
LiteOS
LiteOS LiteOS
LiteOS
 
High Performance Linux Virtual Machine on Microsoft Azure: SR-IOV Networking ...
High Performance Linux Virtual Machine on Microsoft Azure: SR-IOV Networking ...High Performance Linux Virtual Machine on Microsoft Azure: SR-IOV Networking ...
High Performance Linux Virtual Machine on Microsoft Azure: SR-IOV Networking ...
 
kdump: usage and_internals
kdump: usage and_internalskdump: usage and_internals
kdump: usage and_internals
 
Practical CNI
Practical CNIPractical CNI
Practical CNI
 
点融网区块链即服务实践 - The Practice of Blockchain as a Service in Dianrong
点融网区块链即服务实践 - The Practice of Blockchain as a Service in Dianrong点融网区块链即服务实践 - The Practice of Blockchain as a Service in Dianrong
点融网区块链即服务实践 - The Practice of Blockchain as a Service in Dianrong
 
See what happened with real time kvm when building real time cloud pezhang@re...
See what happened with real time kvm when building real time cloud pezhang@re...See what happened with real time kvm when building real time cloud pezhang@re...
See what happened with real time kvm when building real time cloud pezhang@re...
 
Zephyr: Creating a Best-of-Breed, Secure RTOS for IoT
Zephyr: Creating a Best-of-Breed, Secure RTOS for IoTZephyr: Creating a Best-of-Breed, Secure RTOS for IoT
Zephyr: Creating a Best-of-Breed, Secure RTOS for IoT
 
Policy-based Resource Placement
Policy-based Resource PlacementPolicy-based Resource Placement
Policy-based Resource Placement
 
OpenDaylight OpenStack Integration
OpenDaylight OpenStack IntegrationOpenDaylight OpenStack Integration
OpenDaylight OpenStack Integration
 
Hyperledger Technical Community in China.
Hyperledger Technical Community in China. Hyperledger Technical Community in China.
Hyperledger Technical Community in China.
 
Obstacles & Solutions for Livepatch Support on ARM64 Architecture
Obstacles & Solutions for Livepatch Support on ARM64 ArchitectureObstacles & Solutions for Livepatch Support on ARM64 Architecture
Obstacles & Solutions for Livepatch Support on ARM64 Architecture
 
Building a Better Thermostat
Building a Better ThermostatBuilding a Better Thermostat
Building a Better Thermostat
 

Semelhante a Make Accelerator Pluggable for Container Engine

Node.js, Vagrant, Chef, and Mathoid @ Benetech
Node.js, Vagrant, Chef, and Mathoid @ BenetechNode.js, Vagrant, Chef, and Mathoid @ Benetech
Node.js, Vagrant, Chef, and Mathoid @ Benetech
Christopher Bumgardner
 
Automate drupal deployments with linux containers, docker and vagrant
Automate drupal deployments with linux containers, docker and vagrant Automate drupal deployments with linux containers, docker and vagrant
Automate drupal deployments with linux containers, docker and vagrant
Ricardo Amaro
 

Semelhante a Make Accelerator Pluggable for Container Engine (20)

State of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDataState of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigData
 
Instant scaling and deployment of Vitis Libraries on Alveo clusters using InA...
Instant scaling and deployment of Vitis Libraries on Alveo clusters using InA...Instant scaling and deployment of Vitis Libraries on Alveo clusters using InA...
Instant scaling and deployment of Vitis Libraries on Alveo clusters using InA...
 
CA Performance Manager Agility by using Docker Containers for Network Manag...
CA Performance Manager Agility by using Docker Containers for Network Manag...CA Performance Manager Agility by using Docker Containers for Network Manag...
CA Performance Manager Agility by using Docker Containers for Network Manag...
 
DCSF 19 Accelerating Docker Containers with NVIDIA GPUs
DCSF 19 Accelerating Docker Containers with NVIDIA GPUsDCSF 19 Accelerating Docker Containers with NVIDIA GPUs
DCSF 19 Accelerating Docker Containers with NVIDIA GPUs
 
Delivering Docker & K3s worloads to IoT Edge devices
Delivering Docker & K3s worloads to IoT Edge devicesDelivering Docker & K3s worloads to IoT Edge devices
Delivering Docker & K3s worloads to IoT Edge devices
 
Why you’re going to fail running java on docker!
Why you’re going to fail running java on docker!Why you’re going to fail running java on docker!
Why you’re going to fail running java on docker!
 
No more Dockerfiles? Buildpacks to help you ship your image!
No more Dockerfiles? Buildpacks to help you ship your image!No more Dockerfiles? Buildpacks to help you ship your image!
No more Dockerfiles? Buildpacks to help you ship your image!
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
 
Dayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on CloudDayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on Cloud
 
Develop QNAP NAS App by Docker
Develop QNAP NAS App by DockerDevelop QNAP NAS App by Docker
Develop QNAP NAS App by Docker
 
DevFest 2022 - Cloud Workstation Introduction TaiChung
DevFest 2022 - Cloud Workstation Introduction TaiChungDevFest 2022 - Cloud Workstation Introduction TaiChung
DevFest 2022 - Cloud Workstation Introduction TaiChung
 
Simplifying and accelerating converged media with Open Visual Cloud
Simplifying and accelerating converged media with Open Visual CloudSimplifying and accelerating converged media with Open Visual Cloud
Simplifying and accelerating converged media with Open Visual Cloud
 
Node.js, Vagrant, Chef, and Mathoid @ Benetech
Node.js, Vagrant, Chef, and Mathoid @ BenetechNode.js, Vagrant, Chef, and Mathoid @ Benetech
Node.js, Vagrant, Chef, and Mathoid @ Benetech
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
 
Docker engine - Indroduc
Docker engine - IndroducDocker engine - Indroduc
Docker engine - Indroduc
 
Automate drupal deployments with linux containers, docker and vagrant
Automate drupal deployments with linux containers, docker and vagrant Automate drupal deployments with linux containers, docker and vagrant
Automate drupal deployments with linux containers, docker and vagrant
 
Building a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and DockerBuilding a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and Docker
 
Containerizing GPU Applications with Docker for Scaling to the Cloud
Containerizing GPU Applications with Docker for Scaling to the CloudContainerizing GPU Applications with Docker for Scaling to the Cloud
Containerizing GPU Applications with Docker for Scaling to the Cloud
 
Scaleable PHP Applications in Kubernetes
Scaleable PHP Applications in KubernetesScaleable PHP Applications in Kubernetes
Scaleable PHP Applications in Kubernetes
 
Hands on Docker - Launch your own LEMP or LAMP stack - SunshinePHP
Hands on Docker - Launch your own LEMP or LAMP stack - SunshinePHPHands on Docker - Launch your own LEMP or LAMP stack - SunshinePHP
Hands on Docker - Launch your own LEMP or LAMP stack - SunshinePHP
 

Mais de LinuxCon ContainerCon CloudOpen China

Mais de LinuxCon ContainerCon CloudOpen China (8)

SecurityPI - Hardening your IoT endpoints in Home.
SecurityPI - Hardening your IoT endpoints in Home. SecurityPI - Hardening your IoT endpoints in Home.
SecurityPI - Hardening your IoT endpoints in Home.
 
Flowchain: A case study on building a Blockchain for the IoT
Flowchain: A case study on building a Blockchain for the IoTFlowchain: A case study on building a Blockchain for the IoT
Flowchain: A case study on building a Blockchain for the IoT
 
Introduction to OCI Image Technologies Serving Container
Introduction to OCI Image Technologies Serving ContainerIntroduction to OCI Image Technologies Serving Container
Introduction to OCI Image Technologies Serving Container
 
UEFI HTTP/HTTPS Boot
UEFI HTTP/HTTPS BootUEFI HTTP/HTTPS Boot
UEFI HTTP/HTTPS Boot
 
How Open Source Communities do Standardization
How Open Source Communities do StandardizationHow Open Source Communities do Standardization
How Open Source Communities do Standardization
 
Fully automated kubernetes deployment and management
Fully automated kubernetes deployment and managementFully automated kubernetes deployment and management
Fully automated kubernetes deployment and management
 
Is there still room for innovation in container orchestration and scheduling
Is there still room for innovation in container orchestration and scheduling Is there still room for innovation in container orchestration and scheduling
Is there still room for innovation in container orchestration and scheduling
 
Container Security
Container SecurityContainer Security
Container Security
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Make Accelerator Pluggable for Container Engine

  • 1. www.huawei.com HUAWEI TECHNOLOGIES CO., LTD. Make Accelerator Pluggable for Container Engine Author/ Email: Jiuyue Ma / majiuyue@huawei.com Version: V1.0 (20150504)
  • 2. HUAWEI TECHNOLOGIES CO., LTD. 2 Today’s Topic Container Engine Hardware Accelerators e.g. GPU、QAT、FPGA OLTP/OLAP BigData Processing Neural Network CloudRAN Pluggable Accelerator Support for Container Engine
  • 3. HUAWEI TECHNOLOGIES CO., LTD. 3 Agenda  Use accelerator in containers and How?  Key problems to deal with  Identify accelerator requirements  Prepare accelerator runtimes  Manage accelerator resources  Further development
  • 4. HUAWEI TECHNOLOGIES CO., LTD. 4 Why Container?
  • 5. HUAWEI TECHNOLOGIES CO., LTD. 5 Why Container? Why NOT Container? * Raphael Da Silva. OPEN SOURCE TECHNOLOGY: Docker Containers on IBM Bluemix. 2015.
  • 6. HUAWEI TECHNOLOGIES CO., LTD. 6  CPU is not fast/effective/… enough  Heterogeneous is the future  Nvidia GPU-compute, Intel Skylake Xeon+FPGA, AWS P2/F1 Instance  use the right processor, in the right place, at the right time Why Accelerator? Container a.k.a. Docker [baidu, HotChips 2016]
  • 7. HUAWEI TECHNOLOGIES CO., LTD. 7 Google it ! “Use XXX in docker” NVIDIA’s GPU-Docker Solution DPDK FPGA GPU
  • 8. HUAWEI TECHNOLOGIES CO., LTD. 8 Use GPU in Docker (from nvidia-docker) Wrapped Command docker run --volume-driver=nvidia-docker --volume=nvidia_driver_361.48:/usr/local/nvidia:ro ③ nvidia-docker cli-wrapper $ ldconfig -p | grep -E 'nvidia|cuda' libnvidia-ml.so (libc6,x86-64) => /usr/lib/nvidia-361/libnvidia-ml.so libnvidia-glcore.so.361.48 (libc6,x86-64) => /usr/lib/nvidia-361/lib… libnvidia-compiler.so.361.48 (libc6,x86-64) => /usr/lib/nvidia-361/lib… libcuda.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcuda.so ... $ lsmod | grep nvidia nvidia_uvm 711531 2 nvidia_modeset 742329 0 nvidia 10058469 80 nvidia_modeset,nvidia_uvm Key Problem:VERSION MISMATCH $ nvidia-smi Failed to initialize NVML: Driver/library version mismatch ② VOLUME: nvidia-docker/nvidia_driver_361.48 ① Labeled Image LABEL com.nvidia.volumes.needed="nvidia_driver" LABEL com.nvidia.cuda.version="7.5" • detect/check cuda version • hook docker cli args --volume-driver / --volume nvidia
  • 9. HUAWEI TECHNOLOGIES CO., LTD. 9 Do we really need “nvidia-docker”? If so, we also need:  dpdk-docker, qat-docker, fpga-docker, … But, if we need both GPU and DPDK? nvidia DPDK QAT FPGA DPDK nvidia Pluggable Accelerator
  • 10. HUAWEI TECHNOLOGIES CO., LTD. 10 Device Files (-d device) /dev/nvidia0 /dev/nvidia-smi /dev/nvidia-uvm /dev/uio0 /dev/fpga0 File Bindings (-v src:dest) /usr/lib/nvidia-361/libnvidia-ml.so /usr/lib/nvidia-361/libnvidia-glcore.so.361.48 /usr/lib/nvidia-361/libnvidia- compiler.so.361.48 /usr/lib/x86_64-linux-gnu/libcuda.so ... /sys/bus/pci/devices/xxxx:xx:xx.x /dev/hugepages /usr/lib/libfpga.so /sys/bus/pci/devices/xxxx:xx:xx.x /dev/hugepages Environments (-e env) PATH=$PATH:/usr/lib/nvidia-361/bin LD_LIBRARY_PATH=… Pluggable Accelerator, How?  What’s the difference? nvidia DPDK FPGA Accelerator Plugin Interface in-house FPGA Accel
  • 11. HUAWEI TECHNOLOGIES CO., LTD. 11 Implementation (1/3): IDENTIFY  Accelerator Requirements  We are using the function provided by device, not the device itself  Identified by runtime instead of device  e.g. “A SHA256 accelerator” instead of “A FPGA with XXXX-bitstream”  Identify user requirements  Integrate into image  Specify when start container  Identify device capabilities  Identified by “accelerator plugin”  Report to engine through plugin api LABEL runtime gpu0=cuda:7.5 $ docker run --accel fpga:crypto huawei/crypto-image image cli / rest-api  dockerd Device Accel Runtime /dev/nvidia0 cuda:7.5, cuda:8.0, OpenCL 2.0, … /dev/qat0 md5, sha256, aes, … /dev/fpga0 crypto, compress, decode, … Accelerator Plugin Plugin API
  • 12. HUAWEI TECHNOLOGIES CO., LTD. 12 Implementation (2/3): PREPARE  Select plugin from accelerator requirements  Device initialization by plugin  Dynamic inject required libraries & devices DPDK-plugin - eal.so/… - igb_uio.ko - /dev/uioX FPGA-plugin - fpga.so - fpga.ko - /dev/fpgaX  container  Host Application fpga.ko v1  dockerd /dev/fpga0 generate bind libacc accel requirements PaaS fpga.so/dev/fpga0 fpga.so v1 (2) mount userspace lib (3) mount device GPU-plugin - gpu.so / cuda.so - gpu.ko - /dev/gpuX QAT-plugin - qat.so - qat.ko - /dev/qatX
  • 13. HUAWEI TECHNOLOGIES CO., LTD. 13 Implementation (3/3): MANAGEMENT Running CreatedStopped CreatingDeleting INIT Container Life-Cycles allocate persistent allocate non-persistent release non-persistent release persistent prepare parse accel requirements Accelerator Plugin  Accelerator Life-Cycle  Allocate  Prepare  Release  Plugin hooks for each stage  Integrate with Container Life- Cycle  Persistent:CreateDelete  Non-persistent: StartStop
  • 14. HUAWEI TECHNOLOGIES CO., LTD. 14 Example Flow
  • 15. HUAWEI TECHNOLOGIES CO., LTD. 15 Further Developments  Standardize “Accelerator Runtime”  OCI Runtime/Image Spec should be good place  Integrate with Docker Swarm / K8S / etc  Cluster level accelerator schedule&management  More Features  Accelerator Sharing  Accelerator Exception Interface  Accelerator Hot-Plug  …
  • 16. Copyright© 2011 Huawei Technologies Co., Ltd. All Rights Reserved. The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice. Thank you www.huawei.com