Python и программирование GPU (Ивашкевич Глеб)

•

2 gostaram•1,263 visualizações

Ивашкевич Глеб - HPC software developer / Gero / Украина, Харьков Графические процессоры становятся частью стандартного инструментария в высокопроизводительных вычислениях. Одновременно появляются новые и совершенствуются уже существующие программные средства. Мы поговорим об архитектуре графических процессоров Nvidia и о том, как с ними работать из Python. http://www.it-sobytie.ru/events/2040

Educação Tecnologia

Python and GPU
Computing
Glib Ivashkevych
HPC software developer, GERO Lab

Parallel revolution
The Free Lunch Is Over: A Fundamental Turn Toward
Concurrency in Software
Herb Sutter, March 2005
When serial code hits the wall.
Power wall.
Now, Intel is embarked on a course already adopted by some of its major
rivals: obtaining more computing power by stamping multiple processors
on a single chip rather than straining to increase the speed of a single
processor.
Paul S. Otellini, Intel's CEO
May 2004

July 2006
Feb 2007
Nov 2008
Intel launches Core 2 Duo (Conroe)
Nvidia releases CUDA SDK
Tsubame, first GPU accelerated
supercomputer
Dec 2008 OpenCL 1.0 specification released
Today >50 GPU powered supercomputers
in Top500, 9 in Top50

It's very clear, that we are close to the tipping point. If we're not at a
tipping point, we're racing at it.
Jen-Hsun Huang, NVIDIA Co-founder and CEO
March 2013
Heterogeneous computing
becomes a standard in HPC
and programming has changed

Heterogeneous
computing
CPU
main memory
GPU
cores
GPU
memory
multiprocessors
Host Device

CPU GPU
general purpose
sophisticated design
and scheduling
perfect for
task parallelism
highly parallel
huge memory bandwidth
lightweight scheduling
perfect for
data parallelism

Anatomy of GPU:
multiprocessors
GPU
MP
shared
memory
GPU is composed of
tens of
multiprocessors
(streaming processors), which are
composed of
tens of cores
= hundreds of cores

Compute
Unified
Device
Architecture
is a
hierarchy of
computation
memory
synchronization

Compute hierarchy
software
kernel
hardware
abstractions
hardware
thread
thread block
grid of blocks
core
multiprocessor
GPU

Compute hierarchy
thread
threadIdx
thread block
blockIdx, blockDim
grid of blocks
gridDim

Python
fast development
huge # of packages: for data analysis, linear
algebra, special functions etc
metaprogramming
Convenient, but not that fast
in number crunching

PyCUDA
Wrapper package around CUDA API
Convenient abstractions: GPUArray, random numbers
generation, reductions & scans etc
Automatic cleanup, initialization and error checking,
kernels caching
Completeness

GPUArray
NumPy-like interface for GPU arrays
Convenient creation and manipulation routines
Elementwise operations
Cleanup

SourceModule
Abstraction to create, compile and run GPU
code
GPU code to compile is passed as a string
Control over nvcc compiler options
Convenient interface to get kernels

Metaprogramming
GPU code can be created at runtime
PyCUDA uses mako template engine internally
Any template engine is ok to create GPU source code.
Remember about codepy
Create more flexible and optimized code

Installation
numpy, mako, CUDA driver & toolkit are
required
Boost.Python is optional
Dev packages: if you build from source
Also:
PyOpenCl, pyfft

NumbaPro
Accelerator package for Python
Generates machine code from Python scalar functions
(create ufunc)
from numbapro import vectorize
import numpy as np
@vectorize(['float32(float32, float32)'], target='cpu')
def add2(a, b):
return a + b
X = np.ones((1024), dtype='float32')
Y = 2*np.ones((1024), dtype='float32')
print add(X, Y)
[3., 3., … 3.]

GPU computing resources
Documentation
Intro to Parallel Programming
by David Luebke (Nvidia) and John Owens (UC Davis)
Heterogeneous Parallel Programming
by Wen-mei W. Hwu (UIUC)
Tesla K20/K40 test drive
http://www.nvidia.ru/object/k40-gpu-test-drive-ru.html

Mais conteúdo relacionado

Mais procurados

20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo OmuraPreferred Networks

GPU Programming with CUDAFilipo Mór

CudaGopi Saiteja

PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...AMD Developer Central

Transparent GPU Exploitation for JavaKazuaki Ishizaki

CuPy: A NumPy-compatible Library for GPUShohei Hido

LCU13: GPGPU on ARM Experience ReportLinaro

Part 3 Maximizing the utilization of GPU resources on-premise and in the cloudUniva, an Altair Company

Nvidia tesla-k80-overviewCommunication Progress

GPU Programming with JavaKelum Senanayake

IBM AI at ScaleGanesan Narayanasamy

Exploiting GPUs in SparkKazuaki Ishizaki

How to Burn Multi-GPUs using CUDA stress test memoNaoto MATSUMOTO

Using Docker for GPU Accelerated ApplicationsNVIDIA

ChainerUI v0.2, v0.3Preferred Networks

The GPGPU ContinuumOfer Rosenberg

Introduction to Computing on GPUIlya Kuzovkin

Distributed deep learning optimizations - AI WithTheBestgeetachauhan

GPU ComputingKhan Mostafa

Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.

Mais procurados (20)

20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura

GPU Programming with CUDA

Cuda

PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...

Transparent GPU Exploitation for Java

CuPy: A NumPy-compatible Library for GPU

LCU13: GPGPU on ARM Experience Report

Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud

Nvidia tesla-k80-overview

GPU Programming with Java

IBM AI at Scale

Exploiting GPUs in Spark

How to Burn Multi-GPUs using CUDA stress test memo

Using Docker for GPU Accelerated Applications

ChainerUI v0.2, v0.3

The GPGPU Continuum

Introduction to Computing on GPU

Distributed deep learning optimizations - AI WithTheBest

GPU Computing

Backend.AI Technical Introduction (19.09 / 2019 Autumn)

Semelhante a Python и программирование GPU (Ивашкевич Глеб)

GPGPU Accelerates PostgreSQL (English)Kohei KaiGai

PGI Compilers & Tools Update- March 2018NVIDIA

Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach

Introduction to GPU ProgrammingChakkrit (Kla) Tantithamthavorn

Deep learning: Hardware LandscapeGrigory Sapunov

“A New, Open-standards-based, Open-source Programming Model for All Accelerat...Edge AI and Vision Alliance

Parallel and Distributed Computing Chapter 8AbdullahMunir32

Recreating "The Clock" with Machine Learning and Web ScrapingKP Kaiser

Benchmark of common AI accelerators: NVIDIA GPU vs. Intel MovidiusbyteLAKE

Vpu technology &gpgpu computingArka Ghosh

Build and Monitor Machine Learning Services in KubernetesKP Kaiser

Nvidia at SEMICon, MunichAlison B. Lowndes

The Rise of Parallel Computingbakers84

Vpu technology &gpgpu computingArka Ghosh

Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecturemohamedragabslideshare

GPU enablement for data science on OpenShift | DevNation Tech TalkRed Hat Developers

HPC DAY 2017 | FlyElephant Solutions for Data Science and HPCHPC DAY

GPGPU programming with CUDASavith Satheesh

Semelhante a Python и программирование GPU (Ивашкевич Глеб) (20)

GPGPU Accelerates PostgreSQL (English)

PGI Compilers & Tools Update- March 2018

Using GPUs to handle Big Data with Java by Adam Roberts.

Introduction to GPU Programming

Deep learning: Hardware Landscape

“A New, Open-standards-based, Open-source Programming Model for All Accelerat...

Parallel and Distributed Computing Chapter 8

Recreating "The Clock" with Machine Learning and Web Scraping

Benchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius

Vpu technology &gpgpu computing

Build and Monitor Machine Learning Services in Kubernetes

Nvidia at SEMICon, Munich

The Rise of Parallel Computing

Vpu technology &gpgpu computing

Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture

GPU enablement for data science on OpenShift | DevNation Tech Talk

HPC DAY 2017 | FlyElephant Solutions for Data Science and HPC

GPGPU programming with CUDA

Mais de IT-Доминанта

Алексей Федоров: Количественные исследования в HRIT-Доминанта

Рекрутеры не волшебники или Почему клиенты тоже плачут?IT-Доминанта

Soft skills matrix for HR managerIT-Доминанта

Конкуренция городов среди ИТ-специалистовIT-Доминанта

Волшебная формула наймаIT-Доминанта

Рекрутинг и коммьюнити билдинг за МКАДом: миссия выполнимаIT-Доминанта

Как продать вакансию техническому специалисту?IT-Доминанта

Для спикера Piter Py #3IT-Доминанта

IT HR Meetup 24 (Faina Lerner)IT-Доминанта

Дмитрий Кончаленков "Особенности маркетинга IT продуктов в социальных сетях"IT-Доминанта

Андрей Маркин "Основы маркетинга (продвижения) IT продуктов в поисковых и мед...IT-Доминанта

6 самых неприличных поз HR-брендинга и стоит ли им заниматься вообщеIT-Доминанта

Цифровой лайфхак, или как посчитать счастье сотрудников: сбор, анализ и предс...IT-Доминанта

Повышаем эффективность и прозрачность HR, сокращаем расходыIT-Доминанта

Премирование для IT – фикция или работающий инструмент?IT-Доминанта

Простые непростые истины бюджетирования или сколько стоит управление персоналом?IT-Доминанта

Повышают ли ретроспективы проектов результативность команд? (Дмитрий Лазареd)IT-Доминанта

О геймификации серьезно: опыт Veeam (от проблемы до внедрения)IT-Доминанта

дарья кирпо для найти ответ - публикацияIT-Доминанта

HR брендинг: позиционирование компании на рынке труда (часть 1)IT-Доминанта

Mais de IT-Доминанта (20)

Алексей Федоров: Количественные исследования в HR

Рекрутеры не волшебники или Почему клиенты тоже плачут?

Soft skills matrix for HR manager

Конкуренция городов среди ИТ-специалистов

Волшебная формула найма

Рекрутинг и коммьюнити билдинг за МКАДом: миссия выполнима

Как продать вакансию техническому специалисту?

Для спикера Piter Py #3

IT HR Meetup 24 (Faina Lerner)

Дмитрий Кончаленков "Особенности маркетинга IT продуктов в социальных сетях"

Андрей Маркин "Основы маркетинга (продвижения) IT продуктов в поисковых и мед...

6 самых неприличных поз HR-брендинга и стоит ли им заниматься вообще

Цифровой лайфхак, или как посчитать счастье сотрудников: сбор, анализ и предс...

Повышаем эффективность и прозрачность HR, сокращаем расходы

Премирование для IT – фикция или работающий инструмент?

Простые непростые истины бюджетирования или сколько стоит управление персоналом?

Повышают ли ретроспективы проектов результативность команд? (Дмитрий Лазареd)

О геймификации серьезно: опыт Veeam (от проблемы до внедрения)

дарья кирпо для найти ответ - публикация

HR брендинг: позиционирование компании на рынке труда (часть 1)

Último

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar

1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh

Student login on Anyboli platform.helpinRaunakKeshri1

Nutritional Needs Presentation - HLTH 104misteraugie

Mattingly "AI & Prompt Design: The Basics of Prompt Design"National Information Standards Organization (NISO)

Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB

The Most Excellent Way | 1 Corinthians 13Steve Thomason

Staff of Color (SOC) Retention Efforts DDSDDavid Douglas School District

Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron

Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching

The basics of sentences session 2pptx copy.pptxheathfieldcps1

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"National Information Standards Organization (NISO)

A Critique of the Proposed National Education Policy ReformChameera Dedduwage

URLs and Routing in the Odoo 17 Website AppCeline George

Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019

Interactive Powerpoint_How to Master effective communicationnomboosow

Arihant handbook biology for class 11 .pdfchloefrazer622

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr

Python и программирование GPU (Ивашкевич Глеб)

1. Python and GPU Computing Glib Ivashkevych HPC software developer, GERO Lab

2. Parallel revolution The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software Herb Sutter, March 2005 When serial code hits the wall. Power wall. Now, Intel is embarked on a course already adopted by some of its major rivals: obtaining more computing power by stamping multiple processors on a single chip rather than straining to increase the speed of a single processor. Paul S. Otellini, Intel's CEO May 2004

3. July 2006 Feb 2007 Nov 2008 Intel launches Core 2 Duo (Conroe) Nvidia releases CUDA SDK Tsubame, first GPU accelerated supercomputer Dec 2008 OpenCL 1.0 specification released Today >50 GPU powered supercomputers in Top500, 9 in Top50

4. It's very clear, that we are close to the tipping point. If we're not at a tipping point, we're racing at it. Jen-Hsun Huang, NVIDIA Co-founder and CEO March 2013 Heterogeneous computing becomes a standard in HPC and programming has changed

5. Heterogeneous computing CPU main memory GPU cores GPU memory multiprocessors Host Device

6. CPU GPU general purpose sophisticated design and scheduling perfect for task parallelism highly parallel huge memory bandwidth lightweight scheduling perfect for data parallelism

7. Anatomy of GPU: multiprocessors GPU MP shared memory GPU is composed of tens of multiprocessors (streaming processors), which are composed of tens of cores = hundreds of cores

8. Compute Unified Device Architecture is a hierarchy of computation memory synchronization

9. Compute hierarchy software kernel hardware abstractions hardware thread thread block grid of blocks core multiprocessor GPU

10. Compute hierarchy thread threadIdx thread block blockIdx, blockDim grid of blocks gridDim

11. Python fast development huge # of packages: for data analysis, linear algebra, special functions etc metaprogramming Convenient, but not that fast in number crunching

12. PyCUDA Wrapper package around CUDA API Convenient abstractions: GPUArray, random numbers generation, reductions & scans etc Automatic cleanup, initialization and error checking, kernels caching Completeness

13. GPUArray NumPy-like interface for GPU arrays Convenient creation and manipulation routines Elementwise operations Cleanup

14. SourceModule Abstraction to create, compile and run GPU code GPU code to compile is passed as a string Control over nvcc compiler options Convenient interface to get kernels

15. Metaprogramming GPU code can be created at runtime PyCUDA uses mako template engine internally Any template engine is ok to create GPU source code. Remember about codepy Create more flexible and optimized code

16. Installation numpy, mako, CUDA driver & toolkit are required Boost.Python is optional Dev packages: if you build from source Also: PyOpenCl, pyfft

17. NumbaPro Accelerator package for Python Generates machine code from Python scalar functions (create ufunc) from numbapro import vectorize import numpy as np @vectorize(['float32(float32, float32)'], target='cpu') def add2(a, b): return a + b X = np.ones((1024), dtype='float32') Y = 2*np.ones((1024), dtype='float32') print add(X, Y) [3., 3., … 3.]

18. GPU computing resources Documentation Intro to Parallel Programming by David Luebke (Nvidia) and John Owens (UC Davis) Heterogeneous Parallel Programming by Wen-mei W. Hwu (UIUC) Tesla K20/K40 test drive http://www.nvidia.ru/object/k40-gpu-test-drive-ru.html

Python и программирование GPU (Ивашкевич Глеб)

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Python и программирование GPU (Ивашкевич Глеб)

Semelhante a Python и программирование GPU (Ивашкевич Глеб) (20)

Mais de IT-Доминанта

Mais de IT-Доминанта (20)

Último

Último (20)

Python и программирование GPU (Ивашкевич Глеб)