Ивашкевич Глеб - HPC software developer / Gero / Украина, Харьков
Графические процессоры становятся частью стандартного инструментария в высокопроизводительных вычислениях. Одновременно появляются новые и совершенствуются уже существующие программные средства. Мы поговорим об архитектуре графических процессоров Nvidia и о том, как с ними работать из Python.
http://www.it-sobytie.ru/events/2040
2. Parallel revolution
The Free Lunch Is Over: A Fundamental Turn Toward
Concurrency in Software
Herb Sutter, March 2005
When serial code hits the wall.
Power wall.
Now, Intel is embarked on a course already adopted by some of its major
rivals: obtaining more computing power by stamping multiple processors
on a single chip rather than straining to increase the speed of a single
processor.
Paul S. Otellini, Intel's CEO
May 2004
3. July 2006
Feb 2007
Nov 2008
Intel launches Core 2 Duo (Conroe)
Nvidia releases CUDA SDK
Tsubame, first GPU accelerated
supercomputer
Dec 2008 OpenCL 1.0 specification released
Today >50 GPU powered supercomputers
in Top500, 9 in Top50
4. It's very clear, that we are close to the tipping point. If we're not at a
tipping point, we're racing at it.
Jen-Hsun Huang, NVIDIA Co-founder and CEO
March 2013
Heterogeneous computing
becomes a standard in HPC
and programming has changed
6. CPU GPU
general purpose
sophisticated design
and scheduling
perfect for
task parallelism
highly parallel
huge memory bandwidth
lightweight scheduling
perfect for
data parallelism
11. Python
fast development
huge # of packages: for data analysis, linear
algebra, special functions etc
metaprogramming
Convenient, but not that fast
in number crunching
12. PyCUDA
Wrapper package around CUDA API
Convenient abstractions: GPUArray, random numbers
generation, reductions & scans etc
Automatic cleanup, initialization and error checking,
kernels caching
Completeness
14. SourceModule
Abstraction to create, compile and run GPU
code
GPU code to compile is passed as a string
Control over nvcc compiler options
Convenient interface to get kernels
15. Metaprogramming
GPU code can be created at runtime
PyCUDA uses mako template engine internally
Any template engine is ok to create GPU source code.
Remember about codepy
Create more flexible and optimized code
16. Installation
numpy, mako, CUDA driver & toolkit are
required
Boost.Python is optional
Dev packages: if you build from source
Also:
PyOpenCl, pyfft
17. NumbaPro
Accelerator package for Python
Generates machine code from Python scalar functions
(create ufunc)
from numbapro import vectorize
import numpy as np
@vectorize(['float32(float32, float32)'], target='cpu')
def add2(a, b):
return a + b
X = np.ones((1024), dtype='float32')
Y = 2*np.ones((1024), dtype='float32')
print add(X, Y)
[3., 3., … 3.]
18. GPU computing resources
Documentation
Intro to Parallel Programming
by David Luebke (Nvidia) and John Owens (UC Davis)
Heterogeneous Parallel Programming
by Wen-mei W. Hwu (UIUC)
Tesla K20/K40 test drive
http://www.nvidia.ru/object/k40-gpu-test-drive-ru.html