3. INTRODUCTION:
• GPU has powered the display of Computers
• Designed for real-time high resolution 3D graphics tasks
• Commercial GPU-based systems are becoming common
• NVIDIA and AMD expanding processor sophistication and
software development tools
• High accuracy by higher floating point precision
• GPUs currently on a development cycle much closer to CPUs
• GPU not constrained by sockets
• Very small backwards compatibility needed in firmware while
rest is delivered through driver implementation
4. GPU based S/W’s requirements
• Computational requirements are large
• Parallelism is substantial
• Throughput is more important than latency
App requirement to target GPGPU
programming:
• Large data sets
• High parallelism
• Minimal dependencies between data elements
• High arithmetic intensity
• Lots of work to do without CPU intervention
5. Task Vs. Data parallelism
Task parallelism:
• Independent processes with little
communication
Data parallelism:
• Lots of data on which the same computation is
being executed
• No dependencies between data elements in
each step in the computation
• Can saturate many ALUs
6. GPU Vs CPU
• CPU designed to process a task as fast as
possible while GPU capable of processing a
maximum of tasks on a large scale of data
• CPU divides work in time while GPU divides work
in space
7. Graphics Pipeline:
• Input to the GPU is a list of geometric primitives
• Vertex Operations: primitives transformed into screen
space and shaded
• Primitive Assembly: Vertices assembled into triangles
• Computing their interaction with the lights in the scene
• Rasterization: determines which screen-space pixels are
covered by each triangle
• Fragment Operations: Using color information each
fragment is shaded to determine its final color
• Each pixel’s color value may be computed from several
fragments
• Composition: Fragments are assembled into a final image
with one color per pixel
9. Evolution of GPU Architecture:
• Fixed function pipeline lacked generality for complex
effects
• Replacement of fixed function per vertex and per
fragment operations by vertex and fragment programs
• Increased complexity of vertex and fragment program
as Shader Model evolved
• Support for Unified Shader Models
Shader Models:
• A Shader provides a user defined programmable
alternative to hard-coded approach in GLSL
• A Vertex Shader describe the traits(position, colors ,
depth value etc) of a vertex
• A Geometry shader add volumetric detail & O/P is then
sent to the rasterizer
• A Pixel/fragment shader describe the traits (color, z-
depth and alpha value) of a pixel
10.
11. GPU Programming Model
• Follows a SPMD programming model
• Each element is independent from other
elements in base programming Model
• Many parallel elements processed by single
program
• Each element can operate on integer or
float data with reasonably complete
instruction set
• Reads data from shared memory by scatter
and gather operations
• Code is in SIMD manner
• Allows different execution path for each
element
• If elements branch in different directions
both branches are computed
• Computation as blocks in order of 16
elements
• Finally programmers branches are
permitted but not free
13. Memory Architecture
• Capable of reading and writing anywhere in local
memory(GPU) or elsewhere.
• These non cached memories having large read/write
latencies which can be masked by the extremely long
pipeline, if they don’t wait for a reading instruction
14. GPGPU Programming
Stream processing is a new paradigm to maximize the
efficiency of parallel computing. It can be decomposed in two
parts:
• Stream: It’s a collection of objects which can be operated
in parallel and which require the same computation.
• Kernel: It’s a function applied on the entire stream, looks
like a “for each” loop
15. Terminology:
Streams
-Collection of records requiring similar computation
eg. Vertex positions, Voxels etc.
-Provide data parallelism
Kernels
–Functions applied to each element in stream
transforms
–No dependencies between stream elements
encourage high Arithmetic Intensity
Gather
–Indirect read from memory ( x = a[i] )
–Naturally maps to a texture fetch
–Used to access data structures and data streams
Scatter
–Indirect write to memory ( a[i] = x )
–Needed for building many data structures
–Usually done on the CPU
16. What can you do on GPUs other than
graphics?
• Large matrix/vector operations (BLAS)
• Protein Folding (Molecular Dynamics)
• FFT (SETI, signal processing)
• Ray Tracing
• Physics Simulation [cloth, fluid, collision]
• Sequence Matching (Hidden Markov Models)
• Speech Recognition (Hidden Markov
Models, Neural nets)
• Databases
• Sort/Search
• Medical Imaging (image
segmentation, processing)
And many, many more…
17. Future of GPU Computing:
• Higher Bandwidth PCI-E bus path between
CPU and GPU
• AMD’s fusion and Intel’s IvyBridge places
both CPU and GPU elements on a single chip
• Addition of AVX instructions in CPU
architectures
• Programmable Pipelines over the current few
programmable shading stages in the fixed
graphics pipeline
• Flexibility of variety of rendering along with
general purpose processing
19. Problems in GPGPU Computing
• A killer App...???...??
• Programming models and Tools…Proprietary
nature…??
• GPU in tomorrow’s Computer…Will it get
dissolved…or absorbed???
• Relationship to other parallel H/W and S/W
• Managing Rapid Change…
• Performance Evaluation and Cliffs
• Broader Toolbox for computation and Data
Structures…”Vertical” model for app development
• Faults and Lack of Precision…
20. Drawbacks:
• Power consumption
• Increasing die size
• Multi die solutions requiring inter-die connections
increase the packaging and wafer cost
• Increasing amount of die space to control logic
, registers and cache as GPU becomes flexible and
programmable
• Comparing CPU to GPUs is more like comparing
apples to oranges
• Still lots of fixed functions hardware
• Integration of multimedia fixed functions within
the CPUs
21. References:
• GPU Computing Gems Emerald Edition By Wen.Mei W.
Hwu
• Cuda By Example: An Introduction to General
Purpose GPU Computing By J.Sanders,E.Kandrot (July
2010)
• http://www.oxford-man.ox.ac.uk/gpuss/simd.html
• http://idlastro.gsfc.nasa.gov/idl_html_help/About_Sh
ader_Programs.html
• GPU Computing Proceedings of IEEE,May 2008
• Evolution Of GPU By Chris Sietz