This document provides an introduction to CUDA (Compute Unified Device Architecture). It discusses that GPUs have advantages over CPUs for parallel computing due to their optimized architecture and large number of cores. It explains how CUDA works by offloading parts of a program to run on GPU memory and cores. An example of a block cipher encryption is provided to illustrate a CPU and GPU program for the same task. Additional CUDA concepts covered include debugging tools, adoption rates, and libraries.
3. WE NEED TO THINK BEYOND MULTI-CORE
CPUS … WE NEED TO THINK MANY-CORE
GPUS
…
4. NVIDIA GPUS FPS
FPS – Floating-point per second aka flops. A measure of how
many flops can a GPU do. More is Better
GPUs beat CPUs
5. NVIDIA GPUS MEMORY BANDWIDTH
With massively parallel processors in Nvidia’s GPUs, providing
high memory bandwidth plays a big role in high performance
computing.
GPUs beat CPUs
6. GPU VS CPU
CPU GPU
" Optimised for low-latency " Optimised for data-parallel,
access to cached data sets throughput computation
" Control logic for out-of-order " Architecture tolerant of
and speculative execution memory latency
" More transistors dedicated to
computation
7. I DON’T KNOW C/C++, SHOULD I LEAVE?
Your Brain Asks:
Wait a minute, why
Relax, no worries. Not to fret. should I learn the C/
C++ SDK?
CUDA Answers:
Efficiency!!!
8. WHAT DO I NEED TO BEGIN WITH CUDA?
A Nvidia CUDA enabled graphics card e.g. Fermi
9. HOW DOES CUDA WORK
PCI Bus
1. Copy input data from CPU memory to
GPU memory
2. Load GPU program and execute,
caching data on chip for performance
3. Copy results from GPU memory to CPU
memory
10. EXAMPLE: BLOCK CYPHER
void host_shift_cypher(unsigned int *input_array, __global__ void shift_cypher(unsigned int
unsigned int *output_array, unsigned int *input_array, unsigned int *output_array,
shift_amount, unsigned int alphabet_max, unsigned int shift_amount, unsigned int
unsigned int array_length)
alphabet_max, unsigned int array_length)
{
{
for(unsigned int i=0;i<array_length;i++)
unsigned int tid = threadIdx.x + blockIdx.x *
{
blockDim.x;
int element = input_array[i];
int shifted = input_array[tid] + shift_amount;
int shifted = element + shift_amount;
if ( shifted > alphabet_max )
if(shifted > alphabet_max)
shifted = shifted % (alphabet_max + 1);
{
shifted = shifted % (alphabet_max + 1);
output_array[tid] = shifted;
}
}
output_array[i] = shifted;
}
Int main() {
}
dim3 dimGrid(ceil(array_length)/block_size);
Int main() {
dim3 dimBlock(block_size);
host_shift_cypher(input_array, output_array,
shift_cypher<<<dimGrid,dimBlock>>>(input_array,
shift_amount, alphabet_max, array_length);
output_array, shift_amount, alphabet_max,
}
array_length);
}
CPU GPU
Program Program
11. EXAMPLE: VECTOR ADDITION
// CUDA CODE
__global__ void VecAdd(const float* A, const float* B, float* C,
unsigned int N)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < N)
C[i] = A[i] + B[i];
}
// C CODE
void VecAdd(const float* A, const float* B, float* C,unsigned int N)
{
for( int i = 0; i < N; ++i)
C[i] = A[i] + B[i];
}
12. DEBUGGER
CUDA-GDB
• Based on GDB
• Linux
• Mac OS X
Parallel Nsight
• Plugin inside
Visual Studio
13. VISUAL PROFILER & MEMCHECK
Profiler
• Microsoft Windows
• Linux
• Mac OS X
• Analyze
Performance
CUDA-MEMCHECK
• Microsoft Windows
• Linux
• Mac OS X
• Detect memory
access errors
14. WHERE’S CUDA AT IN 2011?
60,000 researchers use it to aid drug discovery
470 universities teach CUDA
15. WHERE’S CUDA AT IN 2011? (PART 2..)
NVIDIA Show Case (1000+ applications)
16. ADDITIONAL RESOURCES
CUDA FAQ (http://tegradeveloper.nvidia.com/cuda-faq)
CUDA Tools & Ecosystem (
http://tegradeveloper.nvidia.com/cuda-tools-ecosystem)
CUDA Downloads (http://tegradeveloper.nvidia.com/cuda-downloads)
NVIDIA Forums (http://forums.nvidia.com/index.php?showforum=62)
GPGPU (http://gpgpu.org )
CUDA By Example (
http://tegradeveloper.nvidia.com/content/cuda-example-introduction-
general-purpose-gpu-programming-0)
Jason Sanders & Edward Kandrot
GPU Computing Gems Emerald Edition (
http://www.amazon.com/GPU-Computing-Gems-Emerald-Applications/dp/
0123849888/ )
Editor in Chief: Prof Hwu Wen-Mei
17. CUDA LIBRARIES
Visit this site
http://developer.nvidia.com/cuda-tools-
ecosystem#Libraries
Thrust, CUFFT, CUBLAS, CUSP, NPP, OpenCV,
GPU AI-Tree Search, GPU AI-Path Finding
A lot of the libraries are hosted in Google Code.
Many more gems in there too!