Mais conteúdo relacionado Semelhante a GPU programming (20) GPU programming1. GPU Programming
Roberto Bonvallet
Departamento de Inform´ tica
a
Universidad T´ cnica Federico Santa Mar´a
e ı
Junio de 2010
3. CPU and GPU architectures
Control ALU ALU
ALU ALU
Cache
DRAM DRAM
8. Task and data parallelism
Task parallelism:
distributed
processing
distributed memory
message passing
9. Task and data parallelism
Task parallelism:
distributed
processing
distributed memory
message passing
Data parallelism:
same instruction on
different data
shared memory
13. Thread and memory hierarchies
Thread hierarchy:
grid of blocks
blocks of threads
Memory hierarchy:
global memory (large, slow)
shared memory (per-block, small, fast)
registers (per-thread, small, fast)
17. Matrix-matrix multiplication
cij = aik bkj
k
Cij = Aik Bkj
k
Multiplication kernel:
initialize element of
Cij = 0
for each k:
fetch element of
Aik , Bkj into shared
memory
synchronize
compute element
of Cij = Cij + Aik Bkj
synchronize
18. Nvidia C1060
Core clock 602 Mhz
Multiprocessors 30
Thread processors 240 = 30 × 8
Memory size 4 GB
Memory bandwidth 102.4 GB/s
Single precision pp 933.12 Gflop
Double precision pp 77.76 Gflop
19. CUDA programming
Array allocation and copying
cudaMalloc((void **) &p, mem_size);
cudaMemcpy(host_p, dev_p, mem_size,
cudaMemcpyHostToDevice);
[...]
cudaMemcpy(dev_p, host_p, mem_size,
cudaMemcpyDeviceToHost);
cudaFree(p);
20. CUDA programming
Kernel definition
__global__ void
vector_sum(float *a, float *b, float *c) {
int i = blockIdx.x * blockDim.x +
threadIdx.x;
c[i] = a[i] + b[i];
}
21. CUDA programming
Kernel definition
__global__ void
vector_sum(float *a, float *b, float *c) {
int i = blockIdx.x * blockDim.x +
threadIdx.x;
c[i] = a[i] + b[i];
}
Kernel launch
f<<<grid_size, block_size,
sh_mem_size>>>(a, b, c);
23. Vortex Methods
Fluid discretized as vortices
(x, y, α)
Vortex interaction:
1
K(x, y) = − (−y, x)
2π x
24. Vortex Methods
Fluid discretized as vortices
(x, y, α)
Vortex interaction:
1
K(x, y) = − (−y, x)
2π x
Biot-Savart law:
u(x) = αp K(x − xp )
p