3. Classical science
Nature
Observation Theory
blogs.sundaymercury.net
Physical
Experiments
conserve-energy-future.com
Numerical Simulations
Modern science
3
SX-9 (Tohoku University)
4. Quantum chemistry Cosmology CFD
autoevolution.com
scidacreview.org
physicsworld.com
Medicine Material design
albertkents.com
solid.me.tut.ac.jp
4
5. • Supercomputer
– The most powerful computers that can be built[2]
– First computer “ENIAC” ⇒ 350 mult/sec (1946)
– Todays supercomputer > 1,000,000,000 x ENIACS
– Todays processor speed only ~ 1,000,000 x ENIACS (?)
“Parallel computing”
cbc.ca
datacenterknowledge.com
allvoices.com 5
6. CPU: The brain of the
computer, all data is
processed here
Memory: The computers
scratch pad, programs
are loaded and run here
GPU: For graphics
processing. Used as
accelerator in HPC
Storage: Hold data
and program files
6
7. • The free lunch is over!!
-Heat
-Power restriction
-Transistor size
CPU arent getting
any faster
7
9. • Trends in HPC system design
– More nodes/processors/cores
– Deep memory hierarchies
– Non-uniform interconnect network
– Accelerators today’s topic
N
N P
P
…
… C
C
N
P
C … CC
C A C
… C
N P
P
……
PP C C C M
C … C
N N N N P
PP CCC …
…… CC
C
M
M …
A C C
C
… P
PP CCC ……
… CC
C ……
A C
C
C
C
P C P C P C …… C
CC M M
A C
… C
… ………
P CC A C C C
P
C
C
… C
C MMM
MM M C
C
C M
M
M M M
M
MM
M C … C M
M M
M M C … CM
M M
M M
Good old days! M
One proc. / node
One core / proc. Too complicated …
Uniform network… How can we fully exploit the potential? 9
10. • Programmers need to learn both Hardware and
Software
Figure: Markus Pueschel
10
11. • We need a powerful computer
• CPU speed cannot be increased anymore
• Go parallel:
– Multicomputer
– Multicore
• System’s complexity requires programmer
to learn both HW and SW
11
14. • Power is the problem
– System size is limited by power budget
• Heterogeneous system is promising
– CPU + Accelerator (=GPU)
– CPU and GPU have their own strengths and
weaknesses
– CPU: few cores, high frequency (~GHz)
– GPU: 1000 cores, low frequency (~MHz)
14
15. • Graphics Processing Unit (GPU)
– Originally developed for quickly generating 2D and
3D graphics, images, and video
– Highly parallel processor
– GPU is more power-efficient than CPU[3]
*Image from nvidia.com 15
16. • CPU and GPU are very different
processors
– Latency-oriented design (=speculative)
– Throughput-oriented design (=parallel)
vs
16
17. • CPU and GPU are very different
processors
– Latency-oriented design (=speculative)
– Throughput-oriented design (=parallel)
vs vs
17
18. CPU task 1 task 2 task 3 task 4
task 1
task 2
GPU
task 3
task 4 time
vs vs
18
19. • Speculative execution by branch prediction is
effective to shorten the execution time. But
it makes the hardware complicated
A = 2;
B = 3;
C = A+B;
D = A*B;
E = A-B;
if ( C > 4 )
{
E D C ? A = 0;
}
B = 0;
19
20. • CPU has a large cache memory and
control unit
• GPUs devote more hardware resources
to ALUs
20
21. • Many simple cores
– No speculation features
• Simplicity to increase the number of cores on a chip
• Fast context switch due to simplicity of its core design
comp. memory access comp.
GPU Core A
comp. memory access
context switch
comp. time
21
22. • CPU and GPU are very different
processors
– They have own strengths and weaknesses
• CPU has few big cores to shorten the execution
time
• GPU has many simple cores to increase
throughput
– CPU for serial execution and GPU for parallel
execution
22
23. [1] Levin, E. “Grand challenges to computational
science.” Communication of the ACM
32(12):1456-1457, December 1989.
[2] Kauffmann, William J. III, and Larry L. Smarr.
Supercomputing and the Transformation.
[3] Nvidia. “Doing more with less of a scarce
resource.” http://www.nvidia.com/object/gcr-
energy-efficiency.html
23