Our unique 1U GPU servers allow you to use the latest GPUs (Tesla, GTX285, Quadro FX5800) for visualization or offloading processing in a small form factor. These are built on Intel\'s latest Nehalem processors.
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Gpu Systems
1. GPU Systems
Advanced Clustering’s offerings for GPGPU computing
advanced clustering technologies
www.advancedclustering.com • 866.802.8222
2. what is GPU computing
• The use of a GPU (graphics processing unit) to
do general purpose scientific and engineering
computing
• Model is to use a CPU and GPU together in a
heterogenous computing model
• CPU is used to run sequential portions of
application
• Offload parallel computation onto the GPU
2
3. history of GPUs
• GPUs designed with fixed function pipelines for
real-time 3D graphics
• As complexity of GPU increased they were
designed to be more programable to easily
implement new features
• Scientists and engineers discovered that the
originally purpose built GPUs could also be re-
programmed for General Purpose computing on
a GPU (GPGPU)
3
4. history of GPUs - continued
• The nature of 3D graphics meant GPUs have
very fast floating-point units, which are also great
for scientific codes
• Originally very difficult to program, GPU vendors
have realized another market for their products
and developed specially designed GPUs and
programming environments for scientific
computing
• Most prominent is NVIDIA Tesla GPU and their
CUDA programming environment
4
5. GPUs vs. CPUs
•Traditional x86 CPUs are 240 Core Tesla GPU
available today with 4
cores: 6, 8, 12 core in the
future
• NVIDIA’s Tesla GPU is
shipping with 240 cores
Quad-core CPU
5
7. why use GPUs?
• Massively parallel design: 240 cores per GPU
• Nearly 1 teraflop of single precision floating-
point performance
• Designed as an accelerator card to add into your
existing system - does not replace your current
CPU
• Maximum of 4GB of fast dedicated RAM per
GPU
• If your code is highly parallel it’s worth
investigating
7
8. why not use GPUs?
• Fixed RAM sizes on GPU - not upgradable or
configurable
• Large power requirements of 188W
• Still requires a host server and CPU to operate
• Specialized development tools required, does not
run standard x86 code
• Current development tools are specific to
NVIDIA cards - no support for other
manufacturer’s GPUs
• Your code maybe difficult to parallelize
8
9. developing for GPUs
• Current development model: CUDA parallel
environment
• The CUDA parallel programming model guides programmers
to partition the problem into coarse sub-problems that can be
solved independently in parallel.
• Fine grain parallelism in the sub-problems is then expressed
such that each sub-problem can be solved cooperatively in
parallel.
• Currently an extension for the C programming
language - other languages in development
9
10. NVIDIA GPUs
• All of NVIDIA’s recent GPUs support CUDA
development
• Tesla cards designed exclusively for CUDA and
GPGPU code (no graphics support)
• GeForce cards designed for graphics can be used
for CUDA code as well
• Usually slower, less cores, or less RAM - but a
great way to get started at low price points
• Development and testing can be done on almost
any standard GeForce GPU and run on a Tesla
system
10
12. GPU future
• More products coming: AMD Stream processor
line of products, similar to NVIDIA’s Tesla
• Standard, portable programming via OpenCL
• OpenCL (Open Computing Language) is the first open, royalty-
free standard for general-purpose parallel programming.
Create portable code for a diverse mix of multi-core CPUs,
GPUs, Cell-type architectures and other parallel processors
such as DSPs.
• More info: http://www.khronos.org/opencl/
12
13. building GPU systems
• Building systems to house GPUs can be difficult:
• Requires lots of engineering and design work
to be able to be able to power and cool them
correctly
• GPUs were originally designed for visualization
and gaming; size and form-factor were not as
important
• When used for computation data-center space
is limited and expensive - need to find a way to
implement GPUs in existing infrastructure
13
14. traditional GPU servers
•Large tower style cases
•Rackmount servers 4U or larger
•Either choice is not an efficient
Text
use of limited data center space
14
15. GPUs are large
1.5” deep
lo ng
4.6 .5”
”t 10
all
The size of the GPU has
limited it’s application
15
16. GPUs are power hungry
=
•GPU Cards can use a lot of power
- as much as 270W
•Lots of power equals lots of heat
•Difficult to put into a small space
and cool effectively
16
17. GPU system options
Advanced Clustering has two solutions
to the power, heat, and density problems:
NVIDIA’s Tesla S1070
Advanced Clustering’s
15XGPU nodes
17
18. NVIDIA’s tesla S1070
• The S1070 is an external 1U box that contains
4x Tesla C1060 GPUs
• The S1070 must be connected to one or two
host servers to operate
• S1070 has one power supply and dedicated
cooling for the 4x GPUs
• Only available with the C1060 GPU cards pre-
installed
18
22. host interface cards (HIC)
• The Host Interface Card (HIC) • HICs can be installed in 2
connects Tesla S1070 to Server separate servers, or 1 server
• Every S1070 requires 2 HICs • HICs are available in PCI-e 8x and
• Each HIC bridges the server to 16x widths
two of the four GPUs inside of
the S1070
22
23. tesla S1070 block diagram
Tesla S1070
Cables to HICs in
Host System(s)
23
24. connecting S1070 to 2 servers
Server
#1
Tesla S1070
Most servers do not have
enough PCI-e bandwidth, so Server
S1070 is designed to allow
connecting to 2 separate
#2
machines.
24
25. connecting S1070 to 1 server
Tesla S1070
Server
If the server has enough
PCI-e lanes and expansion
slots one Tesla S1070 can be
connected to one server.
25
26. example cluster of S1070s
HIC #1
HIC #2
HIC #1
HIC #2
• 10x 1U compute nodes
HIC #1 with 2x CPUs each
HIC #2 • 5 Tesla S1070 with 4x
GPUs each
HIC #1 • Balanced system of 20
CPUs and 20 GPUs
HIC #2
• All in 15U of rack space
HIC #1
HIC #2
26
27. S1070s pros and cons
•Pros •Cons
• External enclosure to hold GPUs • Two GPUs share one PCI-e slot in the
doesn’t require a special server design host server limiting bandwidth to the
to hold the GPUs GPU card
• Easy to add GPUs to any existing system • Most 1U servers only have 1x PCI-e
• 4 GPUs in only 1U of space expansion slot which is occupied by the
HIC - this limits ability to use
• Multiple HIC card configurations
interconnects like InfiniBand or 10
including PCI-e 8x or 16x
Gigabit Ethernet
• Thermally tested and validated by
• Limited configuration options, only Tesla
NVIDIA
cards, no GeForce or Quadro options
27
29. advanced clustering GPU nodes
• The 15XGPU line of systems is a complete two
processor server and GPU in 1U
• Server fully configured with latest quad-core Intel
Xeon processors, RAM, hard drives, optical,
networking, InfiniBand and GPU card
• Flexible to support various GPUs, including:
• Tesla C1060 card
• GeForce series
• Quadro series
29
33. GPU node - block diagram
Advanced
Clustering
15XGPU
node
Simplified design, host server completely
integrated with GPU no external components
to connect to.
33
34. example cluster of GPU nodes
• 15x 1U compute nodes
• 2x CPUs each
• 1x GPU integrated in
each node
• Entire system contains
30x CPUs and 15x GPUs
• All in 15U of rack space
34
35. GPU nodes - thermals
•System carefully engineered
to ensure all components
will fit in the small form
factor
•Detailed modeling and
testing to make sure the
system components (CPU
and memory) and the GPU
are adequately cooled
35
36. GPU nodes pros and cons
•Pros •Cons
• Entire server and GPU all enclosed in a • Only 1x GPU per server
1U package
• Requires purchase of new servers, not
• Flexibility in GPU choice: Tesla, an upgrade or add-on
GeForce, and Quadro supported
• Not as dense of a solution as S1070 for
• Full PCI-e bandwidth to GPU 4x GPUs
• Full-featured server with the latest
quad-core Intel Xeon CPUs
• Can be used for more than
computation, use the GPU for video
output as well
36
37. GPU nodes
• The GPU node concept is unique to Advanced
Clustering
• Only vendor shipping a 1U with integrated Tesla
or high-end GeForce / Quadro card
• Available for order as the 1X5GPU2
• Dual Quad-Core Intel Xeon 5500 series
processors
• Choice of GPU
37
38. 15XGPU2 - specifications
• Processor • Management
• Two Intel Xeon 5500 Series processors • Integrated IPMI 2.0 module
• Next generation "Nehalem" microarchitecture • Integrated management controller providing iKVM
• Integrated memory controller and 2x QPI chipset and remote disk emulation.
interconnects per processor • Dedicated RJ45 LAN for management network
• 45nm process technology • I/O connections
• Chipset • Two independent 10/100/1000Base-T (Gigabit)
• Intel 5500 I/O controller hub RJ-45 Ethernet interfaces
• Memory • Two USB 2.0 ports
• 800MHz, 1066MHz, or 1333MHz DDR3 memory • One DB-9 serial port (RS-232)
• Twelve DIMM sockets for support up to 144GB of • One VGA port
memory • Optional ConnectX DDR or QDR InfiniBand
• GPU connector
• PCI-e 2.0 16x double height expansion slot for GPU • Electrical Requirements
• Multiple options: Tesla, GeForce, or Quadro cards • High-efficiency power supply (greater than 80%)
• Storage • Output Power: 560W
• Two 3.5" SATA2 drive bay • Universal input voltage 100V to 240V
• Support RAID level 0-1 with Linux software RAID • Frequency: 50Hz to 60Hz, single phase
(with 2.5" drives)
• DVD+RW slim-line optical drive
38
39. availability
• Both the Tesla S1070 and 15XGPU GPU nodes
are available and shipping now
• For price and custom configuration contact your
Account Representative
• (866) 802-8222
• sales@advancedclustering.com
• http://www.advancedclustering.com/go/gpu
39