High-Performance Computing with C++

• Quant
• Programmer (C++, .NET, MATLAB)
• Microsoft MVP Visual C# (since 2009)
• Pluralsight course author
(MATLAB, CUDA, D, Boost,…)
• Technical Evangelist @ JetBrains

• An overview of available technologies for
computation
• A look at managed vs. unmanaged code
• How to leverage capabilities of x86 architecture
• What COTS and specialized acceleration h/w exists
and how to use it

• Native code
• Managed code

• More portable. But С++ is also portable provided you do
not use platform-specific things.
• In theory gets optimized for various platforms. In
practice, this isn’t great.
• Does not permit low-level interaction with the processor.
• Additional safety («managed») – array bound checks,
type conversion checks, etc.

• Not always portable (e.g. .NET is only partially
portable, excluding UI, WCF, …)
• Typically supports garbage collection.
• Has ways of interacting with native code (JNI,
P/Invoke, C++/CLI).

• Developer vs. software productivity?
• Managed languages simpler to use

• This talk focuses on CPU bound problems
• Some problems bottleneck on I/O
• SSD made things a lot better
• Optimization mechanisms

• Don’t expect CPU clock speed to pick up
• PC/server architecture does not scale
• The only way to accelerate computation is to provide
more entities to compute on.

• Instruction-level
• Thread-level
• Machine-level

• Via inline assembly
• Via ‘intrinsics’
• Compiler vectorization
• Use magical compilers (e.g. Intel SPMD)

• Processing data in an array
• OpenMP
• Intel Threading Building Blocks/
Parallel Patterns Library (MS)

• GPGPU
• Expansion boards
• Custom chips

• Hardware Platforms – NVIDIA, ATI
• Software platforms for computation – CUDA,
OpenCL, C++ AMP

• Typically 2, effectiveness drop-off after that
• PCI bus congestion, but depends on usage patterns

• CUDA is the principal commercially successful GPGPU platform
• CUDA is supported by many software manufacturers
(Photoshop, MATLAB, etc.)
• In many domains (e.g. video transcoding), the situation with GPU
leveraging is dire
• In terms of performance, it is thought that CUDA has better
floating-point, AMD better integral math

• CUDA is actually a managed technology
• CUDA is not device-independent
• CUDA C is the primary development language

• A GPU has several streaming multiprocessors (SM)
• Each SM has lots of processors (SP)
• We can launch a large number of threads in parallel
• Very large number of SPs ensures that even at lower
clock speeds, GPU wins out over CPU

• A look at CUDA development

• GPU does not support ordinary x86.
• Running several tasks on a GPU is difficult
• Branch divergence – branching code (a simple if)
turns computation from parallel to sequential.

• How do you plug in a few CPUs into a
motherboard? You cannot. The architecture doesn’t
scale. (And never will.)
• An alternative is to put a coprocessor on the PCI bus

• Commercial coprocessor
implementation from Intel
• PCI board with 60x cores
• Supports x86!!!!!!!!!111111
• Supports different technologies
• Runs its own micro Linux (not a driver)
• Can be used in either independent or offload mode
• Requires special development tools (Intel C++ compiler)

• Intel makes a lot of tools for С++ developers
• To work with Xeon Phi, you need

• Offload mode
• Native execution mode
• Symmetric execution

• 60 processors
• 4 hardware threads per core
• 8Gb memory
• 512-bit SIMD

• Same as in ordinary PCs, i.e.,
• OpenMP, MPI
• pthreads
• Other models coming soon

• FPGA – Field Programmable
Gate Array
• Design your own CPU
processing mechanic
• Middle ground between
hard-wired ASIC and very
flexible general-purpose CPU
• Uses special hardware description
languages (HDL) – VHDL, Verilog. There are others (SystemC,
OpenCL) and higher-level solutions (e.g., MATLAB, Embeddr).

• Intrinsically parallel
• Low-power
• Better scalability
• Not a COTS solution

• FPGA lets us offload some tasks from the CPU
• FPGA is a lot less flexible. Not so good for math.
• FPGA is a low-level construct.
• FPGAs are relatively expensive to operate.

• FPGAs do not directly compete with ordinary CPUs
• Gain an advantage due to a highly asynchronous
nature
• The goal is to pre-program an FPGA to solve a
single problem very quickly
• E.g., protocol parsing in hardware (so called ‘feed
handler’)

• JetBrains is working on the C++ IDE
• And C++ support in ReSharper
• Questions?

High-Performance Computing with C++

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (18)

Destaque

Destaque (20)

Semelhante a High-Performance Computing with C++

Semelhante a High-Performance Computing with C++ (20)

Último

Último (20)

High-Performance Computing with C++