Stay up-to-date on the latest news, events and resources for the OpenACC community. This month’s highlights covers pseudo random number generation, the first-ever MONAI Bootcamp, upcoming GPU Hackathons and Bootcamps, and new resources!
2. 2
WHAT IS OPENACC?
main()
{
<serial code>
#pragma acc kernels
{
<parallel code>
}
}
Add Simple Compiler Directive
POWERFUL & PORTABLE
Directives-based
programming model for
parallel
computing
Designed for
performance and
portability on
CPUs and GPUs
SIMPLE
Open Specification Developed by OpenACC.org Consortium
3. 3
silica IFPEN, RMM-DIIS on P100
OPENACC GROWING MOMENTUM
Wide Adoption Across Key HPC Codes
ANSYS Fluent
Gaussian
VASP
LSDalton
MPAS
GAMERA
GTC
XGC
ACME
FLASH
COSMO
Numeca
200 APPS* USING OpenACC
Prof. Georg Kresse
Computational Materials Physics
University of Vienna
For VASP, OpenACC is the way forward for GPU
acceleration. Performance is similar to CUDA, and
OpenACC dramatically decreases GPU
development and maintenance efforts. We’re
excited to collaborate with NVIDIA and PGI as an
early adopter of Unified Memory.
“ “
VASP
Top Quantum Chemistry and Material Science Code
* Applications in production and development
4. 4
READ BLOG NOW
Read the blog by Johan Carlsson, PhD. as he details how his
team brought the Data Encryption Standard (DES) block cipher
out of retirement for a second career as a Pseudo Random
Number Generator (PRNG). Intended for simulations that
benefit from PRN generation at the granularity of lightweight
(GPU) threads, the DES PRNG has been thoroughly tested and
found to produce higher-quality random numbers than all
commonly used PRNGs.
PSEUDO RANDOM NUMBER GENERATION
BY LIGHTWEIGHT THREADS
5. 5
DON’T MISS THESE UPCOMING EVENTS
COMPLETE LIST OF EVENTS
Event Call Closes Event Date
Bootcamp OpenACC Pour La Communaute
Scientifique Française (Digital)
November 9, 2020 November 19-20, 2020
SFU HPC OpenACC GPU Bootcamp (Digital) November 24, 2020 December 1-2, 2020
CHPC OpenACC GPU Bootcamp (Digital) November 29, 2020 December 10-11, 2020
Digital in 2020: Many of our events are happening digitally! Get the same high-touch training and
mentorship without the hassle of travel!
6. 6
READ ARTICLE
In collaboration with the MICCAI Educational
Initiative, MONAI hosted its first Bootcamp
from September 30 to October 2, 2020.
This three-day virtual event included
presentations, hands-on labs, direct contact
with the MONAI core group, and an open
challenge on the last day.
FIRST-EVER MONAI BOOTCAMP
7. 7
RESOURCES
Paper: GPU acceleration of MPAS microphysics
WSM6 using OpenACC directives: Performance
and verification
Jae Youp Kim, Ji-Sun Kang, and Minsu Joh
We have attempted to accelerate a microphysics scheme embedded
within a next generation climate/weather numerical model, the Model for
Prediction Across Scales (MPAS), using OpenACC directives. As one of
the most time-consuming physics parameterization schemes, we have
focused on parallelizing the Weather Research and Forecasting (WRF)
single-moment 6-class microphysics scheme (WSM6) onto a Graphics
Processing Unit (GPU). We applied several essential methodologies to
optimize the performance of WSM6 computation on GPU, so as to
minimize data transfer between the Central Processing Unit (CPU) and
GPU, and to reduce the waste of GPU threads during computation. As a
result, we achieved GPU runs using one Tesla V100 that were on average
4.29 times faster than 20 CPU core Message Passing Interface (MPI)
runs, including I/O communication between the CPU and GPU. When
porting the whole model onto the GPU, then we achieved x10.44 speedup
of WSM6 computation, allowing us to measure the acceleration of WSM6
without I/O communication. This represents the first successful application
of GPU acceleration to the realistic full-model integration of MPAS.
READ PAPER
Fig. 2. Original structure of WSM6 subroutines (a) and modified
call graph after subroutine inlining (b).
8. 8
RESOURCES
Paper: Accelerating High-Order Stencils on GPUs
Ryuichi Sai, John Mellor-Crummey, Xiaozhu Meng, Mauricio
Araya-Polo, and Jie Meng
Stencil computations are widely used in HPC applications. Today, many HPC
platforms use GPUs as accelerators. As a result, understanding how to perform
stencil computations fast on GPUs is important. While implementation strategies for
low-order stencils on GPUs have been well-studied in the literature, not all of the
techniques work well for high-order stencils, such as those used for seismic
imaging. Furthermore, coping with boundary conditions often requires different
computational logic, which complicates efficient exploitation of the thread-level
parallelism on GPUs. In this paper, we study practical seismic imaging
computations on GPUs using high-order stencils on large domains with meaningful
boundary conditions. We manually crafted a collection of implementations of a 25-
point seismic modeling stencil in CUDA along with code to apply the boundary
conditions. We evaluated our stencil code shapes, memory hierarchy usage, data-
fetching patterns, and other performance attributes. We conducted an empirical
evaluation of these stencils using several mature and emerging tools and discuss
our quantitative findings. Among our implementations, we achieve twice the
performance of a proprietary code developed in C and mapped to GPUs using
OpenACC. Additionally, several of our implementations have excellent performance
portability.
READ PAPER
Fig. 1. Data Domain Decomposition
9. 9
RESOURCES
Paper: A GPU-based algorithm for efficient LES of
high Reynolds number flows in heterogeneous
CPU/GPU supercomputers
Guillermo Oyarzun, Iason A. Chalmoukis, Georgios A. Leftheriotis,
and Athanassios A.Dimas
Αn optimized MPI+OpenACC implementation model that performs efficiently in CPU/GPU
systems using large-eddy simulation is presented. The code was validated for the
simulation of wave boundary-layer flows against numerical and experimental data in the
literature. A direct Fast-Fourier-Transform-based solver was developed for the solution of
the Poisson equation for pressure taking advantage of the periodic boundary conditions.
This solver was optimized for parallel execution in CPUs and outperforms by 10 times in
computational time a typical iterative preconditioned conjugate gradient solver in GPUs. In
terms of parallel performance, an overlapping strategy was developed to reduce the
overhead of performing MPI communications using GPUs. As a result, the weak scaling of
the algorithm was improved up to 30%. Finally, a large-scale simulation (Re = 2 × 105)
using a grid of 4 × 108 cells was executed, and the performance of the code was analyzed.
The simulation was launched using up to 512 nodes (512 GPUs + 6144 CPU-cores) on
one of the current top 10 supercomputers of the world (Piz Daint). A comparison of the
overall computational time showed that the GPU version was 4.2 times faster than the
CPU one. The parallel efficiency of this strategy (47%) is competitive compared with the
state-of-the-art CPU implementations, and it has the potential to take advantage of modern
supercomputing capabilities.
READ PAPER
10. 10
RESOURCES
Books, eBooks and online courses: InformIT
VISIT SITE
InformIT, a part of Pearson, is your one-stop resource for
Addison-Wesley DRM-free eBooks and video courses for
learning tech skills including game development,
programming, and data engineering.
Through the end of 2020, InformIT is offering the community
35% off books or eBooks and 50% off video courses with
coupon code: NVIDIA.
11. 11
RESOURCES
Website: GPUHackathons.org
Technical Resources
VISIT SITE
Explore a wealth of resources for GPU-accelerated
computing across HPC, AI and Big Data.
Review a collection of videos, presentations, GitHub repos,
tutorials, libraries and more to help you advance your skills
and expand your knowledge.
12. 12
STAY IN THE KNOW:
JOIN THE OPENACC COMMUNITY
JOIN TODAY
The OpenACC specification is designed for, and
by, users meaning that the OpenACC organization
relies on our users’ active participation to shape
the specification and to educate the scientific
community on its use.
Take an active role in influencing the future of both
the OpenACC specification and the organization
itself by becoming a member of the community.