SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
PERFORMANCE-STUDIES OF A MOLECULAR DYNAMICS CODE
                                                           Evaluating Serial, Thread and Process-Parallel Performance
         Molecular Dynamics                                                                                                                                                                                                                                                          Collaboration between
A molecular dynamics code simulating the                                                                                                                                                                                                                                                  PTI and ZIH
diffusion in dense nuclear matter in white dwarf
stars is analyzed in this collaboration. The code                                                                                                                           Planning                                                                                            In 2008 a collaboration between the Technische
is highly configurable allowing MPI, OpenMP, or                                                                                                                                                                                                                                  Universität Dresden, Center for Information
hybrid runs and additional fine tuning with a                                                                                                                                                                                                                                    Services and High Performance Computing
range of parameters.                                                                                                                                                                                                                                                            (ZIH), in Germany and the Indiana University,




                                                                                                                                                                                                                 Im h VampirTrace
                                                                                                                                                                                                                                                                                Pervasive Technology Institute (PTI), was




                                                                                                                                                                                                                   wit
                                                                                                                                                                                                                           plem
                                                                                                                                                                                                                                                                                founded to mutually benefit from common




                                                                                                                                                   valuation
                                                                                                                                                         r
                                                                                                                                                      Vampi
                                                                                                                                                                                                                                                                                research areas.




                                                                                                                                                                                                                               entation
                    Serial Analysis




                                                                                                                                                         with
                                                                                                                                                                                                                                                                                These research topics include:
The first step in the code analysis is to identify                                                                                                                                                                                                                                 ‣ Data-centric computing




                                                                                                                                                  E
the best performing parameter set. This                                                                                                                                                                                                                                           ‣ Computing for biological and life sciences
configuration represents the most promising                                                                                                                                                                                                                                        ‣ Wide area distributed file systems
candidate for further parallel analysis.                                                                                                                                                                                                                                          ‣ Parallel computing performance
                                                                                                                                                                                 Testing
                                                                                                                                                                                                                                                                                This collaboration involves ZIH maintaining a
                Parallel Analysis                                                                                                                                                                                                                                               stack of performance evaluation tools inside the
                                                                                                                                                                                                                                                                                NSF FutureGrid project. For the analysis an
Aim of the parallel analysis is to measure the                                                                                                                                                                                                                                  HPC system called Xray, a Cray XT5m which is
scalability limits of the different parallel code                                                                                                                        Authors                                                                                                part of the FutureGrid hardware, was used.
implementations and to detect bottlenecks                                                                                                                                                                                                                                       Xray consists of Quad-core 64-bit AMD
possibly preventing further parallel efficiency.                                                                        T. William, M. Weber, D. Röhrig - ZIH, TU Dresden                                                                                                        Opteron series 2000 processors. PGI compilers
This work has been done with the parallel                                                                              D. K. Berry, R. Henschel - UITS, IU Bloomington                                                                                                          version 9.0.4 and the optimized xtpe-barcelona
analysis framework Vampir.                                                                                             J. Hughto, A. S. Schneider, C. J. Horowitz - Physics, IU Bloomington                                                                                     module provided by Cray have been used.

This document was developed with support from the National Science Foundation (NSF) under Grant No. 0910812 to Indiana University for "FutureGrid: An Experimental, High-Performance Grid Test-bed." Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.

                                                                                                                Technische Universität Dresden                                                                                                                   IU Bloomington
                                                                                                                Zentrum für Informationsdienste                                                                                                                  Pervasive Technology Institute
                                                                                                                und Hochleistungsrechnen (ZIH)
                                                                                                                                                                                                                                                                 2719 East 10th Street
                                                                                                                01062 Dresden, Germany                                                                                                                           Bloomington, Indiana 47408

                                                                                                                E-mail: contact@vampir.eu                                                                                                                        E-mail: pti@indiana.edu
                                                                                                                Web: http://www.vampir.eu/                                                                                                                       Web: http://pti.iu.edu/
MOLECULAR DYNAMICS CODE
                                                                                         Overview




The analyzed molecular dynamics code (MD)                                                                                        The movie shows a 5125 (5k) particle
has been developed at Indiana University for                                                                                     simulation of oxygen ions which are forming
simulating dense nuclear matter in white dwarf                                                                                   microcrystals:
and neutron stars.
                                                                                                                                      ‣ 1280 Carbon ions
Matter in these compact stellar objects consists
of either completely ionized atoms, or else free                                                                                      ‣ 3740 Oxygen ions
neutrons and protons.
                                                                                                                                      ‣ 100 Neon ions
The MD code models these systems as classical
particles interacting via a screened Coulomb                                                                                     By studying the motion of ions over a long
interaction. The electrons are not modeled                                                                                       simulation time, one can calculate the self-
explicitly, but are treated as a uniform                                                                                         diffusion coefficient of ions, an important
background charge that serves to screen the                                                                                      quantity for understanding the behavior of
Coulomb interaction.                                                                                                             white dwarfs.



                                                                                     MD-Code Ion Mix Movements




                                                   Technische Universität Dresden                                IU Bloomington
                                                   Zentrum für Informationsdienste                               Pervasive Technology Institute
                                                   und Hochleistungsrechnen (ZIH)
                                                                                                                 2719 East 10th Street
                                                   01062 Dresden, Germany                                        Bloomington, Indiana 47408

                                                   E-mail: contact@vampir.eu                                     E-mail: pti@indiana.edu
                                                   Web: http://www.vampir.eu/                                    Web: http://pti.iu.edu/
MOLECULAR DYNAMICS CODE
                                                                      Implementation Details

                                                                                                                                         During compilation the input parameter PP0x
MD is a hybrid MPI/OpenMP parallel code                                                                                                  decides what file is used for the simulation.
written in Fortran. Its main features are:                                                                                               Inside the PP0x files the parameters Ax and Bx
                                                                                                                                         denote different code-blocks that implement
   ‣ Simple particle-particle algorithm                                                                                                  equal semantics in different ways. Additionally,
   ‣ Long range interaction with large cut-off                                                                                           the file PP03 provides the define NBS to
     sphere                                                                                                                              influence the vector access length. Those
   ‣ Cutoff radius is half the box edge length                                                                                           parameters have been used throughout the
      ‣ Each ion interacts with about half the                                                                                           years to re-implement code utilizing newer
        others                                                                                                                           constructs of Fortran95 that had not been
   ‣ Interactions are calculated in a pair of                                                                                            available in Fortran77.
     nested do loops
      ‣ Outer loop goes over "target" particles                                                                                          PP01:
      ‣ Inner loop goes over "source" particles                                                                                            ‣ Original implementation
   ‣ Targets are assigned in a round-robin                                                                                                 ‣ No division into Ax, Bx, or NBS
     fashion to MPI processes                                                                                                              ‣ Baseline for other measurements
      ‣ Within each MPI process, the work is
        shared among OpenMP threads                                                                                                      PP02:
                                                                                                                                           ‣ A0, A1, A2
The MD code is highly modular, consisting of                                                                                               ‣ B0, B1, B2
several different implementations of the same                                                                                              ‣ No NBS
simulation logic . The par ticle-par ticle                  Code block in the PP03 file showing different implementations
interaction semantics have been implemented                   of the same loop selectable by Ax preprocessor macros
                                                                                                                                         PP03:
multiple times in different ways. Each                                                                                                     ‣ A0, A1, A2
implementation is stored in an individual source                                                                                           ‣ B0, B1, B2, B3, B4, B5, B6, B7
file, called from PP01 to PP03.                                                                                                             ‣ NBS


                                                   Technische Universität Dresden                                          IU Bloomington
                                                   Zentrum für Informationsdienste                                         Pervasive Technology Institute
                                                   und Hochleistungsrechnen (ZIH)
                                                                                                                           2719 East 10th Street
                                                   01062 Dresden, Germany                                                  Bloomington, Indiana 47408

                                                   E-mail: contact@vampir.eu                                               E-mail: pti@indiana.edu
                                                   Web: http://www.vampir.eu/                                              Web: http://pti.iu.edu/
SCOPE OF THE CODE ANALYSIS
                                             Hardware Details & Parameter Sweep
All measurements done on FutureGrid                                             Time estimate per measurements and                        Serial analysis:
machine Xray:                                                                   particles:
                                                                                                                                               ‣ 55k particles
 ‣   Cray XT5m                                                                        5k !    ~ 300 seconds!   ~5 minutes                      ‣ Comparing optimization flags:
 ‣   672 cores                                                                        27k !   ~ 4000 seconds! ~1 h                        !        O2
 ‣   84 nodes                                                                         55k !   ~ 35000 seconds! ~10 h                      !        O3
 ‣   2 CPU per node                                                                                                                       !        fastsse
 ‣   Quad-Core AMD Opteron(tm) Processor 23 (C2)                                                                                               ‣ fastsse == -fast
 ‣   2.4 GHz                                                                                                                                   ‣ fast - Common optimizations;
 ‣   ...                                                                                                                                  !        -O2 -Munroll=c:1 -Mnoframe -Mlre
                                                                                                                                          !        -Mautoinline -Mvect=sse -Mscalarsse
                                                                                                                                          !        -Mcache_align -Mflushz


                              Particle-type                                          Loop combination                       Paralellism
                                nucleon-nucleon                                        Block A: ! !     A0-A2                 serial: ! md
                                ion pure                                               Block B: ! !     B1-B7                 OpenMP:! md_omp
                                ion mix                                                Code-blocking: ! NBS                   MPI: !    md_mpi
                                                                                                                              Hybrid: ! md_mpi_omp

                              Input parameter                                        Compiler Flags                         Code version
                                simulation type                                        O2                                     Original: ! !              PP01
                                number of particles                                    O3                                     Production: !              PP02
                                number of time steps                                   SSE                                    Research:! !               PP03
                                                                                       ....

                                                                >8100 possible combinations
                                                   Technische Universität Dresden                                                  IU Bloomington
                                                   Zentrum für Informationsdienste                                                 Pervasive Technology Institute
                                                   und Hochleistungsrechnen (ZIH)
                                                                                                                                   2719 East 10th Street
                                                   01062 Dresden, Germany                                                          Bloomington, Indiana 47408

                                                   E-mail: contact@vampir.eu                                                       E-mail: pti@indiana.edu
                                                   Web: http://www.vampir.eu/                                                      Web: http://pti.iu.edu/
SERIAL ANALYSIS
Identifying The Best Configuration For Parallel Runs




                                           Runtime for all code cominations
         Technische Universität Dresden                                       IU Bloomington
         Zentrum für Informationsdienste                                      Pervasive Technology Institute
         und Hochleistungsrechnen (ZIH)
                                                                              2719 East 10th Street
         01062 Dresden, Germany                                               Bloomington, Indiana 47408

         E-mail: contact@vampir.eu                                            E-mail: pti@indiana.edu
         Web: http://www.vampir.eu/                                           Web: http://pti.iu.edu/
SERIAL ANALYSIS
Identifying The Best Configuration For Parallel Runs




                                           Runtime in seconds

         Technische Universität Dresden                         IU Bloomington
         Zentrum für Informationsdienste                        Pervasive Technology Institute
         und Hochleistungsrechnen (ZIH)
                                                                2719 East 10th Street
         01062 Dresden, Germany                                 Bloomington, Indiana 47408

         E-mail: contact@vampir.eu                              E-mail: pti@indiana.edu
         Web: http://www.vampir.eu/                             Web: http://pti.iu.edu/
SERIAL ANALYSIS
PAPI Aided Analysis Of PP02 Code Versions




                                       PAPI_FP_INS



     Technische Universität Dresden                  IU Bloomington
     Zentrum für Informationsdienste                 Pervasive Technology Institute
     und Hochleistungsrechnen (ZIH)
                                                     2719 East 10th Street
     01062 Dresden, Germany                          Bloomington, Indiana 47408

     E-mail: contact@vampir.eu                       E-mail: pti@indiana.edu
     Web: http://www.vampir.eu/                      Web: http://pti.iu.edu/
SERIAL ANALYSIS
PAPI Aided Analysis Of PP02 Code Versions




                                       FPU Idle



     Technische Universität Dresden               IU Bloomington
     Zentrum für Informationsdienste              Pervasive Technology Institute
     und Hochleistungsrechnen (ZIH)
                                                  2719 East 10th Street
     01062 Dresden, Germany                       Bloomington, Indiana 47408

     E-mail: contact@vampir.eu                    E-mail: pti@indiana.edu
     Web: http://www.vampir.eu/                   Web: http://pti.iu.edu/
SERIAL ANALYSIS
PAPI Aided Analysis Of PP02 Code Versions




                                       Branches mispredicted



     Technische Universität Dresden                            IU Bloomington
     Zentrum für Informationsdienste                           Pervasive Technology Institute
     und Hochleistungsrechnen (ZIH)
                                                               2719 East 10th Street
     01062 Dresden, Germany                                    Bloomington, Indiana 47408

     E-mail: contact@vampir.eu                                 E-mail: pti@indiana.edu
     Web: http://www.vampir.eu/                                Web: http://pti.iu.edu/
SERIAL ANALYSIS
PAPI Aided Analysis Of PP02 Code Versions




                                       PAPI_VEC_INS



     Technische Universität Dresden                   IU Bloomington
     Zentrum für Informationsdienste                  Pervasive Technology Institute
     und Hochleistungsrechnen (ZIH)
                                                      2719 East 10th Street
     01062 Dresden, Germany                           Bloomington, Indiana 47408

     E-mail: contact@vampir.eu                        E-mail: pti@indiana.edu
     Web: http://www.vampir.eu/                       Web: http://pti.iu.edu/
SERIAL ANALYSIS
PAPI Aided Analysis Of PP02 Code Versions




                                       L1 hit ratio



     Technische Universität Dresden                   IU Bloomington
     Zentrum für Informationsdienste                  Pervasive Technology Institute
     und Hochleistungsrechnen (ZIH)
                                                      2719 East 10th Street
     01062 Dresden, Germany                           Bloomington, Indiana 47408

     E-mail: contact@vampir.eu                        E-mail: pti@indiana.edu
     Web: http://www.vampir.eu/                       Web: http://pti.iu.edu/
SERIAL ANALYSIS
PAPI Aided Analysis Of PP02 Code Versions




                                       PAPI_TO_CYC



     Technische Universität Dresden                  IU Bloomington
     Zentrum für Informationsdienste                 Pervasive Technology Institute
     und Hochleistungsrechnen (ZIH)
                                                     2719 East 10th Street
     01062 Dresden, Germany                          Bloomington, Indiana 47408

     E-mail: contact@vampir.eu                       E-mail: pti@indiana.edu
     Web: http://www.vampir.eu/                      Web: http://pti.iu.edu/
SOURCE CODE ANALYSIS
                       Codeblocks Ax & Bx For Ion-Mix (PP02)



A0:
!   !   Branching

A1:
!   !   Arithmetic

A2:
!   !   Array syntax




                                                             Block A




                         Technische Universität Dresden    IU Bloomington
                         Zentrum für Informationsdienste   Pervasive Technology Institute
                         und Hochleistungsrechnen (ZIH)
                                                           2719 East 10th Street
                         01062 Dresden, Germany            Bloomington, Indiana 47408

                         E-mail: contact@vampir.eu         E-mail: pti@indiana.edu
                         Web: http://www.vampir.eu/        Web: http://pti.iu.edu/
SOURCE CODE ANALYSIS
   Codeblocks Ax & Bx For Ion-Mix (PP02)



                                            B0:
                                            !   !              Loop

                                            B1:
                                            !   !              No Cut-off Sphere

                                            B2:
                                            !   !              Cut-off Sphere




Block B




          Technische Universität Dresden       IU Bloomington
          Zentrum für Informationsdienste      Pervasive Technology Institute
          und Hochleistungsrechnen (ZIH)
                                               2719 East 10th Street
          01062 Dresden, Germany               Bloomington, Indiana 47408

          E-mail: contact@vampir.eu            E-mail: pti@indiana.edu
          Web: http://www.vampir.eu/           Web: http://pti.iu.edu/
SOURCE CODE ANALYSIS
   Codeblocks Ax & Bx For Ion-Mix (PP02)



                                                    B0:
                                                    !   !              Loop

                                                    B1:
                                                    !   !              No Cut-off Sphere

                                                    B2:
                                                    !   !              Cut-off Sphere




                                            Result: "Calculation does not beat branching"

Block B




          Technische Universität Dresden               IU Bloomington
          Zentrum für Informationsdienste              Pervasive Technology Institute
          und Hochleistungsrechnen (ZIH)
                                                       2719 East 10th Street
          01062 Dresden, Germany                       Bloomington, Indiana 47408

          E-mail: contact@vampir.eu                    E-mail: pti@indiana.edu
          Web: http://www.vampir.eu/                   Web: http://pti.iu.edu/
PARALLEL ANALYSIS
                                                                 First Scalability Evaluation



                                                                                                   The MD code is used in production with particle counts between 27k and
                                                                                                   55k. Production runs currently use the PP02 implementation of particle-
                                                                                                   particle interaction semantics. The code blocks selected are A0 and B2, and
                                                                                                   the vector length (NBS) is set to 32. Measurements show that this
                                                                                                   configuration is very efficient and yields good performance.

                                                                                                   So far, runtimes of larger runs have only been analyzed using the hybrid
                                                                                                   version. The chart shows speedups on a large Cray XT5 (Kraken) and the
                                                                                                   resulting efficiency for 27k and 55k particle runs. Up to 1152 cores, the
                                                                                                   efficiency is higher than 80%. At 4608 cores the efficiency drops below 50%
                                                                                                   for 27k particles. Using more ions shows the effect of weak scaling that can
                                                                                                   be used to optimize the efficiency so that even on 4608 cores an efficiency
                                                                                                   of 58% can be achieved.

                                                                                                   Although these results are very promising, Kraken has 112,896 compute
                                                                                                   cores that could be used for computation. The mid term goal is therefore
                                                                                                   to optimize the MD code to achieve 50% parallel efficiency for 16000
                                                                                                   cores. To achieve this we will take a more detailed look at the 4 different
                                                                                                   versions of the code (serial, OpenMP, MPI and hybrid).

The code is very efficient for multi-core processors such as the 6-core processors on Kraken, the
 Cray XT5 at The National Institute for Computational Sciences. Each processor is assigned one
       MPI process. The work of each process is then shared among 6 OpenMP threads.


                                                    Technische Universität Dresden                                      IU Bloomington
                                                    Zentrum für Informationsdienste                                     Pervasive Technology Institute
                                                    und Hochleistungsrechnen (ZIH)
                                                                                                                        2719 East 10th Street
                                                    01062 Dresden, Germany                                              Bloomington, Indiana 47408

                                                    E-mail: contact@vampir.eu                                           E-mail: pti@indiana.edu
                                                    Web: http://www.vampir.eu/                                          Web: http://pti.iu.edu/
PARALLEL ANALYSIS WITH VAMPIR


                    ZIH has more then a decade of expertise in analyzing
                    sequential and parallel codes. As part of the
                    collaboration with IU, ZIH applied its analysis software
                    Vampir to the different versions of MD.


                    Vampir consists of a tracing framework called
                    VampirTrace that is able to monitor all levels of
                    parallelism of the MD code. Additionally PAPI
                    counters are recorded in order to analyze the cache
                    behavior of the main calculation loop.


                    In a first step traces of the four versions were
                    generated using only 5k particles to get a broad
                    overview. The next step will be to refine the tracing
                    mechanism using manually instrumented code to avoid
                    the rather huge overhead introduced by OpenMP
                    tracing. This will enable monitoring of at least 4096
                    cores to get an understanding why the efficiency
                    drops below 50% at this core count.




    Technische Universität Dresden                                             IU Bloomington
    Zentrum für Informationsdienste                                            Pervasive Technology Institute
    und Hochleistungsrechnen (ZIH)
                                                                               2719 East 10th Street
    01062 Dresden, Germany                                                     Bloomington, Indiana 47408

    E-mail: contact@vampir.eu                                                  E-mail: pti@indiana.edu
    Web: http://www.vampir.eu/                                                 Web: http://pti.iu.edu/
OPENMP CODE OPTIMIZATION




           Comparison of OpenMP Versions using 1 Node and 1-8 Cores
  Technische Universität Dresden                                      IU Bloomington
  Zentrum für Informationsdienste                                     Pervasive Technology Institute
  und Hochleistungsrechnen (ZIH)
                                                                      2719 East 10th Street
  01062 Dresden, Germany                                              Bloomington, Indiana 47408

  E-mail: contact@vampir.eu                                           E-mail: pti@indiana.edu
  Web: http://www.vampir.eu/                                          Web: http://pti.iu.edu/
OPENMP CODE OPTIMIZATION




           Comparison of OpenMP Versions using 1 Node and 1-8 Cores
  Technische Universität Dresden                                      IU Bloomington
  Zentrum für Informationsdienste                                     Pervasive Technology Institute
  und Hochleistungsrechnen (ZIH)
                                                                      2719 East 10th Street
  01062 Dresden, Germany                                              Bloomington, Indiana 47408

  E-mail: contact@vampir.eu                                           E-mail: pti@indiana.edu
  Web: http://www.vampir.eu/                                          Web: http://pti.iu.edu/
PARALLEL ANALYSIS WITH VAMPIR




                       55k particles, MPI only, 8 nodes, 1 process per node
    Technische Universität Dresden                                            IU Bloomington
    Zentrum für Informationsdienste                                           Pervasive Technology Institute
    und Hochleistungsrechnen (ZIH)
                                                                              2719 East 10th Street
    01062 Dresden, Germany                                                    Bloomington, Indiana 47408

    E-mail: contact@vampir.eu                                                 E-mail: pti@indiana.edu
    Web: http://www.vampir.eu/                                                Web: http://pti.iu.edu/
PARALLEL ANALYSIS WITH VAMPIR




                                      20+ loops of the A0_B2 block
    Technische Universität Dresden                                   IU Bloomington
    Zentrum für Informationsdienste                                  Pervasive Technology Institute
    und Hochleistungsrechnen (ZIH)
                                                                     2719 East 10th Street
    01062 Dresden, Germany                                           Bloomington, Indiana 47408

    E-mail: contact@vampir.eu                                        E-mail: pti@indiana.edu
    Web: http://www.vampir.eu/                                       Web: http://pti.iu.edu/
PARALLEL ANALYSIS WITH VAMPIR




                     55k particles, MPI only, 1 nodes, 8 processes per node
    Technische Universität Dresden                                            IU Bloomington
    Zentrum für Informationsdienste                                           Pervasive Technology Institute
    und Hochleistungsrechnen (ZIH)
                                                                              2719 East 10th Street
    01062 Dresden, Germany                                                    Bloomington, Indiana 47408

    E-mail: contact@vampir.eu                                                 E-mail: pti@indiana.edu
    Web: http://www.vampir.eu/                                                Web: http://pti.iu.edu/
PARALLEL ANALYSIS WITH VAMPIR




                                      20+ loops of the A0_B2 block
    Technische Universität Dresden                                   IU Bloomington
    Zentrum für Informationsdienste                                  Pervasive Technology Institute
    und Hochleistungsrechnen (ZIH)
                                                                     2719 East 10th Street
    01062 Dresden, Germany                                           Bloomington, Indiana 47408

    E-mail: contact@vampir.eu                                        E-mail: pti@indiana.edu
    Web: http://www.vampir.eu/                                       Web: http://pti.iu.edu/
PARALLEL ANALYSIS WITH VAMPIR




   55k particles, MPI only, 84 nodes, 8 process per node (672 core == full machine run)
        Technische Universität Dresden                                             IU Bloomington
        Zentrum für Informationsdienste                                            Pervasive Technology Institute
        und Hochleistungsrechnen (ZIH)
                                                                                   2719 East 10th Street
        01062 Dresden, Germany                                                     Bloomington, Indiana 47408

        E-mail: contact@vampir.eu                                                  E-mail: pti@indiana.edu
        Web: http://www.vampir.eu/                                                 Web: http://pti.iu.edu/
PARALLEL ANALYSIS WITH VAMPIR




                              switching of groups and flushing of VT buffers
    Technische Universität Dresden                                            IU Bloomington
    Zentrum für Informationsdienste                                           Pervasive Technology Institute
    und Hochleistungsrechnen (ZIH)
                                                                              2719 East 10th Street
    01062 Dresden, Germany                                                    Bloomington, Indiana 47408

    E-mail: contact@vampir.eu                                                 E-mail: pti@indiana.edu
    Web: http://www.vampir.eu/                                                Web: http://pti.iu.edu/
PARALLEL ANALYSIS WITH VAMPIR




                          55k particles, OpenMP only, 1 Node, 8 processes
    Technische Universität Dresden                                          IU Bloomington
    Zentrum für Informationsdienste                                         Pervasive Technology Institute
    und Hochleistungsrechnen (ZIH)
                                                                            2719 East 10th Street
    01062 Dresden, Germany                                                  Bloomington, Indiana 47408

    E-mail: contact@vampir.eu                                               E-mail: pti@indiana.edu
    Web: http://www.vampir.eu/                                              Web: http://pti.iu.edu/
PARALLEL ANALYSIS WITH VAMPIR




55k particles, Hybrid, 84 MPI processes (1 per Node), 8 Threads per process (672 core == full machine run)
                 Technische Universität Dresden                                           IU Bloomington
                 Zentrum für Informationsdienste                                          Pervasive Technology Institute
                 und Hochleistungsrechnen (ZIH)
                                                                                          2719 East 10th Street
                 01062 Dresden, Germany                                                   Bloomington, Indiana 47408

                 E-mail: contact@vampir.eu                                                E-mail: pti@indiana.edu
                 Web: http://www.vampir.eu/                                               Web: http://pti.iu.edu/
PARALLEL ANALYSIS WITH VAMPIR




                                      Second group, 100 runs
    Technische Universität Dresden                             IU Bloomington
    Zentrum für Informationsdienste                            Pervasive Technology Institute
    und Hochleistungsrechnen (ZIH)
                                                               2719 East 10th Street
    01062 Dresden, Germany                                     Bloomington, Indiana 47408

    E-mail: contact@vampir.eu                                  E-mail: pti@indiana.edu
    Web: http://www.vampir.eu/                                 Web: http://pti.iu.edu/
SCALABILITY STUDIES ON XRAY




1 Node   8 Nodes

                                                     Parallel Efficiency (MPI-only runs)
                   Technische Universität Dresden                                         IU Bloomington
                   Zentrum für Informationsdienste                                        Pervasive Technology Institute
                   und Hochleistungsrechnen (ZIH)
                                                                                          2719 East 10th Street
                   01062 Dresden, Germany                                                 Bloomington, Indiana 47408

                   E-mail: contact@vampir.eu                                              E-mail: pti@indiana.edu
                   Web: http://www.vampir.eu/                                             Web: http://pti.iu.edu/
SCALABILITY STUDIES ON XRAY




                                     Parallel Efficiency (Hybrid runs)
   Technische Universität Dresden                                       IU Bloomington
   Zentrum für Informationsdienste                                      Pervasive Technology Institute
   und Hochleistungsrechnen (ZIH)
                                                                        2719 East 10th Street
   01062 Dresden, Germany                                               Bloomington, Indiana 47408

   E-mail: contact@vampir.eu                                            E-mail: pti@indiana.edu
   Web: http://www.vampir.eu/                                           Web: http://pti.iu.edu/
 CONCLUSION AND FUTURE WORK

To find the optimal serial version of MD, different combinations of the code
blocks and compiler flags were tested. The results from the serial analysis built
the starting point for the parallel analysis.

Vampir was then successfully applied to the code making it possible to visualize                       Planning
MPI and OpenMP parallelism.

An important step for future parallel analysis is to embed PAPI counter directly




                                                                                                                                        Im h VampirTrace
in the MD code to avoid the overhead caused by VampirTrace. This enables the




                                                                                                                                          wit
                                                                                                                                                  plem
comparison of the actually achieved performance with the theoretical peak




                                                                                           valuation
performance using larger core counts.




                                                                                                r
                                                                                              Vampi




                                                                                                                                                      entation
Additional tests with larger particle counts are planned to exploit the weak




                                                                                                with
scaling capabilities of the code. The hybrid implementation is to be run in
different process/thread configurations to research wether fitting MPI/OpenMP




                                                                                          E
usage to the hardware characteristics can further increase performance.

Analysis tests will be re-run on Kraken with up to 16386 cores. The goal is to
detect scalability issues and to significantly improve parallel efficiency.                                  Te s ti n g
The MD code is currently being ported to GPGPUs. Vampir already supports
the analysis of CUDA parallelized code. This analysis support will enable efficient
ported code allowing for the utilization of the maximum possible potential
provided by GPGPUs.



                                                        Technische Universität Dresden                 IU Bloomington
                                                        Zentrum für Informationsdienste                Pervasive Technology Institute
                                                        und Hochleistungsrechnen (ZIH)
                                                                                                       2719 East 10th Street
                                                        01062 Dresden, Germany                         Bloomington, Indiana 47408

                                                        E-mail: contact@vampir.eu                      E-mail: pti@indiana.edu
                                                        Web: http://www.vampir.eu/                     Web: http://pti.iu.edu/
PERFORMANCE-STUDIES-OF-MOLECULAR-DYNAMICS-CODE

Mais conteúdo relacionado

Destaque (7)

Berkheimer portfolio
Berkheimer portfolioBerkheimer portfolio
Berkheimer portfolio
 
ICTD Aviation
ICTD  AviationICTD  Aviation
ICTD Aviation
 
Mm platform overview enterprise 2012 (2)
Mm platform overview enterprise 2012 (2)Mm platform overview enterprise 2012 (2)
Mm platform overview enterprise 2012 (2)
 
Antioch college's history of innovation
Antioch college's history of innovationAntioch college's history of innovation
Antioch college's history of innovation
 
Uh Concepts 5 4 2012 Compressed
Uh Concepts 5 4 2012 CompressedUh Concepts 5 4 2012 Compressed
Uh Concepts 5 4 2012 Compressed
 
The Pennsylvania State University (Penn State)
The Pennsylvania State University (Penn State)The Pennsylvania State University (Penn State)
The Pennsylvania State University (Penn State)
 
Adv app
Adv appAdv app
Adv app
 

Semelhante a PERFORMANCE-STUDIES-OF-MOLECULAR-DYNAMICS-CODE

Hybrid Publishing Consortium
Hybrid Publishing ConsortiumHybrid Publishing Consortium
Hybrid Publishing ConsortiumSimon Worthington
 
Adaptation to the survey ICT-H in Canary Islands of the Dual Frame Methodology
Adaptation to the survey ICT-H in Canary Islands of the Dual Frame Methodology Adaptation to the survey ICT-H in Canary Islands of the Dual Frame Methodology
Adaptation to the survey ICT-H in Canary Islands of the Dual Frame Methodology Instituto Canario de Estadística (ISTAC)
 
Automotive Cockpit HMI
Automotive Cockpit HMIAutomotive Cockpit HMI
Automotive Cockpit HMITorben Haagh
 
2011 Game Changer Presentation Agenda
2011 Game Changer Presentation Agenda2011 Game Changer Presentation Agenda
2011 Game Changer Presentation AgendaDr. Jimmy Schwarzkopf
 
Parametric studies for the AEC domain using InteliGrid platform
Parametric studies for the AEC domain using InteliGrid platformParametric studies for the AEC domain using InteliGrid platform
Parametric studies for the AEC domain using InteliGrid platformMatevz Dolenc
 
Java on the real-time hardware implementation feasibility of joint radio res...
Java  on the real-time hardware implementation feasibility of joint radio res...Java  on the real-time hardware implementation feasibility of joint radio res...
Java on the real-time hardware implementation feasibility of joint radio res...ecwayerode
 
What deep learning can bring to...
What deep learning can bring to...What deep learning can bring to...
What deep learning can bring to...Gautier Marti
 
Introduction to the solar med atlas
Introduction to the solar med atlasIntroduction to the solar med atlas
Introduction to the solar med atlasRCREEE
 
Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...IJECEIAES
 
Patchwork3 D Modules January 2009
Patchwork3 D Modules January 2009Patchwork3 D Modules January 2009
Patchwork3 D Modules January 2009CAM3DRT
 
Indoor multi operator solutions - Network sharing and Outsourcing
Indoor multi operator solutions - Network sharing and OutsourcingIndoor multi operator solutions - Network sharing and Outsourcing
Indoor multi operator solutions - Network sharing and OutsourcingAmirhossein Ghanbari
 

Semelhante a PERFORMANCE-STUDIES-OF-MOLECULAR-DYNAMICS-CODE (19)

Hybrid Publishing Consortium
Hybrid Publishing ConsortiumHybrid Publishing Consortium
Hybrid Publishing Consortium
 
Srg Industry Analysis Primer Web Version
Srg Industry Analysis Primer Web VersionSrg Industry Analysis Primer Web Version
Srg Industry Analysis Primer Web Version
 
Adaptation to the survey ICT-H in Canary Islands of the Dual Frame Methodology
Adaptation to the survey ICT-H in Canary Islands of the Dual Frame Methodology Adaptation to the survey ICT-H in Canary Islands of the Dual Frame Methodology
Adaptation to the survey ICT-H in Canary Islands of the Dual Frame Methodology
 
Ivan Combined
Ivan CombinedIvan Combined
Ivan Combined
 
Automotive Cockpit HMI
Automotive Cockpit HMIAutomotive Cockpit HMI
Automotive Cockpit HMI
 
2011 Game Changer Presentation Agenda
2011 Game Changer Presentation Agenda2011 Game Changer Presentation Agenda
2011 Game Changer Presentation Agenda
 
Parametric studies for the AEC domain using InteliGrid platform
Parametric studies for the AEC domain using InteliGrid platformParametric studies for the AEC domain using InteliGrid platform
Parametric studies for the AEC domain using InteliGrid platform
 
Cmap3
Cmap3Cmap3
Cmap3
 
Java on the real-time hardware implementation feasibility of joint radio res...
Java  on the real-time hardware implementation feasibility of joint radio res...Java  on the real-time hardware implementation feasibility of joint radio res...
Java on the real-time hardware implementation feasibility of joint radio res...
 
Data-Intensive Research
Data-Intensive ResearchData-Intensive Research
Data-Intensive Research
 
What deep learning can bring to...
What deep learning can bring to...What deep learning can bring to...
What deep learning can bring to...
 
Introduction to the solar med atlas
Introduction to the solar med atlasIntroduction to the solar med atlas
Introduction to the solar med atlas
 
Patel Work Skills
Patel Work SkillsPatel Work Skills
Patel Work Skills
 
DAC 2012
DAC 2012DAC 2012
DAC 2012
 
Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...
 
TInA tool box
TInA tool boxTInA tool box
TInA tool box
 
Patchwork3 D Modules January 2009
Patchwork3 D Modules January 2009Patchwork3 D Modules January 2009
Patchwork3 D Modules January 2009
 
241 250
241 250241 250
241 250
 
Indoor multi operator solutions - Network sharing and Outsourcing
Indoor multi operator solutions - Network sharing and OutsourcingIndoor multi operator solutions - Network sharing and Outsourcing
Indoor multi operator solutions - Network sharing and Outsourcing
 

Último

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

PERFORMANCE-STUDIES-OF-MOLECULAR-DYNAMICS-CODE

  • 1. PERFORMANCE-STUDIES OF A MOLECULAR DYNAMICS CODE Evaluating Serial, Thread and Process-Parallel Performance Molecular Dynamics Collaboration between A molecular dynamics code simulating the PTI and ZIH diffusion in dense nuclear matter in white dwarf stars is analyzed in this collaboration. The code Planning In 2008 a collaboration between the Technische is highly configurable allowing MPI, OpenMP, or Universität Dresden, Center for Information hybrid runs and additional fine tuning with a Services and High Performance Computing range of parameters. (ZIH), in Germany and the Indiana University, Im h VampirTrace Pervasive Technology Institute (PTI), was wit plem founded to mutually benefit from common valuation r Vampi research areas. entation Serial Analysis with These research topics include: The first step in the code analysis is to identify ‣ Data-centric computing E the best performing parameter set. This ‣ Computing for biological and life sciences configuration represents the most promising ‣ Wide area distributed file systems candidate for further parallel analysis. ‣ Parallel computing performance Testing This collaboration involves ZIH maintaining a Parallel Analysis stack of performance evaluation tools inside the NSF FutureGrid project. For the analysis an Aim of the parallel analysis is to measure the HPC system called Xray, a Cray XT5m which is scalability limits of the different parallel code Authors part of the FutureGrid hardware, was used. implementations and to detect bottlenecks Xray consists of Quad-core 64-bit AMD possibly preventing further parallel efficiency. T. William, M. Weber, D. Röhrig - ZIH, TU Dresden Opteron series 2000 processors. PGI compilers This work has been done with the parallel D. K. Berry, R. Henschel - UITS, IU Bloomington version 9.0.4 and the optimized xtpe-barcelona analysis framework Vampir. J. Hughto, A. S. Schneider, C. J. Horowitz - Physics, IU Bloomington module provided by Cray have been used. This document was developed with support from the National Science Foundation (NSF) under Grant No. 0910812 to Indiana University for "FutureGrid: An Experimental, High-Performance Grid Test-bed." Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF. Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 2. MOLECULAR DYNAMICS CODE Overview The analyzed molecular dynamics code (MD) The movie shows a 5125 (5k) particle has been developed at Indiana University for simulation of oxygen ions which are forming simulating dense nuclear matter in white dwarf microcrystals: and neutron stars. ‣ 1280 Carbon ions Matter in these compact stellar objects consists of either completely ionized atoms, or else free ‣ 3740 Oxygen ions neutrons and protons. ‣ 100 Neon ions The MD code models these systems as classical particles interacting via a screened Coulomb By studying the motion of ions over a long interaction. The electrons are not modeled simulation time, one can calculate the self- explicitly, but are treated as a uniform diffusion coefficient of ions, an important background charge that serves to screen the quantity for understanding the behavior of Coulomb interaction. white dwarfs. MD-Code Ion Mix Movements Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 3. MOLECULAR DYNAMICS CODE Implementation Details During compilation the input parameter PP0x MD is a hybrid MPI/OpenMP parallel code decides what file is used for the simulation. written in Fortran. Its main features are: Inside the PP0x files the parameters Ax and Bx denote different code-blocks that implement ‣ Simple particle-particle algorithm equal semantics in different ways. Additionally, ‣ Long range interaction with large cut-off the file PP03 provides the define NBS to sphere influence the vector access length. Those ‣ Cutoff radius is half the box edge length parameters have been used throughout the ‣ Each ion interacts with about half the years to re-implement code utilizing newer others constructs of Fortran95 that had not been ‣ Interactions are calculated in a pair of available in Fortran77. nested do loops ‣ Outer loop goes over "target" particles PP01: ‣ Inner loop goes over "source" particles ‣ Original implementation ‣ Targets are assigned in a round-robin ‣ No division into Ax, Bx, or NBS fashion to MPI processes ‣ Baseline for other measurements ‣ Within each MPI process, the work is shared among OpenMP threads PP02: ‣ A0, A1, A2 The MD code is highly modular, consisting of ‣ B0, B1, B2 several different implementations of the same ‣ No NBS simulation logic . The par ticle-par ticle Code block in the PP03 file showing different implementations interaction semantics have been implemented of the same loop selectable by Ax preprocessor macros PP03: multiple times in different ways. Each ‣ A0, A1, A2 implementation is stored in an individual source ‣ B0, B1, B2, B3, B4, B5, B6, B7 file, called from PP01 to PP03. ‣ NBS Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 4. SCOPE OF THE CODE ANALYSIS Hardware Details & Parameter Sweep All measurements done on FutureGrid Time estimate per measurements and Serial analysis: machine Xray: particles: ‣ 55k particles ‣ Cray XT5m 5k ! ~ 300 seconds! ~5 minutes ‣ Comparing optimization flags: ‣ 672 cores 27k ! ~ 4000 seconds! ~1 h ! O2 ‣ 84 nodes 55k ! ~ 35000 seconds! ~10 h ! O3 ‣ 2 CPU per node ! fastsse ‣ Quad-Core AMD Opteron(tm) Processor 23 (C2) ‣ fastsse == -fast ‣ 2.4 GHz ‣ fast - Common optimizations; ‣ ... ! -O2 -Munroll=c:1 -Mnoframe -Mlre ! -Mautoinline -Mvect=sse -Mscalarsse ! -Mcache_align -Mflushz Particle-type Loop combination Paralellism nucleon-nucleon Block A: ! ! A0-A2 serial: ! md ion pure Block B: ! ! B1-B7 OpenMP:! md_omp ion mix Code-blocking: ! NBS MPI: ! md_mpi Hybrid: ! md_mpi_omp Input parameter Compiler Flags Code version simulation type O2 Original: ! ! PP01 number of particles O3 Production: ! PP02 number of time steps SSE Research:! ! PP03 .... >8100 possible combinations Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 5. SERIAL ANALYSIS Identifying The Best Configuration For Parallel Runs Runtime for all code cominations Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 6. SERIAL ANALYSIS Identifying The Best Configuration For Parallel Runs Runtime in seconds Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 7. SERIAL ANALYSIS PAPI Aided Analysis Of PP02 Code Versions PAPI_FP_INS Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 8. SERIAL ANALYSIS PAPI Aided Analysis Of PP02 Code Versions FPU Idle Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 9. SERIAL ANALYSIS PAPI Aided Analysis Of PP02 Code Versions Branches mispredicted Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 10. SERIAL ANALYSIS PAPI Aided Analysis Of PP02 Code Versions PAPI_VEC_INS Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 11. SERIAL ANALYSIS PAPI Aided Analysis Of PP02 Code Versions L1 hit ratio Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 12. SERIAL ANALYSIS PAPI Aided Analysis Of PP02 Code Versions PAPI_TO_CYC Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 13. SOURCE CODE ANALYSIS Codeblocks Ax & Bx For Ion-Mix (PP02) A0: ! ! Branching A1: ! ! Arithmetic A2: ! ! Array syntax Block A Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 14. SOURCE CODE ANALYSIS Codeblocks Ax & Bx For Ion-Mix (PP02) B0: ! ! Loop B1: ! ! No Cut-off Sphere B2: ! ! Cut-off Sphere Block B Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 15. SOURCE CODE ANALYSIS Codeblocks Ax & Bx For Ion-Mix (PP02) B0: ! ! Loop B1: ! ! No Cut-off Sphere B2: ! ! Cut-off Sphere Result: "Calculation does not beat branching" Block B Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 16. PARALLEL ANALYSIS First Scalability Evaluation The MD code is used in production with particle counts between 27k and 55k. Production runs currently use the PP02 implementation of particle- particle interaction semantics. The code blocks selected are A0 and B2, and the vector length (NBS) is set to 32. Measurements show that this configuration is very efficient and yields good performance. So far, runtimes of larger runs have only been analyzed using the hybrid version. The chart shows speedups on a large Cray XT5 (Kraken) and the resulting efficiency for 27k and 55k particle runs. Up to 1152 cores, the efficiency is higher than 80%. At 4608 cores the efficiency drops below 50% for 27k particles. Using more ions shows the effect of weak scaling that can be used to optimize the efficiency so that even on 4608 cores an efficiency of 58% can be achieved. Although these results are very promising, Kraken has 112,896 compute cores that could be used for computation. The mid term goal is therefore to optimize the MD code to achieve 50% parallel efficiency for 16000 cores. To achieve this we will take a more detailed look at the 4 different versions of the code (serial, OpenMP, MPI and hybrid). The code is very efficient for multi-core processors such as the 6-core processors on Kraken, the Cray XT5 at The National Institute for Computational Sciences. Each processor is assigned one MPI process. The work of each process is then shared among 6 OpenMP threads. Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 17. PARALLEL ANALYSIS WITH VAMPIR ZIH has more then a decade of expertise in analyzing sequential and parallel codes. As part of the collaboration with IU, ZIH applied its analysis software Vampir to the different versions of MD. Vampir consists of a tracing framework called VampirTrace that is able to monitor all levels of parallelism of the MD code. Additionally PAPI counters are recorded in order to analyze the cache behavior of the main calculation loop. In a first step traces of the four versions were generated using only 5k particles to get a broad overview. The next step will be to refine the tracing mechanism using manually instrumented code to avoid the rather huge overhead introduced by OpenMP tracing. This will enable monitoring of at least 4096 cores to get an understanding why the efficiency drops below 50% at this core count. Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 18. OPENMP CODE OPTIMIZATION Comparison of OpenMP Versions using 1 Node and 1-8 Cores Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 19. OPENMP CODE OPTIMIZATION Comparison of OpenMP Versions using 1 Node and 1-8 Cores Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 20. PARALLEL ANALYSIS WITH VAMPIR 55k particles, MPI only, 8 nodes, 1 process per node Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 21. PARALLEL ANALYSIS WITH VAMPIR 20+ loops of the A0_B2 block Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 22. PARALLEL ANALYSIS WITH VAMPIR 55k particles, MPI only, 1 nodes, 8 processes per node Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 23. PARALLEL ANALYSIS WITH VAMPIR 20+ loops of the A0_B2 block Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 24. PARALLEL ANALYSIS WITH VAMPIR 55k particles, MPI only, 84 nodes, 8 process per node (672 core == full machine run) Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 25. PARALLEL ANALYSIS WITH VAMPIR switching of groups and flushing of VT buffers Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 26. PARALLEL ANALYSIS WITH VAMPIR 55k particles, OpenMP only, 1 Node, 8 processes Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 27. PARALLEL ANALYSIS WITH VAMPIR 55k particles, Hybrid, 84 MPI processes (1 per Node), 8 Threads per process (672 core == full machine run) Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 28. PARALLEL ANALYSIS WITH VAMPIR Second group, 100 runs Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 29. SCALABILITY STUDIES ON XRAY 1 Node 8 Nodes Parallel Efficiency (MPI-only runs) Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 30. SCALABILITY STUDIES ON XRAY Parallel Efficiency (Hybrid runs) Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  • 31.  CONCLUSION AND FUTURE WORK To find the optimal serial version of MD, different combinations of the code blocks and compiler flags were tested. The results from the serial analysis built the starting point for the parallel analysis. Vampir was then successfully applied to the code making it possible to visualize Planning MPI and OpenMP parallelism. An important step for future parallel analysis is to embed PAPI counter directly Im h VampirTrace in the MD code to avoid the overhead caused by VampirTrace. This enables the wit plem comparison of the actually achieved performance with the theoretical peak valuation performance using larger core counts. r Vampi entation Additional tests with larger particle counts are planned to exploit the weak with scaling capabilities of the code. The hybrid implementation is to be run in different process/thread configurations to research wether fitting MPI/OpenMP E usage to the hardware characteristics can further increase performance. Analysis tests will be re-run on Kraken with up to 16386 cores. The goal is to detect scalability issues and to significantly improve parallel efficiency. Te s ti n g The MD code is currently being ported to GPGPUs. Vampir already supports the analysis of CUDA parallelized code. This analysis support will enable efficient ported code allowing for the utilization of the maximum possible potential provided by GPGPUs. Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/