Porting applications to Intel Xeon Phi: tips and experiences

Porting application to Intel Xeon Phi: some experiences

RIKEN Advanced Center for Computing and Communication
2012/11 Super Computing 2012 @ Intel Booth, Salt lake city, US

maho@riken.jp

Other side of my face
maho@FreeBSD.org (FreeBSD committer)
maho@apache.org (Apache OpenOffice committer)
2012/11 Super Computing 2012

12年11月15日木曜日

Aims of my talk

•Proof of concept:
- Intel says, “One source base, tuned to many targets”
- Is it true or not?
- my answer is TRUE.
•Native model is considered
- Just compile with Intel Composer XE 2013 :-)
- Offload model is extremely demanding for modern complicated programs
- CUDA expertise's say: to get performance, do everything on GPU, do not
transfer data between CPU and GPU.
- Modern applications use a lot of external open source / free software
packages. Very complex structure!
- Not realistic!
•Providing Porting tips
- Gaussian09, povray, sdpa... Super Computing 2012 @ Intel Booth


What is Intel Xeon Phi ??
• Intel Xeon Phi is a co-processor, connected via PCI-express slot.
• Peak performance is 1TFlops in double precision
- many cores : 64 cores, 4 threads each, 512bit AVX, GDDR5 8GB of RAM...
• We can see as if there are another cluster of computer inside a Linux box.
- Linux micro OS is provided
• Better programability
- x86 based (64bit)
- Development tool: Intel Composer XE 2013
- C, C++, Fortran
- compile and run same code to CPU
- familiar parallelism : OpenMP, MPI, OpenCL
- Various programming model
- MIC centric
- CPU centric
-CAUTION: BINARY IS INCOMPATIBLE!
-Recompile is needed for Xeon Phi!

Super Computing 2012 @ Intel Booth


How to build your program on Xeon Phi
•Very easy.
•Just passing -mmic flags to Compilers
-icc -mmic
-icpc -mmic
-ifort -mmic
•How to link against optimized BLAS and LAPACK?
-just add -mkl
-same for CPU case.



DGEMM benchmark: sorry, no free lunch, tune Needed.
• DGEMM is a matrix-matrix multiplication routine. It uses almost 100% of CPU
performance (if tuned) so it is used for benchmarking.
- not see the memory bandwidth
• Intel Xeon Phi’s theoretical peak performance is 1TFlops.
• Do we need some tunes for Intel Xeon Phi?
- YES. Otherwise 40% of peak is attained: ~400GFlops
- If tuned we attain ~816GFlops.
- memory allocation, thread affinity
• How to obtain the data?
- just malloc and fill random values
- no alignment is specified
- CPU’s case it is sufficient, but
- not sufficient for Xeon Phi.



SDPA : How to cheat “configure” part I
• SDPA is a highly efficient semidefinite programming solver.
- distributed at http://sdpa.sourceforge.net/, under GPL.
• ./configure ; make (on CPU)
• But Intel Composer XE 2013 for Xeon Phi is a cross-compiler... how to do this?
- almost the same environment...
- Two pass strategy. First pass, pass dummy “-DDMIC” to configure, then
replace to “-mmic”, then compile.
#!/bin/sh

CC="icc"; export CC
CXX="icpc"; export CXX
FC="ifort"; export FC

CFLAGS="-DMMIC" ; export CFLAGS
CXXFLAGS="-DMMIC" ; export CXXFLAGS
FFLAGS="-DMMIC" ; export FFLAGS

./configure --with-blas="-mkl" --with-lapack="-mkl"

files=$(find ./* -name Makefile)
perl -p -i -e 's/-DMMIC/-mmic/g' $files


Povray: how to cheat configure part II
• The Persistence of Vision Raytracer is a high-quality, totally free tool for
creating stunning three-dimensional graphics; a famous ray tracing program.
• This treat how to build Povray 3.7 RC
- This version is the first pthread parallelized Povray.
• Requires some external libraries other than provided to Intel Xeon Phi.



Povray: how to cheat configure : part II
• Prerequisites
- boost, zlib, jpeg, tiff and libpng.
- all libraries should be build for Phi :-( :-( :-(
• How to build boost and zlib: We took the same strategy as povray.
- First build and install host version of boost to /home/maho/HOST then Phi
version to /home/maho/MIC
- Next, build and install host version of zlib to /home/maho/HOST
- then, build Phi version as follows:
- backup /home/maho/MIC to /home/maho/MIC.org
- copy /home/maho/HOST to /home/maho/MIC
- run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS.
- be sure LD_LIBRARY_FLAGS points /home/maho/MIC!
- remove /home/maho/MIC
- rename /home/maho/MIC.org to /home/maho/MIC
- replace -DMMIC to -mmic
- make for Xeon Phi binary.
- Done.
• Building tiff and png for Phi is similar to above procedure. Super Computing 2012 @ Intel Booth

Povray: how to cheat configure : part II
• Prerequisites
- boost, zlib, jpeg, tiff and libpng.
- all libraries should be build for Phi :-( :-( :-(
• Strategy: do build twice: host build then Xeon Phi build
- build and install host version of libraries to /home/maho/HOST
- build and install Phi version of libraires to /home/maho/MIC
- actually,
• Final configure for Povray should be done as follows:
- backup /home/maho/MIC to /home/maho/MIC.org
- copy /home/maho/HOST to /home/maho/MIC
- run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS.
- be sure LD_LIBRARY_FLAGS points /home/maho/MIC!
- remove /home/maho/MIC
- rename /home/maho/MIC.org to /home/maho/MIC
- replace -DMMIC to -mmic
- make for Xeon Phi binary.
- Done.


Gaussian09 Partially Runs on Intel Xeon Phi!
• Gaussian09 is a famous quantum chemical program package and it provides state-
of the-art capabilities for electronic structure modeling.
• Very large source code: 1.7 million lines
- $ cat *F | wc -l
- 1714217
• Intel Composer XE is not officially supported compiler
- Gaussian Inc. only supports PGI compiler.
- Patches are made by M.N. (sorry, we cannot provide the patches to public)
- Small set of patches enable us to build
- -rw-r--r--. 1 maho users 463 1 30 10:53 2012 patch-bsd+buldg09
- -rw-r--r--. 1 maho users 692 1 30 10:53 2012 patch-bsd+fsplit.c
- -rw-r--r-- 1 maho users 5674 10 18 16:41 2012 patch-bsd+i386.make
- -rw-r--r--. 1 maho users 643 1 30 10:53 2012 patch-bsd+mdutil.F
- -rw-r--r--. 1 maho users 240 1 30 10:53 2012 patch-bsd+mygau
- -rw-r--r--. 1 maho users 486 1 30 10:53 2012 patch-bsd+set-mflags

- patches are almost the same as hosts’ one.
- almost merely adding -mmic
- somehow shared libs don’t work??
- utils.a should be a static library.
- Intel MKL should also be linked statically.
- shared libs of MKL should be located at /lib64? LD_LIBRARY_PATH doesn’t parsed?
- Resultant binaries occupy approximately 2GB Super Computing 2012 @ Intel Booth


Gaussian09 Partially Runs on Intel Xeon Phi!
• Just run
• Still very unstable with -O3
- l303.exe (just wish your luck)
- l401.exe (should be built with -O0)
- Passed:(just test000.com-test200.com)
test001,023,024,025,026,027,028,029,030,031,032,033,034,035,036,037,03
8,039,040,042,056,076,077,078,079,081,091,092,093,099,101,102,104,108,11
5,116,119,120,130,131,140,142,144,145,149,150,151,153,162,163,165,168,169,17
0,172,177,184,188,195



A packaging system (pkgsrc) porting effort on Intel Phi!!!

• What is the pkgsrc?
- pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over 12000
packages. It is used to enable freely available software to be configured and built easily on supported platforms; http://
www.pkgsrc.org/

• NAKATA, Maho has over ten years of FreeBSD ports committer experience.
• Why pkgsrc?
- We need MORE software packages on Intel Phi!
- Currently HPC program packages depend on other free software packages.
- RPM, deb are too complex (to me).
- Native tool chain for Intel Phi is really important
- ./configure (autotools) is a good one but cross building is rarely supported.
- ./configure looks some parameters of the host machine.
- Intel Composer can be used as if it is a native toolkit with a small trick.
- highly portable packaging system: works on *BSD (Net, DragonFly, Free),
various Linux variants, AIX, MacOSX, FreeBSD
• Status:
- ./bootstrap : done
• How to get?
- I’ll provide ASAP on sourceforge.net or somewhere...

Summary and outlook
• We tested Intel Xeon Phi, especially how to build Phi native binary.
-“One source base, tuned to many targets” is TRUE!
• We regard Intel Xeon Phi as a small Linux cluster.
- but no binary compatibility inbetween.
• We provided a porting tip; how to build gaussian, povray and sdpa.
• For packages using autotools (./configure) or similar things, our approach
requires two pass configure to cheat
- if configure looks Phi specific stuffs like availability of FMA, then this
strategy doesn’t work.
- Yoshikazu Kamoshida’s strategy solves for configure or build system which
requires run small programs on target machine (SWoPP 2012; Development of
middleware which facilitate tuning while installation under cross compile
environment).
• More packages are needed!
- Poring NetBSD’s pkgsrc might be good idea for cross compiling environment
like Intel Xeon Phi.
- pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over
12000 packages. It is used to enable freely available software to be configured and built easily on supported platforms;
http://www.pkgsrc.org/

Porting applications to Intel Xeon Phi: tips and experiences

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (6)

Semelhante a Porting applications to Intel Xeon Phi: tips and experiences

Semelhante a Porting applications to Intel Xeon Phi: tips and experiences (20)

Mais de Maho Nakata

Mais de Maho Nakata (20)

Último

Último (20)

Porting applications to Intel Xeon Phi: tips and experiences