SlideShare uma empresa Scribd logo
1 de 13
Baixar para ler offline
Porting application to Intel Xeon Phi: some experiences

    RIKEN Advanced Center for Computing and Communication
    2012/11 Super Computing 2012 @ Intel Booth, Salt lake city, US

    maho@riken.jp

    Other side of my face
    maho@FreeBSD.org (FreeBSD committer)
    maho@apache.org (Apache OpenOffice committer)
                                                                  2012/11 Super Computing 2012




12年11月15日木曜日
Aims of my talk

    •Proof of concept:
       - Intel says, “One source base, tuned to many targets”
      - Is it true or not?
         - my answer is TRUE.
    •Native model is considered
      - Just compile with Intel Composer XE 2013 :-)
      - Offload model is extremely demanding for modern complicated programs
         - CUDA expertise's say: to get performance, do everything on GPU, do not
           transfer data between CPU and GPU.
         - Modern applications use a lot of external open source / free software
           packages. Very complex structure!
         - Not realistic!
    •Providing Porting tips
     - Gaussian09, povray, sdpa...                            Super Computing 2012 @ Intel Booth

12年11月15日木曜日
What is Intel Xeon Phi ??
    • Intel Xeon Phi is a co-processor, connected via PCI-express slot.
    • Peak performance is 1TFlops in double precision
       - many cores : 64 cores, 4 threads each, 512bit AVX, GDDR5 8GB of RAM...
    • We can see as if there are another cluster of computer inside a Linux box.
       - Linux micro OS is provided
    • Better programability
       - x86 based (64bit)
       - Development tool: Intel Composer XE 2013
          - C, C++, Fortran
          - compile and run same code to CPU
          - familiar parallelism : OpenMP, MPI, OpenCL
       - Various programming model
          - MIC centric
          - CPU centric
       -CAUTION: BINARY IS INCOMPATIBLE!
       -Recompile is needed for Xeon Phi!

                                                              Super Computing 2012 @ Intel Booth

12年11月15日木曜日
How to build your program on Xeon Phi
    •Very easy.
    •Just passing -mmic flags to Compilers
      -icc -mmic
      -icpc -mmic
      -ifort -mmic
    •How to link against optimized BLAS and LAPACK?
      -just add -mkl
      -same for CPU case.




                                                      Super Computing 2012 @ Intel Booth

12年11月15日木曜日
DGEMM benchmark: sorry, no free lunch, tune Needed.
    • DGEMM is a matrix-matrix multiplication routine. It uses almost 100% of CPU
      performance (if tuned) so it is used for benchmarking.
       - not see the memory bandwidth
    • Intel Xeon Phi’s theoretical peak performance is 1TFlops.
    • Do we need some tunes for Intel Xeon Phi?
       - YES. Otherwise 40% of peak is attained: ~400GFlops
       - If tuned we attain ~816GFlops.
       - memory allocation, thread affinity
    • How to obtain the data?
       - just malloc and fill random values
       - no alignment is specified
       - CPU’s case it is sufficient, but
       - not sufficient for Xeon Phi.




                                                              Super Computing 2012 @ Intel Booth

12年11月15日木曜日
SDPA : How to cheat “configure” part I
    • SDPA is a highly efficient semidefinite programming solver.
       - distributed at http://sdpa.sourceforge.net/, under GPL.
    • ./configure ; make (on CPU)
    • But Intel Composer XE 2013 for Xeon Phi is a cross-compiler... how to do this?
       - almost the same environment...
       - Two pass strategy. First pass, pass dummy “-DDMIC” to configure, then
         replace to “-mmic”, then compile.
                           #!/bin/sh

                           CC="icc"; export CC
                           CXX="icpc"; export CXX
                           FC="ifort"; export FC

                           CFLAGS="-DMMIC" ; export CFLAGS
                           CXXFLAGS="-DMMIC" ; export CXXFLAGS
                           FFLAGS="-DMMIC" ; export FFLAGS

                           ./configure --with-blas="-mkl" --with-lapack="-mkl"

                           files=$(find ./* -name Makefile)
                           perl -p -i -e 's/-DMMIC/-mmic/g' $files
                                                                                Super Computing 2012 @ Intel Booth

12年11月15日木曜日
Povray: how to cheat configure part II
    • The Persistence of Vision Raytracer is a high-quality, totally free tool for
      creating stunning three-dimensional graphics; a famous ray tracing program.
    • This treat how to build Povray 3.7 RC
       - This version is the first pthread parallelized Povray.
    • Requires some external libraries other than provided to Intel Xeon Phi.




                                                                Super Computing 2012 @ Intel Booth

12年11月15日木曜日
Povray: how to cheat configure : part II
    • Prerequisites
       - boost, zlib, jpeg, tiff and libpng.
       - all libraries should be build for Phi :-( :-( :-(
    • How to build boost and zlib: We took the same strategy as povray.
       - First build and install host version of boost to /home/maho/HOST then Phi
         version to /home/maho/MIC
       - Next, build and install host version of zlib to /home/maho/HOST
       - then, build Phi version as follows:
          - backup /home/maho/MIC to /home/maho/MIC.org
          - copy /home/maho/HOST to /home/maho/MIC
          - run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS.
              - be sure LD_LIBRARY_FLAGS points /home/maho/MIC!
          - remove /home/maho/MIC
          - rename /home/maho/MIC.org to /home/maho/MIC
          - replace -DMMIC to -mmic
          - make for Xeon Phi binary.
          - Done.
    • Building tiff and png for Phi is similar to above procedure. Super Computing 2012 @ Intel Booth
12年11月15日木曜日
Povray: how to cheat configure : part II
    • Prerequisites
       - boost, zlib, jpeg, tiff and libpng.
       - all libraries should be build for Phi :-( :-( :-(
    • Strategy: do build twice: host build then Xeon Phi build
       - build and install host version of libraries to /home/maho/HOST
       - build and install Phi version of libraires to /home/maho/MIC
          - actually,
    • Final configure for Povray should be done as follows:
       - backup /home/maho/MIC to /home/maho/MIC.org
       - copy /home/maho/HOST to /home/maho/MIC
       - run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS.
          - be sure LD_LIBRARY_FLAGS points /home/maho/MIC!
       - remove /home/maho/MIC
       - rename /home/maho/MIC.org to /home/maho/MIC
       - replace -DMMIC to -mmic
       - make for Xeon Phi binary.
       - Done.
                                                            Super Computing 2012 @ Intel Booth

12年11月15日木曜日
Gaussian09 Partially Runs on Intel Xeon Phi!
    • Gaussian09 is a famous quantum chemical program package and it provides state-
      of the-art capabilities for electronic structure modeling.
    • Very large source code: 1.7 million lines
       - $ cat *F | wc -l
       - 1714217
    • Intel Composer XE is not officially supported compiler
       - Gaussian Inc. only supports PGI compiler.
       - Patches are made by M.N. (sorry, we cannot provide the patches to public)
       - Small set of patches enable us to build
         -   -rw-r--r--. 1 maho users   463 1 30 10:53 2012 patch-bsd+buldg09
         -   -rw-r--r--. 1 maho users   692 1 30 10:53 2012 patch-bsd+fsplit.c
         -   -rw-r--r-- 1 maho users    5674 10 18 16:41 2012 patch-bsd+i386.make
         -   -rw-r--r--. 1 maho users   643 1 30 10:53 2012 patch-bsd+mdutil.F
         -   -rw-r--r--. 1 maho users   240 1 30 10:53 2012 patch-bsd+mygau
         -   -rw-r--r--. 1 maho users   486 1 30 10:53 2012 patch-bsd+set-mflags

       - patches are almost the same as hosts’ one.
         - almost merely adding -mmic
      - somehow shared libs don’t work??
         - utils.a should be a static library.
         - Intel MKL should also be linked statically.
         - shared libs of MKL should be located at /lib64? LD_LIBRARY_PATH doesn’t parsed?
         - Resultant binaries occupy approximately 2GB                              Super Computing 2012 @ Intel Booth

12年11月15日木曜日
Gaussian09 Partially Runs on Intel Xeon Phi!
    • Just run
    • Still very unstable with -O3
       - l303.exe (just wish your luck)
       - l401.exe (should be built with -O0)
       - Passed:(just test000.com-test200.com)
         test001,023,024,025,026,027,028,029,030,031,032,033,034,035,036,037,03
         8,039,040,042,056,076,077,078,079,081,091,092,093,099,101,102,104,108,11
         5,116,119,120,130,131,140,142,144,145,149,150,151,153,162,163,165,168,169,17
         0,172,177,184,188,195




                                                               Super Computing 2012 @ Intel Booth

12年11月15日木曜日
A packaging system (pkgsrc) porting effort on Intel Phi!!!

    • What is the pkgsrc?
         - pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over 12000
           packages. It is used to enable freely available software to be configured and built easily on supported platforms; http://
           www.pkgsrc.org/

    • NAKATA, Maho has over ten years of FreeBSD ports committer experience.
    • Why pkgsrc?
      - We need MORE software packages on Intel Phi!
         - Currently HPC program packages depend on other free software packages.
      - RPM, deb are too complex (to me).
      - Native tool chain for Intel Phi is really important
         - ./configure (autotools) is a good one but cross building is rarely supported.
         - ./configure looks some parameters of the host machine.
         - Intel Composer can be used as if it is a native toolkit with a small trick.
      - highly portable packaging system: works on *BSD (Net, DragonFly, Free),
        various Linux variants, AIX, MacOSX, FreeBSD
    • Status:
      - ./bootstrap : done
    • How to get?
      - I’ll provide ASAP on sourceforge.net or somewhere...
12年11月15日木曜日
Summary and outlook
    • We tested Intel Xeon Phi, especially how to build Phi native binary.
       -“One source base, tuned to many targets” is TRUE!
    • We regard Intel Xeon Phi as a small Linux cluster.
       - but no binary compatibility inbetween.
    • We provided a porting tip; how to build gaussian, povray and sdpa.
    • For packages using autotools (./configure) or similar things, our approach
      requires two pass configure to cheat
       - if configure looks Phi specific stuffs like availability of FMA, then this
         strategy doesn’t work.
       - Yoshikazu Kamoshida’s strategy solves for configure or build system which
         requires run small programs on target machine (SWoPP 2012; Development of
         middleware which facilitate tuning while installation under cross compile
         environment).
    • More packages are needed!
       - Poring NetBSD’s pkgsrc might be good idea for cross compiling environment
         like Intel Xeon Phi.
               - pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over
                 12000 packages. It is used to enable freely available software to be configured and built easily on supported platforms;
                 http://www.pkgsrc.org/
12年11月15日木曜日

Mais conteúdo relacionado

Mais procurados

Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architectureinside-BigData.com
 
On the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC ApplicationsOn the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC ApplicationsWim Vanderbauwhede
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersKazuaki Ishizaki
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacGanesan Narayanasamy
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterLinaro
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...MLconf
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseKazuaki Ishizaki
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterLinaro
 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingJonathan Dursi
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Jim Dowling
 
TinyML as-a-Service
TinyML as-a-ServiceTinyML as-a-Service
TinyML as-a-ServiceHiroshi Doyu
 
Towards Automated Design Space Exploration and Code Generation using Type Tra...
Towards Automated Design Space Exploration and Code Generation using Type Tra...Towards Automated Design Space Exploration and Code Generation using Type Tra...
Towards Automated Design Space Exploration and Code Generation using Type Tra...waqarnabi
 
Everything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI LibraryEverything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI LibraryIntel® Software
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Anne Nicolas
 
Advanced spark deep learning
Advanced spark deep learningAdvanced spark deep learning
Advanced spark deep learningAdam Gibson
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainabilitygeetachauhan
 

Mais procurados (20)

Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
On the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC ApplicationsOn the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC Applications
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java Programmers
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdac
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
 
Getting started with AMD GPUs
Getting started with AMD GPUsGetting started with AMD GPUs
Getting started with AMD GPUs
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to Use
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)
 
TinyML as-a-Service
TinyML as-a-ServiceTinyML as-a-Service
TinyML as-a-Service
 
Towards Automated Design Space Exploration and Code Generation using Type Tra...
Towards Automated Design Space Exploration and Code Generation using Type Tra...Towards Automated Design Space Exploration and Code Generation using Type Tra...
Towards Automated Design Space Exploration and Code Generation using Type Tra...
 
Available HPC resources at CSUC
Available HPC resources at CSUCAvailable HPC resources at CSUC
Available HPC resources at CSUC
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Everything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI LibraryEverything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI Library
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
 
Advanced spark deep learning
Advanced spark deep learningAdvanced spark deep learning
Advanced spark deep learning
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainability
 

Destaque

Post-processing SAR images on Xeon Phi - a porting exercise
Post-processing SAR images on Xeon Phi - a porting exercisePost-processing SAR images on Xeon Phi - a porting exercise
Post-processing SAR images on Xeon Phi - a porting exerciseIntel IT Center
 
Intel xeon phi coprocessor slideshare ppt
Intel xeon phi coprocessor slideshare pptIntel xeon phi coprocessor slideshare ppt
Intel xeon phi coprocessor slideshare pptIntel IT Center
 
Using Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC WorkloadsUsing Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloadsinside-BigData.com
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana
 
Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Sean Everett
 
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana
 

Destaque (6)

Post-processing SAR images on Xeon Phi - a porting exercise
Post-processing SAR images on Xeon Phi - a porting exercisePost-processing SAR images on Xeon Phi - a porting exercise
Post-processing SAR images on Xeon Phi - a porting exercise
 
Intel xeon phi coprocessor slideshare ppt
Intel xeon phi coprocessor slideshare pptIntel xeon phi coprocessor slideshare ppt
Intel xeon phi coprocessor slideshare ppt
 
Using Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC WorkloadsUsing Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloads
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16
 
Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016
 
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17
 

Semelhante a Porting applications to Intel Xeon Phi: tips and experiences

Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer George Markomanolis
 
the NML project
the NML projectthe NML project
the NML projectLei Yang
 
Hacking and Forensics on the Go - 44CON 2012
Hacking and Forensics on the Go - 44CON 2012Hacking and Forensics on the Go - 44CON 2012
Hacking and Forensics on the Go - 44CON 201244CON
 
Linux as a gaming platform, ideology aside
Linux as a gaming platform, ideology asideLinux as a gaming platform, ideology aside
Linux as a gaming platform, ideology asideLeszek Godlewski
 
The Deck by Phil Polstra GrrCON2012
The Deck by Phil Polstra GrrCON2012The Deck by Phil Polstra GrrCON2012
The Deck by Phil Polstra GrrCON2012Philip Polstra
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterTim Ellison
 
Hardwear.io 2018 BLE Security Essentials workshop
Hardwear.io 2018 BLE Security Essentials workshopHardwear.io 2018 BLE Security Essentials workshop
Hardwear.io 2018 BLE Security Essentials workshopSlawomir Jasek
 
First Steps Developing Embedded Applications using Heterogeneous Multi-core P...
First Steps Developing Embedded Applications using Heterogeneous Multi-core P...First Steps Developing Embedded Applications using Heterogeneous Multi-core P...
First Steps Developing Embedded Applications using Heterogeneous Multi-core P...Toradex
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementGanesan Narayanasamy
 
fpga1 - What is.pptx
fpga1 - What is.pptxfpga1 - What is.pptx
fpga1 - What is.pptxssuser0de10a
 
Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...
Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...
Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...jaxLondonConference
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
 
Building SuperComputers @ Home
Building SuperComputers @ HomeBuilding SuperComputers @ Home
Building SuperComputers @ HomeAbhishek Parolkar
 
(phpconftw2012) PHP as a Middleware in Embedded Systems
(phpconftw2012) PHP as a Middleware in Embedded Systems(phpconftw2012) PHP as a Middleware in Embedded Systems
(phpconftw2012) PHP as a Middleware in Embedded Systemssosorry
 
The NRB Group mainframe day 2021 - New Programming Languages on Z - Frank Van...
The NRB Group mainframe day 2021 - New Programming Languages on Z - Frank Van...The NRB Group mainframe day 2021 - New Programming Languages on Z - Frank Van...
The NRB Group mainframe day 2021 - New Programming Languages on Z - Frank Van...NRB
 
Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Peter Hlavaty
 

Semelhante a Porting applications to Intel Xeon Phi: tips and experiences (20)

Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer
 
the NML project
the NML projectthe NML project
the NML project
 
Polstra 44con2012
Polstra 44con2012Polstra 44con2012
Polstra 44con2012
 
Hacking and Forensics on the Go - 44CON 2012
Hacking and Forensics on the Go - 44CON 2012Hacking and Forensics on the Go - 44CON 2012
Hacking and Forensics on the Go - 44CON 2012
 
PowerAI Deep Dive ( key points )
PowerAI Deep Dive ( key points )PowerAI Deep Dive ( key points )
PowerAI Deep Dive ( key points )
 
Linux as a gaming platform, ideology aside
Linux as a gaming platform, ideology asideLinux as a gaming platform, ideology aside
Linux as a gaming platform, ideology aside
 
The Deck by Phil Polstra GrrCON2012
The Deck by Phil Polstra GrrCON2012The Deck by Phil Polstra GrrCON2012
The Deck by Phil Polstra GrrCON2012
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
 
Hardwear.io 2018 BLE Security Essentials workshop
Hardwear.io 2018 BLE Security Essentials workshopHardwear.io 2018 BLE Security Essentials workshop
Hardwear.io 2018 BLE Security Essentials workshop
 
First Steps Developing Embedded Applications using Heterogeneous Multi-core P...
First Steps Developing Embedded Applications using Heterogeneous Multi-core P...First Steps Developing Embedded Applications using Heterogeneous Multi-core P...
First Steps Developing Embedded Applications using Heterogeneous Multi-core P...
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
 
fpga1 - What is.pptx
fpga1 - What is.pptxfpga1 - What is.pptx
fpga1 - What is.pptx
 
Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...
Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...
Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...
 
Hands on OpenCL
Hands on OpenCLHands on OpenCL
Hands on OpenCL
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
 
Thotcon2013
Thotcon2013Thotcon2013
Thotcon2013
 
Building SuperComputers @ Home
Building SuperComputers @ HomeBuilding SuperComputers @ Home
Building SuperComputers @ Home
 
(phpconftw2012) PHP as a Middleware in Embedded Systems
(phpconftw2012) PHP as a Middleware in Embedded Systems(phpconftw2012) PHP as a Middleware in Embedded Systems
(phpconftw2012) PHP as a Middleware in Embedded Systems
 
The NRB Group mainframe day 2021 - New Programming Languages on Z - Frank Van...
The NRB Group mainframe day 2021 - New Programming Languages on Z - Frank Van...The NRB Group mainframe day 2021 - New Programming Languages on Z - Frank Van...
The NRB Group mainframe day 2021 - New Programming Languages on Z - Frank Van...
 
Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!
 

Mais de Maho Nakata

quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)Maho Nakata
 
Lie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解についてLie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解についてMaho Nakata
 
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編Maho Nakata
 
Q#による量子化学計算 : 水素分子の位相推定について
Q#による量子化学計算 : 水素分子の位相推定についてQ#による量子化学計算 : 水素分子の位相推定について
Q#による量子化学計算 : 水素分子の位相推定についてMaho Nakata
 
量子コンピュータの量子化学計算への応用の現状と展望
量子コンピュータの量子化学計算への応用の現状と展望量子コンピュータの量子化学計算への応用の現状と展望
量子コンピュータの量子化学計算への応用の現状と展望Maho Nakata
 
qubitによる波動関数の虚時間発展のシミュレーション: a review
qubitによる波動関数の虚時間発展のシミュレーション: a reviewqubitによる波動関数の虚時間発展のシミュレーション: a review
qubitによる波動関数の虚時間発展のシミュレーション: a reviewMaho Nakata
 
Openfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part IOpenfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part IMaho Nakata
 
量子コンピュータで量子化学のfullCIが超高速になる(かも
量子コンピュータで量子化学のfullCIが超高速になる(かも量子コンピュータで量子化学のfullCIが超高速になる(かも
量子コンピュータで量子化学のfullCIが超高速になる(かもMaho Nakata
 
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装Maho Nakata
 
第11回分子科学 2017/9/17 Pubchemqcプロジェクト
第11回分子科学 2017/9/17 Pubchemqcプロジェクト第11回分子科学 2017/9/17 Pubchemqcプロジェクト
第11回分子科学 2017/9/17 PubchemqcプロジェクトMaho Nakata
 
Kobeworkshop pubchemqc project
Kobeworkshop pubchemqc projectKobeworkshop pubchemqc project
Kobeworkshop pubchemqc projectMaho Nakata
 
計算化学実習講座:第二回
 計算化学実習講座:第二回 計算化学実習講座:第二回
計算化学実習講座:第二回Maho Nakata
 
計算化学実習講座:第一回
計算化学実習講座:第一回計算化学実習講座:第一回
計算化学実習講座:第一回Maho Nakata
 
HOKUSAIのベンチマーク 理研シンポジウム 中田分
HOKUSAIのベンチマーク 理研シンポジウム 中田分HOKUSAIのベンチマーク 理研シンポジウム 中田分
HOKUSAIのベンチマーク 理研シンポジウム 中田分Maho Nakata
 
為替取引(FX)でのtickdataの加工とMySQLで管理
為替取引(FX)でのtickdataの加工とMySQLで管理為替取引(FX)でのtickdataの加工とMySQLで管理
為替取引(FX)でのtickdataの加工とMySQLで管理Maho Nakata
 
為替のTickdataをDukascopyからダウンロードする
為替のTickdataをDukascopyからダウンロードする為替のTickdataをDukascopyからダウンロードする
為替のTickdataをDukascopyからダウンロードするMaho Nakata
 
HPCS2015 pythonを用いた量子化学プログラムの開発と応用
HPCS2015 pythonを用いた量子化学プログラムの開発と応用HPCS2015 pythonを用いた量子化学プログラムの開発と応用
HPCS2015 pythonを用いた量子化学プログラムの開発と応用Maho Nakata
 
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)Maho Nakata
 
The PubChemQC Project
The PubChemQC ProjectThe PubChemQC Project
The PubChemQC ProjectMaho Nakata
 
3Dプリンタ導入記 タンパク質の模型をプリントする
3Dプリンタ導入記 タンパク質の模型をプリントする3Dプリンタ導入記 タンパク質の模型をプリントする
3Dプリンタ導入記 タンパク質の模型をプリントするMaho Nakata
 

Mais de Maho Nakata (20)

quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
 
Lie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解についてLie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解について
 
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
 
Q#による量子化学計算 : 水素分子の位相推定について
Q#による量子化学計算 : 水素分子の位相推定についてQ#による量子化学計算 : 水素分子の位相推定について
Q#による量子化学計算 : 水素分子の位相推定について
 
量子コンピュータの量子化学計算への応用の現状と展望
量子コンピュータの量子化学計算への応用の現状と展望量子コンピュータの量子化学計算への応用の現状と展望
量子コンピュータの量子化学計算への応用の現状と展望
 
qubitによる波動関数の虚時間発展のシミュレーション: a review
qubitによる波動関数の虚時間発展のシミュレーション: a reviewqubitによる波動関数の虚時間発展のシミュレーション: a review
qubitによる波動関数の虚時間発展のシミュレーション: a review
 
Openfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part IOpenfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part I
 
量子コンピュータで量子化学のfullCIが超高速になる(かも
量子コンピュータで量子化学のfullCIが超高速になる(かも量子コンピュータで量子化学のfullCIが超高速になる(かも
量子コンピュータで量子化学のfullCIが超高速になる(かも
 
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装
 
第11回分子科学 2017/9/17 Pubchemqcプロジェクト
第11回分子科学 2017/9/17 Pubchemqcプロジェクト第11回分子科学 2017/9/17 Pubchemqcプロジェクト
第11回分子科学 2017/9/17 Pubchemqcプロジェクト
 
Kobeworkshop pubchemqc project
Kobeworkshop pubchemqc projectKobeworkshop pubchemqc project
Kobeworkshop pubchemqc project
 
計算化学実習講座:第二回
 計算化学実習講座:第二回 計算化学実習講座:第二回
計算化学実習講座:第二回
 
計算化学実習講座:第一回
計算化学実習講座:第一回計算化学実習講座:第一回
計算化学実習講座:第一回
 
HOKUSAIのベンチマーク 理研シンポジウム 中田分
HOKUSAIのベンチマーク 理研シンポジウム 中田分HOKUSAIのベンチマーク 理研シンポジウム 中田分
HOKUSAIのベンチマーク 理研シンポジウム 中田分
 
為替取引(FX)でのtickdataの加工とMySQLで管理
為替取引(FX)でのtickdataの加工とMySQLで管理為替取引(FX)でのtickdataの加工とMySQLで管理
為替取引(FX)でのtickdataの加工とMySQLで管理
 
為替のTickdataをDukascopyからダウンロードする
為替のTickdataをDukascopyからダウンロードする為替のTickdataをDukascopyからダウンロードする
為替のTickdataをDukascopyからダウンロードする
 
HPCS2015 pythonを用いた量子化学プログラムの開発と応用
HPCS2015 pythonを用いた量子化学プログラムの開発と応用HPCS2015 pythonを用いた量子化学プログラムの開発と応用
HPCS2015 pythonを用いた量子化学プログラムの開発と応用
 
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
 
The PubChemQC Project
The PubChemQC ProjectThe PubChemQC Project
The PubChemQC Project
 
3Dプリンタ導入記 タンパク質の模型をプリントする
3Dプリンタ導入記 タンパク質の模型をプリントする3Dプリンタ導入記 タンパク質の模型をプリントする
3Dプリンタ導入記 タンパク質の模型をプリントする
 

Último

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Último (20)

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Porting applications to Intel Xeon Phi: tips and experiences

  • 1. Porting application to Intel Xeon Phi: some experiences RIKEN Advanced Center for Computing and Communication 2012/11 Super Computing 2012 @ Intel Booth, Salt lake city, US maho@riken.jp Other side of my face maho@FreeBSD.org (FreeBSD committer) maho@apache.org (Apache OpenOffice committer)  2012/11 Super Computing 2012 12年11月15日木曜日
  • 2. Aims of my talk •Proof of concept: - Intel says, “One source base, tuned to many targets” - Is it true or not? - my answer is TRUE. •Native model is considered - Just compile with Intel Composer XE 2013 :-) - Offload model is extremely demanding for modern complicated programs - CUDA expertise's say: to get performance, do everything on GPU, do not transfer data between CPU and GPU. - Modern applications use a lot of external open source / free software packages. Very complex structure! - Not realistic! •Providing Porting tips - Gaussian09, povray, sdpa... Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 3. What is Intel Xeon Phi ?? • Intel Xeon Phi is a co-processor, connected via PCI-express slot. • Peak performance is 1TFlops in double precision - many cores : 64 cores, 4 threads each, 512bit AVX, GDDR5 8GB of RAM... • We can see as if there are another cluster of computer inside a Linux box. - Linux micro OS is provided • Better programability - x86 based (64bit) - Development tool: Intel Composer XE 2013 - C, C++, Fortran - compile and run same code to CPU - familiar parallelism : OpenMP, MPI, OpenCL - Various programming model - MIC centric - CPU centric -CAUTION: BINARY IS INCOMPATIBLE! -Recompile is needed for Xeon Phi! Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 4. How to build your program on Xeon Phi •Very easy. •Just passing -mmic flags to Compilers -icc -mmic -icpc -mmic -ifort -mmic •How to link against optimized BLAS and LAPACK? -just add -mkl -same for CPU case. Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 5. DGEMM benchmark: sorry, no free lunch, tune Needed. • DGEMM is a matrix-matrix multiplication routine. It uses almost 100% of CPU performance (if tuned) so it is used for benchmarking. - not see the memory bandwidth • Intel Xeon Phi’s theoretical peak performance is 1TFlops. • Do we need some tunes for Intel Xeon Phi? - YES. Otherwise 40% of peak is attained: ~400GFlops - If tuned we attain ~816GFlops. - memory allocation, thread affinity • How to obtain the data? - just malloc and fill random values - no alignment is specified - CPU’s case it is sufficient, but - not sufficient for Xeon Phi. Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 6. SDPA : How to cheat “configure” part I • SDPA is a highly efficient semidefinite programming solver. - distributed at http://sdpa.sourceforge.net/, under GPL. • ./configure ; make (on CPU) • But Intel Composer XE 2013 for Xeon Phi is a cross-compiler... how to do this? - almost the same environment... - Two pass strategy. First pass, pass dummy “-DDMIC” to configure, then replace to “-mmic”, then compile. #!/bin/sh CC="icc"; export CC CXX="icpc"; export CXX FC="ifort"; export FC CFLAGS="-DMMIC" ; export CFLAGS CXXFLAGS="-DMMIC" ; export CXXFLAGS FFLAGS="-DMMIC" ; export FFLAGS ./configure --with-blas="-mkl" --with-lapack="-mkl" files=$(find ./* -name Makefile) perl -p -i -e 's/-DMMIC/-mmic/g' $files Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 7. Povray: how to cheat configure part II • The Persistence of Vision Raytracer is a high-quality, totally free tool for creating stunning three-dimensional graphics; a famous ray tracing program. • This treat how to build Povray 3.7 RC - This version is the first pthread parallelized Povray. • Requires some external libraries other than provided to Intel Xeon Phi. Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 8. Povray: how to cheat configure : part II • Prerequisites - boost, zlib, jpeg, tiff and libpng. - all libraries should be build for Phi :-( :-( :-( • How to build boost and zlib: We took the same strategy as povray. - First build and install host version of boost to /home/maho/HOST then Phi version to /home/maho/MIC - Next, build and install host version of zlib to /home/maho/HOST - then, build Phi version as follows: - backup /home/maho/MIC to /home/maho/MIC.org - copy /home/maho/HOST to /home/maho/MIC - run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS. - be sure LD_LIBRARY_FLAGS points /home/maho/MIC! - remove /home/maho/MIC - rename /home/maho/MIC.org to /home/maho/MIC - replace -DMMIC to -mmic - make for Xeon Phi binary. - Done. • Building tiff and png for Phi is similar to above procedure. Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 9. Povray: how to cheat configure : part II • Prerequisites - boost, zlib, jpeg, tiff and libpng. - all libraries should be build for Phi :-( :-( :-( • Strategy: do build twice: host build then Xeon Phi build - build and install host version of libraries to /home/maho/HOST - build and install Phi version of libraires to /home/maho/MIC - actually, • Final configure for Povray should be done as follows: - backup /home/maho/MIC to /home/maho/MIC.org - copy /home/maho/HOST to /home/maho/MIC - run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS. - be sure LD_LIBRARY_FLAGS points /home/maho/MIC! - remove /home/maho/MIC - rename /home/maho/MIC.org to /home/maho/MIC - replace -DMMIC to -mmic - make for Xeon Phi binary. - Done. Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 10. Gaussian09 Partially Runs on Intel Xeon Phi! • Gaussian09 is a famous quantum chemical program package and it provides state- of the-art capabilities for electronic structure modeling. • Very large source code: 1.7 million lines - $ cat *F | wc -l - 1714217 • Intel Composer XE is not officially supported compiler - Gaussian Inc. only supports PGI compiler. - Patches are made by M.N. (sorry, we cannot provide the patches to public) - Small set of patches enable us to build - -rw-r--r--. 1 maho users 463 1 30 10:53 2012 patch-bsd+buldg09 - -rw-r--r--. 1 maho users 692 1 30 10:53 2012 patch-bsd+fsplit.c - -rw-r--r-- 1 maho users 5674 10 18 16:41 2012 patch-bsd+i386.make - -rw-r--r--. 1 maho users 643 1 30 10:53 2012 patch-bsd+mdutil.F - -rw-r--r--. 1 maho users 240 1 30 10:53 2012 patch-bsd+mygau - -rw-r--r--. 1 maho users 486 1 30 10:53 2012 patch-bsd+set-mflags - patches are almost the same as hosts’ one. - almost merely adding -mmic - somehow shared libs don’t work?? - utils.a should be a static library. - Intel MKL should also be linked statically. - shared libs of MKL should be located at /lib64? LD_LIBRARY_PATH doesn’t parsed? - Resultant binaries occupy approximately 2GB Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 11. Gaussian09 Partially Runs on Intel Xeon Phi! • Just run • Still very unstable with -O3 - l303.exe (just wish your luck) - l401.exe (should be built with -O0) - Passed:(just test000.com-test200.com) test001,023,024,025,026,027,028,029,030,031,032,033,034,035,036,037,03 8,039,040,042,056,076,077,078,079,081,091,092,093,099,101,102,104,108,11 5,116,119,120,130,131,140,142,144,145,149,150,151,153,162,163,165,168,169,17 0,172,177,184,188,195 Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 12. A packaging system (pkgsrc) porting effort on Intel Phi!!! • What is the pkgsrc? - pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over 12000 packages. It is used to enable freely available software to be configured and built easily on supported platforms; http:// www.pkgsrc.org/ • NAKATA, Maho has over ten years of FreeBSD ports committer experience. • Why pkgsrc? - We need MORE software packages on Intel Phi! - Currently HPC program packages depend on other free software packages. - RPM, deb are too complex (to me). - Native tool chain for Intel Phi is really important - ./configure (autotools) is a good one but cross building is rarely supported. - ./configure looks some parameters of the host machine. - Intel Composer can be used as if it is a native toolkit with a small trick. - highly portable packaging system: works on *BSD (Net, DragonFly, Free), various Linux variants, AIX, MacOSX, FreeBSD • Status: - ./bootstrap : done • How to get? - I’ll provide ASAP on sourceforge.net or somewhere... 12年11月15日木曜日
  • 13. Summary and outlook • We tested Intel Xeon Phi, especially how to build Phi native binary. -“One source base, tuned to many targets” is TRUE! • We regard Intel Xeon Phi as a small Linux cluster. - but no binary compatibility inbetween. • We provided a porting tip; how to build gaussian, povray and sdpa. • For packages using autotools (./configure) or similar things, our approach requires two pass configure to cheat - if configure looks Phi specific stuffs like availability of FMA, then this strategy doesn’t work. - Yoshikazu Kamoshida’s strategy solves for configure or build system which requires run small programs on target machine (SWoPP 2012; Development of middleware which facilitate tuning while installation under cross compile environment). • More packages are needed! - Poring NetBSD’s pkgsrc might be good idea for cross compiling environment like Intel Xeon Phi. - pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over 12000 packages. It is used to enable freely available software to be configured and built easily on supported platforms; http://www.pkgsrc.org/ 12年11月15日木曜日