SlideShare uma empresa Scribd logo
1 de 44
Petapath Dairsie Latimer and Michal Harasimiuk Programming for High Performance Accelerated Systems
Petapath
Petapath
Petapath Joint Petapath/HP PRACE WP8 Prototype system at SARA/NCF
Petapath Joint Petapath/HP PRACE WP8 Prototype system at SARA/NCF 6U 10 TFLOPS 7 kW
Petapath ,[object Object],[object Object],[object Object]
Programming for High Performance Accelerated Systems ,[object Object],[object Object],[object Object],[object Object],Petapath
Petapath Petapath/HP PRACE Prototype system at SARA/NCF
ClearSpeed Software Development environment at SARA/NCF ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],ClearSpeed graphical debug interface for the heterogeneous systems Images used with permission of ClearSpeed Technology Plc
ClearSpeed profiler for heterogeneous and multi-processor systems Advance™ Accelerator Board CSX 600 Pipeline CSX 600 Pipeline Host CPU(s) Host CPU(s) Host CPU(s) Advance™ Accelerator Board Host Cores(s) CSX Pipeline HOST/BOARD INTERACTION View host/board interactions. Provides performance information for data transfer operations. Trace cluster node/board interaction. See overlap of host compute and board compute. CSX PIPELINE View detailed instruction issue information. Visualize overlap of executing instructions. Optimize code at the  instruction level. View instruction level performance bottlenecks. Get accurate instruction timing. CSX SYSTEM View system level trace. Visually inspect the overlap of compute and I/O. Visualize cache utilization. View branch trace of code executing. Find and analyse performance bottlenecks. Get accurate event timing ClearSpeed  Accelerated  System CSX Pipeline HOST CODE PROFILING Visually inspect host code executing.  Supports multiple threads and processes. Time specific code sections. See overlap of host threads executing. Platform and processor  agnostic trace collection. PCIe
Petapath Programming for High Performance Accelerated Systems
Programming for High Performance Accelerated Systems Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],What comes next?
[object Object],[object Object],[object Object],OpenCL
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],Observations
Petapath Software development flows on multi-core and heterogeneous systems
Host Software Development Practice (Single Core) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Host Software Development Practice (Multi-core) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Host Software Development Practice (Pitfalls) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Heterogeneous Systems Software Development Practice ,[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],General comments on using accelerators
Accelerator Software Development Pitfalls ,[object Object],[object Object],[object Object],[object Object],[object Object]
Petapath The future - Developing with OpenCL
OpenCL in use ,[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],Will development methods and tools converge?
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],What will OpenCL have initially?
ClearSpeed CSX700 All Image Rights reserved by original copyright holders Architectures targeted by OpenCL are similar, but different …
NVIDIA GT200 Image Rights reserved by original copyright holders
AMD RV770 Image Rights reserved by original copyright holders
INTEL LARRABEE Image Rights reserved by original copyright holders
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Can we look forward to …
Petapath   Questions?

Mais conteúdo relacionado

Mais procurados

HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Foundation
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013 HSA Foundation
 
Graal Tutorial at CGO 2015 by Christian Wimmer
Graal Tutorial at CGO 2015 by Christian WimmerGraal Tutorial at CGO 2015 by Christian Wimmer
Graal Tutorial at CGO 2015 by Christian WimmerThomas Wuerthinger
 
Micro-Benchmarking Considered Harmful
Micro-Benchmarking Considered HarmfulMicro-Benchmarking Considered Harmful
Micro-Benchmarking Considered HarmfulThomas Wuerthinger
 
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...Thomas Wuerthinger
 
Graal VM: Multi-Language Execution Platform
Graal VM: Multi-Language Execution PlatformGraal VM: Multi-Language Execution Platform
Graal VM: Multi-Language Execution PlatformThomas Wuerthinger
 
Apache Big Data Europe 2016
Apache Big Data Europe 2016Apache Big Data Europe 2016
Apache Big Data Europe 2016Tim Ellison
 
Compiler Design Introduction
Compiler Design IntroductionCompiler Design Introduction
Compiler Design IntroductionKuppusamy P
 
Turbo C Compiler Reports
Turbo C Compiler Reports Turbo C Compiler Reports
Turbo C Compiler Reports Sunil Kumar R
 
OpenPOWER Application Optimization
OpenPOWER Application Optimization OpenPOWER Application Optimization
OpenPOWER Application Optimization Ganesan Narayanasamy
 
Graal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them AllGraal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them AllThomas Wuerthinger
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective HSA Foundation
 
Presentation systemc
Presentation systemcPresentation systemc
Presentation systemcSUBRAHMANYA S
 
.net Core Blimey - Smart Devs UG
.net Core Blimey - Smart Devs UG.net Core Blimey - Smart Devs UG
.net Core Blimey - Smart Devs UGcitizenmatt
 

Mais procurados (20)

My DIaries
My DIariesMy DIaries
My DIaries
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
 
Graal Tutorial at CGO 2015 by Christian Wimmer
Graal Tutorial at CGO 2015 by Christian WimmerGraal Tutorial at CGO 2015 by Christian Wimmer
Graal Tutorial at CGO 2015 by Christian Wimmer
 
Micro-Benchmarking Considered Harmful
Micro-Benchmarking Considered HarmfulMicro-Benchmarking Considered Harmful
Micro-Benchmarking Considered Harmful
 
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
 
Graal VM: Multi-Language Execution Platform
Graal VM: Multi-Language Execution PlatformGraal VM: Multi-Language Execution Platform
Graal VM: Multi-Language Execution Platform
 
Apache Big Data Europe 2016
Apache Big Data Europe 2016Apache Big Data Europe 2016
Apache Big Data Europe 2016
 
Compiler Design Introduction
Compiler Design IntroductionCompiler Design Introduction
Compiler Design Introduction
 
Turbo C Compiler Reports
Turbo C Compiler Reports Turbo C Compiler Reports
Turbo C Compiler Reports
 
OpenPOWER Application Optimization
OpenPOWER Application Optimization OpenPOWER Application Optimization
OpenPOWER Application Optimization
 
Graal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them AllGraal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them All
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective
 
JVM++: The Graal VM
JVM++: The Graal VMJVM++: The Graal VM
JVM++: The Graal VM
 
C compiler-ide
C compiler-ideC compiler-ide
C compiler-ide
 
Presentation systemc
Presentation systemcPresentation systemc
Presentation systemc
 
TensorRT survey
TensorRT surveyTensorRT survey
TensorRT survey
 
Compilation
CompilationCompilation
Compilation
 
.net Core Blimey - Smart Devs UG
.net Core Blimey - Smart Devs UG.net Core Blimey - Smart Devs UG
.net Core Blimey - Smart Devs UG
 
Hsa10 whitepaper
Hsa10 whitepaperHsa10 whitepaper
Hsa10 whitepaper
 

Destaque

Apresentação nova WBM DO BRASIL 27/05/2016
Apresentação nova WBM DO BRASIL  27/05/2016Apresentação nova WBM DO BRASIL  27/05/2016
Apresentação nova WBM DO BRASIL 27/05/2016Banco BMG
 
Secundaria (13 15 AñOs)
Secundaria (13  15 AñOs)Secundaria (13  15 AñOs)
Secundaria (13 15 AñOs)guest3fe7b93
 
Practica 2. Publicación de documentos en Gmail
Practica 2. Publicación de documentos en GmailPractica 2. Publicación de documentos en Gmail
Practica 2. Publicación de documentos en Gmailguest3fe7b93
 
IncomeTax Declaration FY 09-10
IncomeTax Declaration FY 09-10IncomeTax Declaration FY 09-10
IncomeTax Declaration FY 09-10guest1f4f89d
 
Karl Viilmann: Leibkonna eelarve uuringu tulemuste tutvustus 2015
Karl Viilmann: Leibkonna eelarve uuringu tulemuste tutvustus 2015Karl Viilmann: Leibkonna eelarve uuringu tulemuste tutvustus 2015
Karl Viilmann: Leibkonna eelarve uuringu tulemuste tutvustus 2015Statistikaamet / Statistics Estonia
 
Tuulikki Sillajõe: Raamatupidamine, statistika ja unistuste aruandlus
Tuulikki Sillajõe: Raamatupidamine, statistika ja unistuste aruandlusTuulikki Sillajõe: Raamatupidamine, statistika ja unistuste aruandlus
Tuulikki Sillajõe: Raamatupidamine, statistika ja unistuste aruandlusStatistikaamet / Statistics Estonia
 
rx Pelvis.proyec.no convencionales-Lic.Juan Jose MOntico -Argentina.Casilda S...
rx Pelvis.proyec.no convencionales-Lic.Juan Jose MOntico -Argentina.Casilda S...rx Pelvis.proyec.no convencionales-Lic.Juan Jose MOntico -Argentina.Casilda S...
rx Pelvis.proyec.no convencionales-Lic.Juan Jose MOntico -Argentina.Casilda S...guesta389e4
 

Destaque (10)

Nexis uk spanishv3
Nexis uk spanishv3Nexis uk spanishv3
Nexis uk spanishv3
 
Apresentação nova WBM DO BRASIL 27/05/2016
Apresentação nova WBM DO BRASIL  27/05/2016Apresentação nova WBM DO BRASIL  27/05/2016
Apresentação nova WBM DO BRASIL 27/05/2016
 
Secundaria (13 15 AñOs)
Secundaria (13  15 AñOs)Secundaria (13  15 AñOs)
Secundaria (13 15 AñOs)
 
Practica 2. Publicación de documentos en Gmail
Practica 2. Publicación de documentos en GmailPractica 2. Publicación de documentos en Gmail
Practica 2. Publicación de documentos en Gmail
 
IncomeTax Declaration FY 09-10
IncomeTax Declaration FY 09-10IncomeTax Declaration FY 09-10
IncomeTax Declaration FY 09-10
 
Tiiu-Liisa Rummo: Leibkondade sundkulutused 2015
Tiiu-Liisa Rummo: Leibkondade sundkulutused 2015Tiiu-Liisa Rummo: Leibkondade sundkulutused 2015
Tiiu-Liisa Rummo: Leibkondade sundkulutused 2015
 
Karl Viilmann: Leibkonna eelarve uuringu tulemuste tutvustus 2015
Karl Viilmann: Leibkonna eelarve uuringu tulemuste tutvustus 2015Karl Viilmann: Leibkonna eelarve uuringu tulemuste tutvustus 2015
Karl Viilmann: Leibkonna eelarve uuringu tulemuste tutvustus 2015
 
REGREL esimese prooviloenduse tulemused: eluruumid
REGREL esimese prooviloenduse tulemused: eluruumidREGREL esimese prooviloenduse tulemused: eluruumid
REGREL esimese prooviloenduse tulemused: eluruumid
 
Tuulikki Sillajõe: Raamatupidamine, statistika ja unistuste aruandlus
Tuulikki Sillajõe: Raamatupidamine, statistika ja unistuste aruandlusTuulikki Sillajõe: Raamatupidamine, statistika ja unistuste aruandlus
Tuulikki Sillajõe: Raamatupidamine, statistika ja unistuste aruandlus
 
rx Pelvis.proyec.no convencionales-Lic.Juan Jose MOntico -Argentina.Casilda S...
rx Pelvis.proyec.no convencionales-Lic.Juan Jose MOntico -Argentina.Casilda S...rx Pelvis.proyec.no convencionales-Lic.Juan Jose MOntico -Argentina.Casilda S...
rx Pelvis.proyec.no convencionales-Lic.Juan Jose MOntico -Argentina.Casilda S...
 

Semelhante a Petapath HP Cast 12 - Programming for High Performance Accelerated Systems

Learn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVGhodhbane Mohamed Amine
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBDDan Frincu
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - IntroHSA Foundation
 
OFI Overview 2019 Webinar
OFI Overview 2019 WebinarOFI Overview 2019 Webinar
OFI Overview 2019 Webinarseanhefty
 
Mainframe Architecture & Product Overview
Mainframe Architecture & Product OverviewMainframe Architecture & Product Overview
Mainframe Architecture & Product Overviewabhi1112
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stackinside-BigData.com
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleJim Dowling
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task ComputingEric Van Hensbergen
 
Windows Server 2008 R2 Dev Session 02
Windows Server 2008 R2 Dev Session 02Windows Server 2008 R2 Dev Session 02
Windows Server 2008 R2 Dev Session 02Clint Edmonson
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowKaxil Naik
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesIntel® Software
 
Exploring the Open Source Linux Ecosystem
Exploring the Open Source Linux EcosystemExploring the Open Source Linux Ecosystem
Exploring the Open Source Linux EcosystemIBM
 
Hail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceHail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceTimothy Spann
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...HSA Foundation
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
 

Semelhante a Petapath HP Cast 12 - Programming for High Performance Accelerated Systems (20)

Learn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFV
 
Implement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVMImplement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVM
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBD
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - Intro
 
OFI Overview 2019 Webinar
OFI Overview 2019 WebinarOFI Overview 2019 Webinar
OFI Overview 2019 Webinar
 
Mainframe Architecture & Product Overview
Mainframe Architecture & Product OverviewMainframe Architecture & Product Overview
Mainframe Architecture & Product Overview
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stack
 
PROSE
PROSEPROSE
PROSE
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task Computing
 
Windows Server 2008 R2 Dev Session 02
Windows Server 2008 R2 Dev Session 02Windows Server 2008 R2 Dev Session 02
Windows Server 2008 R2 Dev Session 02
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing Technologies
 
Exploring the Open Source Linux Ecosystem
Exploring the Open Source Linux EcosystemExploring the Open Source Linux Ecosystem
Exploring the Open Source Linux Ecosystem
 
Hail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceHail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open source
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 

Petapath HP Cast 12 - Programming for High Performance Accelerated Systems

  • 1. Petapath Dairsie Latimer and Michal Harasimiuk Programming for High Performance Accelerated Systems
  • 4. Petapath Joint Petapath/HP PRACE WP8 Prototype system at SARA/NCF
  • 5. Petapath Joint Petapath/HP PRACE WP8 Prototype system at SARA/NCF 6U 10 TFLOPS 7 kW
  • 6.
  • 7.
  • 8. Petapath Petapath/HP PRACE Prototype system at SARA/NCF
  • 9.
  • 10.
  • 11. ClearSpeed profiler for heterogeneous and multi-processor systems Advance™ Accelerator Board CSX 600 Pipeline CSX 600 Pipeline Host CPU(s) Host CPU(s) Host CPU(s) Advance™ Accelerator Board Host Cores(s) CSX Pipeline HOST/BOARD INTERACTION View host/board interactions. Provides performance information for data transfer operations. Trace cluster node/board interaction. See overlap of host compute and board compute. CSX PIPELINE View detailed instruction issue information. Visualize overlap of executing instructions. Optimize code at the instruction level. View instruction level performance bottlenecks. Get accurate instruction timing. CSX SYSTEM View system level trace. Visually inspect the overlap of compute and I/O. Visualize cache utilization. View branch trace of code executing. Find and analyse performance bottlenecks. Get accurate event timing ClearSpeed Accelerated System CSX Pipeline HOST CODE PROFILING Visually inspect host code executing. Supports multiple threads and processes. Time specific code sections. See overlap of host threads executing. Platform and processor agnostic trace collection. PCIe
  • 12. Petapath Programming for High Performance Accelerated Systems
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25. Petapath Software development flows on multi-core and heterogeneous systems
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34. Petapath The future - Developing with OpenCL
  • 35.
  • 36.
  • 37.
  • 38.
  • 39. ClearSpeed CSX700 All Image Rights reserved by original copyright holders Architectures targeted by OpenCL are similar, but different …
  • 40. NVIDIA GT200 Image Rights reserved by original copyright holders
  • 41. AMD RV770 Image Rights reserved by original copyright holders
  • 42. INTEL LARRABEE Image Rights reserved by original copyright holders
  • 43.

Notas do Editor

  1. So with the scene set for our presentation I’m going to talk a bit about the current state of the art in programming heterogeneous systems (with a summary of what will be used at SARA), as well as taking a look at what the development flow for a heterogeneous system really looks like.
  2. So with the scene set for our presentation I’m going to talk a bit about the current state of the art in programming heterogeneous systems (with a summary of what will be used at SARA), as well as taking a look at what the development flow for a heterogeneous system really looks like.
  3. At SARA the system is based on ClearSpeed Technology hardware and has the full range of development tools and libraries available
  4. The level of support offered by the ClearSpeed SDK for debugging and especially profiling is still well ahead of the best of the rest (for the moment). Host profiling API, allows you to instrument even non-CS specific code and have it displayed in the profiler.
  5. So let’s take a look at what makes heterogeneous systems interesting to the user and also some of the issues involved in programming them.
  6. If it’s single use it’s much easier to justify the investment in time and money to get the benefits of acceleration If it’s multi-use then the cost benefit analysis is more complicated, but can still be swayed by an obvious imbalance in resource consumption. Are the codes yours, open source or closed source ISV applications? If you have source level access do you have the development expertise and resources?
  7. So let’s put closed source applications to one side for a moment. If you have answered yes to “Do you have source access?” and “Do you have the development capabilities?” them, today you will have to decide on one of a number of proprietary development environments.
  8. I include OpenCL here because of it’s similarity to existing languages and it’s imminent availability.
  9. As with MKL, ACML etc IHVs will usually (but not always) get the best out of their hardware. The Library approach is by far and away easiest for the user because it carries with it the potential to provide acceleration for ISV applications, but there are a number of caveats, such as the requirement for the apps to use standard libraries (such as BLAS, LAPACK, FFTW etc) and dynamic linking (many do not because it reduces the support burden). ClearSpeed has long provided a selection of L3 BLAS support and drop in replacements for many of the most popular LAPACK routines. As you will see, the applicability and effectiveness of this approach is limited by the amount of data that gets moved around vs the compute required (in the case of DGEMM that’s n^3 compute to n^2 data)
  10. Ok so we’ve established that proprietary solutions are not ideal for a number of reasons, but even then they have stimulated the interest of the research community and for some cases they still do provide compelling financial advantages to the user. Why do I say ‘inevitably’, well because the pull from both the developers and customers is there. Developers want to innovate, but not all are willing to be locked into single vendor deals for obvious reasons. OpenCL has gained enviable support in a very short period of time and Petapath are members of the Khronos Group and are actively participating on the OpenCL working group.
  11. So what, for those of you who are not familiar with it, is OpenCL? It addresses a wide range of systems in a familiar way. Very similar to the existing language and library support from a number IHVs.
  12. A very interesting point to note here is that OpenCL can also target multi-core systems. It does this via supporting the SIMD extensions to current x86 cores and exposing this parallelism to the developer in a single open API. It doesn’t provide anything that OpenMP doesn’t apart from a single API and programming interface, but this is the huge benefit for developers.
  13. Note that there can be multiple OpenCL compute devices in a single system. Initially this is likely to be the host multi-core backend and a single vendor’s accelerator but the potential is there for supporting multiple accelerators and incrementally accelerating your systems.
  14. So this all sounds great, but when will I be able to use OpenCL. And it’s a 1.0 spec shouldn’t I watch to see what happens for a little bit?
  15. Note that I said earlier that there could be multiple OpenCL supported devices in a system. Well interoperability between different vendor’s implementations will be the key to this.
  16. So having mapped out what people use today, and what standards we may have in the near future what does development on a heterogeneous system look like today?
  17. Well if you’re here then there is probably a financial or scientific imperative to make the application run faster. HVs also provide optimised (BLAS, LAPACK, etc.) so use where you can Many compilers enable support for SSE2+ and auto-parallelisation Does it run fast enough yet? (Where can I go next if it doesn’t?)
  18. The list of general and vendor specific tips is too long to go into here.
  19. Wake up Tim I’m expecting a heckle here!
  20. So having mapped out what people use today, and what standards we may have in the near future what does development on a heterogeneous system look like today?
  21. So will all vendors hardware behave the same? How will the performance vary on different platforms?
  22. (clearspeed gdb support and has done for about four years)
  23. (clearspeed gdb support and has done for about four years)
  24. There are many tools that developers rely on for host development and I think that means there will be space for a thriving ecosystem of third party tools for OpenCL