SlideShare uma empresa Scribd logo
1 de 31
Java GPU Computing 
Maarten Steur & Arjan Lamers
● Overzicht OpenCL 
● Simpel voorbeeld 
● Casus 
● Tips & tricks 
● Vragen
Waarom GPU Computing
Afkortingen 
● CPU, GPU, APU 
● Khronos: OpenCL, OpenGL 
● Nvidia: CUDA 
● JogAmp JOCL, JavaCL, JOCL
GPU vergeleken met CPU 
● Veel simpele cores 
● Veel high bandwidth geheugen 
● Intel core i7 GeForce GT 650M 
8 cores 384 cores 
180 Gflops 650 Gflops
Programmeer model 
● Definieer stream (flow) 
● Run in parallel
Gebruik 
● Algorithme: 
– Hoge Concurrency 
– Partitioneerbaar 
● Maar: 
– Extra latency door on- en offloaden op 
de GPU 
– Extra complexiteit
Componenten
Componenten
Voorbeeld (MacBook Pro) 
Platform name: Apple 
Platform profile: FULL_PROFILE 
Platform spec version: OpenCL 1.2 
Platform vendor: Apple 
Device 16925696 HD Graphics 4000 
Driver:1.2(Aug 17 2014 20:29:07) 
Max work group size:512 
Global mem size: 1073741824 
Local mem size: 65536 
Max clock freq: 1200 
Max compute units: 16 
Device 16918272 GeForce GT 650M 
Driver:8.26.28 310.40.55b01 
Max work group size:1024 
Global mem size: 1073741824 
Local mem size: 49152 
Max clock freq: 900 
Max compute units: 2 
Device 4294967295 Intel(R) Core(TM) i7-3720QM CPU @ 
2.60GHz 
Driver:1.1 
Max work group size:1024 
Global mem size: 17179869184 
Local mem size: 32768 
Max clock freq: 2600 
Max compute units: 8
Work & Memory
Application / Kernel 
● Schrijf .cl files in C variant 
● Kernels zijn de 'publieke' functies 
● Java Bytecode 
– Aparapi (OpenCL) 
– RootBeer (CUDA)
Disclaimer
Parallel sort 
kernel void sort(global const float* in, global float* out, int size) { 
int i = get_global_id(0); // current thread 
float id = in[i]; 
int pos = 0; 
for (int j=0;j<size;j++) 
{ 
float jd = in[j]; 
// in[j] < in[i] ? 
bool smaller = (jx < ix) || (jx == ix && j < i); 
pos += (smaller)?1:0; 
} 
out[pos] = id; 
}
Java GPU Computing 
CLContext globalContext = CLContext.create(); 
CLDevice device = globalContext.getMaxFlopsDevice(Type.GPU); 
CLContext context = CLContext.create(device); 
CLCommandQueue queue = device.createCommandQueue(); 
CLProgram program = 
context.createProgram( 
First8GpuComputing.class.getResourceAsStream("MyTask.cl") 
).build(); 
Je kunt ook builden voor specifieke devices: build(device)
Java GPU Computing 
CLBuffer<FloatBuffer> inBuffer = context.createFloatBuffer( 
input.length , READ_ONLY); 
CLBuffer<FloatBuffer> outBuffer = context.createFloatBuffer( 
input.length, WRITE_ONLY); 
mapToBuffer(inBuffer.getBuffer(), workLoad);
Java GPU Computing 
CLBuffer<FloatBuffer> inBuffer = context.createFloatBuffer( 
input.length , READ_ONLY); 
CLBuffer<FloatBuffer> outBuffer = context.createFloatBuffer( 
input.length, WRITE_ONLY); 
mapToBuffer(inBuffer.getBuffer(), workLoad); 
CLKernel kernel = program.createCLKernel("MyTask"); 
kernel.putArgs(inBuffer, outBuffer).putArg(workLoad.length);
Java GPU Computing 
CLBuffer<FloatBuffer> inBuffer = context.createFloatBuffer( 
input.length , READ_ONLY); 
CLBuffer<FloatBuffer> outBuffer = context.createFloatBuffer( 
input.length, WRITE_ONLY); 
mapToBuffer(inBuffer.getBuffer(), workLoad); 
CLKernel kernel = program.createCLKernel("MyTask"); 
kernel.putArgs(inBuffer, outBuffer).putArg(workLoad.length); 
queue.putWriteBuffer(inBuffer, false) 
.put1DRangeKernel(kernel, 0, globalWorkSize, localWorkSize) 
.putReadBuffer(outBuffer, true); 
FloatBuffer output = outBuffer.getBuffer();
Praktijkcasus
Praktijk casus 
● Rekeninstrument ter ondersteuning van 
de Programmatische Aanpak Stikstof. 
● http://www.aerius.nl
Praktijk casus
Praktijk casus
Tips & tricks 
● CL beheer 
– getResourceAsStream()? 
– Java constanten → #define 
– Locale? Oops!
Tips & tricks 
● Unit testen 
– Aparte test kernels 
– Test cases in batches 
kernel void testDifficultCalculation(const int testCount, 
global const double* distance, global double* results) { 
const int testId = get_global_id(0); 
if (testId < testCount) { 
results[testId] = difficultCalculation(distance[testId]); 
} 
}
Direct memory management 
● -XX:MaxDirectMemorySize=??M 
● ByteBuffer.allocateDirect(int capacity) 
– Max 2GB per buffer 
● Garbage collection te laat 
– Getriggered door heap collection 
– Handmatig vrijgeven 
– ((sun.nio.ch.DirectBuffer) 
myBuffer).cleaner().clean(); 
● VisualVM plugin voor direct buffers
GPU vs CPU 
● GPU's checken minder dan CPU's 
– Div by zero 
– Out of bounds checks 
– Test eerst op CPU
Portabiliteit 
● OpenCL is portable, de performance 
niet 
– Memory sizes verschillen 
– Memory latencies verschillen 
– Work group sizes verschillen 
– Compute devices verschillen 
– OpenCL implementatie verschillen 
● Develop dus voor de productie 
hardware
Ten slotte 
● Float vs Double 
– Dubbele precisie 
– Halve performance 
– Double support optioneel
Conclusie
Conclusie 
● Wanneer te gebruiken? 
– Als performance echt nodig is 
– Als probleem hoge concurrency heeft 
– Als probleem partitioneerbaar is
Vragen? 
Setting up OpenCL test on Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz 
Warming up OpenCL test 
[thread 32003 also had an error][thread 33027 also had an error] 
## 
A fatal error has been detected by the Java Runtime Environment: 
## 
SIGSEGV[thread 32515 also had an error] 
(0xb)[thread 32771 also had an error] 
[thread 32259 also had an error] 
at pc=0x00000001250ded70, pid=99851, tid=29475 
## 
JRE version: Java(TM) SE Runtime Environment (8.0_20-b26) (build 1.8.0_20-b26) 
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.20-b23 mixed mode bsd-amd64 compressed oops) 
# Problematic frame: 
# [thread 17415 also had an error] 
C [cl_kernels+0x1d70] sort_wrapper+0x1b0 
## 
Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again 
## 
An error report file with more information is saved as: 
# /Users/arjanl/Documents/opencl/workspace/opencl-test/jogamp/hs_err_pid99851.log 
[thread 31763 also had an error] 
## 
If you would like to submit a bug report, please visit: 
# http://bugreport.sun.com/bugreport/crash.jsp 
#

Mais conteúdo relacionado

Mais procurados

Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyOptimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyHenning Jacobs
 
Nvidia in bioinformatics
Nvidia in bioinformaticsNvidia in bioinformatics
Nvidia in bioinformaticsShanker Trivedi
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Ural-PDC
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~Kohei KaiGai
 
Performance is a feature! - Developer South Coast - part 2
Performance is a feature!  - Developer South Coast - part 2Performance is a feature!  - Developer South Coast - part 2
Performance is a feature! - Developer South Coast - part 2Matt Warren
 
Deploying Prometheus stacks with Juju
Deploying Prometheus stacks with JujuDeploying Prometheus stacks with Juju
Deploying Prometheus stacks with JujuJ.J. Ciarlante
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015Kohei KaiGai
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaRob Gillen
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda enKohei KaiGai
 
GPU profiling for computer vision applications
GPU profiling for computer vision applicationsGPU profiling for computer vision applications
GPU profiling for computer vision applicationsMai Nishimura
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAPiyush Mittal
 
Molecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New WorldMolecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New WorldCan Ozdoruk
 
Nvidia® cuda™ 5 sample evaluationresult_2
Nvidia® cuda™ 5 sample evaluationresult_2Nvidia® cuda™ 5 sample evaluationresult_2
Nvidia® cuda™ 5 sample evaluationresult_2Yukio Saito
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storageKohei KaiGai
 
chainer-trt: ChainerとTensorRTで超高速推論
chainer-trt: ChainerとTensorRTで超高速推論chainer-trt: ChainerとTensorRTで超高速推論
chainer-trt: ChainerとTensorRTで超高速推論belltailjp
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012DefCamp
 
Sun jdk 1.6 gc english version
Sun jdk 1.6 gc english versionSun jdk 1.6 gc english version
Sun jdk 1.6 gc english versionbluedavy lin
 

Mais procurados (18)

Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyOptimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
 
Nvidia in bioinformatics
Nvidia in bioinformaticsNvidia in bioinformatics
Nvidia in bioinformatics
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
 
Performance is a feature! - Developer South Coast - part 2
Performance is a feature!  - Developer South Coast - part 2Performance is a feature!  - Developer South Coast - part 2
Performance is a feature! - Developer South Coast - part 2
 
Deploying Prometheus stacks with Juju
Deploying Prometheus stacks with JujuDeploying Prometheus stacks with Juju
Deploying Prometheus stacks with Juju
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
GPU profiling for computer vision applications
GPU profiling for computer vision applicationsGPU profiling for computer vision applications
GPU profiling for computer vision applications
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDA
 
Molecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New WorldMolecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New World
 
Nvidia® cuda™ 5 sample evaluationresult_2
Nvidia® cuda™ 5 sample evaluationresult_2Nvidia® cuda™ 5 sample evaluationresult_2
Nvidia® cuda™ 5 sample evaluationresult_2
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
chainer-trt: ChainerとTensorRTで超高速推論
chainer-trt: ChainerとTensorRTで超高速推論chainer-trt: ChainerとTensorRTで超高速推論
chainer-trt: ChainerとTensorRTで超高速推論
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
 
Sun jdk 1.6 gc english version
Sun jdk 1.6 gc english versionSun jdk 1.6 gc english version
Sun jdk 1.6 gc english version
 

Semelhante a Java gpu computing

開放運算&GPU技術研究班
開放運算&GPU技術研究班開放運算&GPU技術研究班
開放運算&GPU技術研究班Paul Chao
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to AcceleratorsDilum Bandara
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...VMware Tanzu
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugginglibfetion
 
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsGPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsAkihiro Hayashi
 
clWrap: Nonsense free control of your GPU
clWrap: Nonsense free control of your GPUclWrap: Nonsense free control of your GPU
clWrap: Nonsense free control of your GPUJohn Colvin
 
CUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" courseCUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" courseShuai Yuan
 
Speculative Execution of Parallel Programs with Precise Exception Semantics ...
Speculative Execution of Parallel Programs with Precise Exception Semantics ...Speculative Execution of Parallel Programs with Precise Exception Semantics ...
Speculative Execution of Parallel Programs with Precise Exception Semantics ...Akihiro Hayashi
 
Android Boot Time Optimization
Android Boot Time OptimizationAndroid Boot Time Optimization
Android Boot Time OptimizationKan-Ru Chen
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilersAnastasiaStulova
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...IndicThreads
 
LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness Peter Griffin
 
LAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLinaro
 

Semelhante a Java gpu computing (20)

開放運算&GPU技術研究班
開放運算&GPU技術研究班開放運算&GPU技術研究班
開放運算&GPU技術研究班
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsGPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
 
clWrap: Nonsense free control of your GPU
clWrap: Nonsense free control of your GPUclWrap: Nonsense free control of your GPU
clWrap: Nonsense free control of your GPU
 
CUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" courseCUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" course
 
Speculative Execution of Parallel Programs with Precise Exception Semantics ...
Speculative Execution of Parallel Programs with Precise Exception Semantics ...Speculative Execution of Parallel Programs with Precise Exception Semantics ...
Speculative Execution of Parallel Programs with Precise Exception Semantics ...
 
Android Boot Time Optimization
Android Boot Time OptimizationAndroid Boot Time Optimization
Android Boot Time Optimization
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilers
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...
 
LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness
 
LAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel Awareness
 

Último

VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 

Último (20)

VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 

Java gpu computing

  • 1. Java GPU Computing Maarten Steur & Arjan Lamers
  • 2. ● Overzicht OpenCL ● Simpel voorbeeld ● Casus ● Tips & tricks ● Vragen
  • 4. Afkortingen ● CPU, GPU, APU ● Khronos: OpenCL, OpenGL ● Nvidia: CUDA ● JogAmp JOCL, JavaCL, JOCL
  • 5. GPU vergeleken met CPU ● Veel simpele cores ● Veel high bandwidth geheugen ● Intel core i7 GeForce GT 650M 8 cores 384 cores 180 Gflops 650 Gflops
  • 6. Programmeer model ● Definieer stream (flow) ● Run in parallel
  • 7. Gebruik ● Algorithme: – Hoge Concurrency – Partitioneerbaar ● Maar: – Extra latency door on- en offloaden op de GPU – Extra complexiteit
  • 10. Voorbeeld (MacBook Pro) Platform name: Apple Platform profile: FULL_PROFILE Platform spec version: OpenCL 1.2 Platform vendor: Apple Device 16925696 HD Graphics 4000 Driver:1.2(Aug 17 2014 20:29:07) Max work group size:512 Global mem size: 1073741824 Local mem size: 65536 Max clock freq: 1200 Max compute units: 16 Device 16918272 GeForce GT 650M Driver:8.26.28 310.40.55b01 Max work group size:1024 Global mem size: 1073741824 Local mem size: 49152 Max clock freq: 900 Max compute units: 2 Device 4294967295 Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz Driver:1.1 Max work group size:1024 Global mem size: 17179869184 Local mem size: 32768 Max clock freq: 2600 Max compute units: 8
  • 12. Application / Kernel ● Schrijf .cl files in C variant ● Kernels zijn de 'publieke' functies ● Java Bytecode – Aparapi (OpenCL) – RootBeer (CUDA)
  • 14. Parallel sort kernel void sort(global const float* in, global float* out, int size) { int i = get_global_id(0); // current thread float id = in[i]; int pos = 0; for (int j=0;j<size;j++) { float jd = in[j]; // in[j] < in[i] ? bool smaller = (jx < ix) || (jx == ix && j < i); pos += (smaller)?1:0; } out[pos] = id; }
  • 15. Java GPU Computing CLContext globalContext = CLContext.create(); CLDevice device = globalContext.getMaxFlopsDevice(Type.GPU); CLContext context = CLContext.create(device); CLCommandQueue queue = device.createCommandQueue(); CLProgram program = context.createProgram( First8GpuComputing.class.getResourceAsStream("MyTask.cl") ).build(); Je kunt ook builden voor specifieke devices: build(device)
  • 16. Java GPU Computing CLBuffer<FloatBuffer> inBuffer = context.createFloatBuffer( input.length , READ_ONLY); CLBuffer<FloatBuffer> outBuffer = context.createFloatBuffer( input.length, WRITE_ONLY); mapToBuffer(inBuffer.getBuffer(), workLoad);
  • 17. Java GPU Computing CLBuffer<FloatBuffer> inBuffer = context.createFloatBuffer( input.length , READ_ONLY); CLBuffer<FloatBuffer> outBuffer = context.createFloatBuffer( input.length, WRITE_ONLY); mapToBuffer(inBuffer.getBuffer(), workLoad); CLKernel kernel = program.createCLKernel("MyTask"); kernel.putArgs(inBuffer, outBuffer).putArg(workLoad.length);
  • 18. Java GPU Computing CLBuffer<FloatBuffer> inBuffer = context.createFloatBuffer( input.length , READ_ONLY); CLBuffer<FloatBuffer> outBuffer = context.createFloatBuffer( input.length, WRITE_ONLY); mapToBuffer(inBuffer.getBuffer(), workLoad); CLKernel kernel = program.createCLKernel("MyTask"); kernel.putArgs(inBuffer, outBuffer).putArg(workLoad.length); queue.putWriteBuffer(inBuffer, false) .put1DRangeKernel(kernel, 0, globalWorkSize, localWorkSize) .putReadBuffer(outBuffer, true); FloatBuffer output = outBuffer.getBuffer();
  • 20. Praktijk casus ● Rekeninstrument ter ondersteuning van de Programmatische Aanpak Stikstof. ● http://www.aerius.nl
  • 23. Tips & tricks ● CL beheer – getResourceAsStream()? – Java constanten → #define – Locale? Oops!
  • 24. Tips & tricks ● Unit testen – Aparte test kernels – Test cases in batches kernel void testDifficultCalculation(const int testCount, global const double* distance, global double* results) { const int testId = get_global_id(0); if (testId < testCount) { results[testId] = difficultCalculation(distance[testId]); } }
  • 25. Direct memory management ● -XX:MaxDirectMemorySize=??M ● ByteBuffer.allocateDirect(int capacity) – Max 2GB per buffer ● Garbage collection te laat – Getriggered door heap collection – Handmatig vrijgeven – ((sun.nio.ch.DirectBuffer) myBuffer).cleaner().clean(); ● VisualVM plugin voor direct buffers
  • 26. GPU vs CPU ● GPU's checken minder dan CPU's – Div by zero – Out of bounds checks – Test eerst op CPU
  • 27. Portabiliteit ● OpenCL is portable, de performance niet – Memory sizes verschillen – Memory latencies verschillen – Work group sizes verschillen – Compute devices verschillen – OpenCL implementatie verschillen ● Develop dus voor de productie hardware
  • 28. Ten slotte ● Float vs Double – Dubbele precisie – Halve performance – Double support optioneel
  • 30. Conclusie ● Wanneer te gebruiken? – Als performance echt nodig is – Als probleem hoge concurrency heeft – Als probleem partitioneerbaar is
  • 31. Vragen? Setting up OpenCL test on Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz Warming up OpenCL test [thread 32003 also had an error][thread 33027 also had an error] ## A fatal error has been detected by the Java Runtime Environment: ## SIGSEGV[thread 32515 also had an error] (0xb)[thread 32771 also had an error] [thread 32259 also had an error] at pc=0x00000001250ded70, pid=99851, tid=29475 ## JRE version: Java(TM) SE Runtime Environment (8.0_20-b26) (build 1.8.0_20-b26) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.20-b23 mixed mode bsd-amd64 compressed oops) # Problematic frame: # [thread 17415 also had an error] C [cl_kernels+0x1d70] sort_wrapper+0x1b0 ## Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again ## An error report file with more information is saved as: # /Users/arjanl/Documents/opencl/workspace/opencl-test/jogamp/hs_err_pid99851.log [thread 31763 also had an error] ## If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp #

Notas do Editor

  1. Wij zijn Arjan &amp; Maarten Arjan: software architect, schaalbaarheid en performance interesse Maarten: senior developer, performance en concurrency, 3d interesse
  2. Werkzaam ministerie economische zaken Project Aerius
  3. PAS: programmatische aanpak stikstof Balanceren van milieu en economische ontwikkelingen. Rekeninstrument: monitoren doelstellingen en ondersteuning aanvraag vergunningen
  4. Berekend concentraties/deposities Exporteer voor vergunning aanvraag Vergelijk meerdere situaties OpenCL toepassing: wegverkeer Snelheid van belang ivm wachten
  5. Importeer set bronnen Bereken per bron – rekenpunt Tel resultaten op per rekenpunt Emissie van de weg Afstand tot de weg Windsnelheid Windrichting Ozon concentratie Locatie
  6. Creatief met tekst files OpenCL file inladen + pre-processen Java constanten toevoegen dmv #define Locale 1.0 vs 1,0 Configureerbare opties Tijd voor testen!
  7. Test kernels toevoegen, alleen in test mode. Junit test functie: Buffers met test waarden Buffers met verwachtte resultaten Test → &amp;apos;Uitdagingen&amp;apos; met direct memory
  8. Niet genoeg geheugen → Direct memory size Max 2 GB per buffer Eerste run goed, tweede run faalt? → Garbage Collection getriggered op heap space. Buffer release → geheugen handmatig vrijgeven Sun classes → JVM specifiek Handige tool: plugin voor VisualVM
  9. Division by zero → geen probleem, resultaten waardeloos Lezen/schrijven buiten gealloceerd geheugen? CPU → Crash GPU → Geen probleem (Waarden veranderen per test run) Test eerst op CPU! (Maar nog geen garantie) Nog meer device verschillen...
  10. “OpenCL is portable, de performance niet” OpenCL ook niet altijd portable “Write once, debug anywhere” ? Develop voor productie hardware/drivers
  11. Performance of precisie? Is double echt nodig? Double support optioneel, maar high end meestal wel.
  12. Alleen als de performance nodig is EN Het probleem hoge concurrency vertoont Partioneerbaar meestal handig