SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
GPU Programming

       Roberto Bonvallet
      Departamento de Inform´ tica
                               a
Universidad T´ cnica Federico Santa Mar´a
             e                         ı


          Junio de 2010
CPU vs GPU peak performance
CPU and GPU architectures


  Control   ALU ALU
            ALU ALU
  Cache



  DRAM                DRAM
CPU and GPU architectures




                      DRAM
CPU and GPU architectures
Nvidia Tesla architecture
Task and data parallelism
Task and data parallelism



                            Task parallelism:
                                distributed
                                processing
                                distributed memory
                                message passing
Task and data parallelism



                            Task parallelism:
                                distributed
                                processing
                                distributed memory
                                message passing
                            Data parallelism:
                                same instruction on
                                different data
                                shared memory
Thread and memory hierarchies




                      Thread hierarchy:
Thread and memory hierarchies




                      Thread hierarchy:
                          grid of blocks
Thread and memory hierarchies




                      Thread hierarchy:
                          grid of blocks
                          blocks of threads
Thread and memory hierarchies




                      Thread hierarchy:
                          grid of blocks
                          blocks of threads
                      Memory hierarchy:
                          global memory (large, slow)
                          shared memory (per-block, small, fast)
                          registers (per-thread, small, fast)
Matrix-matrix multiplication
Matrix-matrix multiplication

                        cij =       aik bkj
                                k
Matrix-matrix multiplication

                        cij =       aik bkj
                                k

                        Cij =       Aik Bkj
                                k
Matrix-matrix multiplication

                        cij =         aik bkj
                                 k

                        Cij =         Aik Bkj
                                  k
                        Multiplication kernel:
                                initialize element of
                                Cij = 0
                                for each k:
                                      fetch element of
                                      Aik , Bkj into shared
                                      memory
                                      synchronize
                                      compute element
                                      of Cij = Cij + Aik Bkj
                                      synchronize
Nvidia C1060



 Core clock            602 Mhz
 Multiprocessors       30
 Thread processors     240 = 30 × 8
 Memory size           4 GB
 Memory bandwidth      102.4 GB/s
 Single precision pp   933.12 Gflop
 Double precision pp   77.76 Gflop
CUDA programming

    Array allocation and copying
    cudaMalloc((void **) &p, mem_size);

    cudaMemcpy(host_p, dev_p, mem_size,
               cudaMemcpyHostToDevice);

    [...]

    cudaMemcpy(dev_p, host_p, mem_size,
               cudaMemcpyDeviceToHost);

    cudaFree(p);
CUDA programming

    Kernel definition
    __global__ void
    vector_sum(float *a, float *b, float *c) {
        int i = blockIdx.x * blockDim.x +
                   threadIdx.x;
        c[i] = a[i] + b[i];
    }
CUDA programming

    Kernel definition
    __global__ void
    vector_sum(float *a, float *b, float *c) {
        int i = blockIdx.x * blockDim.x +
                   threadIdx.x;
        c[i] = a[i] + b[i];
    }

    Kernel launch
    f<<<grid_size, block_size,
        sh_mem_size>>>(a, b, c);
Vortex Methods


                 Fluid discretized as vortices
                 (x, y, α)
Vortex Methods


                 Fluid discretized as vortices
                 (x, y, α)
                 Vortex interaction:
                                    1
                    K(x, y) = −        (−y, x)
                                  2π x
Vortex Methods


                 Fluid discretized as vortices
                 (x, y, α)
                 Vortex interaction:
                                    1
                    K(x, y) = −        (−y, x)
                                  2π x

                 Biot-Savart law:

                     u(x) =       αp K(x − xp )
                              p
GPU velocity evaluation

Mais conteúdo relacionado

Mais procurados

Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
Rob Gillen
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
Angela Mendoza M.
 
Java script dom-cheatsheet)
Java script dom-cheatsheet)Java script dom-cheatsheet)
Java script dom-cheatsheet)
Fafah Ranaivo
 

Mais procurados (20)

Using GPUs for parallel processing
Using GPUs for parallel processingUsing GPUs for parallel processing
Using GPUs for parallel processing
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
CUDA
CUDACUDA
CUDA
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUs
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
Gpu perf-presentation
Gpu perf-presentationGpu perf-presentation
Gpu perf-presentation
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
 
Java script dom-cheatsheet)
Java script dom-cheatsheet)Java script dom-cheatsheet)
Java script dom-cheatsheet)
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
Seeing with Python presented at PyCon AU 2014
Seeing with Python presented at PyCon AU 2014Seeing with Python presented at PyCon AU 2014
Seeing with Python presented at PyCon AU 2014
 
Cuda
CudaCuda
Cuda
 
GPU: Understanding CUDA
GPU: Understanding CUDAGPU: Understanding CUDA
GPU: Understanding CUDA
 
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager ExecutionTensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
 
OpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering TechniquesOpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering Techniques
 
2011.02.18 marco parenzan - modelli di programmazione per le gpu
2011.02.18   marco parenzan - modelli di programmazione per le gpu2011.02.18   marco parenzan - modelli di programmazione per le gpu
2011.02.18 marco parenzan - modelli di programmazione per le gpu
 
Separable bilateral filtering for fast video preprocessing
Separable bilateral filtering for fast video preprocessingSeparable bilateral filtering for fast video preprocessing
Separable bilateral filtering for fast video preprocessing
 
Chainer ui v0.3 and imagereport
Chainer ui v0.3 and imagereportChainer ui v0.3 and imagereport
Chainer ui v0.3 and imagereport
 

Destaque (7)

Test 101
Test 101Test 101
Test 101
 
Windows 7, Despliegue
Windows 7, DespliegueWindows 7, Despliegue
Windows 7, Despliegue
 
Austin Xmas 2008
Austin Xmas 2008Austin Xmas 2008
Austin Xmas 2008
 
Windows7 Venta En El Mercado
Windows7  Venta En El  MercadoWindows7  Venta En El  Mercado
Windows7 Venta En El Mercado
 
Programación funcional en Haskell
Programación funcional en HaskellProgramación funcional en Haskell
Programación funcional en Haskell
 
Tobacco Use
Tobacco UseTobacco Use
Tobacco Use
 
Edición eficiente de texto con Vim
Edición eficiente de texto con VimEdición eficiente de texto con Vim
Edición eficiente de texto con Vim
 

Semelhante a GPU programming

Gdc03 ericson memory_optimization
Gdc03 ericson memory_optimizationGdc03 ericson memory_optimization
Gdc03 ericson memory_optimization
brettlevin
 
Windows to reality getting the most out of direct3 d 10 graphics in your games
Windows to reality   getting the most out of direct3 d 10 graphics in your gamesWindows to reality   getting the most out of direct3 d 10 graphics in your games
Windows to reality getting the most out of direct3 d 10 graphics in your games
changehee lee
 
Drupal Camp Kiev 2012 - High Performance Drupal Web Sites
Drupal Camp Kiev 2012 - High Performance Drupal Web SitesDrupal Camp Kiev 2012 - High Performance Drupal Web Sites
Drupal Camp Kiev 2012 - High Performance Drupal Web Sites
Jean-Baptiste Guerraz
 
An Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptxAn Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptx
AnirudhGarg35
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Chester Chen
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
mjfrankli
 

Semelhante a GPU programming (20)

Efficient Parallel Set-Similarity Joins Using MapReduce - Poster
Efficient Parallel Set-Similarity Joins Using MapReduce - PosterEfficient Parallel Set-Similarity Joins Using MapReduce - Poster
Efficient Parallel Set-Similarity Joins Using MapReduce - Poster
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: CUDA Tricks and High-Performance Comput...
IAP09 CUDA@MIT 6.963 - Guest Lecture: CUDA Tricks and High-Performance Comput...IAP09 CUDA@MIT 6.963 - Guest Lecture: CUDA Tricks and High-Performance Comput...
IAP09 CUDA@MIT 6.963 - Guest Lecture: CUDA Tricks and High-Performance Comput...
 
Gdc03 ericson memory_optimization
Gdc03 ericson memory_optimizationGdc03 ericson memory_optimization
Gdc03 ericson memory_optimization
 
Sparse Content Map Storage System
Sparse Content Map Storage SystemSparse Content Map Storage System
Sparse Content Map Storage System
 
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
 
Intro to threp
Intro to threpIntro to threp
Intro to threp
 
Windows to reality getting the most out of direct3 d 10 graphics in your games
Windows to reality   getting the most out of direct3 d 10 graphics in your gamesWindows to reality   getting the most out of direct3 d 10 graphics in your games
Windows to reality getting the most out of direct3 d 10 graphics in your games
 
Drupal Camp Kiev 2012 - High Performance Drupal Web Sites
Drupal Camp Kiev 2012 - High Performance Drupal Web SitesDrupal Camp Kiev 2012 - High Performance Drupal Web Sites
Drupal Camp Kiev 2012 - High Performance Drupal Web Sites
 
An Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptxAn Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptx
 
Cuda Architecture
Cuda ArchitectureCuda Architecture
Cuda Architecture
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
VAST-Tree, EDBT'12
VAST-Tree, EDBT'12VAST-Tree, EDBT'12
VAST-Tree, EDBT'12
 
Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012
Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012
Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
 
IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...
IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...
IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
 
Lecture 25
Lecture 25Lecture 25
Lecture 25
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

GPU programming

  • 1. GPU Programming Roberto Bonvallet Departamento de Inform´ tica a Universidad T´ cnica Federico Santa Mar´a e ı Junio de 2010
  • 2. CPU vs GPU peak performance
  • 3. CPU and GPU architectures Control ALU ALU ALU ALU Cache DRAM DRAM
  • 4. CPU and GPU architectures DRAM
  • 5. CPU and GPU architectures
  • 7. Task and data parallelism
  • 8. Task and data parallelism Task parallelism: distributed processing distributed memory message passing
  • 9. Task and data parallelism Task parallelism: distributed processing distributed memory message passing Data parallelism: same instruction on different data shared memory
  • 10. Thread and memory hierarchies Thread hierarchy:
  • 11. Thread and memory hierarchies Thread hierarchy: grid of blocks
  • 12. Thread and memory hierarchies Thread hierarchy: grid of blocks blocks of threads
  • 13. Thread and memory hierarchies Thread hierarchy: grid of blocks blocks of threads Memory hierarchy: global memory (large, slow) shared memory (per-block, small, fast) registers (per-thread, small, fast)
  • 15. Matrix-matrix multiplication cij = aik bkj k
  • 16. Matrix-matrix multiplication cij = aik bkj k Cij = Aik Bkj k
  • 17. Matrix-matrix multiplication cij = aik bkj k Cij = Aik Bkj k Multiplication kernel: initialize element of Cij = 0 for each k: fetch element of Aik , Bkj into shared memory synchronize compute element of Cij = Cij + Aik Bkj synchronize
  • 18. Nvidia C1060 Core clock 602 Mhz Multiprocessors 30 Thread processors 240 = 30 × 8 Memory size 4 GB Memory bandwidth 102.4 GB/s Single precision pp 933.12 Gflop Double precision pp 77.76 Gflop
  • 19. CUDA programming Array allocation and copying cudaMalloc((void **) &p, mem_size); cudaMemcpy(host_p, dev_p, mem_size, cudaMemcpyHostToDevice); [...] cudaMemcpy(dev_p, host_p, mem_size, cudaMemcpyDeviceToHost); cudaFree(p);
  • 20. CUDA programming Kernel definition __global__ void vector_sum(float *a, float *b, float *c) { int i = blockIdx.x * blockDim.x + threadIdx.x; c[i] = a[i] + b[i]; }
  • 21. CUDA programming Kernel definition __global__ void vector_sum(float *a, float *b, float *c) { int i = blockIdx.x * blockDim.x + threadIdx.x; c[i] = a[i] + b[i]; } Kernel launch f<<<grid_size, block_size, sh_mem_size>>>(a, b, c);
  • 22. Vortex Methods Fluid discretized as vortices (x, y, α)
  • 23. Vortex Methods Fluid discretized as vortices (x, y, α) Vortex interaction: 1 K(x, y) = − (−y, x) 2π x
  • 24. Vortex Methods Fluid discretized as vortices (x, y, α) Vortex interaction: 1 K(x, y) = − (−y, x) 2π x Biot-Savart law: u(x) = αp K(x − xp ) p