SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
GPU Computing:
The Democratization of Parallel Computing
               David Luebke
              NVIDIA Research
Tutorial Speakers


         David Luebke       NVIDIA Research

         Kevin Skadron      University of Virginia

         Michael Garland    NVIDIA Research

         John Owens         University of California Davis




© NVIDIA Corporation 2007
Tutorial Schedule

1:30 – 1:55                 Introduction & Motivation                        Luebke

1:55 – 2:15                 Manycore architectural trends                    Skadron

2:15 – 3:15                 CUDA model & programming                         Garland

3:15 – 3:30                 Break

3:30 – 4:00                 GPU architecture & implications                  Luebke

4:00 – 5:00                 Advanced data-parallel programming               Owens

5:00 – 5:30                 Architectural lessons & research opportunities   Skadron


© NVIDIA Corporation 2007
Parallel Computing’s Golden Age


         1980s, early `90s: a golden age for parallel computing
                   Particularly data-parallel computing


         Architectures
                   Connection Machine, MasPar, Cray
                   True supercomputers: incredibly exotic, powerful, expensive


         Algorithms, languages, & programming models
                   Solved a wide variety of problems
                   Various parallel algorithmic models developed
                   P-RAM, V-RAM, circuit, hypercube, etc.


© NVIDIA Corporation 2007
Parallel Computing’s Dark Age


  But…impact of data-parallel computing limited
     Thinking Machines sold 7 CM-1s (100s of systems total)
     MasPar sold ~200 systems


  Commercial and research activity subsided
     Massively-parallel machines replaced by clusters
     of ever-more powerful commodity microprocessors
     Beowulf, Legion, grid computing, …


  Massively parallel computing lost momentum to
  the inexorable advance of commodity technology
Enter the GPU


         GPU = Graphics Processing Unit
                   Chip in computer video cards, PlayStation 3, Xbox, etc.
                   Two major vendors: NVIDIA and ATI (now AMD)




© NVIDIA Corporation 2007
Enter the GPU


         GPUs are massively multithreaded manycore chips
                   NVIDIA Tesla products have up to 128 scalar processors
                   Over 12,000 concurrent threads in flight
                   Over 470 GFLOPS sustained performance


         Users across science & engineering disciplines are
         achieving 100x or better speedups on GPUs

         CS researchers can use GPUs as a research platform
         for manycore computing: arch, PL, numeric, …


© NVIDIA Corporation 2007
Enter CUDA


         CUDA is a scalable parallel programming model and a
         software environment for parallel computing
                   Minimal extensions to familiar C/C++ environment
                   Heterogeneous serial-parallel programming model


         NVIDIA’s TESLA GPU architecture accelerates CUDA
                   Expose the computational horsepower of NVIDIA GPUs
                   Enable general-purpose GPU computing


         CUDA also maps well to multicore CPUs!


© NVIDIA Corporation 2007
The Democratization
    of Parallel Computing
         GPU Computing with CUDA brings data-parallel
         computing to the masses
                   Over 46,000,000 CUDA-capable GPUs sold
                   A “developer kit” costs ~$200 (for 500 GFLOPS)


         Data-parallel supercomputers are everywhere!
                   CUDA makes this power accessible
                   We’re already seeing innovations in data-parallel
                   computing


         Massively parallel computing has become a
         commodity technology!
© NVIDIA Corporation 2007
GPU Computing:
  Motivation
17X
  45X                       100X




                13–457x


           GPU Computing:
110-240X
             Motivation
                             35X
GPUs Are Fast


         Theoretical peak performance: 518 GFLOPS

         Sustained μbenchmark performance:
                   Raw math: 472 GFLOPS (8800 Ultra)
                   Raw bandwidth: 80 GB per second (Tesla C870)


         Actual application performance:
                   Molecular dynamics: 290 GFLOPS
                   (VMD ion placement)




© NVIDIA Corporation 2007
GPUs Are Getting Faster, Faster




© NVIDIA Corporation 2007
Manycore GPU – Block Diagram
              G80 (launched Nov 2006 – GeForce 8800 GTX)
              128 Thread Processors execute kernel threads
              Up to 12,288 parallel threads active
              Per-block shared memory (PBSM) accelerates processing
                                               Host

                                         Input Assembler

                                   Thread Execution Manager



Thread Processors    Thread Processors     Thread Processors   Thread Processors       Thread Processors   Thread Processors   Thread Processors   Thread Processors




PBSM      PBSM       PBSM       PBSM      PBSM        PBSM     PBSM      PBSM          PBSM      PBSM      PBSM      PBSM      PBSM      PBSM      PBSM      PBSM




                                                                                   Load/store



    © NVIDIA Corporation 2007                                                 Global Memory
CUDA Programming Model
Heterogeneous Programming


   CUDA = serial program with parallel kernels, all in C
       Serial C code executes in a CPU thread
       Parallel kernel C code executes in thread blocks
       across multiple processing elements


          Serial Code


        Parallel Kernel
KernelA<<< nBlk, nTid >>>(args);                     ...


          Serial Code


        Parallel Kernel
KernelB<<< nBlk, nTid >>>(args);                     ...
GPU Computing with CUDA:
A Highly Multithreaded Coprocessor

         The GPU is a highly parallel compute device
                   serves as a coprocessor for the host CPU
                   has its own device memory on the card
                   executes many threads in parallel


         Parallel kernels run a single program in many threads

         GPU threads are extremely lightweight
                   Thread creation and context switching are essentially free


         GPU expects 1000’s of threads for full utilization
© NVIDIA Corporation 2007
CUDA: Programming GPU in C

Philosophy: provide minimal set of extensions necessary to expose power


Declaration specifiers to indicate where things live
    __global__ void KernelFunc(...);        // kernel function, runs on device
    __device__ int GlobalVar;               // variable in device memory
    __shared__ int SharedVar;               // variable in per-block shared memory


Extend function invocation syntax for parallel kernel launch
    KernelFunc<<<500, 128>>>(...);          // launch 500 blocks w/ 128 threads each


Special variables for thread identification in kernels
    dim3 threadIdx;     dim3 blockIdx;     dim3 blockDim;       dim3 gridDim;


Intrinsics that expose specific operations in kernel code
    __syncthreads();                        // barrier synchronization within kernel
Decoder Ring
            GeForce®              Quadro®                      TeslaTM
           Entertainment       Design & Creation      High Performance Computing




                            Architecture: TESLA
© NVIDIA Corporation 2007   Chips: G80, G84, G92, …
A New Platform: Tesla


         HPC-oriented product line
                   C870: board            (1 GPU)
                   D870: deskside unit    (2 GPUs)
                   S870: 1u server unit   (4 GPUs)




© NVIDIA Corporation 2007
Conclusion


         GPUs are massively parallel manycore computers
                   Ubiquitous - most successful parallel processor in history
                   Useful - users achieve huge speedups on real problems


         CUDA is a powerful parallel programming model
                   Heterogeneous - mixed serial-parallel programming
                   Scalable - hierarchical thread execution model
                   Accessible - minimal but expressive changes to C


         They provide tremendous scope for innovative,
         impactful research

© NVIDIA Corporation 2007
Questions?

   David Luebke
dluebke@nvidia.com

Mais conteúdo relacionado

Mais procurados

CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentationVishal Singh
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)Amal R
 
Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)Jafar Khan
 
Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)MuntasirMuhit
 
Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Saksham Tanwar
 
Graphic Processing Unit
Graphic Processing UnitGraphic Processing Unit
Graphic Processing UnitKamran Ashraf
 
GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)self employed
 
AMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD
 
graphics processing unit ppt
graphics processing unit pptgraphics processing unit ppt
graphics processing unit pptNitesh Dubey
 
5. IO virtualization
5. IO virtualization5. IO virtualization
5. IO virtualizationHwanju Kim
 

Mais procurados (20)

GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentation
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)
 
Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)
 
Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)
 
CPU vs GPU Comparison
CPU  vs GPU ComparisonCPU  vs GPU Comparison
CPU vs GPU Comparison
 
Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)
 
Graphic Processing Unit
Graphic Processing UnitGraphic Processing Unit
Graphic Processing Unit
 
Cuda
CudaCuda
Cuda
 
GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)
 
AMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor Architecture
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
Graphics processing unit
Graphics processing unitGraphics processing unit
Graphics processing unit
 
CUDA
CUDACUDA
CUDA
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
graphics processing unit ppt
graphics processing unit pptgraphics processing unit ppt
graphics processing unit ppt
 
Cuda
CudaCuda
Cuda
 
5. IO virtualization
5. IO virtualization5. IO virtualization
5. IO virtualization
 
Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)
 

Destaque

Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) ItalyClustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) ItalyGiovanni Toraldo
 
Docker Ecosystem - Part II - Compose
Docker Ecosystem - Part II - ComposeDocker Ecosystem - Part II - Compose
Docker Ecosystem - Part II - ComposeMario IC
 
Advanced ETL2 Pentaho
Advanced ETL2  Pentaho Advanced ETL2  Pentaho
Advanced ETL2 Pentaho Sunny U Okoro
 
Building Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoBuilding Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoAshnikbiz
 
NGINX Plus PLATFORM For Flawless Application Delivery
NGINX Plus PLATFORM For Flawless Application DeliveryNGINX Plus PLATFORM For Flawless Application Delivery
NGINX Plus PLATFORM For Flawless Application DeliveryAshnikbiz
 
Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?
Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?
Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?Carlos Sanchez
 
Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Uday Kothari
 
Migración de datos con OpenERP-Kettle
Migración de datos con OpenERP-KettleMigración de datos con OpenERP-Kettle
Migración de datos con OpenERP-Kettleraimonesteve
 
Introduction to docker swarm
Introduction to docker swarmIntroduction to docker swarm
Introduction to docker swarmWalid Ashraf
 
Elementos ETL - Kettle Pentaho
Elementos ETL - Kettle Pentaho Elementos ETL - Kettle Pentaho
Elementos ETL - Kettle Pentaho valex_haro
 
Scaling Jenkins with Docker and Kubernetes
Scaling Jenkins with Docker and KubernetesScaling Jenkins with Docker and Kubernetes
Scaling Jenkins with Docker and KubernetesCarlos Sanchez
 
Jenkins Peru Meetup Docker Ecosystem
Jenkins Peru Meetup Docker EcosystemJenkins Peru Meetup Docker Ecosystem
Jenkins Peru Meetup Docker EcosystemMario IC
 
Pentaho | Data Integration & Report designer
Pentaho | Data Integration & Report designerPentaho | Data Integration & Report designer
Pentaho | Data Integration & Report designerHamdi Hmidi
 
Docker Ecosystem: Engine, Compose, Machine, Swarm, Registry
Docker Ecosystem: Engine, Compose, Machine, Swarm, RegistryDocker Ecosystem: Engine, Compose, Machine, Swarm, Registry
Docker Ecosystem: Engine, Compose, Machine, Swarm, RegistryMario IC
 
Indic threads pune12-accelerating computation in html 5
Indic threads pune12-accelerating computation in html 5Indic threads pune12-accelerating computation in html 5
Indic threads pune12-accelerating computation in html 5IndicThreads
 
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12Puppet
 
Load Balancing Apps in Docker Swarm with NGINX
Load Balancing Apps in Docker Swarm with NGINXLoad Balancing Apps in Docker Swarm with NGINX
Load Balancing Apps in Docker Swarm with NGINXNGINX, Inc.
 
Docker swarm introduction
Docker swarm introductionDocker swarm introduction
Docker swarm introductionEvan Lin
 
Building a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and DockerBuilding a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and DockerWellington Marinho
 

Destaque (20)

Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) ItalyClustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
 
Docker Ecosystem - Part II - Compose
Docker Ecosystem - Part II - ComposeDocker Ecosystem - Part II - Compose
Docker Ecosystem - Part II - Compose
 
Advanced ETL2 Pentaho
Advanced ETL2  Pentaho Advanced ETL2  Pentaho
Advanced ETL2 Pentaho
 
Building Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoBuilding Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using Pentaho
 
NGINX Plus PLATFORM For Flawless Application Delivery
NGINX Plus PLATFORM For Flawless Application DeliveryNGINX Plus PLATFORM For Flawless Application Delivery
NGINX Plus PLATFORM For Flawless Application Delivery
 
Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?
Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?
Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?
 
Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho
 
Migración de datos con OpenERP-Kettle
Migración de datos con OpenERP-KettleMigración de datos con OpenERP-Kettle
Migración de datos con OpenERP-Kettle
 
Introduction to docker swarm
Introduction to docker swarmIntroduction to docker swarm
Introduction to docker swarm
 
Elementos ETL - Kettle Pentaho
Elementos ETL - Kettle Pentaho Elementos ETL - Kettle Pentaho
Elementos ETL - Kettle Pentaho
 
Scaling Jenkins with Docker and Kubernetes
Scaling Jenkins with Docker and KubernetesScaling Jenkins with Docker and Kubernetes
Scaling Jenkins with Docker and Kubernetes
 
Jenkins Peru Meetup Docker Ecosystem
Jenkins Peru Meetup Docker EcosystemJenkins Peru Meetup Docker Ecosystem
Jenkins Peru Meetup Docker Ecosystem
 
Pentaho | Data Integration & Report designer
Pentaho | Data Integration & Report designerPentaho | Data Integration & Report designer
Pentaho | Data Integration & Report designer
 
Docker Ecosystem: Engine, Compose, Machine, Swarm, Registry
Docker Ecosystem: Engine, Compose, Machine, Swarm, RegistryDocker Ecosystem: Engine, Compose, Machine, Swarm, Registry
Docker Ecosystem: Engine, Compose, Machine, Swarm, Registry
 
Tao zhang
Tao zhangTao zhang
Tao zhang
 
Indic threads pune12-accelerating computation in html 5
Indic threads pune12-accelerating computation in html 5Indic threads pune12-accelerating computation in html 5
Indic threads pune12-accelerating computation in html 5
 
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
 
Load Balancing Apps in Docker Swarm with NGINX
Load Balancing Apps in Docker Swarm with NGINXLoad Balancing Apps in Docker Swarm with NGINX
Load Balancing Apps in Docker Swarm with NGINX
 
Docker swarm introduction
Docker swarm introductionDocker swarm introduction
Docker swarm introduction
 
Building a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and DockerBuilding a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and Docker
 

Semelhante a Introduction to GPU Programming

Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Angela Mendoza M.
 
19564926 graphics-processing-unit
19564926 graphics-processing-unit19564926 graphics-processing-unit
19564926 graphics-processing-unitDayakar Siddula
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUsShree Kumar
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computingbakers84
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
N A G P A R I S280101
N A G P A R I S280101N A G P A R I S280101
N A G P A R I S280101John Holden
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platforminside-BigData.com
 
Stream Processing
Stream ProcessingStream Processing
Stream Processingarnamoy10
 
HPC on Azure for Reserach
HPC on Azure for ReserachHPC on Azure for Reserach
HPC on Azure for ReserachJürgen Ambrosi
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxssuser413a98
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAFacultad de Informática UCM
 

Semelhante a Introduction to GPU Programming (20)

Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
 
19564926 graphics-processing-unit
19564926 graphics-processing-unit19564926 graphics-processing-unit
19564926 graphics-processing-unit
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUs
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
N A G P A R I S280101
N A G P A R I S280101N A G P A R I S280101
N A G P A R I S280101
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
 
EPSRC CDT Conference
EPSRC CDT ConferenceEPSRC CDT Conference
EPSRC CDT Conference
 
Nvidia at SEMICon, Munich
Nvidia at SEMICon, MunichNvidia at SEMICon, Munich
Nvidia at SEMICon, Munich
 
HPC on Azure for Reserach
HPC on Azure for ReserachHPC on Azure for Reserach
HPC on Azure for Reserach
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
 

Mais de Chakkrit (Kla) Tantithamthavorn

Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...Chakkrit (Kla) Tantithamthavorn
 
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...Chakkrit (Kla) Tantithamthavorn
 
Mining Software Defects: Should We Consider Affected Releases?
Mining Software Defects: Should We Consider Affected Releases?Mining Software Defects: Should We Consider Affected Releases?
Mining Software Defects: Should We Consider Affected Releases?Chakkrit (Kla) Tantithamthavorn
 
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Chakkrit (Kla) Tantithamthavorn
 
AI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOpsAI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOpsChakkrit (Kla) Tantithamthavorn
 
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Chakkrit (Kla) Tantithamthavorn
 
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...Chakkrit (Kla) Tantithamthavorn
 
Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Chakkrit (Kla) Tantithamthavorn
 
Automated parameter optimization should be included in future 
defect predict...
Automated parameter optimization should be included in future 
defect predict...Automated parameter optimization should be included in future 
defect predict...
Automated parameter optimization should be included in future 
defect predict...Chakkrit (Kla) Tantithamthavorn
 
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...Chakkrit (Kla) Tantithamthavorn
 
Impact Analysis of Granularity Levels on Feature Location Technique
Impact Analysis of Granularity Levels on Feature Location TechniqueImpact Analysis of Granularity Levels on Feature Location Technique
Impact Analysis of Granularity Levels on Feature Location TechniqueChakkrit (Kla) Tantithamthavorn
 
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...Chakkrit (Kla) Tantithamthavorn
 

Mais de Chakkrit (Kla) Tantithamthavorn (14)

Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
 
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
 
Mining Software Defects: Should We Consider Affected Releases?
Mining Software Defects: Should We Consider Affected Releases?Mining Software Defects: Should We Consider Affected Releases?
Mining Software Defects: Should We Consider Affected Releases?
 
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
 
AI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOpsAI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOps
 
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
 
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
 
Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...
 
Automated parameter optimization should be included in future 
defect predict...
Automated parameter optimization should be included in future 
defect predict...Automated parameter optimization should be included in future 
defect predict...
Automated parameter optimization should be included in future 
defect predict...
 
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
 
Impact Analysis of Granularity Levels on Feature Location Technique
Impact Analysis of Granularity Levels on Feature Location TechniqueImpact Analysis of Granularity Levels on Feature Location Technique
Impact Analysis of Granularity Levels on Feature Location Technique
 
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
 
Introduction to Google App Engine
Introduction to Google App EngineIntroduction to Google App Engine
Introduction to Google App Engine
 
Example Application of GPU
Example Application of GPUExample Application of GPU
Example Application of GPU
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Último (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Introduction to GPU Programming

  • 1. GPU Computing: The Democratization of Parallel Computing David Luebke NVIDIA Research
  • 2. Tutorial Speakers David Luebke NVIDIA Research Kevin Skadron University of Virginia Michael Garland NVIDIA Research John Owens University of California Davis © NVIDIA Corporation 2007
  • 3. Tutorial Schedule 1:30 – 1:55 Introduction & Motivation Luebke 1:55 – 2:15 Manycore architectural trends Skadron 2:15 – 3:15 CUDA model & programming Garland 3:15 – 3:30 Break 3:30 – 4:00 GPU architecture & implications Luebke 4:00 – 5:00 Advanced data-parallel programming Owens 5:00 – 5:30 Architectural lessons & research opportunities Skadron © NVIDIA Corporation 2007
  • 4. Parallel Computing’s Golden Age 1980s, early `90s: a golden age for parallel computing Particularly data-parallel computing Architectures Connection Machine, MasPar, Cray True supercomputers: incredibly exotic, powerful, expensive Algorithms, languages, & programming models Solved a wide variety of problems Various parallel algorithmic models developed P-RAM, V-RAM, circuit, hypercube, etc. © NVIDIA Corporation 2007
  • 5. Parallel Computing’s Dark Age But…impact of data-parallel computing limited Thinking Machines sold 7 CM-1s (100s of systems total) MasPar sold ~200 systems Commercial and research activity subsided Massively-parallel machines replaced by clusters of ever-more powerful commodity microprocessors Beowulf, Legion, grid computing, … Massively parallel computing lost momentum to the inexorable advance of commodity technology
  • 6. Enter the GPU GPU = Graphics Processing Unit Chip in computer video cards, PlayStation 3, Xbox, etc. Two major vendors: NVIDIA and ATI (now AMD) © NVIDIA Corporation 2007
  • 7. Enter the GPU GPUs are massively multithreaded manycore chips NVIDIA Tesla products have up to 128 scalar processors Over 12,000 concurrent threads in flight Over 470 GFLOPS sustained performance Users across science & engineering disciplines are achieving 100x or better speedups on GPUs CS researchers can use GPUs as a research platform for manycore computing: arch, PL, numeric, … © NVIDIA Corporation 2007
  • 8. Enter CUDA CUDA is a scalable parallel programming model and a software environment for parallel computing Minimal extensions to familiar C/C++ environment Heterogeneous serial-parallel programming model NVIDIA’s TESLA GPU architecture accelerates CUDA Expose the computational horsepower of NVIDIA GPUs Enable general-purpose GPU computing CUDA also maps well to multicore CPUs! © NVIDIA Corporation 2007
  • 9. The Democratization of Parallel Computing GPU Computing with CUDA brings data-parallel computing to the masses Over 46,000,000 CUDA-capable GPUs sold A “developer kit” costs ~$200 (for 500 GFLOPS) Data-parallel supercomputers are everywhere! CUDA makes this power accessible We’re already seeing innovations in data-parallel computing Massively parallel computing has become a commodity technology! © NVIDIA Corporation 2007
  • 10. GPU Computing: Motivation
  • 11. 17X 45X 100X 13–457x GPU Computing: 110-240X Motivation 35X
  • 12. GPUs Are Fast Theoretical peak performance: 518 GFLOPS Sustained μbenchmark performance: Raw math: 472 GFLOPS (8800 Ultra) Raw bandwidth: 80 GB per second (Tesla C870) Actual application performance: Molecular dynamics: 290 GFLOPS (VMD ion placement) © NVIDIA Corporation 2007
  • 13. GPUs Are Getting Faster, Faster © NVIDIA Corporation 2007
  • 14. Manycore GPU – Block Diagram G80 (launched Nov 2006 – GeForce 8800 GTX) 128 Thread Processors execute kernel threads Up to 12,288 parallel threads active Per-block shared memory (PBSM) accelerates processing Host Input Assembler Thread Execution Manager Thread Processors Thread Processors Thread Processors Thread Processors Thread Processors Thread Processors Thread Processors Thread Processors PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM Load/store © NVIDIA Corporation 2007 Global Memory
  • 16. Heterogeneous Programming CUDA = serial program with parallel kernels, all in C Serial C code executes in a CPU thread Parallel kernel C code executes in thread blocks across multiple processing elements Serial Code Parallel Kernel KernelA<<< nBlk, nTid >>>(args); ... Serial Code Parallel Kernel KernelB<<< nBlk, nTid >>>(args); ...
  • 17. GPU Computing with CUDA: A Highly Multithreaded Coprocessor The GPU is a highly parallel compute device serves as a coprocessor for the host CPU has its own device memory on the card executes many threads in parallel Parallel kernels run a single program in many threads GPU threads are extremely lightweight Thread creation and context switching are essentially free GPU expects 1000’s of threads for full utilization © NVIDIA Corporation 2007
  • 18. CUDA: Programming GPU in C Philosophy: provide minimal set of extensions necessary to expose power Declaration specifiers to indicate where things live __global__ void KernelFunc(...); // kernel function, runs on device __device__ int GlobalVar; // variable in device memory __shared__ int SharedVar; // variable in per-block shared memory Extend function invocation syntax for parallel kernel launch KernelFunc<<<500, 128>>>(...); // launch 500 blocks w/ 128 threads each Special variables for thread identification in kernels dim3 threadIdx; dim3 blockIdx; dim3 blockDim; dim3 gridDim; Intrinsics that expose specific operations in kernel code __syncthreads(); // barrier synchronization within kernel
  • 19. Decoder Ring GeForce® Quadro® TeslaTM Entertainment Design & Creation High Performance Computing Architecture: TESLA © NVIDIA Corporation 2007 Chips: G80, G84, G92, …
  • 20. A New Platform: Tesla HPC-oriented product line C870: board (1 GPU) D870: deskside unit (2 GPUs) S870: 1u server unit (4 GPUs) © NVIDIA Corporation 2007
  • 21. Conclusion GPUs are massively parallel manycore computers Ubiquitous - most successful parallel processor in history Useful - users achieve huge speedups on real problems CUDA is a powerful parallel programming model Heterogeneous - mixed serial-parallel programming Scalable - hierarchical thread execution model Accessible - minimal but expressive changes to C They provide tremendous scope for innovative, impactful research © NVIDIA Corporation 2007
  • 22. Questions? David Luebke dluebke@nvidia.com