SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
Research in GPU Computing




                     Cao Thanh Tung
Outline

 ●   Introduction to GPU Computing
              –   Past:      Graphics Processing and GPGPU
              –   Present:   CUDA and OpenCL
              –   A bit on the architecture
 ●   Why GPU?
 ●   GPU v.s. Multi-core and Distributed
 ●   Open problems.
 ●   Where does this go?

19-Jan-2011                            Computing Students talk   2
Introduction to GPU Computing

 ●   Who have access to 1,000 processors?




19-Jan-2011              Computing Students talk   3
Introduction to GPU Computing

 ●   Who have access to 1,000 processors?




19-Jan-2011              Computing Students talk   4
Introduction to GPU Computing

 ●   Who have access to 1,000 processors?

                                                   YOU




19-Jan-2011              Computing Students talk         5
Introduction to GPU Computing

 ●   In the past
              –   GPU = Graphics Processing Unit




19-Jan-2011                         Computing Students talk   6
Introduction to GPU Computing

 ●   In the past
              –   GPU = Graphics Processing Unit




19-Jan-2011                         Computing Students talk   7
Introduction to GPU Computing

 ●   In the past
              –   GPU = Graphics Processing Unit




19-Jan-2011                         Computing Students talk   8
Introduction to GPU Computing

 ●   In the past
              –   GPU = Graphics Processing Unit




19-Jan-2011                         Computing Students talk   9
Introduction to GPU Computing

 ●   In the past
              –   GPU = Graphics Processing Unit




19-Jan-2011                         Computing Students talk   10
Introduction to GPU Computing

 ●   In the past
              –   GPGPU = General Purpose computation using GPUs




19-Jan-2011                        Computing Students talk         11
Introduction to GPU Computing

 ●   Now                          al
                            Gener
               –   GPU = Graphics Processing Unit

              __device__ float3 collideCell(int3 gridPos, uint index...
              {
                  uint gridHash = calcGridHash(gridPos);
                  ...
                  for(uint j=startIndex; j<endIndex; j++) {
                      if (j != index) {
                          ...
                          force += collideSpheres(...);
                      }
                  }
                  return force;
              }

19-Jan-2011                          Computing Students talk              12
Introduction to GPU Computing

 ●   Now
               –   We have CUDA (NVIDIA, proprietary) and OpenCL (open standard)

              __device__ float3 collideCell(int3 gridPos, uint index...
              {
                  uint gridHash = calcGridHash(gridPos);
                  ...
                  for(uint j=startIndex; j<endIndex; j++) {
                      if (j != index) {
                          ...
                          force += collideSpheres(...);
                      }
                  }
                  return force;
              }

19-Jan-2011                         Computing Students talk                    13
Introduction to GPU Computing

 ●   A (just a little) bit on the
     architecture of the latest
     NVIDIA GPU (Fermi)
       –   Very simple core (even simpler
             than the Intel Atom)
       –   Little cache




19-Jan-2011                       Computing Students talk   14
Why GPU?




19-Jan-2011    Computing Students talk   15
Why GPU?

 ●   Performance




19-Jan-2011         Computing Students talk   16
Why GPU?

 ●   People have used it, and it works.
              –   Bio-Informatics
              –   Finance
              –   Fluid Dynamics
              –   Data-mining
              –   Computer Vision
              –   Medical Imaging
              –   Numerical Analytics



19-Jan-2011                          Computing Students talk   17
Why GPU?

 ●   A new, promising area
              –   Fast growing
              –   Ubiquitous
              –   New paradigm → new problems, new challenges




19-Jan-2011                        Computing Students talk      18
GPU v.s. Multi-core

 ●   A lot more threads of computation are required:
              –   The GPU has a lot more “core” than a multi-core CPU.
              –   A GPU core is no where as powerful as a CPU core.




19-Jan-2011                         Computing Students talk              19
GPU v.s. Multi-core

 ●   Challenges:
              –   Not all problems can easily be broken into many small sub-
                    problems to be solved in parallel.
              –   Race conditions are much more serious.
              –   Atomic operations are still doable, locking is a performance killer.
                    Lock-free algorithms are much preferable.
              –   Memory access bottleneck (memory is not that parallel)
              –   Debugging is a nightmare.




19-Jan-2011                           Computing Students talk                            20
GPU v.s. Distributed

 ●   GPU allows much cheaper communication between
     different threads.
 ●   GPU memory is still limited compared to a distributed
     system.
 ●   GPU cores are not completely independent processors
              –   Need fine-grain parallelism
              –   Reaching the scalability of a distributed system is difficult.




19-Jan-2011                           Computing Students talk                      21
Open problems

 ●   Data-structures
 ●   Algorithms
 ●   Tools
 ●   Theory




19-Jan-2011               Computing Students talk   22
Open problems

 ●   Data-structures
              –   Requirement: Able to handle very high level of concurrent access.
              –   Common data-structures like dynamic arrays, priority queues or
                    hash tables are not very suitable for the GPU.
              –   Some existing works: kD-tree, quad-tree, read-only hash table...




19-Jan-2011                          Computing Students talk                          23
Open problems

 ●   Algorithms
              –   Most sequential algorithms need serious re-design to make good
                   use of such a huge number of cores.
                        ●   Our computational geometry research: use the discrete
                             space computation to approximate the continuous space
                             result.
              –   Traditional parallel algorithms may or may not work.
                        ●   Usual assumption: infinite number of processors
                        ●   No serious study on this so far!



19-Jan-2011                            Computing Students talk                     24
Open problems

 ●   Tools
              –   Programming language: Better language or model to express
                    parallel algorithms?
              –   Compiler: Optimize GPU code? Auto-parallelization?
                        ●   There's some work on OpenMP to CUDA.
              –   Debugging tool? Maybe a whole new “art of debugging” is needed.


              –   Software engineering is currently far behind the hardware
                    development.


19-Jan-2011                          Computing Students talk                   25
Open problems

 ●   Theory
              –   Some traditional approach:
                        ●   PRAM: CRCW, EREW. Too general.
                        ●   SIMD: Too restricted.
              –   Big Oh analysis may not be good enough.
                        ●   Time complexity is relevant, but work complexity is more
                              important.
                        ●   Most GPU computing works only talk about actual running
                             time.
              –   Performance Modeling for GPU, anyone?

19-Jan-2011                            Computing Students talk                         26
Where does this go?

 ●   Intel/AMD already have 6 core 12 threads processors
     (maybe more).
 ●   SeaMicro has a server with 512 Atom dual-core processors.
 ●   AMD Fusion: CPU + GPU.


 ●   The GPU may not stay forever, but massively-multithreaded
     is definitely the future of computing.


19-Jan-2011               Computing Students talk            27
Where to start?

 ●   Check your PC.
              –   If it's not at the age of being able to go to a Primary school, there's
                      a high chance it has a GPU.
 ●   Go to NVIDIA/ATI website, download some development
     toolkit, and you're ready to go.




19-Jan-2011                           Computing Students talk                           28
THANK YOU

 ●   Any questions? Just ask.
 ●   Any suggestion? What are you waiting for.
 ●   Any problem or solution to discuss? Let's have a private talk
     somewhere (j/k)




19-Jan-2011                Computing Students talk              29

Mais conteúdo relacionado

Destaque

Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux ClubOfer Rosenberg
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteNVIDIA
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Rob Gillen
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Angela Mendoza M.
 
E-Learning: Introduction to GPGPU
E-Learning: Introduction to GPGPUE-Learning: Introduction to GPGPU
E-Learning: Introduction to GPGPUNur Ahmadi
 
GPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU ArchitecturesGPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU Architecturesinside-BigData.com
 
Introduction to gpu architecture
Introduction to gpu architectureIntroduction to gpu architecture
Introduction to gpu architectureCHIHTE LU
 
CS 354 GPU Architecture
CS 354 GPU ArchitectureCS 354 GPU Architecture
CS 354 GPU ArchitectureMark Kilgard
 
Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Tomasz Bednarz
 
GPU, CUDA, OpenCL and OpenACC for Parallel Applications
GPU, CUDA, OpenCL and OpenACC for Parallel ApplicationsGPU, CUDA, OpenCL and OpenACC for Parallel Applications
GPU, CUDA, OpenCL and OpenACC for Parallel ApplicationsMarcos Gonzalez
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wallugur candan
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDARaymond Tay
 
Graphics Processing Unit - GPU
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPUChetan Gole
 
Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)Jafar Khan
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)Amal R
 

Destaque (20)

Gpgpu
GpgpuGpgpu
Gpgpu
 
Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux Club
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 Keynote
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
 
E-Learning: Introduction to GPGPU
E-Learning: Introduction to GPGPUE-Learning: Introduction to GPGPU
E-Learning: Introduction to GPGPU
 
GPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU ArchitecturesGPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU Architectures
 
Introduction to gpu architecture
Introduction to gpu architectureIntroduction to gpu architecture
Introduction to gpu architecture
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
CS 354 GPU Architecture
CS 354 GPU ArchitectureCS 354 GPU Architecture
CS 354 GPU Architecture
 
Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Introduction to OpenCL, 2010
Introduction to OpenCL, 2010
 
Hands on OpenCL
Hands on OpenCLHands on OpenCL
Hands on OpenCL
 
Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
 
GPU, CUDA, OpenCL and OpenACC for Parallel Applications
GPU, CUDA, OpenCL and OpenACC for Parallel ApplicationsGPU, CUDA, OpenCL and OpenACC for Parallel Applications
GPU, CUDA, OpenCL and OpenACC for Parallel Applications
 
GPU: Understanding CUDA
GPU: Understanding CUDAGPU: Understanding CUDA
GPU: Understanding CUDA
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wall
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
Graphics Processing Unit - GPU
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPU
 
Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)
 

Semelhante a CSTalks - GPGPU - 19 Jan

What I learned building a parallel processor from scratch
What I learned building a parallel processor from scratchWhat I learned building a parallel processor from scratch
What I learned building a parallel processor from scratchAndreas Olofsson
 
GPUs vs CPUs for Parallel Processing
GPUs vs CPUs for Parallel ProcessingGPUs vs CPUs for Parallel Processing
GPUs vs CPUs for Parallel ProcessingMohammed Billoo
 
OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4Peter Tröger
 
Artificial intelligence on the Edge
Artificial intelligence on the EdgeArtificial intelligence on the Edge
Artificial intelligence on the EdgeUsman Qayyum
 
digitaldesign-s20-lecture3b-fpga-afterlecture.pdf
digitaldesign-s20-lecture3b-fpga-afterlecture.pdfdigitaldesign-s20-lecture3b-fpga-afterlecture.pdf
digitaldesign-s20-lecture3b-fpga-afterlecture.pdfDuy-Hieu Bui
 
High Performance Rust UI.pdf
High Performance Rust UI.pdfHigh Performance Rust UI.pdf
High Performance Rust UI.pdfmraaaaa
 
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)PhtRaveller
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SBrandon Liu
 
Deterministic Machine Learning with MLflow and mlf-core
Deterministic Machine Learning with MLflow and mlf-coreDeterministic Machine Learning with MLflow and mlf-core
Deterministic Machine Learning with MLflow and mlf-coreDatabricks
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-isctembreternitz
 
Introduction to plotting in Python
Introduction to plotting in Python Introduction to plotting in Python
Introduction to plotting in Python bzamecnik
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Linaro
 
blueMarine Sailing with NetBeans Platform
blueMarine Sailing with NetBeans PlatformblueMarine Sailing with NetBeans Platform
blueMarine Sailing with NetBeans PlatformFabrizio Giudici
 
The road to multi/many core computing
The road to multi/many core computingThe road to multi/many core computing
The road to multi/many core computingOsvaldo Gervasi
 
Strata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4jStrata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4jAdam Gibson
 
CGYRO Performance on Power9 CPUs and Volta GPUS
CGYRO Performance on Power9 CPUs and Volta GPUSCGYRO Performance on Power9 CPUs and Volta GPUS
CGYRO Performance on Power9 CPUs and Volta GPUSIgor Sfiligoi
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & FutureOfer Rosenberg
 
Java Image Processing for Geospatial Community
Java Image Processing for Geospatial CommunityJava Image Processing for Geospatial Community
Java Image Processing for Geospatial CommunityJody Garnett
 

Semelhante a CSTalks - GPGPU - 19 Jan (20)

What I learned building a parallel processor from scratch
What I learned building a parallel processor from scratchWhat I learned building a parallel processor from scratch
What I learned building a parallel processor from scratch
 
GPUs vs CPUs for Parallel Processing
GPUs vs CPUs for Parallel ProcessingGPUs vs CPUs for Parallel Processing
GPUs vs CPUs for Parallel Processing
 
OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4
 
Artificial intelligence on the Edge
Artificial intelligence on the EdgeArtificial intelligence on the Edge
Artificial intelligence on the Edge
 
Cloud accounting software uk
Cloud accounting software ukCloud accounting software uk
Cloud accounting software uk
 
Dl2 computing gpu
Dl2 computing gpuDl2 computing gpu
Dl2 computing gpu
 
digitaldesign-s20-lecture3b-fpga-afterlecture.pdf
digitaldesign-s20-lecture3b-fpga-afterlecture.pdfdigitaldesign-s20-lecture3b-fpga-afterlecture.pdf
digitaldesign-s20-lecture3b-fpga-afterlecture.pdf
 
High Performance Rust UI.pdf
High Performance Rust UI.pdfHigh Performance Rust UI.pdf
High Performance Rust UI.pdf
 
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
 
Deterministic Machine Learning with MLflow and mlf-core
Deterministic Machine Learning with MLflow and mlf-coreDeterministic Machine Learning with MLflow and mlf-core
Deterministic Machine Learning with MLflow and mlf-core
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscte
 
Introduction to plotting in Python
Introduction to plotting in Python Introduction to plotting in Python
Introduction to plotting in Python
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509
 
blueMarine Sailing with NetBeans Platform
blueMarine Sailing with NetBeans PlatformblueMarine Sailing with NetBeans Platform
blueMarine Sailing with NetBeans Platform
 
The road to multi/many core computing
The road to multi/many core computingThe road to multi/many core computing
The road to multi/many core computing
 
Strata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4jStrata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4j
 
CGYRO Performance on Power9 CPUs and Volta GPUS
CGYRO Performance on Power9 CPUs and Volta GPUSCGYRO Performance on Power9 CPUs and Volta GPUS
CGYRO Performance on Power9 CPUs and Volta GPUS
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & Future
 
Java Image Processing for Geospatial Community
Java Image Processing for Geospatial CommunityJava Image Processing for Geospatial Community
Java Image Processing for Geospatial Community
 

Mais de cstalks

CSTalks-Natural Language Processing-2 Nov
CSTalks-Natural Language Processing-2 NovCSTalks-Natural Language Processing-2 Nov
CSTalks-Natural Language Processing-2 Novcstalks
 
CSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17AugCSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17Augcstalks
 
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17AugCSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Augcstalks
 
CSTalks-Visualizing Software Behavior-14Sep
CSTalks-Visualizing Software Behavior-14SepCSTalks-Visualizing Software Behavior-14Sep
CSTalks-Visualizing Software Behavior-14Sepcstalks
 
CSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17AugCSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17Augcstalks
 
CSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 AugCSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 Augcstalks
 
CSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th MayCSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th Maycstalks
 
CSTalks - The Multicore Midlife Crisis - 30 Mar
CSTalks - The Multicore Midlife Crisis - 30 MarCSTalks - The Multicore Midlife Crisis - 30 Mar
CSTalks - The Multicore Midlife Crisis - 30 Marcstalks
 
CSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 MarCSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 Marcstalks
 
CSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 MarCSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 Marcstalks
 
CSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16MarCSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16Marcstalks
 
CSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 FebCSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 Febcstalks
 
CSTalks - Peer-to-peer - 16 Feb
CSTalks - Peer-to-peer - 16 FebCSTalks - Peer-to-peer - 16 Feb
CSTalks - Peer-to-peer - 16 Febcstalks
 
CSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 FebCSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 Febcstalks
 
CSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 JanCSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 Jancstalks
 

Mais de cstalks (15)

CSTalks-Natural Language Processing-2 Nov
CSTalks-Natural Language Processing-2 NovCSTalks-Natural Language Processing-2 Nov
CSTalks-Natural Language Processing-2 Nov
 
CSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17AugCSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17Aug
 
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17AugCSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
 
CSTalks-Visualizing Software Behavior-14Sep
CSTalks-Visualizing Software Behavior-14SepCSTalks-Visualizing Software Behavior-14Sep
CSTalks-Visualizing Software Behavior-14Sep
 
CSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17AugCSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17Aug
 
CSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 AugCSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 Aug
 
CSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th MayCSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th May
 
CSTalks - The Multicore Midlife Crisis - 30 Mar
CSTalks - The Multicore Midlife Crisis - 30 MarCSTalks - The Multicore Midlife Crisis - 30 Mar
CSTalks - The Multicore Midlife Crisis - 30 Mar
 
CSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 MarCSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 Mar
 
CSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 MarCSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 Mar
 
CSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16MarCSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16Mar
 
CSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 FebCSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 Feb
 
CSTalks - Peer-to-peer - 16 Feb
CSTalks - Peer-to-peer - 16 FebCSTalks - Peer-to-peer - 16 Feb
CSTalks - Peer-to-peer - 16 Feb
 
CSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 FebCSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 Feb
 
CSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 JanCSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 Jan
 

Último

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 

Último (20)

Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 

CSTalks - GPGPU - 19 Jan

  • 1. Research in GPU Computing Cao Thanh Tung
  • 2. Outline ● Introduction to GPU Computing – Past: Graphics Processing and GPGPU – Present: CUDA and OpenCL – A bit on the architecture ● Why GPU? ● GPU v.s. Multi-core and Distributed ● Open problems. ● Where does this go? 19-Jan-2011 Computing Students talk 2
  • 3. Introduction to GPU Computing ● Who have access to 1,000 processors? 19-Jan-2011 Computing Students talk 3
  • 4. Introduction to GPU Computing ● Who have access to 1,000 processors? 19-Jan-2011 Computing Students talk 4
  • 5. Introduction to GPU Computing ● Who have access to 1,000 processors? YOU 19-Jan-2011 Computing Students talk 5
  • 6. Introduction to GPU Computing ● In the past – GPU = Graphics Processing Unit 19-Jan-2011 Computing Students talk 6
  • 7. Introduction to GPU Computing ● In the past – GPU = Graphics Processing Unit 19-Jan-2011 Computing Students talk 7
  • 8. Introduction to GPU Computing ● In the past – GPU = Graphics Processing Unit 19-Jan-2011 Computing Students talk 8
  • 9. Introduction to GPU Computing ● In the past – GPU = Graphics Processing Unit 19-Jan-2011 Computing Students talk 9
  • 10. Introduction to GPU Computing ● In the past – GPU = Graphics Processing Unit 19-Jan-2011 Computing Students talk 10
  • 11. Introduction to GPU Computing ● In the past – GPGPU = General Purpose computation using GPUs 19-Jan-2011 Computing Students talk 11
  • 12. Introduction to GPU Computing ● Now al Gener – GPU = Graphics Processing Unit __device__ float3 collideCell(int3 gridPos, uint index... { uint gridHash = calcGridHash(gridPos); ... for(uint j=startIndex; j<endIndex; j++) { if (j != index) { ... force += collideSpheres(...); } } return force; } 19-Jan-2011 Computing Students talk 12
  • 13. Introduction to GPU Computing ● Now – We have CUDA (NVIDIA, proprietary) and OpenCL (open standard) __device__ float3 collideCell(int3 gridPos, uint index... { uint gridHash = calcGridHash(gridPos); ... for(uint j=startIndex; j<endIndex; j++) { if (j != index) { ... force += collideSpheres(...); } } return force; } 19-Jan-2011 Computing Students talk 13
  • 14. Introduction to GPU Computing ● A (just a little) bit on the architecture of the latest NVIDIA GPU (Fermi) – Very simple core (even simpler than the Intel Atom) – Little cache 19-Jan-2011 Computing Students talk 14
  • 15. Why GPU? 19-Jan-2011 Computing Students talk 15
  • 16. Why GPU? ● Performance 19-Jan-2011 Computing Students talk 16
  • 17. Why GPU? ● People have used it, and it works. – Bio-Informatics – Finance – Fluid Dynamics – Data-mining – Computer Vision – Medical Imaging – Numerical Analytics 19-Jan-2011 Computing Students talk 17
  • 18. Why GPU? ● A new, promising area – Fast growing – Ubiquitous – New paradigm → new problems, new challenges 19-Jan-2011 Computing Students talk 18
  • 19. GPU v.s. Multi-core ● A lot more threads of computation are required: – The GPU has a lot more “core” than a multi-core CPU. – A GPU core is no where as powerful as a CPU core. 19-Jan-2011 Computing Students talk 19
  • 20. GPU v.s. Multi-core ● Challenges: – Not all problems can easily be broken into many small sub- problems to be solved in parallel. – Race conditions are much more serious. – Atomic operations are still doable, locking is a performance killer. Lock-free algorithms are much preferable. – Memory access bottleneck (memory is not that parallel) – Debugging is a nightmare. 19-Jan-2011 Computing Students talk 20
  • 21. GPU v.s. Distributed ● GPU allows much cheaper communication between different threads. ● GPU memory is still limited compared to a distributed system. ● GPU cores are not completely independent processors – Need fine-grain parallelism – Reaching the scalability of a distributed system is difficult. 19-Jan-2011 Computing Students talk 21
  • 22. Open problems ● Data-structures ● Algorithms ● Tools ● Theory 19-Jan-2011 Computing Students talk 22
  • 23. Open problems ● Data-structures – Requirement: Able to handle very high level of concurrent access. – Common data-structures like dynamic arrays, priority queues or hash tables are not very suitable for the GPU. – Some existing works: kD-tree, quad-tree, read-only hash table... 19-Jan-2011 Computing Students talk 23
  • 24. Open problems ● Algorithms – Most sequential algorithms need serious re-design to make good use of such a huge number of cores. ● Our computational geometry research: use the discrete space computation to approximate the continuous space result. – Traditional parallel algorithms may or may not work. ● Usual assumption: infinite number of processors ● No serious study on this so far! 19-Jan-2011 Computing Students talk 24
  • 25. Open problems ● Tools – Programming language: Better language or model to express parallel algorithms? – Compiler: Optimize GPU code? Auto-parallelization? ● There's some work on OpenMP to CUDA. – Debugging tool? Maybe a whole new “art of debugging” is needed. – Software engineering is currently far behind the hardware development. 19-Jan-2011 Computing Students talk 25
  • 26. Open problems ● Theory – Some traditional approach: ● PRAM: CRCW, EREW. Too general. ● SIMD: Too restricted. – Big Oh analysis may not be good enough. ● Time complexity is relevant, but work complexity is more important. ● Most GPU computing works only talk about actual running time. – Performance Modeling for GPU, anyone? 19-Jan-2011 Computing Students talk 26
  • 27. Where does this go? ● Intel/AMD already have 6 core 12 threads processors (maybe more). ● SeaMicro has a server with 512 Atom dual-core processors. ● AMD Fusion: CPU + GPU. ● The GPU may not stay forever, but massively-multithreaded is definitely the future of computing. 19-Jan-2011 Computing Students talk 27
  • 28. Where to start? ● Check your PC. – If it's not at the age of being able to go to a Primary school, there's a high chance it has a GPU. ● Go to NVIDIA/ATI website, download some development toolkit, and you're ready to go. 19-Jan-2011 Computing Students talk 28
  • 29. THANK YOU ● Any questions? Just ask. ● Any suggestion? What are you waiting for. ● Any problem or solution to discuss? Let's have a private talk somewhere (j/k) 19-Jan-2011 Computing Students talk 29