SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
INTRODUCTION TO CUDA
Prepared for Geek Camp Singapore 2011
                                  Raymond Tay
THE FREE LUNCH IS OVER – HERB
SUTTER
WE NEED TO THINK BEYOND MULTI-CORE
CPUS … WE NEED TO THINK MANY-CORE
GPUS




…
NVIDIA GPUS FPS
    FPS – Floating-point per second aka flops. A measure of how
     many flops can a GPU do. More is Better 


                                                   GPUs beat CPUs
NVIDIA GPUS MEMORY BANDWIDTH
    With massively parallel processors in Nvidia’s GPUs, providing
     high memory bandwidth plays a big role in high performance
     computing.

                                                    GPUs beat CPUs
GPU VS CPU




CPU                                  GPU
"   Optimised for low-latency        "   Optimised for data-parallel,
    access to cached data sets           throughput computation
"   Control logic for out-of-order   "   Architecture tolerant of
    and speculative execution            memory latency
                                     "   More transistors dedicated to
                                         computation
I DON’T KNOW C/C++, SHOULD I LEAVE?
                           Your Brain Asks:
                                       Wait a minute, why
  Relax,   no worries. Not to fret.   should I learn the C/
                                       C++ SDK?

                                       CUDA Answers:
                                       Efficiency!!!
WHAT DO I NEED TO BEGIN WITH CUDA?
  A   Nvidia CUDA enabled graphics card e.g. Fermi
HOW DOES CUDA WORK



                                  PCI Bus




1.  Copy input data from CPU memory to
    GPU memory
2.  Load GPU program and execute,
    caching data on chip for performance
3.  Copy results from GPU memory to CPU
    memory
EXAMPLE: BLOCK CYPHER
void host_shift_cypher(unsigned int *input_array,    __global__ void shift_cypher(unsigned int
   unsigned int *output_array, unsigned int              *input_array, unsigned int *output_array,
   shift_amount, unsigned int alphabet_max,              unsigned int shift_amount, unsigned int
   unsigned int array_length)	
                          alphabet_max, unsigned int array_length)	
{	
                                                  {	
  for(unsigned int i=0;i<array_length;i++)	
           unsigned int tid = threadIdx.x + blockIdx.x *
 {	
                                                      blockDim.x;	

       int element = input_array[i];	
                 int shifted = input_array[tid] + shift_amount;	
       int shifted = element + shift_amount;	
         if ( shifted > alphabet_max )	
       if(shifted > alphabet_max)	
                        	
shifted = shifted % (alphabet_max + 1);	
       {	
         shifted = shifted % (alphabet_max + 1);	
     output_array[tid] = shifted;	
       }	
                                           }	
       output_array[i] = shifted;	
  }	
                                                Int main() {	
}	
                                                  dim3 dimGrid(ceil(array_length)/block_size);	
Int main() {	
                                                     dim3 dimBlock(block_size);	
host_shift_cypher(input_array, output_array,
                                                     shift_cypher<<<dimGrid,dimBlock>>>(input_array,
   shift_amount, alphabet_max, array_length);	
                                                          output_array, shift_amount, alphabet_max,
}	
                                                       array_length);	
                                                     }	
                    CPU                                               GPU
                    Program                                           Program
EXAMPLE: VECTOR ADDITION
 // CUDA CODE
__global__ void VecAdd(const float* A, const float* B, float* C,
    unsigned int N)
{
  int i = blockDim.x * blockIdx.x + threadIdx.x;
  if (i < N)
   C[i] = A[i] + B[i];
}

// C CODE
void VecAdd(const float* A, const float* B, float* C,unsigned int N)
{
 for( int i = 0; i < N; ++i)
  C[i] = A[i] + B[i];
}
DEBUGGER
              CUDA-GDB	
           • Based on GDB
           • Linux
           • Mac OS X



                             Parallel Nsight	
                            • Plugin inside
                            Visual Studio
VISUAL PROFILER & MEMCHECK
                                 Profiler	
                           •  Microsoft Windows
                           •  Linux
                           •  Mac OS X

                           •  Analyze
                           Performance




     CUDA-MEMCHECK	
    •  Microsoft Windows
    •  Linux
    •  Mac OS X

    •  Detect memory
    access errors
WHERE’S CUDA AT IN 2011?
  60,000 researchers use it to aid drug discovery
  470 universities teach CUDA
WHERE’S CUDA AT IN 2011? (PART 2..)
  NVIDIA   Show Case (1000+ applications)
ADDITIONAL RESOURCES
    CUDA FAQ (http://tegradeveloper.nvidia.com/cuda-faq)
    CUDA Tools & Ecosystem (
     http://tegradeveloper.nvidia.com/cuda-tools-ecosystem)
    CUDA Downloads (http://tegradeveloper.nvidia.com/cuda-downloads)
    NVIDIA Forums (http://forums.nvidia.com/index.php?showforum=62)
    GPGPU (http://gpgpu.org )
    CUDA By Example (
     http://tegradeveloper.nvidia.com/content/cuda-example-introduction-
     general-purpose-gpu-programming-0)
         Jason Sanders & Edward Kandrot
    GPU Computing Gems Emerald Edition (
     http://www.amazon.com/GPU-Computing-Gems-Emerald-Applications/dp/
     0123849888/ )
         Editor in Chief: Prof Hwu Wen-Mei
CUDA LIBRARIES
  Visit this site
   http://developer.nvidia.com/cuda-tools-
   ecosystem#Libraries
  Thrust, CUFFT, CUBLAS, CUSP, NPP, OpenCV,
   GPU AI-Tree Search, GPU AI-Path Finding
  A lot of the libraries are hosted in Google Code.
   Many more gems in there too!
THANK YOU
  @RaymondTayBL

Mais conteúdo relacionado

Mais procurados

Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDAMartin Peniak
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaRob Gillen
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009Randall Hand
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCinside-BigData.com
 
C++ amp on linux
C++ amp on linuxC++ amp on linux
C++ amp on linuxMiller Lee
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUsShree Kumar
 
Vc4c development of opencl compiler for videocore4
Vc4c  development of opencl compiler for videocore4Vc4c  development of opencl compiler for videocore4
Vc4c development of opencl compiler for videocore4nomaddo
 
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...Igalia
 
Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Daniel Lemire
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaFerdinand Jamitzky
 
UDPSRC GStreamer Plugin Session VIII
UDPSRC GStreamer Plugin Session VIIIUDPSRC GStreamer Plugin Session VIII
UDPSRC GStreamer Plugin Session VIIINEEVEE Technologies
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineNarann29
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Ural-PDC
 

Mais procurados (18)

Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
Cuda
CudaCuda
Cuda
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
 
C++ amp on linux
C++ amp on linuxC++ amp on linux
C++ amp on linux
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUs
 
Vc4c development of opencl compiler for videocore4
Vc4c  development of opencl compiler for videocore4Vc4c  development of opencl compiler for videocore4
Vc4c development of opencl compiler for videocore4
 
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
 
Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
 
Lecture 04
Lecture 04Lecture 04
Lecture 04
 
UDPSRC GStreamer Plugin Session VIII
UDPSRC GStreamer Plugin Session VIIIUDPSRC GStreamer Plugin Session VIII
UDPSRC GStreamer Plugin Session VIII
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
 

Destaque

Toying with spark
Toying with sparkToying with spark
Toying with sparkRaymond Tay
 
Distributed computing for new bloods
Distributed computing for new bloodsDistributed computing for new bloods
Distributed computing for new bloodsRaymond Tay
 
Network Security
Network SecurityNetwork Security
Network SecurityMAJU
 
Network Security Threats and Solutions
Network Security Threats and SolutionsNetwork Security Threats and Solutions
Network Security Threats and SolutionsColin058
 

Destaque (7)

Toying with spark
Toying with sparkToying with spark
Toying with spark
 
Distributed computing for new bloods
Distributed computing for new bloodsDistributed computing for new bloods
Distributed computing for new bloods
 
Modern Cryptography
Modern CryptographyModern Cryptography
Modern Cryptography
 
Network Security
Network SecurityNetwork Security
Network Security
 
Network Security
Network SecurityNetwork Security
Network Security
 
Network security
Network securityNetwork security
Network security
 
Network Security Threats and Solutions
Network Security Threats and SolutionsNetwork Security Threats and Solutions
Network Security Threats and Solutions
 

Semelhante a Introduction to CUDA Programming

Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...mouhouioui
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to AcceleratorsDilum Bandara
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Angela Mendoza M.
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugginglibfetion
 
introduction to CUDA_C.pptx it is widely used
introduction to CUDA_C.pptx it is widely usedintroduction to CUDA_C.pptx it is widely used
introduction to CUDA_C.pptx it is widely usedHimanshu577858
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxssuser413a98
 
Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux ClubOfer Rosenberg
 
Intro2 Cuda Moayad
Intro2 Cuda MoayadIntro2 Cuda Moayad
Intro2 Cuda MoayadMoayadhn
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdfTigabu Yaya
 
Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022ssuser866937
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarSpark Summit
 
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages toolslaparuma
 
C++ AMP 실천 및 적용 전략
C++ AMP 실천 및 적용 전략 C++ AMP 실천 및 적용 전략
C++ AMP 실천 및 적용 전략 명신 김
 

Semelhante a Introduction to CUDA Programming (20)

Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
GPU: Understanding CUDA
GPU: Understanding CUDAGPU: Understanding CUDA
GPU: Understanding CUDA
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
introduction to CUDA_C.pptx it is widely used
introduction to CUDA_C.pptx it is widely usedintroduction to CUDA_C.pptx it is widely used
introduction to CUDA_C.pptx it is widely used
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux Club
 
Intro2 Cuda Moayad
Intro2 Cuda MoayadIntro2 Cuda Moayad
Intro2 Cuda Moayad
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
 
C++ AMP 실천 및 적용 전략
C++ AMP 실천 및 적용 전략 C++ AMP 실천 및 적용 전략
C++ AMP 실천 및 적용 전략
 

Último

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

Introduction to CUDA Programming

  • 1. INTRODUCTION TO CUDA Prepared for Geek Camp Singapore 2011 Raymond Tay
  • 2. THE FREE LUNCH IS OVER – HERB SUTTER
  • 3. WE NEED TO THINK BEYOND MULTI-CORE CPUS … WE NEED TO THINK MANY-CORE GPUS …
  • 4. NVIDIA GPUS FPS   FPS – Floating-point per second aka flops. A measure of how many flops can a GPU do. More is Better  GPUs beat CPUs
  • 5. NVIDIA GPUS MEMORY BANDWIDTH   With massively parallel processors in Nvidia’s GPUs, providing high memory bandwidth plays a big role in high performance computing. GPUs beat CPUs
  • 6. GPU VS CPU CPU GPU "   Optimised for low-latency "   Optimised for data-parallel, access to cached data sets throughput computation "   Control logic for out-of-order "   Architecture tolerant of and speculative execution memory latency "   More transistors dedicated to computation
  • 7. I DON’T KNOW C/C++, SHOULD I LEAVE? Your Brain Asks: Wait a minute, why   Relax, no worries. Not to fret. should I learn the C/ C++ SDK? CUDA Answers: Efficiency!!!
  • 8. WHAT DO I NEED TO BEGIN WITH CUDA?   A Nvidia CUDA enabled graphics card e.g. Fermi
  • 9. HOW DOES CUDA WORK PCI Bus 1.  Copy input data from CPU memory to GPU memory 2.  Load GPU program and execute, caching data on chip for performance 3.  Copy results from GPU memory to CPU memory
  • 10. EXAMPLE: BLOCK CYPHER void host_shift_cypher(unsigned int *input_array, __global__ void shift_cypher(unsigned int unsigned int *output_array, unsigned int *input_array, unsigned int *output_array, shift_amount, unsigned int alphabet_max, unsigned int shift_amount, unsigned int unsigned int array_length) alphabet_max, unsigned int array_length) { { for(unsigned int i=0;i<array_length;i++) unsigned int tid = threadIdx.x + blockIdx.x * { blockDim.x; int element = input_array[i]; int shifted = input_array[tid] + shift_amount; int shifted = element + shift_amount; if ( shifted > alphabet_max ) if(shifted > alphabet_max) shifted = shifted % (alphabet_max + 1); { shifted = shifted % (alphabet_max + 1); output_array[tid] = shifted; } } output_array[i] = shifted; } Int main() { } dim3 dimGrid(ceil(array_length)/block_size); Int main() { dim3 dimBlock(block_size); host_shift_cypher(input_array, output_array, shift_cypher<<<dimGrid,dimBlock>>>(input_array, shift_amount, alphabet_max, array_length); output_array, shift_amount, alphabet_max, } array_length); } CPU GPU Program Program
  • 11. EXAMPLE: VECTOR ADDITION // CUDA CODE __global__ void VecAdd(const float* A, const float* B, float* C, unsigned int N) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < N) C[i] = A[i] + B[i]; } // C CODE void VecAdd(const float* A, const float* B, float* C,unsigned int N) { for( int i = 0; i < N; ++i) C[i] = A[i] + B[i]; }
  • 12. DEBUGGER CUDA-GDB • Based on GDB • Linux • Mac OS X Parallel Nsight • Plugin inside Visual Studio
  • 13. VISUAL PROFILER & MEMCHECK Profiler •  Microsoft Windows •  Linux •  Mac OS X •  Analyze Performance CUDA-MEMCHECK •  Microsoft Windows •  Linux •  Mac OS X •  Detect memory access errors
  • 14. WHERE’S CUDA AT IN 2011?   60,000 researchers use it to aid drug discovery   470 universities teach CUDA
  • 15. WHERE’S CUDA AT IN 2011? (PART 2..)   NVIDIA Show Case (1000+ applications)
  • 16. ADDITIONAL RESOURCES   CUDA FAQ (http://tegradeveloper.nvidia.com/cuda-faq)   CUDA Tools & Ecosystem ( http://tegradeveloper.nvidia.com/cuda-tools-ecosystem)   CUDA Downloads (http://tegradeveloper.nvidia.com/cuda-downloads)   NVIDIA Forums (http://forums.nvidia.com/index.php?showforum=62)   GPGPU (http://gpgpu.org )   CUDA By Example ( http://tegradeveloper.nvidia.com/content/cuda-example-introduction- general-purpose-gpu-programming-0)   Jason Sanders & Edward Kandrot   GPU Computing Gems Emerald Edition ( http://www.amazon.com/GPU-Computing-Gems-Emerald-Applications/dp/ 0123849888/ )   Editor in Chief: Prof Hwu Wen-Mei
  • 17. CUDA LIBRARIES   Visit this site http://developer.nvidia.com/cuda-tools- ecosystem#Libraries   Thrust, CUFFT, CUBLAS, CUSP, NPP, OpenCV, GPU AI-Tree Search, GPU AI-Path Finding   A lot of the libraries are hosted in Google Code. Many more gems in there too!
  • 18. THANK YOU @RaymondTayBL