SlideShare a Scribd company logo
1 of 23
For Dummies
From a Dummy


Ngobrol Ilmiah PPIS #1
16 Desember, 2012
M. Alfian Amrizal
Tohoku University
• Introduction to Parallel Computing
• GPU as an Accelerator




                                       2
Classical science


Nature
         Observation          Theory
                                       blogs.sundaymercury.net




                  Physical
                Experiments

                                       conserve-energy-future.com

            Numerical Simulations


              Modern science
                                                                    3
                                        SX-9 (Tohoku University)
Quantum chemistry                                 Cosmology                                            CFD




                                                                                    autoevolution.com
scidacreview.org

                                            physicsworld.com


                                 Medicine                           Material design




                   albertkents.com
                                                               solid.me.tut.ac.jp
                                                                                                              4
• Supercomputer
         –      The most powerful computers that can be built[2]
         –      First computer “ENIAC” ⇒ 350 mult/sec (1946)
         –      Todays supercomputer > 1,000,000,000 x ENIACS
         –      Todays processor speed only ~ 1,000,000 x ENIACS (?)

                          “Parallel computing”




                            cbc.ca
                                                 datacenterknowledge.com
allvoices.com                                                              5
CPU: The brain of the
computer, all data is
processed here

Memory: The computers
scratch pad, programs
are loaded and run here


GPU: For graphics
processing. Used as
accelerator in HPC


Storage: Hold data
and program files
                          6
•  The free lunch is over!!

                               -Heat
                               -Power restriction
                               -Transistor size
                               CPU arent getting
                               any faster




                                             7
• Multicomputers       • Multicore
                              Core1      Core2




  Distributed memory        Shared memory
   parallel computer       parallel computer
                       (e.g. dual core, quad core etc)
                                                         8
• Trends in HPC system design
     –    More nodes/processors/cores
     –    Deep memory hierarchies
     –    Non-uniform interconnect network
     –    Accelerators  today’s topic
                                                   N

                                            N           P
                                                             P
                                                                …
                                                               … C
                                                                C
                                        N
                                                    P
                                                            C … CC
                                                              C             A C
                                                                                  …   C
                                    N          P
                                                 P
                                                 ……
                                             PP C C            C M
                                                                               C  …   C
 N          N            N       N         P
                                         PP CCC …
                                                ……       CC
                                                           C
                                                               M
                                                                 M     …
                                                                      A C      C
                                                                                      C
                     …                  P
                                     PP CCC   ……
                                               …      CC
                                                        C             ……
                                                                    A C
                                                                         C
                                                                             C
                                                                               C
  P   C      P   C       P   C               ……      C
                                                   CC M M
                                                                  A C
                                                                     …     C
                                             …                      ………
                                    P     CC                   A C    C      C
                                  P
                                      C
                                        C
                                            …     C
                                                C MMM
                                                  MM           M    C
                                                                        C
                                                                           C M
                                                                                 M

      M          M           M
                                              M
                                               MM
                                                 M                C …   C M
                                                                        M M
                                           M         M            C …   CM
                                                 M                    M
                                               M                    M
          Good old days!                   M
          One proc. / node
          One core / proc.       Too complicated …
          Uniform network…       How can we fully exploit the potential?                  9
• Programmers need to learn both Hardware and
  Software




                              Figure: Markus Pueschel
                                                    10
• We need a powerful computer
• CPU speed cannot be increased anymore
• Go parallel:
  – Multicomputer
  – Multicore
• System’s complexity requires programmer
  to learn both HW and SW


                                       11
• Introduction to Parallel Computing
• GPU as Accelerator




                                       12
13
• Power is the problem
  – System size is limited by power budget
• Heterogeneous system is promising
  – CPU + Accelerator (=GPU)
  – CPU and GPU have their own strengths and
    weaknesses
  – CPU: few cores, high frequency (~GHz)
  – GPU: 1000 cores, low frequency (~MHz)

                                               14
• Graphics Processing Unit (GPU)
      – Originally developed for quickly generating 2D and
        3D graphics, images, and video
      – Highly parallel processor
      – GPU is more power-efficient than CPU[3]




*Image from nvidia.com                                       15
• CPU and GPU are very different
  processors
  – Latency-oriented design (=speculative)
  – Throughput-oriented design (=parallel)


                  vs



                                             16
• CPU and GPU are very different
  processors
  – Latency-oriented design (=speculative)
  – Throughput-oriented design (=parallel)


             vs vs



                                             17
CPU   task 1 task 2 task 3 task 4


          task 1
          task 2
GPU
          task 3
          task 4                    time




      vs vs



                                           18
• Speculative execution by branch prediction is
      effective to shorten the execution time. But
      it makes the hardware complicated


                                       A = 2;
                                       B = 3;
                                       C = A+B;
                                       D = A*B;
                                       E = A-B;
                                       if ( C > 4 )
                                       {
E   D   C   ?                            A = 0;
                                       }
                                       B = 0;
                                                      19
• CPU has a large cache memory and
  control unit
• GPUs devote more hardware resources
  to ALUs




                                        20
• Many simple cores
  – No speculation features
     • Simplicity to increase the number of cores on a chip
     • Fast context switch due to simplicity of its core design




                  comp.      memory access   comp.
     GPU Core A
                           comp.    memory access
                  context switch
                                   comp.               time




                                                                  21
• CPU and GPU are very different
  processors
  – They have own strengths and weaknesses
    • CPU has few big cores to shorten the execution
      time
    • GPU has many simple cores to increase
      throughput
  – CPU for serial execution and GPU for parallel
    execution

                         22
[1] Levin, E. “Grand challenges to computational
science.” Communication of the ACM
32(12):1456-1457, December 1989.

[2] Kauffmann, William J. III, and Larry L. Smarr.
Supercomputing and the Transformation.

[3] Nvidia. “Doing more with less of a scarce
resource.” http://www.nvidia.com/object/gcr-
energy-efficiency.html

                         23

More Related Content

Viewers also liked

10 gafes que você não pode cometer numa entrevista de emprego
10 gafes que você não pode cometer numa entrevista de emprego10 gafes que você não pode cometer numa entrevista de emprego
10 gafes que você não pode cometer numa entrevista de empregoAna Cunha
 
Electrocardiograma normal y Arritmias
Electrocardiograma normal y ArritmiasElectrocardiograma normal y Arritmias
Electrocardiograma normal y ArritmiasCatalina Guajardo
 
Entrevista para padres y alumnos 15 16
Entrevista para padres y alumnos 15 16Entrevista para padres y alumnos 15 16
Entrevista para padres y alumnos 15 16Any Flores
 
Exerccios sobre o Sistema Muscular
Exerccios sobre o Sistema MuscularExerccios sobre o Sistema Muscular
Exerccios sobre o Sistema MuscularJuarez Silva
 
Valoracion de enfermeria por Dominios
Valoracion de enfermeria por DominiosValoracion de enfermeria por Dominios
Valoracion de enfermeria por Dominiosmiguel hilario
 

Viewers also liked (8)

Sistema arterial posterior
Sistema arterial posteriorSistema arterial posterior
Sistema arterial posterior
 
10 gafes que você não pode cometer numa entrevista de emprego
10 gafes que você não pode cometer numa entrevista de emprego10 gafes que você não pode cometer numa entrevista de emprego
10 gafes que você não pode cometer numa entrevista de emprego
 
Teoriasevolutivas
TeoriasevolutivasTeoriasevolutivas
Teoriasevolutivas
 
Electrocardiograma normal y Arritmias
Electrocardiograma normal y ArritmiasElectrocardiograma normal y Arritmias
Electrocardiograma normal y Arritmias
 
Entrevista para padres y alumnos 15 16
Entrevista para padres y alumnos 15 16Entrevista para padres y alumnos 15 16
Entrevista para padres y alumnos 15 16
 
(2015-09-16)sol
(2015-09-16)sol(2015-09-16)sol
(2015-09-16)sol
 
Exerccios sobre o Sistema Muscular
Exerccios sobre o Sistema MuscularExerccios sobre o Sistema Muscular
Exerccios sobre o Sistema Muscular
 
Valoracion de enfermeria por Dominios
Valoracion de enfermeria por DominiosValoracion de enfermeria por Dominios
Valoracion de enfermeria por Dominios
 

Recently uploaded

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Heterogeneous Parallel Computing with GPU: From a Dummy for Dummies

  • 1. For Dummies From a Dummy Ngobrol Ilmiah PPIS #1 16 Desember, 2012 M. Alfian Amrizal Tohoku University
  • 2. • Introduction to Parallel Computing • GPU as an Accelerator 2
  • 3. Classical science Nature Observation Theory blogs.sundaymercury.net Physical Experiments conserve-energy-future.com Numerical Simulations Modern science 3 SX-9 (Tohoku University)
  • 4. Quantum chemistry Cosmology CFD autoevolution.com scidacreview.org physicsworld.com Medicine Material design albertkents.com solid.me.tut.ac.jp 4
  • 5. • Supercomputer – The most powerful computers that can be built[2] – First computer “ENIAC” ⇒ 350 mult/sec (1946) – Todays supercomputer > 1,000,000,000 x ENIACS – Todays processor speed only ~ 1,000,000 x ENIACS (?) “Parallel computing” cbc.ca datacenterknowledge.com allvoices.com 5
  • 6. CPU: The brain of the computer, all data is processed here Memory: The computers scratch pad, programs are loaded and run here GPU: For graphics processing. Used as accelerator in HPC Storage: Hold data and program files 6
  • 7. •  The free lunch is over!! -Heat -Power restriction -Transistor size CPU arent getting any faster 7
  • 8. • Multicomputers • Multicore Core1 Core2 Distributed memory Shared memory parallel computer parallel computer (e.g. dual core, quad core etc) 8
  • 9. • Trends in HPC system design – More nodes/processors/cores – Deep memory hierarchies – Non-uniform interconnect network – Accelerators  today’s topic N N P P … … C C N P C … CC C A C … C N P P …… PP C C C M C … C N N N N P PP CCC … …… CC C M M … A C C C … P PP CCC …… … CC C …… A C C C C P C P C P C …… C CC M M A C … C … ……… P CC A C C C P C C … C C MMM MM M C C C M M M M M M MM M C … C M M M M M C … CM M M M M Good old days! M One proc. / node One core / proc. Too complicated … Uniform network… How can we fully exploit the potential? 9
  • 10. • Programmers need to learn both Hardware and Software Figure: Markus Pueschel 10
  • 11. • We need a powerful computer • CPU speed cannot be increased anymore • Go parallel: – Multicomputer – Multicore • System’s complexity requires programmer to learn both HW and SW 11
  • 12. • Introduction to Parallel Computing • GPU as Accelerator 12
  • 13. 13
  • 14. • Power is the problem – System size is limited by power budget • Heterogeneous system is promising – CPU + Accelerator (=GPU) – CPU and GPU have their own strengths and weaknesses – CPU: few cores, high frequency (~GHz) – GPU: 1000 cores, low frequency (~MHz) 14
  • 15. • Graphics Processing Unit (GPU) – Originally developed for quickly generating 2D and 3D graphics, images, and video – Highly parallel processor – GPU is more power-efficient than CPU[3] *Image from nvidia.com 15
  • 16. • CPU and GPU are very different processors – Latency-oriented design (=speculative) – Throughput-oriented design (=parallel) vs 16
  • 17. • CPU and GPU are very different processors – Latency-oriented design (=speculative) – Throughput-oriented design (=parallel) vs vs 17
  • 18. CPU task 1 task 2 task 3 task 4 task 1 task 2 GPU task 3 task 4 time vs vs 18
  • 19. • Speculative execution by branch prediction is effective to shorten the execution time. But it makes the hardware complicated A = 2; B = 3; C = A+B; D = A*B; E = A-B; if ( C > 4 ) { E D C ? A = 0; } B = 0; 19
  • 20. • CPU has a large cache memory and control unit • GPUs devote more hardware resources to ALUs 20
  • 21. • Many simple cores – No speculation features • Simplicity to increase the number of cores on a chip • Fast context switch due to simplicity of its core design comp. memory access comp. GPU Core A comp. memory access context switch comp. time 21
  • 22. • CPU and GPU are very different processors – They have own strengths and weaknesses • CPU has few big cores to shorten the execution time • GPU has many simple cores to increase throughput – CPU for serial execution and GPU for parallel execution 22
  • 23. [1] Levin, E. “Grand challenges to computational science.” Communication of the ACM 32(12):1456-1457, December 1989. [2] Kauffmann, William J. III, and Larry L. Smarr. Supercomputing and the Transformation. [3] Nvidia. “Doing more with less of a scarce resource.” http://www.nvidia.com/object/gcr- energy-efficiency.html 23