SlideShare a Scribd company logo
1 of 127
Download to read offline
6.963
                                  IT /
                            A@M
                         CUD
                       9
         IAP0

                            Supercomputing on your desktop:
        Programming the next generation of cheap
       and massively parallel hardware using CUDA

                                                            Lecture 03
                                                            Nicolas Pinto (MIT)




                                 CUDA          -   Basics    #2

Tuesday, January 13, 2009
During this course,
                                                3
                                               6
                                        for 6.9
                                     ed
                                adapt


        we’ll try to


                            “                       ”

           and use existing material ;-)
Tuesday, January 13, 2009
Today
                            yey!!




Tuesday, January 13, 2009
6.963
                                  IT /
                            A@M
                         CUD
                       9
         IAP0
                                  Language
                                Compilation
                                       API
                            Threading Model
                             Memory Model

Tuesday, January 13, 2009
6.963
                                  IT /
                            A@M
                         CUD
                       9
         IAP0



                                               CUDA
                                               Language




Tuesday, January 13, 2009
age
                                                              gu
                                                           an
                                                        L

                        !quot;#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04%
                   !
                        !5!66

                     $--47+%834.3,22'3+%04%',+)-9%24:'%';)+0)*.%
                   !
                     <4&'%04%!quot;#$
                   ! ='++'*+%-',3*)*.%</3:'




                                                          !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                          =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
age
                                                             gu
                                                          an
                                                       L

                       !quot;#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04%
                  !
                       !5!66


                       !quot;#$%&$'()*$'+',,$-%../0/12$.0quot;3$$
                  !
                       &241-40-$'+',,5



                                                         !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                         =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
age
                                                              gu
                                                           an
                                                        L

                        !quot;#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04%
                  !
                        !5!66

                        >9*0,<0)<%';0'*+)4*+?
                  !
                        ! #'<-,3,0)4*%@/,-)()'3+
                        ! A/)-0B)*%C,3),D-'+
                        ! A/)-0B)*%E98'+
                        ! F;'</0)4*%!4*()./3,0)4*

                                                          !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                          =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
age
                                                           gu
                                                        an
                                                     L

                    #'<-+8'< G%&'<-,3,0)4*%+8'<)()'3 5%&'<-,3,0)4*%
                  !
                    H/,-)()'3
                  ! $%24&)()'3%,88-)'&%04%&'<-,3,0)4*+%4(?
                        ! C,3),D-'+
                        ! I/*<0)4*+

                       F;,28-'+?%%!quot;#$%J%&'%&(#J%$%)%*!
                  !



                                                       !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                       =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
uage
                                                                            ang
                                                                          L



                        !quot;#$%/+'+%01'%(4--47)*.%&'<-,3,0)4*%
                  !
                        H/,-)()'3+%(43%:,3),D-'+?

                        ++,&-*!&++
                  !
                        ++$.)(&,++
                  !
                        ++!quot;#$%)#%++
                  !

                        K*-9%,88-9%04%.-4D,-%:,3),D-'+
                  !

                                                         !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                         =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
age
                                                              gu
                                                           an
                                                        L

                       !quot;#$%&quot;'()*%)(%(+$,-%$(.%&/%-$quot;(/'('),&quot;0(,1(
                  !
                       )*quot;(0quot;./#quot;

                       2*quot;(0%)%(&quot;'/0quot;'(/1(+$,-%$(3quot;3,&4
                  !
                       5%'($/6quot;)/3quot;(,6()*quot;(quot;1)/&quot;(%77$/#%)/,1
                  !
                       8##quot;''/-$quot;(),(%$$(9:;()*&quot;%0'
                  !
                       8##quot;''/-$quot;(),()*quot;(<:;(./%(8:=
                  !



                                                          !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                          =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
age
                                                               gu
                                                            an
                                                        L

                       !quot;#$%&quot;'()*%)(%(+$,-%$(.%&/%-$quot;(/'('),&quot;0(,1(
                 !
                       )*quot;(0quot;./#quot;

                       2*quot;(0%)%(&quot;'/0quot;'(/1('*%&quot;0(3quot;3,&4
                 !
                       5%'($/6quot;)/3quot;(,6()*quot;()*&quot;%0(-$,#>
                 !
                       8##quot;''/-$quot;(),(%$$()*&quot;%0'?(,1quot;(#,74(7quot;&()*&quot;%0(
                 !
                       -$,#>


                                                           !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                           =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
age
                                                              gu
                                                           an
                                                        L

                       =6(1,)(0quot;#$%&quot;0(%'(!quot;#$%&#'?(&quot;%0'(6&,3(
                  !
                       0/66quot;&quot;1)()*&quot;%0'(%&quot;(1,)(./'/-$quot;(@1$quot;''(%(
                       '41#*&,1/A%)/,1(-%&&/quot;&(@'quot;0

                       B,)(%##quot;''/-$quot;(6&,3(<:;
                  !




                                                          !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                          =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
age
                                                               gu
                                                            an
                                                         L

                        !quot;#$%&quot;'()*%)(%(+$,-%$(.%&/%-$quot;(/'('),&quot;0(,1(
                  !
                        )*quot;(0quot;./#quot;

                        2*quot;(0%)%(&quot;'/0quot;'(/1(#,1')%1)(3quot;3,&4
                  !
                        5%'($/6quot;)/3quot;(,6(quot;1)/&quot;(%77$/#%)/,1
                  !
                        8##quot;''/-$quot;(),(%$$(9:;()*&quot;%0'(C&quot;%0(,1$4D
                  !
                        8##quot;''/-$quot;(),(<:;(./%(8:=(C&quot;%0EF&/)quot;D
                  !



                                                           !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                           =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
uage
                                                                            ang
                                                                          L



                        <;!8(@'quot;'()*quot;(6,$$,F/1+(0quot;#$'7quot;#' 6,&(
                  !
                        .%&/%-$quot;'G

                        (()'!&*'((
                  !
                        ((+quot;,%((
                  !
                        ((-#quot;.$#((
                  !




                                                         !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                         =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
age
                                                               gu
                                                            an
                                                         L

                        !quot;#$%&quot;'()*%)(%(6@1#)/,1(/'(#,37/$quot;0(),?(%10(
                  !
                        quot;Hquot;#@)quot;'(,1()*quot;(0quot;./#quot;

                        <%$$%-$quot;(,1$4(6&,3(%1,)*quot;&(6@1#)/,1(,1()*quot;(
                  !
                        0quot;./#quot;




                                                           !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                           =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
age
                                                             gu
                                                          an
                                                       L

                       !quot;#$%&quot;'()*%)(%(+,-#)./-(.'(#/01.$quot;2()/(%-2(
                  !
                       quot;3quot;#,)quot;'(/-()*quot;(*/')

                       4%$$%5$quot;(/-$6(+&/0(%-/)*quot;&()*quot;(*/')
                  !
                       7,-#)./-'(8.)*/,)(%-6(49!:(2quot;#$'1quot;# %&quot;(
                  !
                       */')(56(2quot;+%,$)

                       4%-(,'quot;(!!quot;#$%!! %-2(!!&'()*'!!+
                  !
                       )/;quot;)*quot;&
                                                         !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                         =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
age
                                                             gu
                                                          an
                                                       L

                       !quot;#$%&quot;'()*%)(%(+,-#)./-(.'(#/01.$quot;2()/(%-2(
                  !
                       quot;3quot;#,)quot;'(/-()*quot;(2quot;<.#quot;

                       4%$$%5$quot;(+&/0()*quot;(*/')
                  !
                       9'quot;2(%'()*quot;(quot;-)&6(1/.-)(+&/0(*/')()/(2quot;<.#quot;
                  !




                                                         !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                         =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
age
                                                               gu
                                                            an
                                                         L
                       49!:(1&/<.2quot;'(%('quot;)(/+(5,.$)=.-(<quot;#)/&()61quot;'>
                  !

                       *quot;,-./+0*quot;,-./+*quot;,-1/+0*quot;,-1/+*quot;,-2/+
                  !
                       0*quot;,-2/+*quot;,-3/+0*quot;,-3/+
                       $quot;#-%./+0$quot;#-%./+$quot;#-%1/+0$quot;#-%1/+
                  !
                       $quot;#-%2/+0$quot;#-%2/+$quot;#-%3/+0$quot;#-%3/
                       )4%./+0)4%./+)4%1/+0)4%1/+)4%2/+
                  !
                       0)4%2/+)4%3/+0)4%3/+
                       5#46./+05#46./+5#461/+05#461/+5#462/+
                  !
                       05#462/+5#463/+05#463/+
                       75#,%./+75#,%1/+75#,%2/+75#,%3+
                  !

                                                           !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                           =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
age
                                                             gu
                                                          an
                                                       L

                       4%-(#/-')&,#)(%(<quot;#)/&()61quot;(8.)*('1quot;#.%$(
                  !
                       +,-#)./->
                       8,9'!!quot;#$%&'(%):(;/+(.!quot;#$

                       4%-(%##quot;''(quot;$quot;0quot;-)'(/+(%(<quot;#)/&()61quot;(8.)*(
                  !
                       !quot;#$%&!quot;'$%&!quot;($%&!quot;)$*
                       ('*(,-<=


                                                         !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                         =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
age
                                                            gu
                                                         an
                                                        L

                        &)82 .'(%('1quot;#.%$(<quot;#)/&()61quot;
                   !

                        ?%0quot;(%'(0)4%2@(quot;3#quot;1)(#%-(5quot;(#/-')&,#)quot;2(
                   !
                        +&/0(%('#%$%&()/(+/&0(%(<quot;#)/&>
                        :$*,5,-/+./+.>




                                                        !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                        =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
age
                                                               gu
                                                            an
                                                        L

                        49!:(1&/<.2quot;'(+/,&(;$/5%$@(5,.$)=.-(<%&.%5$quot;'
                  !
                        %quot;-',&?&=@(@5#*9?&=@(@5#*9A)8@(
                  !
                        6-)&A)8

                        +',-.&/0&/&1&)822&34&10)4%22&
                  !

                        :##quot;''.5$quot;(/-$6(+&/0(2quot;<.#quot;(#/2quot;
                  !
                        4%--/)()%Aquot;(%22&quot;''
                  !
                        4%--/)(%''.;-(<%$,quot;
                  !
                                                           !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                           =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
uage
                                                                            ang
                                                                          L



                        !quot;#$%&'()*+,-%-./0120*2%-341'%0(%513/26%06,%
                  !
                        ,7,230*(/%(8%9,'/,5-
                   !quot;#$%%%&'()*(+,-./0$1*(+!!!quot;#$%&'()*+,-./
                     !quot;#$%%%&'()*(+,-./0$1*(+!!!quot;#$%&'()*+,-./

                    !quot;#$%%%&'()*(+,-./0$1*(+!!!quot;#$%&'()*+,-./
                        !quot;#$ *-%1%%%&'()*'%%+83/20*(/
                  !

                        @6,%2(>&*5,'%03'/-%06*-%0.&,%(8%-010,>,/0%
                  !
                        */0(%1%=5(29%(8%2(+,%0610%2(/8*43',-A%1/+%
                        513/26,-%06,%9,'/,5
                                                         !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                         =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
age
                                                             gu
                                                          an
                                                       L

                       !quot;#$%+,8*/,-%1%51/4314,%0610%*-%-*>*51'%0(%
                  !
                       !B!CC

                       D>&('01/0%#*88,',/2,-E
                  !
                        ! F3/0*>,%G*='1'.
                        ! H3/20*(/-
                        ! !51--,-A%I0'320-A%quot;/*(/-



                                                         !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                         =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
uage
                                                                           ang
                                                                         L



                       !quot;#$%&'(&%)*+,%quot;+%&'$%#$-./$%/(+0&%*,$%+quot;)1(2%
                  !
                       !quot;!##$%&'()*+$,)-./.0$1&'2()3'4

                       53$!quot;#$%&6$&quot;'()6$*(++,-6$+(2
                  !

                       734($*/(8$1&'2()3'4$8/9+$:+9)2+$+;&)9/<+'(
                  !




                                                        !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                        =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
uage
                                                                           ang
                                                                         L



                        J'$/$!DFG$:+9)2+6$(8+.+$)4$'3$4(/2K
                  !
                        L0$:+1/&<(6$/<<$1&'2()3'$2/<<4$/.+$)'<)'+:
                  !
                        !/'$&4+$!!quot;#$quot;%$quot;&!! (3$>.+9+'($M!DFG$HIHN
                  !

                    G<<$<32/<$9/.)/-<+46$1&'2()3'$/.E&*+'(4$/.+$
                  !
                    4(3.+:$)'$.+E)4(+.4
                  ! '( 1&'2()3'$.+2&.4)3'

                        53$1&'2()3'$>3)'(+.4
                  !
                                                        !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                        =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
uage
                                                                            ang
                                                                          L



                       !DFG$4&>>3.(4$43*+$!##$1+/(&.+4$13.$:+9)2+$
                  !
                       23:+I$$OIE?
                        ! =+*></(+$1&'2()3'4


                       !</44+4$/.+$4&>>3.(+:$)'4):+$I2&$43&.2+6$-&($
                  !
                       *&4($-+$834($3'<0

                       P(.&2(4quot;D')3'4$A3.K$3'$:+9)2+$23:+$/4$>+.$!
                  !

                                                         !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                         =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
Common Runtime Component:
                                                     age
                                                angu
                      Mathematical Functions  L
                  •         pow, sqrt, cbrt, hypot
                  •         exp, exp2, expm1
                  •         log, log2, log10, log1p
                  •         sin, cos, tan, asin, acos, atan, atan2
                  •         sinh, cosh, tanh, asinh, acosh, atanh
                  •         ceil, floor, trunc, round
                  •         Etc.
                             – When executed on the host, a given function uses
                                the C runtime implementation if available
                             – These functions are only supported for scalar types,
                                not vector types
                                                                                      16
                  !quot;#$%&'quot;(&)*+,-.#./quot;$0'quot;120342&quot;15quot;678
                  9):$0$;quot;.<<&0=&>;quot;/8?8>@quot;AB3CC;quot;CDDB




Tuesday, January 13, 2009
Device Runtime Component:
                                                       uage
                                                   ang
                          Mathematical Functions L
                      • Some mathematical functions (e.g. sin(x))
                        have a less accurate, but faster device-only
                        version (e.g. __sin(x))
                            –    __pow
                            –    __log, __log2, __log10
                            –    __exp
                            –    __sin, __cos, __tan




                                                                       17
                  !quot;#$%&'quot;(&)*+,-.#./quot;$0'quot;120342&quot;15quot;678
                  9):$0$;quot;.<<&0=&>;quot;/8?8>@quot;AB3CC;quot;CDDB




Tuesday, January 13, 2009
6.963
                                  IT /
                            A@M
                         CUD
                       9
         IAP0



                                               CUDA
                                           Compilation




Tuesday, January 13, 2009
tion
                                                                       pila
                                                                     m
                                                        Co


                       !quot;#$%&'()*+%,-.+&%+/0%-/%12*(3
                  !
                       !quot;#$%&#'%'(&)'quot;*'+,-&.,'%#+'/quot;0$'.quot;+,1+%$%
                  !

                       !quot;(2&3,+'45'!quot;##
                  !
                       !quot;## &0'6,%335'%'76%22,6'%6quot;8#+'%'(quot;6,'
                  !
                       .quot;(23,)'.quot;(2&3%$&quot;#'26quot;.,00



                                                        !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                        =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
tion
                                                                    pila
                                                                  m
                                                      Co


                  !quot;#$%
                  ! 9quot;6(%3':.;':.22 0quot;86.,'*&3,0
                  ! !<=>':.8'0quot;86.,'.quot;+,'*&3,0

                  &$%#$%
                  ! ?4@,.$1,),.8$%43,'.quot;+,'*quot;6'/quot;0$
                  ! :.84&# ,),.8$%43,'.quot;+,'*quot;6'$/,'+,-&.,



                                                     !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                     =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
tion
                                                                           pila
                                                                         m
                                                            Co


                       Aquot;6':.'%#+':.22 *&3,0;'#-.. &#-quot;B,0'$/,'#%$&-,'
                 !
                       !1!CC'.quot;(2&3,6'*quot;6'$/,'050$,('D,EF'E..1.3G

                       4')%2*(%,-.+&5%-6%-&%7%.-66.+%8')+%*'89.-*76+0:
                 !




                                                            !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                            =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
tion
                                                                                      pila
                                                                                    m
                                                                       Co


                            '($


                        .22
                            '($


                                      '(             '*
                     .8+%*,                  .22              3&#B,6

                                  '.%$,'(
                                                     '*
                                             .22              3&#B,6

                             ')#$'(

                                      '#%+           '($,-quot;
                   #-quot;2,#..                  2$)%0            .84&#




                                                                       !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                                       =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
tion
                                                                       pila
                                                                     m
                                                        Co


                      Hquot;'0,,'$/,'0$,20'2,6*quot;6(,+'45'#-..;'80,'$/,'
                 !
                      //0121$quot; %#+'//344#5.quot;((%#+'3&#,'quot;2$&quot;#0




                                                        !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                        =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
tion
                                                                         pila
                                                                       m
                                                          Co


                      !quot;#$%&$'$()*+%, -%,./0$#%12$12/$3/&1$quot;4$12/$
                 !
                      53quot;63'78

                      9',$+/:
                 !
                      ! ;quot;'0/0$'&$'$4%-/$'1$3*,1%7/
                      ! <7+/00/0$%,$0'1'$&/67/,1
                      ! <7+/00/0$'&$'$3/&quot;*3)/



                                                          !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                          =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
tion
                                                                           pila
                                                                         m
                                                            Co


                      !quot;#$quot;%&&'($)'*)+,(-),(.'/0
                 !
                      ! =2/$53quot;63'7$)3'&2/&
                      ! >1$53quot;0*)/&$12/$#3quot;,6$3/&*-1
                      !0


                      ?*1@$12/3/$'3/$7',A$0/+*66%,6$1/)2,%B*/&
                 !
                      ! C/+*66%,6$&quot;41#'3/$D/6:$60+@$E%&*'-$F1*0%quot;G
                      ! !quot;#$%&

                                                            !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                            =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
bug
                                                                        De

                       9HCI$53quot;63'77%,6$%&$/J/,$-/&&$4*,
                 !
                       ! =2/3/$%&$,quot;$0/+*66/3
                       ! =2/3/$%&$,quot;$!quot;#$%&


                       C/+*66%,6$)quot;0/$quot;,$12/$0/J%)/$%&$J/3A$2'30
                 !
                       ! 9',$13A$1quot;$#3%1/$%,1/37/0%'1/$3/&*-1&$1quot;$7/7quot;3A$
                         ',0$)quot;5A$+').$1quot;$2quot;&1$1quot;$/K'7%,/
                       ! <7*-'1%quot;,$7quot;0/

                                                             !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                             =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
mu
                                                                           E

                   !quot;#$%&'(#)#*+,-&./0#1.)(2#quot;+$#*)'#/,$.)3/#4quot;#
                 !
                   0$''&'(#!quot;quot; *+5/#+'#36/#6+%3
                 ! 7+,-&./0#8.)(9 ##$%&'(%#%)*quot;!+',-

                      :++5#1+0#,+%3#5/4$((&'(9#*)'#$%/#(54;-0&'31
                 !

                      <+3#)#30$/#/,$.)3&+'9
                 !
                       ! =)*/#7+'5&3&+'%2#>/,+0quot;#,+5/.#5&11/0/'*/%2#/3*

                                                           !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                           =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
mu
                                                                 E
                            Device Emulation Mode Pitfalls
                  • Emulated device threads execute sequentially,
                    so simultaneous accesses of the same memory
                    location by multiple threads could produce
                    different results.
                  • Dereferencing device pointers on the host or host
                    pointers on the device can produce correct
                    results in device emulation mode, but will
                    generate an error in device execution mode




                  !quot;#$%&'quot;(&)*+,-.#./quot;$0'quot;120342&quot;15quot;678
                  9):$0$;quot;.<<&0=&>;quot;/8?8>@quot;AB3CC;quot;CDDB




Tuesday, January 13, 2009
mu
                                                                                             E
                                                      Floating Point
                      • Results of floating-point computations will slightly
                        differ because of:
                            – Different compiler outputs, instruction sets
                            – Use of extended precision for intermediate results
                                 • There are various options to force strict single precision on
                                   the host




                  !quot;#$%&'quot;(&)*+,-.#./quot;$0'quot;120342&quot;15quot;678
                  9):$0$;quot;.<<&0=&>;quot;/8?8>@quot;AB3CC;quot;CDDB




Tuesday, January 13, 2009
lkit
                                                                                      oo
                                                                                     T
                            CUDA Toolkit

                                                 Application Software
                                           Industry Standard C Language

                                                                      Libraries
                                                                       !quot;%&'(
                                             !quot;##$                                     !quot;)**

                                                             CUDA Compiler             CUDA Tools
                                  GPU:card, system
                                                                  +        !quot;#$#%&    '()*++(#,,*-./01-
                                   Multicore CPU




                                    4 cores

                                                                                                          3
                      M02: High Performance Computing with CUDA




Tuesday, January 13, 2009
lkit
                                                                                                      oo
                                                                                                     T
                            CUDA Many-core + Multi-core support

                                                                        C CUDA Application



                                                                                          NVCC
                                                     NVCC
                                                                                       --multicore



                                                  Many-core                            Multi-core
                                                  PTX code                            CPU C code


                                                 PTX to Target                           gcc and
                                                   Compiler                               MSVC



                                                   Many-core                            Multi-core

                                                                                                     5
                            M02: High Performance Computing with CUDA




Tuesday, January 13, 2009
lkit
                                                                                      oo
                                                                                     T
                        CUDA Compiler: nvcc

                             Any source file containing CUDA language extensions (.cu)
                             must be compiled with nvcc

                             NVCC is a compiler driver
                                   Works by invoking all the necessary tools and compilers like
                                   cudacc, g++, cl, ...

                             NVCC can output:
                                   Either C code (CPU Code)
                                        That must then be compiled with the rest of the application using another tool
                                   Or PTX or object code directly

                             An executable with CUDA code requires:
                                   The CUDA core library (cuda)
                                   The CUDA runtime library (cudart)



                                                                                                                         6
                      M02: High Performance Computing with CUDA




Tuesday, January 13, 2009
lkit
                                                                            oo
                                                                           T
                        CUDA Compiler: nvcc

                             Important flags:

                                   -arch sm_13                    Enable double precision ( on
                                                                  compatible hardware)

                                   -G                             Enable debug for device code

                                   --ptxas-options=-v             Show register and memory usage

                                   --maxrregcount <N>             Limit the number of registers

                                   -use_fast_math                 Use fast math library



                                                                                                  7
                      M02: High Performance Computing with CUDA




Tuesday, January 13, 2009
lkit
                                                                                  oo
                                                                                 T
                        Compiling CUDA for Multi-Core
                                                                  Using “—multicore” compile
                                        C/C++ CUDA
                                                                  switch with the NVCC
                                        Application
                                                                  compiler generates C code
                                                                  for multi-core CPU

                                    NVCC --multicore              Performance scales linearly
                                                                  with more cores
                                Multicore CPU C Code
                                                                  Control numbers of cores
                                                                  with environment variable
                                                                  CUDA_NROF_CORES=n
                                        gcc / MSVC



                            Multicore Optimized Application


                                                                                              8
                      M02: High Performance Computing with CUDA




Tuesday, January 13, 2009
lkit
                                                                       oo
                                                                      T
                        GPU Tools

                             Profiler
                                   Available now for all supported OSs
                                   Command-line or GUI
                                   Sampling signals on GPU for:
                                        Memory access parameters
                                        Execution (serialization, divergence)
                             Debugger
                                   Runs on the GPU
                             Emulation mode
                                   Compile and execute in emulation on CPU
                                   Allows CPU-style debugging in GPU source




                                                                                 35
                      M02: High Performance Computing with CUDA




Tuesday, January 13, 2009
6.963
                                  IT /
                            A@M
                         CUD
                       9
         IAP0



                                               CUDA
                                                 API




Tuesday, January 13, 2009
PI
                                                                          A

                      !Aquot;(DGHI(IMK(71/'.'$'(19($A&quot;quot;(B*&$'2
                !
                      ! !Aquot;(A1'$(IMK
                      ! !Aquot;(-quot;F.7quot;(IMK
                      ! !Aquot;(71))1/(IMK




                                                       !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                       =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                             A

                      !quot;#$%&'($)*+,$(-.$/0*123#+$4567,2*6+$4*08
                 !
                      ! '#127#$9:6:;#9#6,
                      ! <#9*0=$9:6:;#9#6,
                      ! >,0#:9$9:6:;#9#6,
                      ! ?1#6,$9:6:;#9#6,
                      ! !#@,50#$9:6:;9#6,
                      ! A/#6BCD'20#7,E$26,#0*/#0:F2G2,=


                                                          !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                          =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                    A

                       !quot;#$)*+,$(-.$2+$#@/*+#3$:+$,H*$3244#0#6,$
                 !
                       !quot;#$%&
                       ! !quot;#$G*H$G#1#G$'#127#$(-.$I/0#42@8$75J
                       ! !quot;#$quot;2;quot;$G#1#G$K56,29#$(-.$I/0#42@8$753:J


                   >*9#$,quot;26;+$7:6$F#$3*6#$,quot;0*5;quot;$F*,quot;$(-.+L$
                 !
                   *,quot;#0+$:0#$+/#72:G2M#3
                 ! %:6$F#$92@#3$,*;#,quot;#0$IH2,quot;$7:0#J

                                                                 !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                                 =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                             A

                       (GG$B-&$7*9/5,26;$2+$/#04*09#3$*6$:$3#127#
                 !
                       !*$:GG*7:,#$9#9*0=L$056$:$/0*;0:9L$#,7$*6$
                 !
                       ,quot;#$quot;:03H:0#L$H#$6##3$:$!quot;#$%quot;&%'()quot;*)

                       '#127#$7*6,#@,+$:0#$F*563$N8N$H2,quot;$quot;*+,$
                 !
                       ,quot;0#:3+$IO5+,$G2P#$A/#6BCQJ
                       ! >*L$#:7quot;$quot;*+,$,quot;0#:3$9:=$quot;:1#$:,$9*+,$*6#$3#127#$
                         7*6,#@,
                       ! (63L$#:7quot;$3#127#$7*6,#@,$2+$:77#++2FG#$40*9$*6G=$
                         *6#$quot;*+,$,quot;0#:3
                                                              !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                              =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                            A

                   (GG$3#127#$(-.$7:GG+$0#,506$:6$#00*0D+577#++$
                 !
                   7*3#$*4$,=/#8$+,-quot;./0)
                 ! (GG$056,29#$(-.$7:GG+$0#,506$:6$#00*0D+577#++$
                   7*3#$*4$,=/#$%/!12--'-3)

                       (6$26,#;#0$1:G5#$H2,quot;$M#0*$R$6*$#00*0
                 !

                       %/!14quot;)51.)2--'-L$%/!14quot;)2--'-6)-$(7
                 !

                                                         !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                         =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                            A

                      K56,29#$(-.$7:GG+$:5,*9:,27:GG=$262,2:G2M#
                 !
                      '#127#$(-.$7:GG+$95+,$7:GG$%/8($)
                 !




                                                         !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                         =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                             A

                        !quot;#$420+,$I*/,2*6:GSJ$+,#/$2+$,*$#659#0:,#$,quot;#$
                  !
                        :1:2G:FG#$3#127#+

                        %/9quot;#$%quot;4quot;)+'/()
                  !
                        %/9quot;#$%quot;4quot;)
                  !
                        %/9quot;#$%quot;4quot;):1;quot;
                  !
                        %/9quot;#$%quot;4quot;)<')10=quot;;'->
                  !
                        %/9quot;#$%quot;4quot;)?))-$@/)quot;
                  !
                        !
                  !
                                                           !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                           =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                      A

                      !quot;#$%&$%#'(()$%*%+$,-#$%&-.'%!quot;#$%&!$'$(
                 !
                      &$%/$.%*%+$,-#$%'*quot;+0$%(1%.23$%)*+$%&!$

                      4*quot;%quot;(&%#5$*.$%*%#(quot;.$6.%&-.'%!quot;)(,)-$.($
                 !




                                                     !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                     =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                A

                       78quot;.-9$%:;<%35(,-+$)%*%)-930-1-$+%-quot;.$51*#$%
                 !
                       1(5%#5$*.-quot;/%*%#(quot;.$6.=

                       !quot;+.'$(#$%&!$)/quot;0(
                 !
                       !quot;+.1$(#$%&!$
                 !

                       :quot;+%.'$%8)$180=
                 !

                       !quot;+.)2//3$#$%&!$
                 !
                                                         !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                         =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                        A
                        Device Management


                             CPU can query and select GPU devices
                                   cudaGetDeviceCount( int* count )
                                   cudaSetDevice( int device )
                                   cudaGetDevice( int *current_device )
                                   cudaGetDeviceProperties( cudaDeviceProp* prop,
                                                               int device )
                                   cudaChooseDevice( int *device, cudaDeviceProp* prop )

                             Multi-GPU setup:
                                   device 0 is used by default
                                   one CPU thread can control one GPU
                                        multiple CPU threads can control the same GPU

                                          – calls are serialized by the driver

                                                                                               28
                      M02: High Performance Computing with CUDA




Tuesday, January 13, 2009
PI
                                                                                      A


                   !quot;#$%&$%'*,$%*%#(quot;.$6.%>)*!/0($,(?%#*quot;%
                 !
                   *00(#*.$%9$9(52@%#*00%*%A;B%18quot;#.-(quot;%$.#C%%
                 ! 4(quot;.$6.%-)%-930-#-.02%*))(#-*.$+%&-.'%#5$*.-quot;/%
                   .'5$*+


                   D(%)2quot;#'5(quot;-E$%*00%.'5$*+)%>4;B%'().%&-.'%
                 !
                   A;B%.'5$*+)?%#*00%!quot;)(,140!2-/0&5$
                 ! F*-.)%1(5%*00%A;B%.*)G)%.(%1-quot;-)'%
                                                      !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                      =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                A


                      :00(#*.$HI5$$%9$9(52=
                 !
                      !quot;6$7899/!:;!quot;6$7<-$$
                 !

                      <quot;-.-*0-E$%9$9(52=
                 !
                      !quot;6$73$(
                 !

                      4(32%9$9(52=
                 !
                      !quot;6$7!=4>(/#:;!quot;6$7!=4#(/>:;
                 !
                      !quot;6$7!=4#(/#
                                                !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                       A

                   F'$quot;%*00(#*.-quot;/%9$9(52%1(5%.'$%2/3(@%#*quot;%
                 !
                   8)$%!quot;##$% H%&'( H%!!quot;)
                 ! !5%8)$%!quot;6$7899/!>/3(@%!quot;6$7<-$$>/3(

                      D'$)$%18quot;#.-(quot;)%*00(#*.$%'().%9$9(52%.'*.%-)%
                 !
                      )quot;*'+#$%,'-

                      ;$51(59*quot;#$%-935(,$+%1(5%#(32%.(H15(9%
                 !
                      3*/$J0(#G$+%'().%9$9(52
                                                       !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                       =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                          A
                      :00(#*.$HI5$$%9$9(52=
                 !
                      !quot;+.6.99/!@%!quot;+.<-$$
                 !

                      <quot;-.-*0-E$%9$9(52=
                 !
                      !quot;+.6$73$(
                 !

                      4(32%9$9(52=
                 !
                      !quot;+.6$7!=4
                 !

                                              !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                              =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                       A

                      !quot;#$%&''(!!quot;#$%quot;&''(%&$#quot;!quot;#$%& )#)(*+
                 !
                      ,&-quot;&'.(quot;&''(%&$#quot;%&&%' )#)(*+quot;/012
                 !

                   3**&+.quot;&*#quot;%*#&$#4quot;56$7quot;&quot;.8#%696%quot;564$7quot;&-4quot;
                 !
                   7#6:7$quot;&-4quot;#'#)#-$quot;$+8#
                 ! ;#)(*+quot;'&+(<$quot;6.quot;(8$6)6=#4quot;/#>:>quot;8&%?6-:2quot;@+quot;
                   *<-$6)#

                      !quot;&))*+,)$*-$! !quot;&))*+.$/-)(+
                 !
                      !quot;#$%!0+.-(&! !quot;#$%!0+1-(&!quot;#
                 !

                                                       !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                       =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                        A

                      3quot;)(4<'#quot;6.quot;&quot;@'(@quot;(9quot;ABCquot;%(4#D4&$&quot;&'(-:quot;
                 !
                      56$7quot;.()#quot;$+8#quot;6-9(*)&$6(-
                      ! >%<@6- 96'#.

                      3quot;)(4<'#quot;6.quot;%*#&$#4quot;@+quot;'(&46-:quot;&quot;%<@6- 56$7quot;
                 !
                      !quot;#(2quot;'$,)$*-$ (*quot;!quot;#(2quot;'$3(*2.*-*

                      ;(4<'#quot;%&-quot;@#quot;<-'(&4#4quot;56$7quot;
                 !
                      !quot;#(2quot;'$45'(*2
                                                        !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                        =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                        A

                      E(&46-:quot;&quot;)(4<'#quot;&'.(quot;%(86#.quot;6$quot;$(quot;$7#quot;4#F6%#
                 !

                      ,&-quot;$7#-quot;:#$quot;$7#quot;&44*#..quot;(9quot;9<-%$6(-.quot;&-4quot;
                 !
                      :'(@&'quot;F&*6&@'#.G
                      !quot;#(2quot;'$6$-7quot;5!-8(5
                      !quot;#(2quot;'$6$-6'(9*'
                      !quot;#(2quot;'$6$-:$;<$=


                                                        !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                        =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                        A

                       H-%#quot;&quot;)(4<'#quot;6.quot;'(&4#4!quot;&-4quot;5#quot;7&F#quot;&quot;
                 !
                       9<-%$6(-quot;8(6-$#*!quot;5#quot;%&-quot;%&''quot;&quot;9<-%$6(-

                       I#quot;)<.$quot;.#$<8quot;$7#quot;!quot;!#$%&'()!(*&+'(,!(%)
                 !
                       96*.$




                                                        !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                        =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                         A

                       JK#%<$6(-quot;#-F6*(-)#-$quot;6-%'<4#.G
                 !
                       quot; L7*#&4quot;M'(%?quot;N6=#
                       quot; N7&*#4quot;;#)(*+quot;N6=#
                       quot; O<-%$6(-quot;B&*&)#$#*.
                       quot; A*64quot;N6=#




                                                         !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                         =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                      A

                      L7*#&4quot;M'(%?quot;N6=#Gquot;
                 !
                      !quot;7quot;5!>$-?'(!@>A*0$

                      N7&*#4quot;;#)(*+quot;N6=#G
                 !
                      !quot;7quot;5!>$->A*)$2>8B$

                      O<-%$6(-quot;B&*&)#$#*.G
                 !
                      !quot;C*)*%>$->8B$DE!quot;C*)*%>$-8DE
                      !quot;C*)*%>$-=DE!quot;C*)*%>$-F
                                                      !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                      =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                           A

                       !quot;#$%&#'(%#)%)(*%+*%*,(%)+-(%*#-(%+)%*,(%
                 !
                       ./01*#20%#0321+*#204
                       !quot;#$quot;%!&'()*




                                                           !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                           =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                          A

                       +,!$--. !quot;#$%&#'()*+,#-*#%.quot;#&+*/#01*#2223444#
                  !
                       '&quot;%0(5quot;#(quot;65%.0(5quot;#758*9.059:

                       5,(%12-6#7(quot;%8(0(quot;+*()%1+77)%*2%+77%$(3#1(%9:;%
                  !
                       *2%)(*/6%*,(%(<(1/*#20%(03#quot;20-(0*




                                                          !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                          =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                          A

                       9%)*quot;(+-%#)%+%)(=/(01(%2.%26(quot;+*#20)%*,+*%
                 !
                       211/quot;%#0%2quot;$(quot;%%>?8?
                       @? A26B%$+*+%.quot;2-%,2)*%*2%$(3#1(
                       C? ><(1/*(%$(3#1(%./01*#20%
                       D? A26B%$+*+%.quot;2-%$(3#1(%*2%,2)*




                                                          !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                          =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                        A

                      9%)*quot;(+-%#)%+%)(=/(01(%2.%26(quot;+*#20)%*,+*%
                 !
                      211/quot;%#0%2quot;$(quot;

                      E#..(quot;(0*%)*quot;(+-)%1+0%F(%/)($%*2%-+0+8(%
                 !
                      1201/quot;quot;(01B%%>?8?
                      G3(quot;7+66#08%-(-2quot;B%126B%.quot;2-%20(%)*quot;(+-%
                      H#*,%*,(%./01*#20%(<(1/*#20%.quot;2-%+02*,(quot;


                                                        !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                        =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
PI
                                                                                            A

                    <=quot;4$;',&quot;','8,J'3D'+quot;$quot;&:2424H'$Fquot;'E&3H&quot;;;'
                  !
                    3D',';$&quot;,:
                  ! !quot;#$%&'()*quot;+,#'-'./-)0#)1'+$'-'&%)#-/'-%'-'
                    ;Equot;#2D2#'E3;2$234
                  ! -'F3O+quot;&'3D',4'quot;=quot;4$'F,4+Oquot;'#,4)
                        ! P,2$'D3&',4'quot;=quot;4$'$3'3##%&
                        ! Qquot;,;%&quot;'$Fquot;'$2:quot;'$F,$'3##%&&quot;+'Nquot;$8quot;quot;4'$83'
                            quot;=quot;4$;

                                                            !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
                                                            =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0




Tuesday, January 13, 2009
6.963
                                  IT /
                            A@M
                         CUD
                       9
         IAP0



                                               CUDA
       Execution and Threading Model




Tuesday, January 13, 2009
ing
                                                                        ead
                      Execution Model
                                                                      hr
                                                                  T
                        Software      Hardware

                                                       Threads are executed by thread
                                         Thread
                                                       processors
                                        Processor
                            Thread

                                                       Thread blocks are executed on
                                                       multiprocessors

                                                       Thread blocks do not migrate

                                                       Several concurrent thread blocks can
                            Thread                     reside on one multiprocessor - limited
                                      Multiprocessor
                             Block                     by multiprocessor resources (shared
                                                       memory and register file)

                                                       A kernel is launched as a grid of
                                                       thread blocks
                                ...
                                                       Only one kernel can execute on a
                                                       device at one time
                             Grid        Device
                                                                       © 2008 NVIDIA Corporation.




Tuesday, January 13, 2009
ding
                                                                                             hrea
                                                                                         T
                     CUDA Uses Extensive Multithreading

                  • CUDA threads express fine-grained data parallelism
                            – Map threads to GPU threads or CPU vector elements
                            – Virtualize the processors
                            – You must rethink your algorithms to be aggressively parallel

                  • CUDA thread blocks express coarse-grained parallelism
                            – Map blocks to GPU thread arrays or CPU threads
                            – Scale transparently to any number of processors

                  • GPUs execute thousands of lightweight threads
                            – One DX10 graphics thread computes one pixel fragment
                            – One CUDA thread computes one result (or several results)
                            – Provide hardware multithreading & zero-overhead scheduling



                                                                                              9
                      M02: High Performance Computing with CUDA




Tuesday, January 13, 2009
ing
                                                                                  ead
                                                                                hr
                                                                            T
                        CUDA Programming Model


                             Parallel code (kernel) is launched and executed on a
                             device by many threads
                             Threads are grouped into thread blocks
                             Parallel code is written for a thread
                                   Each thread is free to execute a unique code path
                                   Built-in thread and block ID variables




                                                                                       4
                      M02: High Performance Computing with CUDA




Tuesday, January 13, 2009
ing
                                                                                ead
                                                                              hr
                                                                          T
                        Thread Hierarchy


                             Threads launched for a parallel section are
                             partitioned into thread blocks
                                   Grid = all blocks for a given launch
                             Thread block is a group of threads that can:
                                   Synchronize their execution
                                   Communicate via shared memory




                                                                                 5
                      M02: High Performance Computing with CUDA




Tuesday, January 13, 2009
ing
                                                                                               ead
                                                                                             hr
                                                                                    T
                        IDs and Dimensions


                            Threads:
                                                                  Device
                                3D IDs, unique within a block
                                                                                  Grid 1
                            Blocks:                                                 Block    Block    Block
                                2D IDs, unique within a grid                        (0, 0)   (1, 0)   (2, 0)


                            Dimensions set at launch time                           Block    Block    Block
                                                                                    (0, 1)   (1, 1)   (2, 1)
                                Can be unique for each section
                            Built-in variables:                    Block (1, 1)
                                threadIdx, blockIdx
                                                                    Thread Thread Thread Thread Thread
                                blockDim, gridDim                    (0, 0) (1, 0) (2, 0) (3, 0) (4, 0)

                                                                    Thread Thread Thread Thread Thread
                                                                     (0, 1) (1, 1) (2, 1) (3, 1) (4, 1)

                                                                    Thread Thread Thread Thread Thread
                                                                     (0, 2) (1, 2) (2, 2) (3, 2) (4, 2)



                                                                                                          6
                      M02: High Performance Computing with CUDA




Tuesday, January 13, 2009
ing
                                                                                               ead
                                                                                             hr
                                                                                   T
                  Programming Model
                                                            Host            Device
                            A kernel is executed as a                             Grid 1
                            grid of thread blocks                                   Block     Block    Block
                                                               Kernel
                            A thread block is a batch                               (0, 0)    (1, 0)   (2, 0)
                                                                 1

                            of threads that can                                     Block     Block    Block
                            cooperate with each                                     (0, 1)    (1, 1)   (2, 1)

                            other by:
                                                                                    Grid 2
                                     Sharing data through
                                     shared memory            Kernel
                                                                2
                                     Synchronizing their
                                     execution
                                                                   Block (1, 1)


                            Threads from different
                                                                    Thread Thread Thread Thread Thread
                                                                     (0, 0) (1, 0) (2, 0) (3, 0) (4, 0)

                            blocks cannot cooperate                 Thread Thread Thread Thread Thread
                                                                     (0, 1) (1, 1) (2, 1) (3, 1) (4, 1)

                                                                    Thread Thread Thread Thread Thread
                                                                     (0, 2) (1, 2) (2, 2) (3, 2) (4, 2)

                                                                                                   3
                  © NVIDIA Corporation 2006




Tuesday, January 13, 2009
ing
                                                                                  ead
                                                                                hr
                                                                            T
                        Blocks must be independent


                             Any possible interleaving of blocks should be valid
                                   presumed to run to completion without pre-emption
                                   can run in any order
                                   can run concurrently OR sequentially


                             Blocks may coordinate but not synchronize
                                   shared queue pointer: OK
                                   shared lock: BAD … can easily deadlock


                             Independence requirement gives scalability



                                                                                       10
                      M02: High Performance Computing with CUDA




Tuesday, January 13, 2009
ing
                                                                                       ead
                                                                                     hr
                                                                                T
                         Hardware Multithreading


                                      Hardware allocates resources to blocks
                                            blocks need: thread slots, registers, shared
                     SM
                                            memory
                     MT IU
                                            blocks don’t run until resources are available
                    SP
                                      Hardware schedules threads
                                            threads have their own registers
                                            any thread not waiting for something can run
                                            context switching is free – every cycle

                                      Hardware relies on threads to hide latency
                    Shared
                    Memory
                                            i.e., parallelism is necessary for performance


                                                                                             39
                      M02: High Performance Computing with CUDA




Tuesday, January 13, 2009
ing
                                                                                    ead
                                                                                  hr
                                                                              T
                         SIMT Thread Execution

                                      Groups of 32 threads formed into warps
                                            always executing same instruction
                     SM
                                            shared instruction fetch/dispatch
                     MT IU                  some become inactive when code path diverges
                                            hardware automatically handles divergence
                    SP

                                      Warps are the primitive unit of scheduling
                                      SIMT execution is an implementation choice
                                            sharing control logic leaves more space for ALUs
                                            largely invisible to programmer
                    Shared
                                            must understand for performance, not correctness
                    Memory




                                                                                           40
                      M02: High Performance Computing with CUDA




Tuesday, January 13, 2009
ing
                                                                                         ead
                      Transparent Scalability
                                                                                       hr
                                                                                T
                                 Hardware is free to schedule thread blocks on any
                                 processor
                                         A kernel scales across parallel multiprocessors

                                                    Kernel grid
                       Device                                               Device
                                                     Block 0      Block 1

                                                     Block 2      Block 3

                                                     Block 4      Block 5
                                                                             Block 0       Block 1              Block 2   Block 3
                            Block 0   Block 1
                                                     Block 6      Block 7


                                                                             Block 4       Block 5              Block 6   Block 7
                            Block 2   Block 3



                            Block 4   Block 5



                            Block 6   Block 7


                                                                                       © 2008 NVIDIA Corporation.




Tuesday, January 13, 2009
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

More Related Content

Similar to IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

Formation des internes de santé publique.
Formation des internes de santé publique.Formation des internes de santé publique.
Formation des internes de santé publique.Réseau Pro Santé
 
An Effort to Restore from Imperata Grassland to Secondary Forest in Samboja L...
An Effort to Restore from Imperata Grassland to Secondary Forest in Samboja L...An Effort to Restore from Imperata Grassland to Secondary Forest in Samboja L...
An Effort to Restore from Imperata Grassland to Secondary Forest in Samboja L...GPFLR
 
Filtrowanie treści - dylematy operatorów serwisów społecznościowych
Filtrowanie treści - dylematy operatorów serwisów społecznościowychFiltrowanie treści - dylematy operatorów serwisów społecznościowych
Filtrowanie treści - dylematy operatorów serwisów społecznościowychCendoo
 
Wierzbowski
WierzbowskiWierzbowski
Wierzbowskicendoo1
 
The Mythology of Big Data
The Mythology of Big DataThe Mythology of Big Data
The Mythology of Big Datamark madsen
 
η νεκρανάσταση του πιο βρώμικου αντικομμουνισμού
η νεκρανάσταση του πιο βρώμικου αντικομμουνισμούη νεκρανάσταση του πιο βρώμικου αντικομμουνισμού
η νεκρανάσταση του πιο βρώμικου αντικομμουνισμούKostas Panagio
 
Plone Foundation Annual Meeting, Budapest 2009
Plone Foundation Annual Meeting, Budapest 2009Plone Foundation Annual Meeting, Budapest 2009
Plone Foundation Annual Meeting, Budapest 2009Steve McMahon
 
Mario chaves
Mario chavesMario chaves
Mario chavesAvantica
 
Microformats, Building Blocks of the Semantic Web
Microformats, Building Blocks of the Semantic WebMicroformats, Building Blocks of the Semantic Web
Microformats, Building Blocks of the Semantic WebChris Griego
 
Capitulo 3.9 Gestion Tributaria Mipro ERP
Capitulo 3.9 Gestion Tributaria Mipro ERPCapitulo 3.9 Gestion Tributaria Mipro ERP
Capitulo 3.9 Gestion Tributaria Mipro ERPDeath User
 
Contrail Project, OW2con11, Nov 24-25, Paris
Contrail Project, OW2con11, Nov 24-25, ParisContrail Project, OW2con11, Nov 24-25, Paris
Contrail Project, OW2con11, Nov 24-25, ParisOW2
 

Similar to IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT) (20)

OpenSSO Microsoft Interop
OpenSSO Microsoft InteropOpenSSO Microsoft Interop
OpenSSO Microsoft Interop
 
Formation des internes de santé publique.
Formation des internes de santé publique.Formation des internes de santé publique.
Formation des internes de santé publique.
 
OSGi - beyond the myth
OSGi -  beyond the mythOSGi -  beyond the myth
OSGi - beyond the myth
 
An Effort to Restore from Imperata Grassland to Secondary Forest in Samboja L...
An Effort to Restore from Imperata Grassland to Secondary Forest in Samboja L...An Effort to Restore from Imperata Grassland to Secondary Forest in Samboja L...
An Effort to Restore from Imperata Grassland to Secondary Forest in Samboja L...
 
Social Radar ROI
Social Radar ROISocial Radar ROI
Social Radar ROI
 
Ph 35
Ph 35Ph 35
Ph 35
 
Harpo
HarpoHarpo
Harpo
 
Filtrowanie treści - dylematy operatorów serwisów społecznościowych
Filtrowanie treści - dylematy operatorów serwisów społecznościowychFiltrowanie treści - dylematy operatorów serwisów społecznościowych
Filtrowanie treści - dylematy operatorów serwisów społecznościowych
 
Wierzbowski
WierzbowskiWierzbowski
Wierzbowski
 
The Mythology of Big Data
The Mythology of Big DataThe Mythology of Big Data
The Mythology of Big Data
 
Salah Prez Cogifactory
Salah Prez CogifactorySalah Prez Cogifactory
Salah Prez Cogifactory
 
η νεκρανάσταση του πιο βρώμικου αντικομμουνισμού
η νεκρανάσταση του πιο βρώμικου αντικομμουνισμούη νεκρανάσταση του πιο βρώμικου αντικομμουνισμού
η νεκρανάσταση του πιο βρώμικου αντικομμουνισμού
 
Plone Foundation Annual Meeting, Budapest 2009
Plone Foundation Annual Meeting, Budapest 2009Plone Foundation Annual Meeting, Budapest 2009
Plone Foundation Annual Meeting, Budapest 2009
 
Mario chaves
Mario chavesMario chaves
Mario chaves
 
Microformats, Building Blocks of the Semantic Web
Microformats, Building Blocks of the Semantic WebMicroformats, Building Blocks of the Semantic Web
Microformats, Building Blocks of the Semantic Web
 
Capitulo 3.9 Gestion Tributaria Mipro ERP
Capitulo 3.9 Gestion Tributaria Mipro ERPCapitulo 3.9 Gestion Tributaria Mipro ERP
Capitulo 3.9 Gestion Tributaria Mipro ERP
 
Pres Project
Pres ProjectPres Project
Pres Project
 
Ph 2
Ph 2Ph 2
Ph 2
 
Contrail Project, OW2con11, Nov 24-25, Paris
Contrail Project, OW2con11, Nov 24-25, ParisContrail Project, OW2con11, Nov 24-25, Paris
Contrail Project, OW2con11, Nov 24-25, Paris
 
hdemia1
hdemia1hdemia1
hdemia1
 

More from npinto

"AI" for Blockchain Security (Case Study: Cosmos)
"AI" for Blockchain Security (Case Study: Cosmos)"AI" for Blockchain Security (Case Study: Cosmos)
"AI" for Blockchain Security (Case Study: Cosmos)npinto
 
High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...
High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...
High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...npinto
 
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...npinto
 
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...npinto
 
[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)
[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)
[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)npinto
 
[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...
[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...
[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...npinto
 
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...npinto
 
[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...
[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...
[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...npinto
 
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...npinto
 
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...npinto
 
[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...
[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...
[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...npinto
 
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...npinto
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...npinto
 
[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...
[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...
[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...npinto
 
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)npinto
 
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)npinto
 
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...npinto
 
[Harvard CS264] 05 - Advanced-level CUDA Programming
[Harvard CS264] 05 - Advanced-level CUDA Programming[Harvard CS264] 05 - Advanced-level CUDA Programming
[Harvard CS264] 05 - Advanced-level CUDA Programmingnpinto
 
[Harvard CS264] 04 - Intermediate-level CUDA Programming
[Harvard CS264] 04 - Intermediate-level CUDA Programming[Harvard CS264] 04 - Intermediate-level CUDA Programming
[Harvard CS264] 04 - Intermediate-level CUDA Programmingnpinto
 
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basicsnpinto
 

More from npinto (20)

"AI" for Blockchain Security (Case Study: Cosmos)
"AI" for Blockchain Security (Case Study: Cosmos)"AI" for Blockchain Security (Case Study: Cosmos)
"AI" for Blockchain Security (Case Study: Cosmos)
 
High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...
High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...
High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...
 
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
 
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
 
[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)
[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)
[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)
 
[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...
[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...
[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...
 
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...
 
[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...
[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...
[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...
 
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
 
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
 
[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...
[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...
[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...
 
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
 
[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...
[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...
[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...
 
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
 
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
 
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
 
[Harvard CS264] 05 - Advanced-level CUDA Programming
[Harvard CS264] 05 - Advanced-level CUDA Programming[Harvard CS264] 05 - Advanced-level CUDA Programming
[Harvard CS264] 05 - Advanced-level CUDA Programming
 
[Harvard CS264] 04 - Intermediate-level CUDA Programming
[Harvard CS264] 04 - Intermediate-level CUDA Programming[Harvard CS264] 04 - Intermediate-level CUDA Programming
[Harvard CS264] 04 - Intermediate-level CUDA Programming
 
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
 

Recently uploaded

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 

Recently uploaded (20)

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 

IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

  • 1. 6.963 IT / A@M CUD 9 IAP0 Supercomputing on your desktop: Programming the next generation of cheap and massively parallel hardware using CUDA Lecture 03 Nicolas Pinto (MIT) CUDA - Basics #2 Tuesday, January 13, 2009
  • 2. During this course, 3 6 for 6.9 ed adapt we’ll try to “ ” and use existing material ;-) Tuesday, January 13, 2009
  • 3. Today yey!! Tuesday, January 13, 2009
  • 4. 6.963 IT / A@M CUD 9 IAP0 Language Compilation API Threading Model Memory Model Tuesday, January 13, 2009
  • 5. 6.963 IT / A@M CUD 9 IAP0 CUDA Language Tuesday, January 13, 2009
  • 6. age gu an L !quot;#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04% ! !5!66 $--47+%834.3,22'3+%04%',+)-9%24:'%';)+0)*.% ! <4&'%04%!quot;#$ ! ='++'*+%-',3*)*.%</3:' !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 7. age gu an L !quot;#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04% ! !5!66 !quot;#$%&$'()*$'+',,$-%../0/12$.0quot;3$$ ! &241-40-$'+',,5 !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 8. age gu an L !quot;#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04% ! !5!66 >9*0,<0)<%';0'*+)4*+? ! ! #'<-,3,0)4*%@/,-)()'3+ ! A/)-0B)*%C,3),D-'+ ! A/)-0B)*%E98'+ ! F;'</0)4*%!4*()./3,0)4* !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 9. age gu an L #'<-+8'< G%&'<-,3,0)4*%+8'<)()'3 5%&'<-,3,0)4*% ! H/,-)()'3 ! $%24&)()'3%,88-)'&%04%&'<-,3,0)4*+%4(? ! C,3),D-'+ ! I/*<0)4*+ F;,28-'+?%%!quot;#$%J%&'%&(#J%$%)%*! ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 10. uage ang L !quot;#$%/+'+%01'%(4--47)*.%&'<-,3,0)4*% ! H/,-)()'3+%(43%:,3),D-'+? ++,&-*!&++ ! ++$.)(&,++ ! ++!quot;#$%)#%++ ! K*-9%,88-9%04%.-4D,-%:,3),D-'+ ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 11. age gu an L !quot;#$%&quot;'()*%)(%(+$,-%$(.%&/%-$quot;(/'('),&quot;0(,1( ! )*quot;(0quot;./#quot; 2*quot;(0%)%(&quot;'/0quot;'(/1(+$,-%$(3quot;3,&4 ! 5%'($/6quot;)/3quot;(,6()*quot;(quot;1)/&quot;(%77$/#%)/,1 ! 8##quot;''/-$quot;(),(%$$(9:;()*&quot;%0' ! 8##quot;''/-$quot;(),()*quot;(<:;(./%(8:= ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 12. age gu an L !quot;#$%&quot;'()*%)(%(+$,-%$(.%&/%-$quot;(/'('),&quot;0(,1( ! )*quot;(0quot;./#quot; 2*quot;(0%)%(&quot;'/0quot;'(/1('*%&quot;0(3quot;3,&4 ! 5%'($/6quot;)/3quot;(,6()*quot;()*&quot;%0(-$,#> ! 8##quot;''/-$quot;(),(%$$()*&quot;%0'?(,1quot;(#,74(7quot;&()*&quot;%0( ! -$,#> !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 13. age gu an L =6(1,)(0quot;#$%&quot;0(%'(!quot;#$%&#'?(&quot;%0'(6&,3( ! 0/66quot;&quot;1)()*&quot;%0'(%&quot;(1,)(./'/-$quot;(@1$quot;''(%( '41#*&,1/A%)/,1(-%&&/quot;&(@'quot;0 B,)(%##quot;''/-$quot;(6&,3(<:; ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 14. age gu an L !quot;#$%&quot;'()*%)(%(+$,-%$(.%&/%-$quot;(/'('),&quot;0(,1( ! )*quot;(0quot;./#quot; 2*quot;(0%)%(&quot;'/0quot;'(/1(#,1')%1)(3quot;3,&4 ! 5%'($/6quot;)/3quot;(,6(quot;1)/&quot;(%77$/#%)/,1 ! 8##quot;''/-$quot;(),(%$$(9:;()*&quot;%0'(C&quot;%0(,1$4D ! 8##quot;''/-$quot;(),(<:;(./%(8:=(C&quot;%0EF&/)quot;D ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 15. uage ang L <;!8(@'quot;'()*quot;(6,$$,F/1+(0quot;#$'7quot;#' 6,&( ! .%&/%-$quot;'G (()'!&*'(( ! ((+quot;,%(( ! ((-#quot;.$#(( ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 16. age gu an L !quot;#$%&quot;'()*%)(%(6@1#)/,1(/'(#,37/$quot;0(),?(%10( ! quot;Hquot;#@)quot;'(,1()*quot;(0quot;./#quot; <%$$%-$quot;(,1$4(6&,3(%1,)*quot;&(6@1#)/,1(,1()*quot;( ! 0quot;./#quot; !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 17. age gu an L !quot;#$%&quot;'()*%)(%(+,-#)./-(.'(#/01.$quot;2()/(%-2( ! quot;3quot;#,)quot;'(/-()*quot;(*/') 4%$$%5$quot;(/-$6(+&/0(%-/)*quot;&()*quot;(*/') ! 7,-#)./-'(8.)*/,)(%-6(49!:(2quot;#$'1quot;# %&quot;( ! */')(56(2quot;+%,$) 4%-(,'quot;(!!quot;#$%!! %-2(!!&'()*'!!+ ! )/;quot;)*quot;& !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 18. age gu an L !quot;#$%&quot;'()*%)(%(+,-#)./-(.'(#/01.$quot;2()/(%-2( ! quot;3quot;#,)quot;'(/-()*quot;(2quot;<.#quot; 4%$$%5$quot;(+&/0()*quot;(*/') ! 9'quot;2(%'()*quot;(quot;-)&6(1/.-)(+&/0(*/')()/(2quot;<.#quot; ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 19. age gu an L 49!:(1&/<.2quot;'(%('quot;)(/+(5,.$)=.-(<quot;#)/&()61quot;'> ! *quot;,-./+0*quot;,-./+*quot;,-1/+0*quot;,-1/+*quot;,-2/+ ! 0*quot;,-2/+*quot;,-3/+0*quot;,-3/+ $quot;#-%./+0$quot;#-%./+$quot;#-%1/+0$quot;#-%1/+ ! $quot;#-%2/+0$quot;#-%2/+$quot;#-%3/+0$quot;#-%3/ )4%./+0)4%./+)4%1/+0)4%1/+)4%2/+ ! 0)4%2/+)4%3/+0)4%3/+ 5#46./+05#46./+5#461/+05#461/+5#462/+ ! 05#462/+5#463/+05#463/+ 75#,%./+75#,%1/+75#,%2/+75#,%3+ ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 20. age gu an L 4%-(#/-')&,#)(%(<quot;#)/&()61quot;(8.)*('1quot;#.%$( ! +,-#)./-> 8,9'!!quot;#$%&'(%):(;/+(.!quot;#$ 4%-(%##quot;''(quot;$quot;0quot;-)'(/+(%(<quot;#)/&()61quot;(8.)*( ! !quot;#$%&!quot;'$%&!quot;($%&!quot;)$* ('*(,-<= !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 21. age gu an L &)82 .'(%('1quot;#.%$(<quot;#)/&()61quot; ! ?%0quot;(%'(0)4%2@(quot;3#quot;1)(#%-(5quot;(#/-')&,#)quot;2( ! +&/0(%('#%$%&()/(+/&0(%(<quot;#)/&> :$*,5,-/+./+.> !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 22. age gu an L 49!:(1&/<.2quot;'(+/,&(;$/5%$@(5,.$)=.-(<%&.%5$quot;' ! %quot;-',&?&=@(@5#*9?&=@(@5#*9A)8@( ! 6-)&A)8 +',-.&/0&/&1&)822&34&10)4%22& ! :##quot;''.5$quot;(/-$6(+&/0(2quot;<.#quot;(#/2quot; ! 4%--/)()%Aquot;(%22&quot;'' ! 4%--/)(%''.;-(<%$,quot; ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 23. uage ang L !quot;#$%&'()*+,-%-./0120*2%-341'%0(%513/26%06,% ! ,7,230*(/%(8%9,'/,5- !quot;#$%%%&'()*(+,-./0$1*(+!!!quot;#$%&'()*+,-./ !quot;#$%%%&'()*(+,-./0$1*(+!!!quot;#$%&'()*+,-./ !quot;#$%%%&'()*(+,-./0$1*(+!!!quot;#$%&'()*+,-./ !quot;#$ *-%1%%%&'()*'%%+83/20*(/ ! @6,%2(>&*5,'%03'/-%06*-%0.&,%(8%-010,>,/0% ! */0(%1%=5(29%(8%2(+,%0610%2(/8*43',-A%1/+% 513/26,-%06,%9,'/,5 !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 24. age gu an L !quot;#$%+,8*/,-%1%51/4314,%0610%*-%-*>*51'%0(% ! !B!CC D>&('01/0%#*88,',/2,-E ! ! F3/0*>,%G*='1'. ! H3/20*(/- ! !51--,-A%I0'320-A%quot;/*(/- !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 25. uage ang L !quot;#$%&'(&%)*+,%quot;+%&'$%#$-./$%/(+0&%*,$%+quot;)1(2% ! !quot;!##$%&'()*+$,)-./.0$1&'2()3'4 53$!quot;#$%&6$&quot;'()6$*(++,-6$+(2 ! 734($*/(8$1&'2()3'4$8/9+$:+9)2+$+;&)9/<+'( ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 26. uage ang L J'$/$!DFG$:+9)2+6$(8+.+$)4$'3$4(/2K ! L0$:+1/&<(6$/<<$1&'2()3'$2/<<4$/.+$)'<)'+: ! !/'$&4+$!!quot;#$quot;%$quot;&!! (3$>.+9+'($M!DFG$HIHN ! G<<$<32/<$9/.)/-<+46$1&'2()3'$/.E&*+'(4$/.+$ ! 4(3.+:$)'$.+E)4(+.4 ! '( 1&'2()3'$.+2&.4)3' 53$1&'2()3'$>3)'(+.4 ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 27. uage ang L !DFG$4&>>3.(4$43*+$!##$1+/(&.+4$13.$:+9)2+$ ! 23:+I$$OIE? ! =+*></(+$1&'2()3'4 !</44+4$/.+$4&>>3.(+:$)'4):+$I2&$43&.2+6$-&($ ! *&4($-+$834($3'<0 P(.&2(4quot;D')3'4$A3.K$3'$:+9)2+$23:+$/4$>+.$! ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 28. Common Runtime Component: age angu Mathematical Functions L • pow, sqrt, cbrt, hypot • exp, exp2, expm1 • log, log2, log10, log1p • sin, cos, tan, asin, acos, atan, atan2 • sinh, cosh, tanh, asinh, acosh, atanh • ceil, floor, trunc, round • Etc. – When executed on the host, a given function uses the C runtime implementation if available – These functions are only supported for scalar types, not vector types 16 !quot;#$%&'quot;(&)*+,-.#./quot;$0'quot;120342&quot;15quot;678 9):$0$;quot;.<<&0=&>;quot;/8?8>@quot;AB3CC;quot;CDDB Tuesday, January 13, 2009
  • 29. Device Runtime Component: uage ang Mathematical Functions L • Some mathematical functions (e.g. sin(x)) have a less accurate, but faster device-only version (e.g. __sin(x)) – __pow – __log, __log2, __log10 – __exp – __sin, __cos, __tan 17 !quot;#$%&'quot;(&)*+,-.#./quot;$0'quot;120342&quot;15quot;678 9):$0$;quot;.<<&0=&>;quot;/8?8>@quot;AB3CC;quot;CDDB Tuesday, January 13, 2009
  • 30. 6.963 IT / A@M CUD 9 IAP0 CUDA Compilation Tuesday, January 13, 2009
  • 31. tion pila m Co !quot;#$%&'()*+%,-.+&%+/0%-/%12*(3 ! !quot;#$%&#'%'(&)'quot;*'+,-&.,'%#+'/quot;0$'.quot;+,1+%$% ! !quot;(2&3,+'45'!quot;## ! !quot;## &0'6,%335'%'76%22,6'%6quot;8#+'%'(quot;6,' ! .quot;(23,)'.quot;(2&3%$&quot;#'26quot;.,00 !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 32. tion pila m Co !quot;#$% ! 9quot;6(%3':.;':.22 0quot;86.,'*&3,0 ! !<=>':.8'0quot;86.,'.quot;+,'*&3,0 &$%#$% ! ?4@,.$1,),.8$%43,'.quot;+,'*quot;6'/quot;0$ ! :.84&# ,),.8$%43,'.quot;+,'*quot;6'$/,'+,-&., !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 33. tion pila m Co Aquot;6':.'%#+':.22 *&3,0;'#-.. &#-quot;B,0'$/,'#%$&-,' ! !1!CC'.quot;(2&3,6'*quot;6'$/,'050$,('D,EF'E..1.3G 4')%2*(%,-.+&5%-6%-&%7%.-66.+%8')+%*'89.-*76+0: ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 34. tion pila m Co '($ .22 '($ '( '* .8+%*, .22 3&#B,6 '.%$,'( '* .22 3&#B,6 ')#$'( '#%+ '($,-quot; #-quot;2,#.. 2$)%0 .84&# !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 35. tion pila m Co Hquot;'0,,'$/,'0$,20'2,6*quot;6(,+'45'#-..;'80,'$/,' ! //0121$quot; %#+'//344#5.quot;((%#+'3&#,'quot;2$&quot;#0 !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 36. tion pila m Co !quot;#$%&$'$()*+%, -%,./0$#%12$12/$3/&1$quot;4$12/$ ! 53quot;63'78 9',$+/: ! ! ;quot;'0/0$'&$'$4%-/$'1$3*,1%7/ ! <7+/00/0$%,$0'1'$&/67/,1 ! <7+/00/0$'&$'$3/&quot;*3)/ !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 37. tion pila m Co !quot;#$quot;%&&'($)'*)+,(-),(.'/0 ! ! =2/$53quot;63'7$)3'&2/& ! >1$53quot;0*)/&$12/$#3quot;,6$3/&*-1 !0 ?*1@$12/3/$'3/$7',A$0/+*66%,6$1/)2,%B*/& ! ! C/+*66%,6$&quot;41#'3/$D/6:$60+@$E%&*'-$F1*0%quot;G ! !quot;#$%& !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 38. bug De 9HCI$53quot;63'77%,6$%&$/J/,$-/&&$4*, ! ! =2/3/$%&$,quot;$0/+*66/3 ! =2/3/$%&$,quot;$!quot;#$%& C/+*66%,6$)quot;0/$quot;,$12/$0/J%)/$%&$J/3A$2'30 ! ! 9',$13A$1quot;$#3%1/$%,1/37/0%'1/$3/&*-1&$1quot;$7/7quot;3A$ ',0$)quot;5A$+').$1quot;$2quot;&1$1quot;$/K'7%,/ ! <7*-'1%quot;,$7quot;0/ !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 39. mu E !quot;#$%&'(#)#*+,-&./0#1.)(2#quot;+$#*)'#/,$.)3/#4quot;# ! 0$''&'(#!quot;quot; *+5/#+'#36/#6+%3 ! 7+,-&./0#8.)(9 ##$%&'(%#%)*quot;!+',- :++5#1+0#,+%3#5/4$((&'(9#*)'#$%/#(54;-0&'31 ! <+3#)#30$/#/,$.)3&+'9 ! ! =)*/#7+'5&3&+'%2#>/,+0quot;#,+5/.#5&11/0/'*/%2#/3* !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 40. mu E Device Emulation Mode Pitfalls • Emulated device threads execute sequentially, so simultaneous accesses of the same memory location by multiple threads could produce different results. • Dereferencing device pointers on the host or host pointers on the device can produce correct results in device emulation mode, but will generate an error in device execution mode !quot;#$%&'quot;(&)*+,-.#./quot;$0'quot;120342&quot;15quot;678 9):$0$;quot;.<<&0=&>;quot;/8?8>@quot;AB3CC;quot;CDDB Tuesday, January 13, 2009
  • 41. mu E Floating Point • Results of floating-point computations will slightly differ because of: – Different compiler outputs, instruction sets – Use of extended precision for intermediate results • There are various options to force strict single precision on the host !quot;#$%&'quot;(&)*+,-.#./quot;$0'quot;120342&quot;15quot;678 9):$0$;quot;.<<&0=&>;quot;/8?8>@quot;AB3CC;quot;CDDB Tuesday, January 13, 2009
  • 42. lkit oo T CUDA Toolkit Application Software Industry Standard C Language Libraries !quot;%&'( !quot;##$ !quot;)** CUDA Compiler CUDA Tools GPU:card, system + !quot;#$#%& '()*++(#,,*-./01- Multicore CPU 4 cores 3 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  • 43. lkit oo T CUDA Many-core + Multi-core support C CUDA Application NVCC NVCC --multicore Many-core Multi-core PTX code CPU C code PTX to Target gcc and Compiler MSVC Many-core Multi-core 5 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  • 44. lkit oo T CUDA Compiler: nvcc Any source file containing CUDA language extensions (.cu) must be compiled with nvcc NVCC is a compiler driver Works by invoking all the necessary tools and compilers like cudacc, g++, cl, ... NVCC can output: Either C code (CPU Code) That must then be compiled with the rest of the application using another tool Or PTX or object code directly An executable with CUDA code requires: The CUDA core library (cuda) The CUDA runtime library (cudart) 6 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  • 45. lkit oo T CUDA Compiler: nvcc Important flags: -arch sm_13 Enable double precision ( on compatible hardware) -G Enable debug for device code --ptxas-options=-v Show register and memory usage --maxrregcount <N> Limit the number of registers -use_fast_math Use fast math library 7 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  • 46. lkit oo T Compiling CUDA for Multi-Core Using “—multicore” compile C/C++ CUDA switch with the NVCC Application compiler generates C code for multi-core CPU NVCC --multicore Performance scales linearly with more cores Multicore CPU C Code Control numbers of cores with environment variable CUDA_NROF_CORES=n gcc / MSVC Multicore Optimized Application 8 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  • 47. lkit oo T GPU Tools Profiler Available now for all supported OSs Command-line or GUI Sampling signals on GPU for: Memory access parameters Execution (serialization, divergence) Debugger Runs on the GPU Emulation mode Compile and execute in emulation on CPU Allows CPU-style debugging in GPU source 35 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  • 48. 6.963 IT / A@M CUD 9 IAP0 CUDA API Tuesday, January 13, 2009
  • 49. PI A !Aquot;(DGHI(IMK(71/'.'$'(19($A&quot;quot;(B*&$'2 ! ! !Aquot;(A1'$(IMK ! !Aquot;(-quot;F.7quot;(IMK ! !Aquot;(71))1/(IMK !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 50. PI A !quot;#$%&'($)*+,$(-.$/0*123#+$4567,2*6+$4*08 ! ! '#127#$9:6:;#9#6, ! <#9*0=$9:6:;#9#6, ! >,0#:9$9:6:;#9#6, ! ?1#6,$9:6:;#9#6, ! !#@,50#$9:6:;9#6, ! A/#6BCD'20#7,E$26,#0*/#0:F2G2,= !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 51. PI A !quot;#$)*+,$(-.$2+$#@/*+#3$:+$,H*$3244#0#6,$ ! !quot;#$%& ! !quot;#$G*H$G#1#G$'#127#$(-.$I/0#42@8$75J ! !quot;#$quot;2;quot;$G#1#G$K56,29#$(-.$I/0#42@8$753:J >*9#$,quot;26;+$7:6$F#$3*6#$,quot;0*5;quot;$F*,quot;$(-.+L$ ! *,quot;#0+$:0#$+/#72:G2M#3 ! %:6$F#$92@#3$,*;#,quot;#0$IH2,quot;$7:0#J !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 52. PI A (GG$B-&$7*9/5,26;$2+$/#04*09#3$*6$:$3#127# ! !*$:GG*7:,#$9#9*0=L$056$:$/0*;0:9L$#,7$*6$ ! ,quot;#$quot;:03H:0#L$H#$6##3$:$!quot;#$%quot;&%'()quot;*) '#127#$7*6,#@,+$:0#$F*563$N8N$H2,quot;$quot;*+,$ ! ,quot;0#:3+$IO5+,$G2P#$A/#6BCQJ ! >*L$#:7quot;$quot;*+,$,quot;0#:3$9:=$quot;:1#$:,$9*+,$*6#$3#127#$ 7*6,#@, ! (63L$#:7quot;$3#127#$7*6,#@,$2+$:77#++2FG#$40*9$*6G=$ *6#$quot;*+,$,quot;0#:3 !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 53. PI A (GG$3#127#$(-.$7:GG+$0#,506$:6$#00*0D+577#++$ ! 7*3#$*4$,=/#8$+,-quot;./0) ! (GG$056,29#$(-.$7:GG+$0#,506$:6$#00*0D+577#++$ 7*3#$*4$,=/#$%/!12--'-3) (6$26,#;#0$1:G5#$H2,quot;$M#0*$R$6*$#00*0 ! %/!14quot;)51.)2--'-L$%/!14quot;)2--'-6)-$(7 ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 54. PI A K56,29#$(-.$7:GG+$:5,*9:,27:GG=$262,2:G2M# ! '#127#$(-.$7:GG+$95+,$7:GG$%/8($) ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 55. PI A !quot;#$420+,$I*/,2*6:GSJ$+,#/$2+$,*$#659#0:,#$,quot;#$ ! :1:2G:FG#$3#127#+ %/9quot;#$%quot;4quot;)+'/() ! %/9quot;#$%quot;4quot;) ! %/9quot;#$%quot;4quot;):1;quot; ! %/9quot;#$%quot;4quot;)<')10=quot;;'-> ! %/9quot;#$%quot;4quot;)?))-$@/)quot; ! ! ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 56. PI A !quot;#$%&$%#'(()$%*%+$,-#$%&-.'%!quot;#$%&!$'$( ! &$%/$.%*%+$,-#$%'*quot;+0$%(1%.23$%)*+$%&!$ 4*quot;%quot;(&%#5$*.$%*%#(quot;.$6.%&-.'%!quot;)(,)-$.($ ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 57. PI A 78quot;.-9$%:;<%35(,-+$)%*%)-930-1-$+%-quot;.$51*#$% ! 1(5%#5$*.-quot;/%*%#(quot;.$6.= !quot;+.'$(#$%&!$)/quot;0( ! !quot;+.1$(#$%&!$ ! :quot;+%.'$%8)$180= ! !quot;+.)2//3$#$%&!$ ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 58. PI A Device Management CPU can query and select GPU devices cudaGetDeviceCount( int* count ) cudaSetDevice( int device ) cudaGetDevice( int *current_device ) cudaGetDeviceProperties( cudaDeviceProp* prop, int device ) cudaChooseDevice( int *device, cudaDeviceProp* prop ) Multi-GPU setup: device 0 is used by default one CPU thread can control one GPU multiple CPU threads can control the same GPU – calls are serialized by the driver 28 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  • 59. PI A !quot;#$%&$%'*,$%*%#(quot;.$6.%>)*!/0($,(?%#*quot;% ! *00(#*.$%9$9(52@%#*00%*%A;B%18quot;#.-(quot;%$.#C%% ! 4(quot;.$6.%-)%-930-#-.02%*))(#-*.$+%&-.'%#5$*.-quot;/% .'5$*+ D(%)2quot;#'5(quot;-E$%*00%.'5$*+)%>4;B%'().%&-.'% ! A;B%.'5$*+)?%#*00%!quot;)(,140!2-/0&5$ ! F*-.)%1(5%*00%A;B%.*)G)%.(%1-quot;-)'% !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 60. PI A :00(#*.$HI5$$%9$9(52= ! !quot;6$7899/!:;!quot;6$7<-$$ ! <quot;-.-*0-E$%9$9(52= ! !quot;6$73$( ! 4(32%9$9(52= ! !quot;6$7!=4>(/#:;!quot;6$7!=4#(/>:; ! !quot;6$7!=4#(/# !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 61. PI A F'$quot;%*00(#*.-quot;/%9$9(52%1(5%.'$%2/3(@%#*quot;% ! 8)$%!quot;##$% H%&'( H%!!quot;) ! !5%8)$%!quot;6$7899/!>/3(@%!quot;6$7<-$$>/3( D'$)$%18quot;#.-(quot;)%*00(#*.$%'().%9$9(52%.'*.%-)% ! )quot;*'+#$%,'- ;$51(59*quot;#$%-935(,$+%1(5%#(32%.(H15(9% ! 3*/$J0(#G$+%'().%9$9(52 !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 62. PI A :00(#*.$HI5$$%9$9(52= ! !quot;+.6.99/!@%!quot;+.<-$$ ! <quot;-.-*0-E$%9$9(52= ! !quot;+.6$73$( ! 4(32%9$9(52= ! !quot;+.6$7!=4 ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 63. PI A !quot;#$%&''(!!quot;#$%quot;&''(%&$#quot;!quot;#$%& )#)(*+ ! ,&-quot;&'.(quot;&''(%&$#quot;%&&%' )#)(*+quot;/012 ! 3**&+.quot;&*#quot;%*#&$#4quot;56$7quot;&quot;.8#%696%quot;564$7quot;&-4quot; ! 7#6:7$quot;&-4quot;#'#)#-$quot;$+8# ! ;#)(*+quot;'&+(<$quot;6.quot;(8$6)6=#4quot;/#>:>quot;8&%?6-:2quot;@+quot; *<-$6)# !quot;&))*+,)$*-$! !quot;&))*+.$/-)(+ ! !quot;#$%!0+.-(&! !quot;#$%!0+1-(&!quot;# ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 64. PI A 3quot;)(4<'#quot;6.quot;&quot;@'(@quot;(9quot;ABCquot;%(4#D4&$&quot;&'(-:quot; ! 56$7quot;.()#quot;$+8#quot;6-9(*)&$6(- ! >%<@6- 96'#. 3quot;)(4<'#quot;6.quot;%*#&$#4quot;@+quot;'(&46-:quot;&quot;%<@6- 56$7quot; ! !quot;#(2quot;'$,)$*-$ (*quot;!quot;#(2quot;'$3(*2.*-* ;(4<'#quot;%&-quot;@#quot;<-'(&4#4quot;56$7quot; ! !quot;#(2quot;'$45'(*2 !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 65. PI A E(&46-:quot;&quot;)(4<'#quot;&'.(quot;%(86#.quot;6$quot;$(quot;$7#quot;4#F6%# ! ,&-quot;$7#-quot;:#$quot;$7#quot;&44*#..quot;(9quot;9<-%$6(-.quot;&-4quot; ! :'(@&'quot;F&*6&@'#.G !quot;#(2quot;'$6$-7quot;5!-8(5 !quot;#(2quot;'$6$-6'(9*' !quot;#(2quot;'$6$-:$;<$= !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 66. PI A H-%#quot;&quot;)(4<'#quot;6.quot;'(&4#4!quot;&-4quot;5#quot;7&F#quot;&quot; ! 9<-%$6(-quot;8(6-$#*!quot;5#quot;%&-quot;%&''quot;&quot;9<-%$6(- I#quot;)<.$quot;.#$<8quot;$7#quot;!quot;!#$%&'()!(*&+'(,!(%) ! 96*.$ !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 67. PI A JK#%<$6(-quot;#-F6*(-)#-$quot;6-%'<4#.G ! quot; L7*#&4quot;M'(%?quot;N6=# quot; N7&*#4quot;;#)(*+quot;N6=# quot; O<-%$6(-quot;B&*&)#$#*. quot; A*64quot;N6=# !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 68. PI A L7*#&4quot;M'(%?quot;N6=#Gquot; ! !quot;7quot;5!>$-?'(!@>A*0$ N7&*#4quot;;#)(*+quot;N6=#G ! !quot;7quot;5!>$->A*)$2>8B$ O<-%$6(-quot;B&*&)#$#*.G ! !quot;C*)*%>$->8B$DE!quot;C*)*%>$-8DE !quot;C*)*%>$-=DE!quot;C*)*%>$-F !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 69. PI A !quot;#$%&#'(%#)%)(*%+*%*,(%)+-(%*#-(%+)%*,(% ! ./01*#20%#0321+*#204 !quot;#$quot;%!&'()* !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 70. PI A +,!$--. !quot;#$%&#'()*+,#-*#%.quot;#&+*/#01*#2223444# ! '&quot;%0(5quot;#(quot;65%.0(5quot;#758*9.059: 5,(%12-6#7(quot;%8(0(quot;+*()%1+77)%*2%+77%$(3#1(%9:;% ! *2%)(*/6%*,(%(<(1/*#20%(03#quot;20-(0* !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 71. PI A 9%)*quot;(+-%#)%+%)(=/(01(%2.%26(quot;+*#20)%*,+*% ! 211/quot;%#0%2quot;$(quot;%%>?8? @? A26B%$+*+%.quot;2-%,2)*%*2%$(3#1( C? ><(1/*(%$(3#1(%./01*#20% D? A26B%$+*+%.quot;2-%$(3#1(%*2%,2)* !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 72. PI A 9%)*quot;(+-%#)%+%)(=/(01(%2.%26(quot;+*#20)%*,+*% ! 211/quot;%#0%2quot;$(quot; E#..(quot;(0*%)*quot;(+-)%1+0%F(%/)($%*2%-+0+8(% ! 1201/quot;quot;(01B%%>?8? G3(quot;7+66#08%-(-2quot;B%126B%.quot;2-%20(%)*quot;(+-% H#*,%*,(%./01*#20%(<(1/*#20%.quot;2-%+02*,(quot; !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 73. PI A <=quot;4$;',&quot;','8,J'3D'+quot;$quot;&:2424H'$Fquot;'E&3H&quot;;;' ! 3D',';$&quot;,: ! !quot;#$%&'()*quot;+,#'-'./-)0#)1'+$'-'&%)#-/'-%'-' ;Equot;#2D2#'E3;2$234 ! -'F3O+quot;&'3D',4'quot;=quot;4$'F,4+Oquot;'#,4) ! P,2$'D3&',4'quot;=quot;4$'$3'3##%& ! Qquot;,;%&quot;'$Fquot;'$2:quot;'$F,$'3##%&&quot;+'Nquot;$8quot;quot;4'$83' quot;=quot;4$; !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  • 74. 6.963 IT / A@M CUD 9 IAP0 CUDA Execution and Threading Model Tuesday, January 13, 2009
  • 75. ing ead Execution Model hr T Software Hardware Threads are executed by thread Thread processors Processor Thread Thread blocks are executed on multiprocessors Thread blocks do not migrate Several concurrent thread blocks can Thread reside on one multiprocessor - limited Multiprocessor Block by multiprocessor resources (shared memory and register file) A kernel is launched as a grid of thread blocks ... Only one kernel can execute on a device at one time Grid Device © 2008 NVIDIA Corporation. Tuesday, January 13, 2009
  • 76. ding hrea T CUDA Uses Extensive Multithreading • CUDA threads express fine-grained data parallelism – Map threads to GPU threads or CPU vector elements – Virtualize the processors – You must rethink your algorithms to be aggressively parallel • CUDA thread blocks express coarse-grained parallelism – Map blocks to GPU thread arrays or CPU threads – Scale transparently to any number of processors • GPUs execute thousands of lightweight threads – One DX10 graphics thread computes one pixel fragment – One CUDA thread computes one result (or several results) – Provide hardware multithreading & zero-overhead scheduling 9 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  • 77. ing ead hr T CUDA Programming Model Parallel code (kernel) is launched and executed on a device by many threads Threads are grouped into thread blocks Parallel code is written for a thread Each thread is free to execute a unique code path Built-in thread and block ID variables 4 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  • 78. ing ead hr T Thread Hierarchy Threads launched for a parallel section are partitioned into thread blocks Grid = all blocks for a given launch Thread block is a group of threads that can: Synchronize their execution Communicate via shared memory 5 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  • 79. ing ead hr T IDs and Dimensions Threads: Device 3D IDs, unique within a block Grid 1 Blocks: Block Block Block 2D IDs, unique within a grid (0, 0) (1, 0) (2, 0) Dimensions set at launch time Block Block Block (0, 1) (1, 1) (2, 1) Can be unique for each section Built-in variables: Block (1, 1) threadIdx, blockIdx Thread Thread Thread Thread Thread blockDim, gridDim (0, 0) (1, 0) (2, 0) (3, 0) (4, 0) Thread Thread Thread Thread Thread (0, 1) (1, 1) (2, 1) (3, 1) (4, 1) Thread Thread Thread Thread Thread (0, 2) (1, 2) (2, 2) (3, 2) (4, 2) 6 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  • 80. ing ead hr T Programming Model Host Device A kernel is executed as a Grid 1 grid of thread blocks Block Block Block Kernel A thread block is a batch (0, 0) (1, 0) (2, 0) 1 of threads that can Block Block Block cooperate with each (0, 1) (1, 1) (2, 1) other by: Grid 2 Sharing data through shared memory Kernel 2 Synchronizing their execution Block (1, 1) Threads from different Thread Thread Thread Thread Thread (0, 0) (1, 0) (2, 0) (3, 0) (4, 0) blocks cannot cooperate Thread Thread Thread Thread Thread (0, 1) (1, 1) (2, 1) (3, 1) (4, 1) Thread Thread Thread Thread Thread (0, 2) (1, 2) (2, 2) (3, 2) (4, 2) 3 © NVIDIA Corporation 2006 Tuesday, January 13, 2009
  • 81. ing ead hr T Blocks must be independent Any possible interleaving of blocks should be valid presumed to run to completion without pre-emption can run in any order can run concurrently OR sequentially Blocks may coordinate but not synchronize shared queue pointer: OK shared lock: BAD … can easily deadlock Independence requirement gives scalability 10 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  • 82. ing ead hr T Hardware Multithreading Hardware allocates resources to blocks blocks need: thread slots, registers, shared SM memory MT IU blocks don’t run until resources are available SP Hardware schedules threads threads have their own registers any thread not waiting for something can run context switching is free – every cycle Hardware relies on threads to hide latency Shared Memory i.e., parallelism is necessary for performance 39 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  • 83. ing ead hr T SIMT Thread Execution Groups of 32 threads formed into warps always executing same instruction SM shared instruction fetch/dispatch MT IU some become inactive when code path diverges hardware automatically handles divergence SP Warps are the primitive unit of scheduling SIMT execution is an implementation choice sharing control logic leaves more space for ALUs largely invisible to programmer Shared must understand for performance, not correctness Memory 40 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  • 84. ing ead Transparent Scalability hr T Hardware is free to schedule thread blocks on any processor A kernel scales across parallel multiprocessors Kernel grid Device Device Block 0 Block 1 Block 2 Block 3 Block 4 Block 5 Block 0 Block 1 Block 2 Block 3 Block 0 Block 1 Block 6 Block 7 Block 4 Block 5 Block 6 Block 7 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 © 2008 NVIDIA Corporation. Tuesday, January 13, 2009