SlideShare a Scribd company logo
1 of 28
RTC Workshop, Durham, UK, April 2011


       Real-time processing for the Advanced
            Technology Solar Telescope
                            Vivek Venugopal (vivekv@nso.edu)
                                National Solar Observatory
                                Sunspot, New Mexico, USA




Wednesday, April 13, 2011
Advanced Technology
                              Solar Telescope




                                                  !

Wednesday, April 13, 2011
Adaptive Optics system

                              Uncorrected                           Tip/Tilt
                                 light                              Mirror


                              Deformable
                              Mirror (DM)                                      Tilt drive signal


                    DM drive signal




                                                                                                   Corrected
              Processors                                        Beamsplitter                         light

                                               Shack-Hartmann
                                                Lenslet Array
                                       CCD
                                      Camera


                                                                                                               "

Wednesday, April 13, 2011
HOAO Real-time system
                                                                                               Actuator
                                                                                                gains

                                                           Offscale                 Recon-
                 Dark                   Reference           slope      Slope       struction       Actuator
                            Flat field    image
                 field                                     tolerance    offsets      matrix          offsets



              FPGA                                                                       GPU
                                                                                                              Deformable
                                                                                                                mirror
                                           Cross-
                                                           Offscale
      WFS                                correlation                               Matrix       Actuator
     Camera                   X            slope
                                                            slope
                                                           detection
                                                                        X          multiply      servos         Servo
                                        computation                                                           parameters




                                         Average                                                 Tip/Tilt
                                          slope                                                  servos         Tip/Tilt
                                                                                                                mirror


                                                         Data            Zernike
                                                       collection        offload
                                                                         process




Wednesday, April 13, 2011
Camera format
                                            Channel #
          480 columns x     480 columns x               0   77    76    73    72    53    52    49    48    29    28    25    24     5     4     1     0
            960 rows          960 rows         1        1   173   172   169   168   149   148   145   144   125   124   121   120   101   100   97    96
                                               2        2   269   268   265   264   245   244   241   240   221   220   217   216   197   196   193   192
                                                        3   365   364   361   360   341   340   337   336   317   316   313   312   293   292   289   288
                                                        4   461   460   457   456   437   436   433   432   413   412   409   408   389   388   385   384

                                                        0   85    84    81    80    61    60    57    56    37    36    33    32    13    12     9     8
                                               3        1   181   180   177   176   157   156   153   152   133   132   129   128   109   108   105   104
                                               4        2   277   276   273   272   253   252   249   248   229   228   225   224   205   204   201   200
                                                        3   373   372   369   368   349   348   345   344   325   324   321   320   301   300   297   296
                                                        4   469   468   465   464   445   444   441   440   421   420   417   416   397   396   393   392

                                                        0   93    92    89    88    69    68    65    64    45    44    41    40    21    20    17    16
                                               5        1   189   188   185   184   165   164   161   160   141   140   137   136   117   116   113   112
                                               6        2   285   284   281   280   261   260   257   256   237   236   233   232   213   212   209   208
                                                        3   381   380   377   376   357   356   353   352   333   332   329   328   309   308   305   304
                                                        4   477   476   473   472   453   452   449   448   429   428   425   424   405   404   401   400

                                                        0   79    78    75    74    55    54    51    50    31    30    27    26     7     6     3     2
                                               7        1   175   174   171   170   151   150   147   146   127   126   123   122   103   102   99    98
                                               8        2   271   270   267   266   247   246   243   242   223   222   219   218   199   198   195   194
                                                        3   367   366   363   362   343   342   339   338   319   318   315   314   295   294   291   290
           12 channels       12 channels                4   463   462   459   458   439   438   435   434   415   414   411   410   391   390   387   386
            per FPGA          per FPGA
                                                        0   87    86    83    82    63    62    59    58    39    38    35    34    15    14    11    10
                                               9        1   183   182   179   178   159   158   155   154   135   134   131   130   111   110   107   106

     • 12 channels processed per               10       2
                                                        3
                                                            279
                                                            375
                                                                  278
                                                                  374
                                                                        275
                                                                        371
                                                                              274
                                                                              370
                                                                                    255
                                                                                    351
                                                                                          254
                                                                                          350
                                                                                                251
                                                                                                347
                                                                                                      250
                                                                                                      346
                                                                                                            231
                                                                                                            327
                                                                                                                  230
                                                                                                                  326
                                                                                                                        227
                                                                                                                        323
                                                                                                                              226
                                                                                                                              322
                                                                                                                                    207
                                                                                                                                    303
                                                                                                                                          206
                                                                                                                                          302
                                                                                                                                                203
                                                                                                                                                299
                                                                                                                                                      202
                                                                                                                                                      298
                                                        4
       FPGA                                             0
                                                            471   470   467   466   447   446   443   442   423   422   419   418   399   398   395   394

                                                            95    94    91    90    71    70    67    66    47    46    43    42    23    22    19    18

     • 5 packets to receive a                  11
                                               12
                                                        1
                                                        2
                                                            191
                                                            287
                                                                  190
                                                                  286
                                                                        187
                                                                        283
                                                                              186
                                                                              282
                                                                                    167
                                                                                    263
                                                                                          166
                                                                                          262
                                                                                                163
                                                                                                259
                                                                                                      162
                                                                                                      258
                                                                                                            143
                                                                                                            239
                                                                                                                  142
                                                                                                                  238
                                                                                                                        139
                                                                                                                        235
                                                                                                                              138
                                                                                                                              234
                                                                                                                                    119
                                                                                                                                    215
                                                                                                                                          118
                                                                                                                                          214
                                                                                                                                                115
                                                                                                                                                211
                                                                                                                                                      114
                                                                                                                                                      210
                                                        3   383   382   379   378   359   358   355   354   335   334   331   330   311   310   307   306
       complete row                                     4   479   478   475   474   455   454   451   450   431   430   427   426   407   406   403   402




                                                                                                                                                        #

Wednesday, April 13, 2011
Pixel unpacking
                      Byte 1                                    Byte 0
     15   14    13   12    11   10    9    8    7     6    5    4     3     2    1     0

     49   48    47
                      Pixel 1
                     46    45   44    43   42   9     8    7
                                                                Pixel 0
                                                                6     5     4    3     2
                                                                                            • FPGA receives camera data using the
     31   30    29
                  Byte 3
                 28    27 26 25  24             23   22    21
                                                                 Byte 2
                                                                20    19   18    17   16
                                                                                              fiber channel interface through 12
                  Pixel 3
     129 128 127 126 125 124 123 122            89   88    87
                                                                 Pixel 2
                                                                86    85   84    83   82      transceivers @ 9.42 ns
                      Byte 5                                     Byte 4
     47    46
      Pixel 1
                45   44    43    42
                            Pixel 5
                                      41   40   39    38
                                                 Pixel 0
                                                           37   36    35    34
                                                                       Pixel 4
                                                                                 33   32
                                                                                            • Pixel unpacking implemented using
     41    40   59   58    57    56   55   54   1      0   19   18    17    16   15   14

                  Byte 7                                         Byte 6
                                                                                              FSM with 2 modes (10 states/mode)
     63    62 61 60    59    58 57 56           55    54   53   52    51    50   49   48
      Pixel 3           Pixel 7
     121 120 139 138 137 136 135 134
                                                 Pixel 2
                                                81    80   99   98
                                                                       Pixel 6
                                                                      97    96   95   94    • 16 pixels (10 bits/pixel) written to FIFO
                      Byte 9                                     Byte 8
     79   78    77   76    75   74    73   72   71   70    69   68    67   66    65   64
           Pixel 5               Pixel 9              Pixel 4               Pixel 8
     53   52    51   50   69    68    67   66   13   12    11   10   29    28    27   26

                  Byte 11                                       Byte 10
     95  94    93 92   91 90    89 88           87   86    85   84   83  82     81 80
          Pixel 7         Pixel 11                    Pixel 6             Pixel 10
     133 132 131 130 149 148 147 146            93   92    91   90   109 108 107 106

                    Byte 13                          Byte 12
     111 110 109 108 107 106 105 104 103 102 101 100 99                    98    97    96
              Pixel 9          Pixel 13        Pixel 8                           Pixel 12
     65  64  63    62    61 60 79    78 25 24 23    22    21               20    39    38

                    Byte 15                         Byte 14
     127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112
              Pixel 11        Pixel 15        Pixel 10        Pixel 14
     145 144 143 142 141 140 159 158 105 104 103 102 101 100 119 118

                  Byte 17                         Byte 16
     143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 128
                  Pixel 13                        Pixel 12
     77  76  75  74     73 72 71 70  37  36  35  34     33 32 31 30

                  Byte 19                         Byte 18
     159 158 157 156 155 154 153 152 151 150 149 148 147 146 145 144
                  Pixel 15                        Pixel 14
     157 156 155 154 153 152 151 150 117 116 115 114 113 112 111 110                                                                      $

Wednesday, April 13, 2011
Dark and flat correction
           pixel0      10
                                                                       • Dark pixel and flat pixel stored in
                            -   10
                                                                         RAM
          dark_pixel 8

                                 8
                                     x           18    flat_product0
                                                                       • Flat corrected product is
           flat_pixel                     8
                                             accumulator
                                                             8
                                                                         concatenated and written to
                                                           flat_acc1
           pixel 1     10
                                                                         FIFO
                            -   10
                                                                       • Flat accumulated value can be
                                                                         used to update the reference
          dark_pixel 8

           flat_pixel             8
                                     x   8
                                                 18    flat_product1


                                                                         image
                                                             8
                                             accumulator
                                                           flat_acc1




          pixel16     10



                            -   10

         dark_pixel 8

          flat_pixel             8
                                     x   8
                                                18    flat_product16


                                                            8
                                             accumulator
                                                           flat_acc16
                                                                                                          %

Wednesday, April 13, 2011
Pixel unpacking & Dark
                                                         and flat correction
                                 Synchronizer/
                                   counters

                                                                                           dark and flat                reference image
                                                                                            value RAM                        RAM
                                                      206.8 ns
                                                                                                 20 ns
                                                                                           256
                            channel 1
                                                                                                                     128
                                                       Data                         160       Dark-flat correction/
                                   Receiver                                FIFO
                                                      unpack                                     accumulator
                                                 16               160
                                                                                                                     288

                            channel 2




                                                                                                                             PCIe system bus
                                                                                                                     128
                                                       Data                         160       Dark-flat correction/
              12 channels




                                   Receiver                                FIFO
              1/2 camera




                                                      unpack                                     accumulator
                                                 16               160
                                                                                                                     288




                            channel 12
                                                                                                                     128
                                                       Data                         160       Dark-flat correction/
                                   Receiver                                FIFO
                                                      unpack                                     accumulator
                                                 16               160
                                                                                                                     288

                                                                  clock period = 9.42 ns              clock period = 5 ns
                                                                 clock rate = 106.15 MHz             clock rate = 200 MHz


                                                                                                                                               &

Wednesday, April 13, 2011
Nvidia Tesla C2050
       GPU
                                                                                        Multiprocessor 14
                                                                                                            •   Nvidia Tesla C2050: 14
                                                                                                                streaming multi-processors
                                                                                   Multiprocessor 2             with 32 cores each (SIMD)
                                                                              Multiprocessor 1

                                       Instruction Cache
                                                                                                                clocked at 1.15 GHz
                   Warp Scheduler                          Warp Scheduler                                   •   3 GB on-board RAM
                    Dispatch Unit                             Dispatch Unit
                                                                                                            •   Kernel-based execution
                                         Register File
                                                                                                            •   1.288 TFLOPS single
         Core 1       Core 2          Core 1    Core 2
                                                                 Load/
                                                                Store 1
                                                                                SFU 1                           precision
                                                                Load/           SFU 2
         Core 3       Core 4          Core 3    Core 4
                                                               Store 2                                      •   515.2 GFLOPS double
                                                                                SFU 3

                                                                Load/
                                                                                                                precision
         Core 15     Core 16         Core 15    Core 16                         SFU 4
                                                               Store 16

                                    Interconnection Network

                               64 KB Shared Memory/ L1 cache


                                        Uniform Cache




    Reference: http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf                         '

Wednesday, April 13, 2011
Process mapping and
                                               partitioning


                             Raw       Flat           Reference
                            pixels    pixels            pixels
                            20x20     20x20            20x20
                FPGA                                                     GPU

        Dark                                                                find        x and y
                         dark           flat       2D cross-correlation
       pixels                                                             maximum   interpolation
                       correction    correction
       20x20




                                                                                                ()

Wednesday, April 13, 2011
Correlation routines
              1. FFT correlation                             2. 7x7 correlation

           flat
                            reference
        corrected
                              image
         image
                                                                                              precomputed
                                                   original reference      Region 1             reference
                    FFT                 FFT       image 26x26 pixels                         (20x20 pixels)




                                                                                              precomputed
                                                                           Region 2             reference
                     Complex conjugate                                                       (20x20 pixels)
                       Multiplication




                            IFFT

                                                                                              precomputed
                                                                          Region 49             reference
                                                                                             (20x20 pixels)

                                                           Precomputed Reference pixels 20x20 (49 regions)
                                                                                                              ((

Wednesday, April 13, 2011
find_max and
                                         interpolation routines
     •    Find the maximum value and itʼs index
     •    Find x and y shifts using the interpolation equations

           num x = max value − out(shif ted y index, (shif ted x index − 1)
           den x = 2 ∗ max value − out(shif ted y index, (shif ted x index − 1))
                                       −out(shif ted y index, (shif ted x index + 1))
                                                   num x
                  x = (shif ted x index − 0.5) +
                                                   den x
          num y = max value − out((shif ted y index − 1), shif ted x index)
           den y = 2 ∗ max value − out((shif ted y index − 1), shif ted x index)
                                      −out((shif ted y index + 1), shif ted x index))
                                                   num y
                  y = (shif ted y index − 0.5) +
                                                   den y

                                                                                        (!

Wednesday, April 13, 2011
GPU results
                                                  Tesla C1060
                              FFT correlation     Tesla C2050               7x7 correlation
                   2200                                         400
                                           1889
                                                                        313       307      301
                                       1619                           278       279      281
                   1650        1510                             300
      Time in us




                                                   Time in us
                            1188
                   1100                                         200


                    550                                         100


                      0                                           0
                               1          50                            1         50        584
                               No. of images                                No. of images
     Note: Least time indicates better performance                                                ("

Wednesday, April 13, 2011
Reconstruction routine

                                      1900
                                                                                                             Tesla C1060
         x              y
                                                                                                             Tesla C2050

  1750         1750
                               x                                                                             DSP
                                                                                                             CPU
     x and y shifts for 1750
      sub-aperture images
                               3500
                                                                                               100000                 46769
                                         reconstruction matrix 1900x3500

                                                                                               10000
                                                                                                        964 956
                                                                                  Time in us
                                                                           1900
                                                                                                1000
                                                                                                                  229
                                             accumulated values for 1900
                                                     actuators                                   100

                                                                                                  10
     • 1750 sub-aperture x and y shifts
     • 3500 x 1900 reconstruction matrix                                                           1

                                                                                                            Devices        (*

Wednesday, April 13, 2011
Xilinx design flow
                                                               Design verification
                             Design Entry

                                                                   Functional
                                                                   simulation
                                     Design
                                    Synthesis




                                Design
                            implementation

                             Optimization                          Static timing
                                                                     analysis



                               Mapping
                              Placement
                               Routing              Back
                                                                Timing simulation
                                                  Annotation


                              Bitstream
                              generation




                             Download to                            In-circuit
                             Xilinx FPGA                           verification


                                                                                    (#

Wednesday, April 13, 2011
Cross-correlation
                        18                                 •    Configure 400x392 (49x8 bits/
                flat_product0                                    pixel) RAM bank (RAM0-RAM19)
       18

                        8
                               x   26 xcorr_product0
                                                                with pre-computed reference
flatcorr_value                                                   pixels
                 ref_pixel0
      392
                                                           •    Multiply each pixel with
                        18
  ref_pixel
                                                                corresponding reference pixel
                flat_product0


                        8
                               x   26 xcorr_product1
                                                                1274

                                                        xcorr_value_per pixel
                 ref_pixel1



                        18

                flat_product0


                         8
                               x   26 xcorr_product48




                 ref_pixel48

                                                                                                ($

Wednesday, April 13, 2011
Cross-correlation
                                                     18

                                            flat_product0
                                                                                                               •   Configure 400x392 (49x8 bits/
                                   18
                                                            x   26 xcorr_product0

                            flatcorr_value


                                  392
                                                     8

                                              ref_pixel0

                                                     18
                                                                                                                   pixel) RAM bank (RAM0-RAM19)
                              ref_pixel


                                                                                                                   with pre-computed reference
                                            flat_product0


                                                     8
                                                            x   26 xcorr_product1
                                                                                             1274

                                                                                     xcorr_value_per pixel
                                              ref_pixel1



                                                     18

                                             flat_product0
                                                                                                                   pixels
                                                            x   26 xcorr_product48




                                                                                                               •
                                                      8

                                             ref_pixel48

                                                                                                                   Multiply each pixel with
                                                     18

                                            flat_product1
                                                                                                                   corresponding reference pixel
                                   18

                            flatcorr_value            8
                                                            x   26 xcorr_product0




                                              ref_pixel0
                                  392
                                                     18
                              ref_pixel
                                            flat_product1


                                                     8
                                                            x   26 xcorr_product1
                                                                                             1274

                                                                                     xcorr_value_per pixel
                                              ref_pixel1



                                                     18

                                             flat_product1


                                                      8
                                                            x   26 xcorr_product48




                                             ref_pixel48




                                                     18

                                            flat_product15
                                   18

                            flatcorr_value            8
                                                            x   26 xcorr_product0




                                              ref_pixel0
                                  392
                                                     18
                              ref_pixel
                                            flat_product15


                                                     8
                                                            x   26 xcorr_product1
                                                                                             1274

                                                                                     xcorr_value_per pixel
                                              ref_pixel1



                                                     18

                                            flat_product15


                                                      8
                                                            x   26 xcorr_product48




                                                                                                                                                   ($
                                             ref_pixel48




Wednesday, April 13, 2011
Sub-aperture format
Channel #       Channel #       15 14 13 12 11 10 9                8    7    6    5    4    3    2    1    0

   1
            0
            1      1
                            0
                            1
                                3
                                8
                                     3
                                     8
                                          3
                                          8
                                               3
                                               8
                                                    2
                                                    7
                                                         2
                                                         7
                                                              2
                                                              7
                                                                   2
                                                                   7
                                                                        1
                                                                        6
                                                                             1
                                                                             6
                                                                                  1
                                                                                  6
                                                                                       1
                                                                                       6
                                                                                            0
                                                                                            5
                                                                                                 0
                                                                                                 5
                                                                                                      0
                                                                                                      4
                                                                                                           0
                                                                                                           4
                                                                                                                • Sub-aperture regions in 480 columns x
   2        2
            3
                   2        2
                            3
                                13
                                18
                                     13
                                     18
                                          13
                                          18
                                               13
                                               18
                                                    12
                                                    17
                                                         12
                                                         17
                                                              12
                                                              16
                                                                   12
                                                                   16
                                                                        11
                                                                        15
                                                                             11
                                                                             15
                                                                                  10
                                                                                  15
                                                                                       10
                                                                                       15
                                                                                            9
                                                                                            14
                                                                                                 9
                                                                                                 14
                                                                                                      9
                                                                                                      14
                                                                                                           9
                                                                                                           14
                                                                                                                  1 row per channel
            4               4   23   23   22   22   21   21   21   21   20   20   20   20   19   19   19   19

            0               0   4    4    4    4    3    3    2    2    1    1    1    1    0    0    0    0
                                                                                                                • Accumulate pixels per sub-aperture in
   3
   4
            1
            2
                   3
                   4
                            1
                            2
                                9
                                13
                                     9
                                     13
                                          8
                                          13
                                               8
                                               13
                                                    7
                                                    12
                                                         7
                                                         12
                                                              7
                                                              12
                                                                   7
                                                                   12
                                                                        6
                                                                        11
                                                                             6
                                                                             11
                                                                                  6
                                                                                  11
                                                                                       6
                                                                                       11
                                                                                            5
                                                                                            10
                                                                                                 5
                                                                                                 10
                                                                                                      5
                                                                                                      10
                                                                                                           5
                                                                                                           10
                                                                                                                  each channel
            3               3   18   18   18   18   17   17   17   17   16   16   16   16   15   15   14   14                      1274                           1715
            4               4   23   23   23   23   22   22   22   22   21   21   20   20   19   19   19   19      xcorr_pixel0                                          subap0_acc
                                                                                                                                   1274                           1715
            0               0   4    4    4    4    3    3    3    3    2    2    2    2    1    1    0    0       xcorr_pixel1                                          subap1_acc
                                                                                                                                           subap_accumulator
   5        1      5        1   9    9    9    9    8    8    8    8    7    7    6    6    5    5    5    5                               channel #1,#2,#7,#8
   6        2      6        2   14   14   14   14   13   13   12   12   11   11   11   11   10   10   10   10
            3               3
                                                                                                                                   1274                           1715
                                19   19   18   18   17   17   17   17   16   16   16   16   15   15   15   15      xcorr_pixel15                                         subap23_acc
            4               4   23   23   23   23   22   22   22   22   21   21   21   21   20   20   20   20

            0               0   3    3    3    3    2    2    2    2    1    1    1    1    0    0    0    0
                                                                                                                                   1274                           1715
   7        1      7        1   8    8    8    8    7    7    7    7    6    6    6    6    5    5    4    4       xcorr_pixel0                                          subap0_acc
   8        2      8        2   13   13   13   13   12   12   12   12   11   11   10   10   9    9    9    9                       1274                           1715
            3               3   18   18   18   18   17   17   16   16   15   15   15   15   14   14   14   14      xcorr_pixel1                                          subap1_acc
                                                                                                                                           subap_accumulator
            4               4   23   23   22   22   21   21   21   21   20   20   20   20   19   19   19   19                             channel #3,#4,#9,#10

            0               0   4    4    4    4    3    3    2    2    1    1    1    1    0    0    0    0                       1274                           1715
                                                                                                                   xcorr_pixel15                                         subap23_acc
   9        1      9        1   9    9    8    8    7    7    7    7    6    6    6    6    5    5    5    5
   10       2      10       2   13   13   13   13   12   12   12   12   11   11   11   11   10   10   10   10
            3               3   18   18   18   18   17   17   17   17   16   16   16   16   15   15   14   14
            4               4   23   23   23   23   22   22   22   22   21   21   20   20   19   19   19   19
                                                                                                                                   1274                           1715
                                                                                                                   xcorr_pixel0                                          subap0_acc
            0               0   4    4    4    4    3    3    3    3    2    2    2    2    1    1    0    0                       1274                           1715
   11       1      11       1   9    9    9    9    8    8    8    8    7    7    6    6    5    5    5    5        xcorr_pixel1                                         subap1_acc
                                                                                                                                           subap_accumulator
   12       2      12       2   14   14   14   14   13   13   12   12   11   11   11   11   10   10   10   10                             channel #5,#6,#11,#12
            3               3   19   19   18   18   17   17   17   17   16   16   16   16   15   15   15   15
            4               4   23   23   23   23   22   22   22   22   21   21   21   21   20   20   20   20                      1274                           1715
                                                                                                                   xcorr_pixel15                                         subap23_acc
                                                                                                                                                                                       (%

Wednesday, April 13, 2011
Top level design


                   channel_cycle_count
                                                                                                                         288              288
                                                                                             160
                    subap_row_count      refim_fetch_addr_d   RAM bank (RAM0-    FCFPGA                dark_flat_acc_top         Flatcorr
                                                                                                                                                       xcorr_pixel_channel                     ch1278_subap_accumulator
                                              ecoder             RAM19)                                                        _FIFO

                    addr_decoder_ce                                                                                                                                                                                         subap_acc_out
                                                                                                                                                                                                                            (1715 bits) x24
                                          address decoder                      data unpack                                                                                     xcorr_pixel
                                                                                                                                           refim_in                           (1274 bits) x16
     xcorr_sm        xcorr_pixel_ce                                                                                                       (392 bits)
                                                                                                                                             x16
                      subap_acc_ce
                                                                                                   channel1_top

                   subap_acc_12ch_ce

     xcorr state
                       flat_fifo_rd
      machine
                                                                                                                                                                                                                                                              subap_acc_out
                                                                                                                                                                                                                                              24subap_12ch_   (1715 bits) x24
                                                                                                                                                                                                                                                accumulator




                                                                                                                         288              288
                                                                                             160
                                                                                FCFPGA                dark_flat_acc_top         Flatcorr
                                                                                                                                                       xcorr_pixel_channel                     ch561112_subap_accumulator
                                                                                                                               _FIFO

                                                                                                                                                                                                                            subap_acc_out
                                                                                                                                                                               xcorr_pixel                                  (1715 bits) x24
                                                                               data unpack
                                                                                                                                           refim_in                           (1274 bits) x16
                                                                                                                                          (392 bits)
                                                                                                                                             x16


                                                                                                   channel12_top




                                                                                                                                                                                                                                                       (&

Wednesday, April 13, 2011
Synthesis estimates for
                                         Virtex-6 FPGA
     •    Implement dark, flat correction only : resources used 288 out of
          687,360 (1%)
     •    Implement the correlation for single channel up to the sub-aperture
          accumulator within the channel (without the final 12 channel
          accumulation) : resources used 2,578 out of 687,360 (1%)

                               Device utilization summary:
          Slice Logic Utilization:
          Number of Slice Registers: 992448 out of 687360         144% (*)
          Number of Slice LUTs:         1126081 out of 343680 327% (*)
          Number used as Logic:         1125853 out of 343680 327% (*)
          Number used as Memory:        228 out of 99200
          Number used as SRL:           37
                                                                                ('

Wednesday, April 13, 2011
FPGA timing

                 Rxdata from transceiver

                      unpacked data         123.73 ns
                      written to FIFO
                                            40 ns
                   unpacked data read                                   95 ns
                       from FIFO
                                               15 ns
                     dark-flat output
                                                                40 ns
                   input to xcorr_pixel
                         module
                                                                20 ns
                  output from xcorr_pixel
                                                                  16 ns
                 output from sub-aperture
                 accumulator per channel

                                                        91 ns


     •   Each data packet is available from the FIFO after 95 ns
     •   95 ns * 5 packets * 10 rows = 4.75 us to read the data from the FIFO
     •   Total latency for computing the 960 rows x 480 columns = 4.75 us *
         (960/20)  = 228 us.                                                            !)

Wednesday, April 13, 2011
GPU vs FPGA vs DSP
                            100 us           225 us                         300.93 us


                            Camera
                            readout

                                      Data transfer through
                                            PCIe x16


                                                      C2050 GPU 1



                                                              C2050 GPU 2



                                                                        C2050 GPU 3


                                               C2050 GPU throughput = 525.93 us


                                                 FPGA


                                      FPGA throughput = 250 us



                                                              DSP


                                                    96 DSPs throughput = 495 us


                                                                                   Camera
                                                                                   readout

                                                                                             Data transfer through
                                                                                                   PCIe x16


                                                                                                               C2050 GPU 1




                                                                                                                             !(

Wednesday, April 13, 2011
Conclusions




                                        GPU                              FPGA



     •    DSP: excellent performance but not cost-effective
     •    GPU: fast SIMD architectures - suitable for certain tasks
     •    FPGA: MIMD architectures, custom I/O, meets latency and
          throughput constraints
     Slide idea: David Pellerin, Impulse Accelerated Technology                 !!

Wednesday, April 13, 2011
Future work

                                         Virtex-6           Virtex-7
                      Resources
                                       XC6VLX550T          XC7V2000T
               Slice logic resources     549,888            1,954,560
                        I/O pins           840                 850
                 GTX transceivers           36                  36


     •    Investigate performance improvement after mapping the find_max,
          interpolation and reconstruction matrix calculation routines on the
          FPGA
     •    Promising because of increased logic density in Virtex-7 FPGAs
     •    Throughput sustained even if the processes are partitioned over
          multiple FPGAs
                                                                                !"

Wednesday, April 13, 2011
Discussion

                            Questions




                                             !*

Wednesday, April 13, 2011
Backup
                       Device utilization summary:
                       Selected Device : 6vlx550tff1759-2

                       Slice Logic Utilization:

                        Number of Slice Registers:            992448 out of 687360 144% (*)
                        Number of Slice LUTs:                1126081 out of 343680 327% (*)

                            Number used as Logic:            1125853 out of 343680 327% (*)

                            Number used as Memory:              228 out of 99200     0%
                              Number used as SRL:             228

                       Slice Logic Distribution:

                        Number of LUT Flip Flop pairs used: 1509605

                            Number with an unused Flip Flop: 517157 out of 1509605        34%
                            Number with an unused LUT:          383524 out of 1509605     25%

                            Number of fully used LUT-FF pairs: 608924 out of 1509605      40%

                            Number of unique control sets:      221
                       IO Utilization:

                        Number of IOs:                   88

                        Number of bonded IOBs:                  80 out of   840    9%
                            IOB Flip Flops/Latches:           25

                       Specific Feature Utilization:

                        Number of BUFG/BUFGCTRLs:                     36 out of   32 112% (*)
                       WARNING:Xst:1336 - (*) More than 100% of Device resources are used       !#

Wednesday, April 13, 2011
Pre-computed reference
        0     1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16    17    18    19    20    21    22    23    24    25
       26    27    28    29    30    31    32    33    34    35    36    37    38    39    40    41    42    43    44    45    46    47    48    49    50    51
       52    53    54    55    56    57    58    59    60    61    62    63    64    65    66    67    68    69    70    71    72    73    74    75    76    77
       78    79    80    81    82    83    84    85    86    87    88    89    90    91    92    93    94    95    96    97    98    99   100   101   102   103
      104   105   106   107   108   109   110   111   112   113   114   115   116   117   118   119   120   121   122   123   124   125   126   127   128   129
      130   131   132   133   134   135   136   137   138   139   140   141   142   143   144   145   146   147   148   149   150   151   152   153   154   155
      156   157   158   159   160   161   162   163   164   165   166   167   168   169   170   171   172   173   174   175   176   177   178   179   180   181
      182   183   184   185   186   187   188   189   190   191   192   193   194   195   196   197   198   199   200   201   202   203   204   205   206   207
      208   209   210   211   212   213   214   215   216   217   218   219   220   221   222   223   224   225   226   227   228   229   230   231   232   233
      234   235   236   237   238   239   240   241   242   243   244   245   246   247   248   249   250   251   252   253   254   255   256   257   258   259
      260   261   262   263   264   265   266   267   268   269   270   271   272   273   274   275   276   277   278   279   280   281   282   283   284   285
      286   287   288   289   290   291   292   293   294   295   296   297   298   299   300   301   302   303   304   305   306   307   308   309   310   311
      312   313   314   315   316   317   318   319   320   321   322   323   324   325   326   327   328   329   330   331   332   333   334   335   336   337
      338   339   340   341   342   343   344   345   346   347   348   349   350   351   352   353   354   355   356   357   358   359   360   361   362   363
      364   365   366   367   368   369   370   371   372   373   374   375   376   377   378   379   380   381   382   383   384   385   386   387   388   389
      390   391   392   393   394   395   396   397   398   399   400   401   402   403   404   405   406   407   408   409   410   411   412   413   414   415
      416   417   418   419   420   421   422   423   424   425   426   427   428   429   430   431   432   433   434   435   436   437   438   439   440   441
      442   443   444   445   446   447   448   449   450   451   452   453   454   455   456   457   458   459   460   461   462   463   464   465   466   467
      468   469   470   471   472   473   474   475   476   477   478   479   480   481   482   483   484   485   486   487   488   489   490   491   492   493
      494   495   496   497   498   499   500   501   502   503   504   505   506   507   508   509   510   511   512   513   514   515   516   517   518   519
      520   521   522   523   524   525   526   527   528   529   530   531   532   533   534   535   536   537   538   539   540   541   542   543   544   545
      546   547   548   549   550   551   552   553   554   555   556   557   558   559   560   561   562   563   564   565   566   567   568   569   570   571
      572   573   574   575   576   577   578   579   580   581   582   583   584   585   586   587   588   589   590   591   592   593   594   595   596   597
      598   599   600   601   602   603   604   605   606   607   608   609   610   611   612   613   614   615   616   617   618   619   620   621   622   623
      624   625   626   627   628   629   630   631   632   633   634   635   636   637   638   639   640   641   642   643   644   645   646   647   648   649
      650   651   652   653   654   655   656   657   658   659   660   661   662   663   664   665   666   667   668   669   670   671   672   673   674   675
                                                                                                                                                                  !$

Wednesday, April 13, 2011
Pre-computed reference
        0     1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16    17    18    19    20    21    22    23    24    25
       26    27    28    29    30    31    32    33    34    35    36    37    38    39    40    41    42    43    44    45    46    47    48    49    50    51
       52    53    54    55    56    57    58    59    60    61    62    63    64    65    66    67    68    69    70    71    72    73    74    75    76    77
       78    79    80    81    82    83    84    85    86    87    88    89    90    91    92    93    94    95    96    97    98    99   100   101   102   103
      104   105   106   107   108   109   110   111   112   113   114   115   116   117   118   119   120   121   122   123   124   125   126   127   128   129
      130   131   132   133   134   135   136   137   138   139   140   141   142   143   144   145   146   147   148   149   150   151   152   153   154   155
      156   157   158   159   160   161   162   163   164   165   166   167   168   169   170   171   172   173   174   175   176   177   178   179   180   181
      182   183   184   185   186   187   188   189   190   191   192   193   194   195   196   197   198   199   200   201   202   203   204   205   206   207
      208   209   210   211   212   213   214   215   216   217   218   219   220   221   222   223   224   225   226   227   228   229   230   231   232   233
      234   235   236   237   238   239   240   241   242   243   244   245   246   247   248   249   250   251   252   253   254   255   256   257   258   259
      260   261   262   263   264   265   266   267   268   269   270   271   272   273   274   275   276   277   278   279   280   281   282   283   284   285
      286   287   288   289   290   291   292   293   294   295   296   297   298   299   300   301   302   303   304   305   306   307   308   309   310   311
      312   313   314   315   316   317   318   319   320   321   322   323   324   325   326   327   328   329   330   331   332   333   334   335   336   337
      338   339   340   341   342   343   344   345   346   347   348   349   350   351   352   353   354   355   356   357   358   359   360   361   362   363
      364   365   366   367   368   369   370   371   372   373   374   375   376   377   378   379   380   381   382   383   384   385   386   387   388   389
      390   391   392   393   394   395   396   397   398   399   400   401   402   403   404   405   406   407   408   409   410   411   412   413   414   415
      416   417   418   419   420   421   422   423   424   425   426   427   428   429   430   431   432   433   434   435   436   437   438   439   440   441
      442   443   444   445   446   447   448   449   450   451   452   453   454   455   456   457   458   459   460   461   462   463   464   465   466   467
      468   469   470   471   472   473   474   475   476   477   478   479   480   481   482   483   484   485   486   487   488   489   490   491   492   493
      494   495   496   497   498   499   500   501   502   503   504   505   506   507   508   509   510   511   512   513   514   515   516   517   518   519
      520   521   522   523   524   525   526   527   528   529   530   531   532   533   534   535   536   537   538   539   540   541   542   543   544   545
      546   547   548   549   550   551   552   553   554   555   556   557   558   559   560   561   562   563   564   565   566   567   568   569   570   571
      572   573   574   575   576   577   578   579   580   581   582   583   584   585   586   587   588   589   590   591   592   593   594   595   596   597
      598   599   600   601   602   603   604   605   606   607   608   609   610   611   612   613   614   615   616   617   618   619   620   621   622   623
      624   625   626   627   628   629   630   631   632   633   634   635   636   637   638   639   640   641   642   643   644   645   646   647   648   649
      650   651   652   653   654   655   656   657   658   659   660   661   662   663   664   665   666   667   668   669   670   671   672   673   674   675
                                                                                                                                                                  !$

Wednesday, April 13, 2011

More Related Content

Viewers also liked

Sensation and Perception
Sensation and PerceptionSensation and Perception
Sensation and PerceptionMypzi
 
Success with Requirements: Agile Requirements Work!
Success with Requirements: Agile Requirements Work!Success with Requirements: Agile Requirements Work!
Success with Requirements: Agile Requirements Work!EBG Consulting, Inc.
 
AshabëT E Muhamedit A.S.
AshabëT E Muhamedit A.S.AshabëT E Muhamedit A.S.
AshabëT E Muhamedit A.S.guestef339
 
Project Presentation
Project PresentationProject Presentation
Project Presentationsalieeri
 
Select Sys Services
Select Sys ServicesSelect Sys Services
Select Sys ServicesAnas
 
Profile Fitmentlinkedin
Profile FitmentlinkedinProfile Fitmentlinkedin
Profile FitmentlinkedinAmit Jalihal
 
Comert Hazliu
Comert HazliuComert Hazliu
Comert Hazliusalieeri
 
Carol y mellizas
Carol y mellizasCarol y mellizas
Carol y mellizasvitita
 
Personal Identity2
Personal Identity2Personal Identity2
Personal Identity2Halby33
 
US Obesity Stats Map
US Obesity Stats MapUS Obesity Stats Map
US Obesity Stats MapPeter Chan
 
Professional Identity2
Professional Identity2Professional Identity2
Professional Identity2Halby33
 
Science power point
Science power pointScience power point
Science power pointvitita
 
Present simple
Present simplePresent simple
Present simplevitita
 
Informe de 2ª avaliación 2015 16
Informe de 2ª avaliación 2015 16Informe de 2ª avaliación 2015 16
Informe de 2ª avaliación 2015 16Anxos bibliotequeira
 
رایانش ابری و کارآفرینی اینترنتی
رایانش ابری و کارآفرینی اینترنتیرایانش ابری و کارآفرینی اینترنتی
رایانش ابری و کارآفرینی اینترنتیNasser Ghanemzadeh
 

Viewers also liked (20)

Sensation and Perception
Sensation and PerceptionSensation and Perception
Sensation and Perception
 
Success with Requirements: Agile Requirements Work!
Success with Requirements: Agile Requirements Work!Success with Requirements: Agile Requirements Work!
Success with Requirements: Agile Requirements Work!
 
Club de banda deseñada
Club de banda deseñadaClub de banda deseñada
Club de banda deseñada
 
AshabëT E Muhamedit A.S.
AshabëT E Muhamedit A.S.AshabëT E Muhamedit A.S.
AshabëT E Muhamedit A.S.
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
Foss Business SFD 2010
Foss Business SFD 2010Foss Business SFD 2010
Foss Business SFD 2010
 
Select Sys Services
Select Sys ServicesSelect Sys Services
Select Sys Services
 
Profile Fitmentlinkedin
Profile FitmentlinkedinProfile Fitmentlinkedin
Profile Fitmentlinkedin
 
Social Media Case Studies
Social Media Case StudiesSocial Media Case Studies
Social Media Case Studies
 
Comert Hazliu
Comert HazliuComert Hazliu
Comert Hazliu
 
Carol y mellizas
Carol y mellizasCarol y mellizas
Carol y mellizas
 
Personal Identity2
Personal Identity2Personal Identity2
Personal Identity2
 
US Obesity Stats Map
US Obesity Stats MapUS Obesity Stats Map
US Obesity Stats Map
 
Professional Identity2
Professional Identity2Professional Identity2
Professional Identity2
 
Science power point
Science power pointScience power point
Science power point
 
Leire & Iratxe
Leire & IratxeLeire & Iratxe
Leire & Iratxe
 
Present simple
Present simplePresent simple
Present simple
 
Informe de 2ª avaliación 2015 16
Informe de 2ª avaliación 2015 16Informe de 2ª avaliación 2015 16
Informe de 2ª avaliación 2015 16
 
Hiway%20 Etoquette[1]
Hiway%20 Etoquette[1]Hiway%20 Etoquette[1]
Hiway%20 Etoquette[1]
 
رایانش ابری و کارآفرینی اینترنتی
رایانش ابری و کارآفرینی اینترنتیرایانش ابری و کارآفرینی اینترنتی
رایانش ابری و کارآفرینی اینترنتی
 

Similar to Real-time processing for ATST

Similar to Real-time processing for ATST (10)

Introduction To Sonalysts Engineering
Introduction To Sonalysts EngineeringIntroduction To Sonalysts Engineering
Introduction To Sonalysts Engineering
 
3 d to _hpc
3 d to _hpc3 d to _hpc
3 d to _hpc
 
SPICE MODEL of H5B2 in SPICE PARK
SPICE MODEL of H5B2 in SPICE PARKSPICE MODEL of H5B2 in SPICE PARK
SPICE MODEL of H5B2 in SPICE PARK
 
3 d to_hpc
3 d to_hpc3 d to_hpc
3 d to_hpc
 
2D Games to HPC
2D Games to HPC2D Games to HPC
2D Games to HPC
 
Milev studnicka
Milev studnickaMilev studnicka
Milev studnicka
 
Sncch110 datasheet bullet
Sncch110 datasheet bulletSncch110 datasheet bullet
Sncch110 datasheet bullet
 
SPICE MODEL of 3F3 in SPICE PARK
SPICE MODEL of 3F3 in SPICE PARKSPICE MODEL of 3F3 in SPICE PARK
SPICE MODEL of 3F3 in SPICE PARK
 
Sncdh210 sip2091 fprelim
Sncdh210 sip2091 fprelimSncdh210 sip2091 fprelim
Sncdh210 sip2091 fprelim
 
Sncdh210 sip2091 fprelim_2
Sncdh210 sip2091 fprelim_2Sncdh210 sip2091 fprelim_2
Sncdh210 sip2091 fprelim_2
 

More from Vivek Venugopalan

xDEFENSE: An Extended DEFENSE for mitigating Next Generation Intrusions
xDEFENSE: An Extended DEFENSE for mitigating Next Generation IntrusionsxDEFENSE: An Extended DEFENSE for mitigating Next Generation Intrusions
xDEFENSE: An Extended DEFENSE for mitigating Next Generation IntrusionsVivek Venugopalan
 
Design, Implementation and Security Analysis of Hardware Trojan Threats in FPGA
Design, Implementation and Security Analysis of Hardware Trojan Threats in FPGADesign, Implementation and Security Analysis of Hardware Trojan Threats in FPGA
Design, Implementation and Security Analysis of Hardware Trojan Threats in FPGAVivek Venugopalan
 
Accelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUsAccelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUsVivek Venugopalan
 
Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core ...
Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core ...Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core ...
Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core ...Vivek Venugopalan
 
Accelerating Particle Image Velocimetry using Hybrid Architectures
Accelerating Particle Image Velocimetry using Hybrid ArchitecturesAccelerating Particle Image Velocimetry using Hybrid Architectures
Accelerating Particle Image Velocimetry using Hybrid ArchitecturesVivek Venugopalan
 

More from Vivek Venugopalan (6)

xDEFENSE: An Extended DEFENSE for mitigating Next Generation Intrusions
xDEFENSE: An Extended DEFENSE for mitigating Next Generation IntrusionsxDEFENSE: An Extended DEFENSE for mitigating Next Generation Intrusions
xDEFENSE: An Extended DEFENSE for mitigating Next Generation Intrusions
 
Design, Implementation and Security Analysis of Hardware Trojan Threats in FPGA
Design, Implementation and Security Analysis of Hardware Trojan Threats in FPGADesign, Implementation and Security Analysis of Hardware Trojan Threats in FPGA
Design, Implementation and Security Analysis of Hardware Trojan Threats in FPGA
 
Accelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUsAccelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUs
 
Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core ...
Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core ...Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core ...
Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core ...
 
Accelerating Particle Image Velocimetry using Hybrid Architectures
Accelerating Particle Image Velocimetry using Hybrid ArchitecturesAccelerating Particle Image Velocimetry using Hybrid Architectures
Accelerating Particle Image Velocimetry using Hybrid Architectures
 
CISL talk
CISL talkCISL talk
CISL talk
 

Recently uploaded

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Real-time processing for ATST

  • 1. RTC Workshop, Durham, UK, April 2011 Real-time processing for the Advanced Technology Solar Telescope Vivek Venugopal (vivekv@nso.edu) National Solar Observatory Sunspot, New Mexico, USA Wednesday, April 13, 2011
  • 2. Advanced Technology Solar Telescope ! Wednesday, April 13, 2011
  • 3. Adaptive Optics system Uncorrected Tip/Tilt light Mirror Deformable Mirror (DM) Tilt drive signal DM drive signal Corrected Processors Beamsplitter light Shack-Hartmann Lenslet Array CCD Camera " Wednesday, April 13, 2011
  • 4. HOAO Real-time system Actuator gains Offscale Recon- Dark Reference slope Slope struction Actuator Flat field image field tolerance offsets matrix offsets FPGA GPU Deformable mirror Cross- Offscale WFS correlation Matrix Actuator Camera X slope slope detection X multiply servos Servo computation parameters Average Tip/Tilt slope servos Tip/Tilt mirror Data Zernike collection offload process Wednesday, April 13, 2011
  • 5. Camera format Channel # 480 columns x 480 columns x 0 77 76 73 72 53 52 49 48 29 28 25 24 5 4 1 0 960 rows 960 rowschannels 12 channels 4 463 462 459 458 439 438 435 434 415 414 411 410 391 390 387 386 per FPGA per FPGA 0 87 86 83 82 63 62 59 58 39 38 35 34 15 14 11 10 9 1 183 182 179 178 159 158 155 154 135 134 131 130 111 110 107 106 • 12 channels processed per 10 2 3 279 375 278 374 275 371 274 370 255 351 254 350 251 347 250 346 231 327 230 326 227 323 226 322 207 303 206 302 203 299 202 298 4 FPGA 0 471 470 467 466 447 446 443 442 423 422 419 418 399 398 395 394 95 94 91 90 71 70 67 66 47 46 43 42 23 22 19 18 • 5 packets to receive a 11 12 1 2 191 287 190 286 187 283 186 282 167 263 166 262 163 259 162 258 143 239 142 238 139 235 138 234 119 215 118 214 115 211 114 210 3 383 382 379 378 359 358 355 354 335 334 331 330 311 310 307 306 complete row 4 479 478 475 474 455 454 451 450 431 430 427 426 407 406 403 402 # Wednesday, April 13, 2011
  • 6. Pixel unpacking Byte 1 Byte 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 49 48 47 Pixel 1 46 45 44 43 42 9 8 7 Pixel 0 6 5 4 3 2 • FPGA receives camera data using the 31 30 29 Byte 3 28 27 26 25 24 23 22 21 Byte 2 20 19 18 17 16 fiber channel interface through 12 Pixel 3 129 128 127 126 125 124 123 122 89 88 87 Pixel 2 86 85 84 83 82 transceivers @ 9.42 ns Byte 5 Byte 4 47 46 Pixel 1 45 44 43 42 Pixel 5 41 40 39 38 Pixel 0 37 36 35 34 Pixel 4 33 32 • Pixel unpacking implemented using 41 40 59 58 57 56 55 54 1 0 19 18 17 16 15 14 Byte 7 Byte 6 FSM with 2 modes (10 states/mode) 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 Pixel 3 Pixel 7 121 120 139 138 137 136 135 134 Pixel 2 81 80 99 98 Pixel 6 97 96 95 94 • 16 pixels (10 bits/pixel) written to FIFO Byte 9 Byte 8 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 Pixel 5 Pixel 9 Pixel 4 Pixel 8 53 52 51 50 69 68 67 66 13 12 11 10 29 28 27 26 Byte 11 Byte 10 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 Pixel 7 Pixel 11 Pixel 6 Pixel 10 133 132 131 130 149 148 147 146 93 92 91 90 109 108 107 106 Byte 13 Byte 12 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 Pixel 9 Pixel 13 Pixel 8 Pixel 12 65 64 63 62 61 60 79 78 25 24 23 22 21 20 39 38 Byte 15 Byte 14 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 Pixel 11 Pixel 15 Pixel 10 Pixel 14 145 144 143 142 141 140 159 158 105 104 103 102 101 100 119 118 Byte 17 Byte 16 143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 128 Pixel 13 Pixel 12 77 76 75 74 73 72 71 70 37 36 35 34 33 32 31 30 Byte 19 Byte 18 159 158 157 156 155 154 153 152 151 150 149 148 147 146 145 144 Pixel 15 Pixel 14 157 156 155 154 153 152 151 150 117 116 115 114 113 112 111 110 $ Wednesday, April 13, 2011
  • 7. Dark and flat correction pixel0 10 • Dark pixel and flat pixel stored in - 10 RAM dark_pixel 8 8 x 18 flat_product0 • Flat corrected product is flat_pixel 8 accumulator 8 concatenated and written to flat_acc1 pixel 1 10 FIFO - 10 • Flat accumulated value can be used to update the reference dark_pixel 8 flat_pixel 8 x 8 18 flat_product1 image 8 accumulator flat_acc1 pixel16 10 - 10 dark_pixel 8 flat_pixel 8 x 8 18 flat_product16 8 accumulator flat_acc16 % Wednesday, April 13, 2011
  • 8. Pixel unpacking & Dark and flat correction Synchronizer/ counters dark and flat reference image value RAM RAM 206.8 ns 20 ns 256 channel 1 128 Data 160 Dark-flat correction/ Receiver FIFO unpack accumulator 16 160 288 channel 2 PCIe system bus 128 Data 160 Dark-flat correction/ 12 channels Receiver FIFO 1/2 camera unpack accumulator 16 160 288 channel 12 128 Data 160 Dark-flat correction/ Receiver FIFO unpack accumulator 16 160 288 clock period = 9.42 ns clock period = 5 ns clock rate = 106.15 MHz clock rate = 200 MHz & Wednesday, April 13, 2011
  • 9. Nvidia Tesla C2050 GPU Multiprocessor 14 • Nvidia Tesla C2050: 14 streaming multi-processors Multiprocessor 2 with 32 cores each (SIMD) Multiprocessor 1 Instruction Cache clocked at 1.15 GHz Warp Scheduler Warp Scheduler • 3 GB on-board RAM Dispatch Unit Dispatch Unit • Kernel-based execution Register File • 1.288 TFLOPS single Core 1 Core 2 Core 1 Core 2 Load/ Store 1 SFU 1 precision Load/ SFU 2 Core 3 Core 4 Core 3 Core 4 Store 2 • 515.2 GFLOPS double SFU 3 Load/ precision Core 15 Core 16 Core 15 Core 16 SFU 4 Store 16 Interconnection Network 64 KB Shared Memory/ L1 cache Uniform Cache Reference: http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf ' Wednesday, April 13, 2011
  • 10. Process mapping and partitioning Raw Flat Reference pixels pixels pixels 20x20 20x20 20x20 FPGA GPU Dark find x and y dark flat 2D cross-correlation pixels maximum interpolation correction correction 20x20 () Wednesday, April 13, 2011
  • 11. Correlation routines 1. FFT correlation 2. 7x7 correlation flat reference corrected image image precomputed original reference Region 1 reference FFT FFT image 26x26 pixels (20x20 pixels) precomputed Region 2 reference Complex conjugate (20x20 pixels) Multiplication IFFT precomputed Region 49 reference (20x20 pixels) Precomputed Reference pixels 20x20 (49 regions) (( Wednesday, April 13, 2011
  • 12. find_max and interpolation routines • Find the maximum value and itʼs index • Find x and y shifts using the interpolation equations num x = max value − out(shif ted y index, (shif ted x index − 1) den x = 2 ∗ max value − out(shif ted y index, (shif ted x index − 1)) −out(shif ted y index, (shif ted x index + 1)) num x x = (shif ted x index − 0.5) + den x num y = max value − out((shif ted y index − 1), shif ted x index) den y = 2 ∗ max value − out((shif ted y index − 1), shif ted x index) −out((shif ted y index + 1), shif ted x index)) num y y = (shif ted y index − 0.5) + den y (! Wednesday, April 13, 2011
  • 13. GPU results Tesla C1060 FFT correlation Tesla C2050 7x7 correlation 2200 400 1889 313 307 301 1619 278 279 281 1650 1510 300 Time in us Time in us 1188 1100 200 550 100 0 0 1 50 1 50 584 No. of images No. of images Note: Least time indicates better performance (" Wednesday, April 13, 2011
  • 14. Reconstruction routine 1900 Tesla C1060 x y Tesla C2050 1750 1750 x DSP CPU x and y shifts for 1750 sub-aperture images 3500 100000 46769 reconstruction matrix 1900x3500 10000 964 956 Time in us 1900 1000 229 accumulated values for 1900 actuators 100 10 • 1750 sub-aperture x and y shifts • 3500 x 1900 reconstruction matrix 1 Devices (* Wednesday, April 13, 2011
  • 15. Xilinx design flow Design verification Design Entry Functional simulation Design Synthesis Design implementation Optimization Static timing analysis Mapping Placement Routing Back Timing simulation Annotation Bitstream generation Download to In-circuit Xilinx FPGA verification (# Wednesday, April 13, 2011
  • 16. Cross-correlation 18 • Configure 400x392 (49x8 bits/ flat_product0 pixel) RAM bank (RAM0-RAM19) 18 8 x 26 xcorr_product0 with pre-computed reference flatcorr_value pixels ref_pixel0 392 • Multiply each pixel with 18 ref_pixel corresponding reference pixel flat_product0 8 x 26 xcorr_product1 1274 xcorr_value_per pixel ref_pixel1 18 flat_product0 8 x 26 xcorr_product48 ref_pixel48 ($ Wednesday, April 13, 2011
  • 17. Cross-correlation 18 flat_product0 • Configure 400x392 (49x8 bits/ 18 x 26 xcorr_product0 flatcorr_value 392 8 ref_pixel0 18 pixel) RAM bank (RAM0-RAM19) ref_pixel with pre-computed reference flat_product0 8 x 26 xcorr_product1 1274 xcorr_value_per pixel ref_pixel1 18 flat_product0 pixels x 26 xcorr_product48 • 8 ref_pixel48 Multiply each pixel with 18 flat_product1 corresponding reference pixel 18 flatcorr_value 8 x 26 xcorr_product0 ref_pixel0 392 18 ref_pixel flat_product1 8 x 26 xcorr_product1 1274 xcorr_value_per pixel ref_pixel1 18 flat_product1 8 x 26 xcorr_product48 ref_pixel48 18 flat_product15 18 flatcorr_value 8 x 26 xcorr_product0 ref_pixel0 392 18 ref_pixel flat_product15 8 x 26 xcorr_product1 1274 xcorr_value_per pixel ref_pixel1 18 flat_product15 8 x 26 xcorr_product48 ($ ref_pixel48 Wednesday, April 13, 2011
  • 18. Sub-aperture format Channel # Channel # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 0 1 1 0 1 3 8 3 8 3 8 3 8 2 7 2 7 2 7 2 7 1 6 1 6 1 6 1 6 0 5 0 5 0 4 0 4 • Sub-aperture regions in 480 columns x 2 2 3 2 2 3 13 18 13 18 13 18 13 18 12 17 12 17 12 16 12 16 11 15 11 15 10 15 10 15 9 14 9 14 9 14 9 14 1 row per channel 4 4 23 23 22 22 21 21 21 21 20 20 20 20 19 19 19 19 0 0 4 4 4 4 3 3 2 2 1 1 1 1 0 0 0 0 • Accumulate pixels per sub-aperture in 3 4 1 2 3 4 1 2 9 13 9 13 8 13 8 13 7 12 7 12 7 12 7 12 6 11 6 11 6 11 6 11 5 10 5 10 5 10 5 10 each channel 3 3 18 18 18 18 17 17 17 17 16 16 16 16 15 15 14 14 1274 1715 4 4 23 23 23 23 22 22 22 22 21 21 20 20 19 19 19 19 xcorr_pixel0 subap0_acc 1274 1715 0 0 4 4 4 4 3 3 3 3 2 2 2 2 1 1 0 0 xcorr_pixel1 subap1_acc subap_accumulator 5 1 5 1 9 9 9 9 8 8 8 8 7 7 6 6 5 5 5 5 channel #1,#2,#7,#8 6 2 6 2 14 14 14 14 13 13 12 12 11 11 11 11 10 10 10 10 3 3 1274 1715 19 19 18 18 17 17 17 17 16 16 16 16 15 15 15 15 xcorr_pixel15 subap23_acc 4 4 23 23 23 23 22 22 22 22 21 21 21 21 20 20 20 20 0 0 3 3 3 3 2 2 2 2 1 1 1 1 0 0 0 0 1274 1715 7 1 7 1 8 8 8 8 7 7 7 7 6 6 6 6 5 5 4 4 xcorr_pixel0 subap0_acc 8 2 8 2 13 13 13 13 12 12 12 12 11 11 10 10 9 9 9 9 1274 1715 3 3 18 18 18 18 17 17 16 16 15 15 15 15 14 14 14 14 xcorr_pixel1 subap1_acc subap_accumulator 4 4 23 23 22 22 21 21 21 21 20 20 20 20 19 19 19 19 channel #3,#4,#9,#10 0 0 4 4 4 4 3 3 2 2 1 1 1 1 0 0 0 0 1274 1715 xcorr_pixel15 subap23_acc 9 1 9 1 9 9 8 8 7 7 7 7 6 6 6 6 5 5 5 5 10 2 10 2 13 13 13 13 12 12 12 12 11 11 11 11 10 10 10 10 3 3 18 18 18 18 17 17 17 17 16 16 16 16 15 15 14 14 4 4 23 23 23 23 22 22 22 22 21 21 20 20 19 19 19 19 1274 1715 xcorr_pixel0 subap0_acc 0 0 4 4 4 4 3 3 3 3 2 2 2 2 1 1 0 0 1274 1715 11 1 11 1 9 9 9 9 8 8 8 8 7 7 6 6 5 5 5 5 xcorr_pixel1 subap1_acc subap_accumulator 12 2 12 2 14 14 14 14 13 13 12 12 11 11 11 11 10 10 10 10 channel #5,#6,#11,#12 3 3 19 19 18 18 17 17 17 17 16 16 16 16 15 15 15 15 4 4 23 23 23 23 22 22 22 22 21 21 21 21 20 20 20 20 1274 1715 xcorr_pixel15 subap23_acc (% Wednesday, April 13, 2011
  • 19. Top level design channel_cycle_count 288 288 160 subap_row_count refim_fetch_addr_d RAM bank (RAM0- FCFPGA dark_flat_acc_top Flatcorr xcorr_pixel_channel ch1278_subap_accumulator ecoder RAM19) _FIFO addr_decoder_ce subap_acc_out (1715 bits) x24 address decoder data unpack xcorr_pixel refim_in (1274 bits) x16 xcorr_sm xcorr_pixel_ce (392 bits) x16 subap_acc_ce channel1_top subap_acc_12ch_ce xcorr state flat_fifo_rd machine subap_acc_out 24subap_12ch_ (1715 bits) x24 accumulator 288 288 160 FCFPGA dark_flat_acc_top Flatcorr xcorr_pixel_channel ch561112_subap_accumulator _FIFO subap_acc_out xcorr_pixel (1715 bits) x24 data unpack refim_in (1274 bits) x16 (392 bits) x16 channel12_top (& Wednesday, April 13, 2011
  • 20. Synthesis estimates for Virtex-6 FPGA • Implement dark, flat correction only : resources used 288 out of 687,360 (1%) • Implement the correlation for single channel up to the sub-aperture accumulator within the channel (without the final 12 channel accumulation) : resources used 2,578 out of 687,360 (1%) Device utilization summary: Slice Logic Utilization: Number of Slice Registers: 992448 out of 687360 144% (*) Number of Slice LUTs: 1126081 out of 343680 327% (*) Number used as Logic: 1125853 out of 343680 327% (*) Number used as Memory: 228 out of 99200 Number used as SRL: 37 (' Wednesday, April 13, 2011
  • 21. FPGA timing Rxdata from transceiver unpacked data 123.73 ns written to FIFO 40 ns unpacked data read 95 ns from FIFO 15 ns dark-flat output 40 ns input to xcorr_pixel module 20 ns output from xcorr_pixel 16 ns output from sub-aperture accumulator per channel 91 ns • Each data packet is available from the FIFO after 95 ns • 95 ns * 5 packets * 10 rows = 4.75 us to read the data from the FIFO • Total latency for computing the 960 rows x 480 columns = 4.75 us * (960/20)  = 228 us. !) Wednesday, April 13, 2011
  • 22. GPU vs FPGA vs DSP 100 us 225 us 300.93 us Camera readout Data transfer through PCIe x16 C2050 GPU 1 C2050 GPU 2 C2050 GPU 3 C2050 GPU throughput = 525.93 us FPGA FPGA throughput = 250 us DSP 96 DSPs throughput = 495 us Camera readout Data transfer through PCIe x16 C2050 GPU 1 !( Wednesday, April 13, 2011
  • 23. Conclusions GPU FPGA • DSP: excellent performance but not cost-effective • GPU: fast SIMD architectures - suitable for certain tasks • FPGA: MIMD architectures, custom I/O, meets latency and throughput constraints Slide idea: David Pellerin, Impulse Accelerated Technology !! Wednesday, April 13, 2011
  • 24. Future work Virtex-6 Virtex-7 Resources XC6VLX550T XC7V2000T Slice logic resources 549,888 1,954,560 I/O pins 840 850 GTX transceivers 36 36 • Investigate performance improvement after mapping the find_max, interpolation and reconstruction matrix calculation routines on the FPGA • Promising because of increased logic density in Virtex-7 FPGAs • Throughput sustained even if the processes are partitioned over multiple FPGAs !" Wednesday, April 13, 2011
  • 25. Discussion Questions !* Wednesday, April 13, 2011
  • 26. Backup Device utilization summary: Selected Device : 6vlx550tff1759-2 Slice Logic Utilization: Number of Slice Registers: 992448 out of 687360 144% (*) Number of Slice LUTs: 1126081 out of 343680 327% (*) Number used as Logic: 1125853 out of 343680 327% (*) Number used as Memory: 228 out of 99200 0% Number used as SRL: 228 Slice Logic Distribution: Number of LUT Flip Flop pairs used: 1509605 Number with an unused Flip Flop: 517157 out of 1509605 34% Number with an unused LUT: 383524 out of 1509605 25% Number of fully used LUT-FF pairs: 608924 out of 1509605 40% Number of unique control sets: 221 IO Utilization: Number of IOs: 88 Number of bonded IOBs: 80 out of 840 9% IOB Flip Flops/Latches: 25 Specific Feature Utilization: Number of BUFG/BUFGCTRLs: 36 out of 32 112% (*) WARNING:Xst:1336 - (*) More than 100% of Device resources are used !# Wednesday, April 13, 2011
  • 27. Pre-computed referenceednesday, April 13, 2011
  • 28. Pre-computed referenceednesday, April 13, 2011