SlideShare uma empresa Scribd logo
1 de 1
Baixar para ler offline
High Speed Data Ingestion and Processing for MWA

     Stewart Gleadow (and the team from MWA)
     School of Physics, University of Melbourne, Victoria 3010, Australia                                                                 gleadows@unimelb.edu.au


 The MWA radio telescope requires the interaction of hardware and software systems at close to link capacity,
  with minimal transmission loss and maximum throughput. Using the parallel thread architecture described
     below, we aim to operate high speed network connections and process data products simultaneously.


 1               MWA REAL TIME SYSTEM
                                                                                                                                                                                  Basic structure of the MWA, from antennas
       The Murchison Widefield Array (MWA) is a low-                                                        ANTENNAS / BEAMFORMERS                                                 to output data products. Shows the main
       frequency radio telescope currently being deployed in
                                                                                                                                                                                  high-speed hardware to software interface
       Western Australia using 512 dipole-based antennas.
                                                                                                                                                                                  at the input from the correlator to the RTS.
       With over 130,000 baselines and around 800 fine
                                                                           HARDWARE

       frequency channels, there is a significant
       computational challenge facing the Real Time System                                                           RECEIVERS
       (RTS) software. A prototype system with 32 antennas                                                                                                                        For 32-tile demonstration, each of four
       is presently being used to test the hardware and                                                                                                                           computing nodes receives:
       software solutions from end-to-end.
                                                                                                                                       •  correlations for both polarizations from all
                                                                                                                    CORRELATOR
                                                                                                                                                                                  antennas
       Before calibration and imaging can occur, the RTS                                                                                                                          •  192 x 40KHz frequency channels
       must ingest and integrate correlated data at high                                                                                                                          •  ~0.5 Gbit/s data
                                                                           SOFTWARE




       speeds; around 0.5 Gigabit/sec per network interface
       on a Beowulf-style cluster. The data is transferred                                                     REAL TIME SYSTEM
       using UDP packets over Gigabit Ethernet, with as
       close to zero data loss as possible.
                                                                                                               OUTPUT / STORAGE



 2           DATA INGESTION CHALLENGE
       The MWA hardware correlator sends out packet
       data representing a full set of visibilities and channels                                                                                                                  PACKET                                      VISIBILITY
                                                                                                                                                        CORRELATOR                                                                                         MAIN RTS
       every 50ms, which means only tens of µs per packet.                                                                                                                        READER                                    INTEGRATOR
                                                                     In order to operate at close to
       The RTS runs on an 8 second cadence, so visibilities          gigabit speeds, a hierarchy of
       need to be integrator to this level.
                         parallel threads is required. Each                                                    packet/20µs           20µs to 1s                                     1s to 8s                   8s cadence
                                                                     only does a small amount of
       In order to avoid overflows or loss in the network             processing in order to operate
       card and kernel memory, a custom buffering system is          quickly while still reaching the                                               Buffer One:
       required. The goal is to allow the correlator, network        higher data level required by the
       interface and the main RTS calibration and imaging to         rest of the calibration and imaging                                            Buffer Two:
       run in parallel, without losing data in between.
             processes.

       UDP does not guarantee successful transmission, but
       in our testing, with a direct Gigabit Ethernet
       connection (no switch), there is no packet loss other                                                                                            Each thread uses double buffers (shown in diagram), so that there is one set of
       than from buffer overflows. This only occurs when                                                                                                 data currently being filled by each thread, and another that is already full and being
       packets are not read from the network interface fast                                                                                             passed on to the next level. This allows each thread to operate in parallel, while
       enough.
                                                                                                                                         each set of data still passes through each phase in the order it arrived from the
                                                                                                                                                        correlator.


 3                THREADED HEIRARCHY
                                                                                              1000
                                                                       Left: Plot of effective bandwidth using UDP packets for various
       When approaching link capacity, one thread is                                                                                                                      datagram sizes.
                                                                      Bandwidth (Mbit/sec)




       dedicated to constantly reading packets from the                                        800
                                                                                                                                                                          Below: Plot of percentage packet loss against UDP payload size.
       network interface to avoid buffer overflows and                                          600
                                                                       (tests performed by Steve Ord, Harvard-Smithsonian Center for Astrophysics)
       packet loss. In order to operate at close to Gigabit                                                                         (new packet size)
       speeds, a hierarchy of parallel threads is required.
                                   400

                                                                                               200
                                                                                                                       (original packet size)
       Buffering all packets for 8 seconds would introduce                                                                                                                                                       18
                                                                                                 0
       heavy memory requirements. Hence, an intermediate
                                                                                                                                                                                          Percentage Loss (%)




                                                                                                      0
     400
       800
      1200
         1600
       2000
                                                15
       thread processing a mid-level time resolution is                                                             Datagram Size (bytes)
                                                                                                                                                                                                                 12
       required.
                                                                                                                                                                                                                  9

                                                                                                                                                                                                                  6
       Theoretical network performance is difficult to                The poor network performance for small packets is caused by the
       achieve using small packets because of the overhead           kernel becoming flooded with interrupts faster than it can service                                                                            3
       of the encoding, decoding and notification because             them, to the point where not all interrupts are handled and packets                                                                          0
       too much for the network interface and operating              start to be dropped as requests are ignored. These results prompted                                                                               0
    400
        800
      1200
      1600
     2000
       system.
                                                      a move from 388 byte to 1540 byte packets.
                                                                                                                    Datagram Size (bytes)




 4                                                                                                                  CONCLUSION
       While the new generation radio telescopes pose great computational challenges, they are also pushing the boundaries of network capacity and performance. A combination of high
       quality network hardware and multiple-core processors are required in order to receive and process data simultaneously. Depending on the level of processing and integration
       required, and in a trade off between memory usage and performance, parallel threads may be required at multiple levels.

       The architecture described above has been tested on Intel processors and network interfaces, running Ubuntu Linux, to successfully receive, process and integrate many Gigabytes of
       data without missing a single packet. Further work involves testing the architecture in a switched network environment and deploying the system in the field in late 2009.

           Melbourne Thermochronology

Mais conteúdo relacionado

Mais procurados

SE PAI Unit 5_Serial Port Programming in 8051 microcontroller_Part 1
SE PAI Unit 5_Serial Port Programming in 8051 microcontroller_Part 1SE PAI Unit 5_Serial Port Programming in 8051 microcontroller_Part 1
SE PAI Unit 5_Serial Port Programming in 8051 microcontroller_Part 1KanchanPatil34
 
SE PAI Unit 5_Serial Port Programming in 8051 microcontroller_Part 2
SE PAI Unit 5_Serial Port Programming in 8051 microcontroller_Part 2SE PAI Unit 5_Serial Port Programming in 8051 microcontroller_Part 2
SE PAI Unit 5_Serial Port Programming in 8051 microcontroller_Part 2KanchanPatil34
 
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...IOSR Journals
 
power efficient rake receiver for interference reduction in
power efficient rake receiver for  interference reduction inpower efficient rake receiver for  interference reduction in
power efficient rake receiver for interference reduction inAKASH VIJAYAN
 
Design and implementation of qpsk modulator using digital subcarrier
Design and implementation of qpsk modulator using digital subcarrierDesign and implementation of qpsk modulator using digital subcarrier
Design and implementation of qpsk modulator using digital subcarrierGongadi Nagaraju
 
IEEE_Peer_Reviewed_Paper_1
IEEE_Peer_Reviewed_Paper_1IEEE_Peer_Reviewed_Paper_1
IEEE_Peer_Reviewed_Paper_1Saad Mahboob
 

Mais procurados (16)

Isa scada overview
Isa scada overviewIsa scada overview
Isa scada overview
 
SE PAI Unit 5_Serial Port Programming in 8051 microcontroller_Part 1
SE PAI Unit 5_Serial Port Programming in 8051 microcontroller_Part 1SE PAI Unit 5_Serial Port Programming in 8051 microcontroller_Part 1
SE PAI Unit 5_Serial Port Programming in 8051 microcontroller_Part 1
 
SE PAI Unit 5_Serial Port Programming in 8051 microcontroller_Part 2
SE PAI Unit 5_Serial Port Programming in 8051 microcontroller_Part 2SE PAI Unit 5_Serial Port Programming in 8051 microcontroller_Part 2
SE PAI Unit 5_Serial Port Programming in 8051 microcontroller_Part 2
 
Dc lab Manual
Dc lab ManualDc lab Manual
Dc lab Manual
 
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
 
rake reciever ppt
rake reciever pptrake reciever ppt
rake reciever ppt
 
252 256
252 256252 256
252 256
 
power efficient rake receiver for interference reduction in
power efficient rake receiver for  interference reduction inpower efficient rake receiver for  interference reduction in
power efficient rake receiver for interference reduction in
 
37 44
37 4437 44
37 44
 
115 118
115 118115 118
115 118
 
intro_dgital_TV
intro_dgital_TVintro_dgital_TV
intro_dgital_TV
 
270 273
270 273270 273
270 273
 
Spectra dtp4700h march2012_final
Spectra dtp4700h march2012_finalSpectra dtp4700h march2012_final
Spectra dtp4700h march2012_final
 
Design and implementation of qpsk modulator using digital subcarrier
Design and implementation of qpsk modulator using digital subcarrierDesign and implementation of qpsk modulator using digital subcarrier
Design and implementation of qpsk modulator using digital subcarrier
 
UMKC Dynamics of BER smaller
UMKC Dynamics of BER smallerUMKC Dynamics of BER smaller
UMKC Dynamics of BER smaller
 
IEEE_Peer_Reviewed_Paper_1
IEEE_Peer_Reviewed_Paper_1IEEE_Peer_Reviewed_Paper_1
IEEE_Peer_Reviewed_Paper_1
 

Semelhante a Multithreaded Data Transport

Understanding ISP Wiring
Understanding ISP WiringUnderstanding ISP Wiring
Understanding ISP Wiringrns-usa
 
SIEMENS PXG3.L BACnet/IP Router
SIEMENS PXG3.L BACnet/IP RouterSIEMENS PXG3.L BACnet/IP Router
SIEMENS PXG3.L BACnet/IP RouterCONTROLS & SYSTEMS
 
Wireless agro automation system
Wireless agro automation systemWireless agro automation system
Wireless agro automation systemkarthikpunuru
 
Performance Improvement of IEEE 802.22 WRAN Physical Layer
Performance Improvement of IEEE 802.22 WRAN Physical LayerPerformance Improvement of IEEE 802.22 WRAN Physical Layer
Performance Improvement of IEEE 802.22 WRAN Physical LayerIOSR Journals
 
Performance Improvement of IEEE 802.22 WRAN Physical Layer
Performance Improvement of IEEE 802.22 WRAN Physical LayerPerformance Improvement of IEEE 802.22 WRAN Physical Layer
Performance Improvement of IEEE 802.22 WRAN Physical LayerIOSR Journals
 
Performance Improvement of IEEE 802.22 WRAN Physical Layer
Performance Improvement of IEEE 802.22 WRAN Physical LayerPerformance Improvement of IEEE 802.22 WRAN Physical Layer
Performance Improvement of IEEE 802.22 WRAN Physical LayerIOSR Journals
 
American Fibertek MTX-91685C-SL User Manual
American Fibertek MTX-91685C-SL User ManualAmerican Fibertek MTX-91685C-SL User Manual
American Fibertek MTX-91685C-SL User ManualJMAC Supply
 
underground cable fault location using aruino,gsm&gps
underground cable fault location using aruino,gsm&gps underground cable fault location using aruino,gsm&gps
underground cable fault location using aruino,gsm&gps Mohd Sohail
 
Enhanced Data rates for Global Evolution (EDGE)
Enhanced Data rates for Global Evolution (EDGE)Enhanced Data rates for Global Evolution (EDGE)
Enhanced Data rates for Global Evolution (EDGE)Ramrao Desai
 
Shunra VE Network Appliance
Shunra VE Network ApplianceShunra VE Network Appliance
Shunra VE Network ApplianceShunra Software
 
Study and Emulation of 10G-EPON with Triple Play
Study and Emulation of 10G-EPON with Triple PlayStudy and Emulation of 10G-EPON with Triple Play
Study and Emulation of 10G-EPON with Triple PlaySatya Prakash Rout
 

Semelhante a Multithreaded Data Transport (20)

42
4242
42
 
Understanding ISP Wiring
Understanding ISP WiringUnderstanding ISP Wiring
Understanding ISP Wiring
 
SIEMENS PXG3.L BACnet/IP Router
SIEMENS PXG3.L BACnet/IP RouterSIEMENS PXG3.L BACnet/IP Router
SIEMENS PXG3.L BACnet/IP Router
 
Ijebea14 238
Ijebea14 238Ijebea14 238
Ijebea14 238
 
Wireless agro automation system
Wireless agro automation systemWireless agro automation system
Wireless agro automation system
 
Ax31338342
Ax31338342Ax31338342
Ax31338342
 
Wlan 1
Wlan 1Wlan 1
Wlan 1
 
Project_intership
Project_intershipProject_intership
Project_intership
 
Project_intership
Project_intershipProject_intership
Project_intership
 
Performance Improvement of IEEE 802.22 WRAN Physical Layer
Performance Improvement of IEEE 802.22 WRAN Physical LayerPerformance Improvement of IEEE 802.22 WRAN Physical Layer
Performance Improvement of IEEE 802.22 WRAN Physical Layer
 
Performance Improvement of IEEE 802.22 WRAN Physical Layer
Performance Improvement of IEEE 802.22 WRAN Physical LayerPerformance Improvement of IEEE 802.22 WRAN Physical Layer
Performance Improvement of IEEE 802.22 WRAN Physical Layer
 
Performance Improvement of IEEE 802.22 WRAN Physical Layer
Performance Improvement of IEEE 802.22 WRAN Physical LayerPerformance Improvement of IEEE 802.22 WRAN Physical Layer
Performance Improvement of IEEE 802.22 WRAN Physical Layer
 
Technotrend
TechnotrendTechnotrend
Technotrend
 
American Fibertek MTX-91685C-SL User Manual
American Fibertek MTX-91685C-SL User ManualAmerican Fibertek MTX-91685C-SL User Manual
American Fibertek MTX-91685C-SL User Manual
 
underground cable fault location using aruino,gsm&gps
underground cable fault location using aruino,gsm&gps underground cable fault location using aruino,gsm&gps
underground cable fault location using aruino,gsm&gps
 
Enhanced Data rates for Global Evolution (EDGE)
Enhanced Data rates for Global Evolution (EDGE)Enhanced Data rates for Global Evolution (EDGE)
Enhanced Data rates for Global Evolution (EDGE)
 
Shunra VE Network Appliance
Shunra VE Network ApplianceShunra VE Network Appliance
Shunra VE Network Appliance
 
Globalinvacom
GlobalinvacomGlobalinvacom
Globalinvacom
 
Presentation8
Presentation8Presentation8
Presentation8
 
Study and Emulation of 10G-EPON with Triple Play
Study and Emulation of 10G-EPON with Triple PlayStudy and Emulation of 10G-EPON with Triple Play
Study and Emulation of 10G-EPON with Triple Play
 

Mais de sgleadow

Evolving Mobile Architectures @ Mi9
Evolving Mobile Architectures @ Mi9Evolving Mobile Architectures @ Mi9
Evolving Mobile Architectures @ Mi9sgleadow
 
Evolving for Multiple Screens
Evolving for Multiple ScreensEvolving for Multiple Screens
Evolving for Multiple Screenssgleadow
 
Mobile: more than just an app
Mobile: more than just an appMobile: more than just an app
Mobile: more than just an appsgleadow
 
Evolving Mobile Architectures
Evolving Mobile ArchitecturesEvolving Mobile Architectures
Evolving Mobile Architecturessgleadow
 
Building mobile teams and getting a product to market
Building mobile teams and getting a product to marketBuilding mobile teams and getting a product to market
Building mobile teams and getting a product to marketsgleadow
 
iOS Unit Testing
iOS Unit TestingiOS Unit Testing
iOS Unit Testingsgleadow
 
iOS app case study
iOS app case studyiOS app case study
iOS app case studysgleadow
 
iOS View Coordinators
iOS View CoordinatorsiOS View Coordinators
iOS View Coordinatorssgleadow
 
Frank iOS Testing
Frank iOS TestingFrank iOS Testing
Frank iOS Testingsgleadow
 
A few design patterns
A few design patternsA few design patterns
A few design patternssgleadow
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programmingsgleadow
 
Cocoa Design Patterns
Cocoa Design PatternsCocoa Design Patterns
Cocoa Design Patternssgleadow
 
Beginning iPhone Development
Beginning iPhone DevelopmentBeginning iPhone Development
Beginning iPhone Developmentsgleadow
 

Mais de sgleadow (14)

Evolving Mobile Architectures @ Mi9
Evolving Mobile Architectures @ Mi9Evolving Mobile Architectures @ Mi9
Evolving Mobile Architectures @ Mi9
 
Evolving for Multiple Screens
Evolving for Multiple ScreensEvolving for Multiple Screens
Evolving for Multiple Screens
 
Mobile: more than just an app
Mobile: more than just an appMobile: more than just an app
Mobile: more than just an app
 
Evolving Mobile Architectures
Evolving Mobile ArchitecturesEvolving Mobile Architectures
Evolving Mobile Architectures
 
Building mobile teams and getting a product to market
Building mobile teams and getting a product to marketBuilding mobile teams and getting a product to market
Building mobile teams and getting a product to market
 
iOS Unit Testing
iOS Unit TestingiOS Unit Testing
iOS Unit Testing
 
iOS app case study
iOS app case studyiOS app case study
iOS app case study
 
Agile iOS
Agile iOSAgile iOS
Agile iOS
 
iOS View Coordinators
iOS View CoordinatorsiOS View Coordinators
iOS View Coordinators
 
Frank iOS Testing
Frank iOS TestingFrank iOS Testing
Frank iOS Testing
 
A few design patterns
A few design patternsA few design patterns
A few design patterns
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
 
Cocoa Design Patterns
Cocoa Design PatternsCocoa Design Patterns
Cocoa Design Patterns
 
Beginning iPhone Development
Beginning iPhone DevelopmentBeginning iPhone Development
Beginning iPhone Development
 

Multithreaded Data Transport

  • 1. High Speed Data Ingestion and Processing for MWA Stewart Gleadow (and the team from MWA) School of Physics, University of Melbourne, Victoria 3010, Australia gleadows@unimelb.edu.au The MWA radio telescope requires the interaction of hardware and software systems at close to link capacity, with minimal transmission loss and maximum throughput. Using the parallel thread architecture described below, we aim to operate high speed network connections and process data products simultaneously. 1 MWA REAL TIME SYSTEM Basic structure of the MWA, from antennas The Murchison Widefield Array (MWA) is a low- ANTENNAS / BEAMFORMERS to output data products. Shows the main frequency radio telescope currently being deployed in high-speed hardware to software interface Western Australia using 512 dipole-based antennas. at the input from the correlator to the RTS. With over 130,000 baselines and around 800 fine HARDWARE frequency channels, there is a significant computational challenge facing the Real Time System RECEIVERS (RTS) software. A prototype system with 32 antennas For 32-tile demonstration, each of four is presently being used to test the hardware and computing nodes receives: software solutions from end-to-end. •  correlations for both polarizations from all CORRELATOR antennas Before calibration and imaging can occur, the RTS •  192 x 40KHz frequency channels must ingest and integrate correlated data at high •  ~0.5 Gbit/s data SOFTWARE speeds; around 0.5 Gigabit/sec per network interface on a Beowulf-style cluster. The data is transferred REAL TIME SYSTEM using UDP packets over Gigabit Ethernet, with as close to zero data loss as possible. OUTPUT / STORAGE 2 DATA INGESTION CHALLENGE The MWA hardware correlator sends out packet data representing a full set of visibilities and channels PACKET VISIBILITY CORRELATOR MAIN RTS every 50ms, which means only tens of µs per packet. READER INTEGRATOR In order to operate at close to The RTS runs on an 8 second cadence, so visibilities gigabit speeds, a hierarchy of need to be integrator to this level. parallel threads is required. Each packet/20µs 20µs to 1s 1s to 8s 8s cadence only does a small amount of In order to avoid overflows or loss in the network processing in order to operate card and kernel memory, a custom buffering system is quickly while still reaching the Buffer One: required. The goal is to allow the correlator, network higher data level required by the interface and the main RTS calibration and imaging to rest of the calibration and imaging Buffer Two: run in parallel, without losing data in between. processes. UDP does not guarantee successful transmission, but in our testing, with a direct Gigabit Ethernet connection (no switch), there is no packet loss other Each thread uses double buffers (shown in diagram), so that there is one set of than from buffer overflows. This only occurs when data currently being filled by each thread, and another that is already full and being packets are not read from the network interface fast passed on to the next level. This allows each thread to operate in parallel, while enough. each set of data still passes through each phase in the order it arrived from the correlator. 3 THREADED HEIRARCHY 1000 Left: Plot of effective bandwidth using UDP packets for various When approaching link capacity, one thread is datagram sizes. Bandwidth (Mbit/sec) dedicated to constantly reading packets from the 800 Below: Plot of percentage packet loss against UDP payload size. network interface to avoid buffer overflows and 600 (tests performed by Steve Ord, Harvard-Smithsonian Center for Astrophysics) packet loss. In order to operate at close to Gigabit (new packet size) speeds, a hierarchy of parallel threads is required. 400 200 (original packet size) Buffering all packets for 8 seconds would introduce 18 0 heavy memory requirements. Hence, an intermediate Percentage Loss (%) 0 400 800 1200 1600 2000 15 thread processing a mid-level time resolution is Datagram Size (bytes) 12 required. 9 6 Theoretical network performance is difficult to The poor network performance for small packets is caused by the achieve using small packets because of the overhead kernel becoming flooded with interrupts faster than it can service 3 of the encoding, decoding and notification because them, to the point where not all interrupts are handled and packets 0 too much for the network interface and operating start to be dropped as requests are ignored. These results prompted 0 400 800 1200 1600 2000 system. a move from 388 byte to 1540 byte packets. Datagram Size (bytes) 4 CONCLUSION While the new generation radio telescopes pose great computational challenges, they are also pushing the boundaries of network capacity and performance. A combination of high quality network hardware and multiple-core processors are required in order to receive and process data simultaneously. Depending on the level of processing and integration required, and in a trade off between memory usage and performance, parallel threads may be required at multiple levels. The architecture described above has been tested on Intel processors and network interfaces, running Ubuntu Linux, to successfully receive, process and integrate many Gigabytes of data without missing a single packet. Further work involves testing the architecture in a switched network environment and deploying the system in the field in late 2009. Melbourne Thermochronology