SlideShare uma empresa Scribd logo
1 de 40
Baixar para ler offline
 
   Experience	
  with	
  100Gbps	
  Network	
  
              Applications	
  
Mehmet	
  Balman,	
  Eric	
  Pouyoul,	
  Yushu	
  Yao,	
  E.	
  Wes	
  
Bethel,	
  Burlen	
  Loring,	
  Prabhat,	
  John	
  Shalf,	
  Alex	
  
Sim,	
  and	
  Brian	
  L.	
  Tierney	
  


                                               DIDC	
  –	
  Del/,	
  the	
  Netherlands	
  
                                                                       June	
  19,	
  2012	
  
 
Experience	
  with	
  
100Gbps	
  Network	
  
Applications	
  
                Mehmet	
  Balman	
  
      ComputaConal	
  Research	
  Division	
  
   Lawrence	
  Berkeley	
  NaConal	
  Laboratory	
  

                                       DIDC	
  –	
  Del/,	
  the	
  Netherlands	
  
                                                               June	
  19,	
  2012	
  
Outline	
  
•  A	
  recent	
  100Gbps	
  demo	
  by	
  ESnet	
  and	
  Internet2	
  
   at	
  SC11	
  

•  Two	
  applicaCons:	
  
   •  VisualizaCon	
  of	
  remotely	
  located	
  data	
  
      (Cosmology)	
  
   •  Data	
  movement	
  of	
  large	
  	
  datasets	
  with	
  many	
  
      files	
  (Climate	
  analysis)	
  

Our	
  experience	
  in	
  applicaCon	
  design	
  issues	
  and	
  
host	
  tuning	
  strategies	
  to	
  scale	
  to	
  100Gbps	
  rates	
  
The	
  Need	
  for	
  100Gbps	
  networks	
  

Modern	
  science	
  is	
  Data	
  driven	
  and	
  Collabora?ve	
  in	
  
nature	
  
  •  The	
  largest	
  collaboraCons	
  are	
  most	
  likely	
  to	
  
     depend	
  on	
  distributed	
  architectures.	
  	
  
       •  LHC	
  (distributed	
  architecture)	
  data	
  generaCon,	
  
          distribuCon,	
  and	
  analysis.	
  	
  
       •  The	
  volume	
  of	
  data	
  produced	
  by	
  of	
  genomic	
  
          sequencers	
  is	
  rising	
  exponenCally.	
  	
  
       •  In	
  climate	
  science,	
  researchers	
  must	
  analyze	
  
          observaConal	
  and	
  simulaCon	
  data	
  located	
  at	
  
          faciliCes	
  around	
  the	
  world	
  	
  
ESG	
  (Earth	
  Systems	
  Grid)	
  
•  Over	
  2,700	
  sites	
  
•  25,000	
  users	
  




                                •  IPCC	
  Fi[h	
  Assessment	
  Report	
  
                                   (AR5)	
  2PB	
  	
  
                                •  IPCC	
  Forth	
  Assessment	
  Report	
  
                                   (AR4)	
  35TB	
  
100Gbps	
  networks	
  arrived	
  

•  Increasing	
  network	
  bandwidth	
  is	
  an	
  important	
  step	
  
   toward	
  tackling	
  ever-­‐growing	
  scienCfic	
  datasets.	
  

   •  1Gbps	
  to	
  10Gbps	
  transiCon	
  (10	
  years	
  ago)	
  
       •  ApplicaCon	
  did	
  not	
  run	
  10	
  Cmes	
  faster	
  because	
  
          there	
  was	
  more	
  bandwidth	
  available	
  

   In	
  order	
  to	
  take	
  advantage	
  of	
  the	
  higher	
  network	
  
   capacity,	
  we	
  need	
  to	
  pay	
  close	
  a_enCon	
  to	
  the	
  
   applicaCon	
  design	
  and	
  host	
  tuning	
  issues	
  
Applications’	
  Perspective	
  

•  Increasing	
  the	
  bandwidth	
  is	
  not	
  sufficient	
  by	
  itself;	
  
   we	
  need	
  careful	
  evaluaCon	
  of	
  high-­‐bandwidth	
  
   networks	
  from	
  the	
  applicaCons’	
  perspecCve.	
  	
  
•  Real	
  ?me	
  streaming	
  and	
  visualiza?on	
  of	
  cosmology	
  
   data	
  
   •  How	
  high	
  network	
  capability	
  enables	
  remotely	
  located	
  
      scien6sts	
  to	
  gain	
  insights	
  from	
  large	
  data	
  volumes?	
  
•  Data	
  distribu?on	
  for	
  climate	
  science	
  
   •  	
  How	
  scien6fic	
  data	
  movement	
  and	
  analysis	
  between	
  
      geographically	
  disparate	
  supercompu6ng	
  facili6es	
  can	
  
      benefit	
  from	
  high-­‐bandwidth	
  networks?	
  
The	
  SC11	
  100Gbps	
  demo	
  
•  100Gbps	
  connecCon	
  between	
  ANL	
  (Argon),	
  NERSC	
  at	
  LBNL,	
  
   ORNL	
  (Oak	
  Ridge),	
  and	
  the	
  SC11	
  booth	
  (Sea_le)	
  
Demo	
  ConFiguration	
  




RRT:	
  	
  	
  	
  	
  	
  	
  	
  	
  Sea_le	
  –	
  NERSC	
  	
  16ms	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  NERSC	
  –	
  ANL	
  	
  	
  	
  	
  	
  	
  50ms	
  
                                                 	
  	
  	
  	
  	
  NERSC	
  –	
  ORNL	
  	
  	
  	
  64ms	
  
Visualizing	
  the	
  Universe	
  at	
  100Gbps	
  
•  VisaPult	
  for	
  streaming	
  data	
  
•  Paraview	
  for	
  rendering	
  
	
  
•  For	
  visualizaCon	
  purposes,	
  occasional	
  packet	
  loss	
  is	
  
     acceptable:	
  (using	
  UDP)	
  

•  90Gbps	
  of	
  the	
  bandwidth	
  is	
  used	
  for	
  the	
  full	
  dataset	
  
       •  4x10Gbps	
  NICs	
  (4	
  hosts)	
  
•  10Gbps	
  of	
  the	
  bandwidth	
  is	
  used	
  for	
  1/8	
  of	
  the	
  same	
  
   dataset	
  
       •  10Gbps	
  NIC	
  (one	
  host)	
  
Demo	
  ConFiguration	
  
     Infiniband               10GigE                                 1 GigE
     Connection               Connection                         Connection


      Sender 01                                                        Receive/
                                                                       Render
      Sender 02                                                          H1

                                                                       Receive/
      Sender 03                                                        Render
                                                                         H2
        ……
                                                                       Receive/
                                               LBL                     Render
      Sender 16          NERSC        100G                               H3
                                      Pipe
                                              Booth
                         Router
                                              Router                   Receive/
                                                                       Render
                                                                         H4
 IB Cloud


                                                Low          High                 Gigabit
                                             Bandwidth     Bandwidth              Ethernet
                                              Receive/      Vis Srv
Flash-based                                  Render/Vis
GPFS Cluster

                                    Low                                   High
                               Bandwidth                                  Bandwidth
                                 Display                                  Display




 The	
  1Gbps	
  connecCon	
  is	
  used	
  for	
  synchronizaCon	
  and	
  communicaCon	
  
 of	
  the	
  rendering	
  applicaCon,	
  not	
  for	
  transfer	
  of	
  the	
  raw	
  data	
  	
  
 	
  
UDP	
  shufFling	
  
     •  UDP	
  packets	
  include	
  posiCon	
  (x,	
  y,	
  z)	
  informaCon	
  
          (1024ˆ3	
  matrix)	
  	
  
     	
  
•  MTU	
  is	
  9000.	
  The	
  largest	
  possible	
  packet	
  	
  size	
  under	
  
     8972	
  bytes	
  (MTU	
  size	
  minus	
  IP	
  and	
  UDP	
  headers)	
  	
  
	
  
•  560	
  points	
  in	
  each	
  packet,	
  8968	
  bytes	
  
     •  3	
  integers	
  (x,y,z)	
  +	
  a	
  floaCng	
  point	
  value	
  
                                               UDP Packet



                    Batch#,n       X1Y1Z1D1       X2Y2Z2D2      ……        XnYnZnDn

                       In the final run n=560, packet size is 8968 bytes
Data	
  Flow	
  
•  For	
  each	
  Cme	
  step,	
  the	
  input	
  data	
  is	
  split	
  into	
  32	
  
   streams	
  along	
  the	
  z-­‐direcCon;	
  each	
  stream	
  contains	
  
   a	
  conCguous	
  slice	
  of	
  the	
  size	
  1024	
  ∗	
  1024	
  ∗	
  32.	
  	
  
•  32	
  streams	
  for	
  90Gbps	
  demo	
  (high-­‐bandwidth)	
  
•  4	
  streams	
  for	
  10Gbps	
  demo	
  (low	
  bandwidth,	
  1/8	
  of	
  
   the	
  data)	
  
     GPFS                                                               Flow of Data
     Flash




     Stager            Shuffler                       Receiver           Render SW




              /dev/shm                                           /dev/shm

 Send Server at NERSC                                      Receive Server at SC Booth
Performance	
  Optimization	
  

•  Each	
  10Gbps	
  NIC	
  in	
  the	
  system	
  is	
  bound	
  to	
  a	
  specific	
  
   core.	
  
•  Receiver	
  processes	
  are	
  also	
  bound	
  to	
  the	
  same	
  core	
  
•  Renderer	
  same	
  NUMA	
  node	
  but	
  different	
  core	
  
   (accessing	
  to	
  the	
  same	
  memory	
  region)	
  
                                                                           NUMA Node
                                      10G Port       Mem       Mem
                                     Receiver 1

                                     Receiver 2
                                                                            Core
                                     Render 1

                                     Render 2



                                                     Mem       Mem
Network	
  Utilization	
  

•  2.3TB	
  of	
  data	
  from	
  NERSC	
  to	
  SC11	
  booth	
  in	
  Sea_le	
  in	
  
   ≈	
  3.4	
  minutes	
  
•  For	
  each	
  Cmestep,	
  corresponding	
  to	
  16GB	
  of	
  data,	
  it	
  
   took	
  ≈	
  1.4	
  seconds	
  to	
  transfer	
  	
  and	
  ≈	
  2.5	
  seconds	
  for	
  
   rendering	
  before	
  the	
  image	
  was	
  updated.	
  

•  	
  Peak	
  ≈	
  99Gbps.	
  	
  
•  Average	
  ≈	
  85Gbps	
  
Demo	
  
Climate	
  Data	
  Distribution	
  
•  ESG	
  data	
  nodes	
  
   •  Data	
  replicaCon	
  in	
  the	
  
      ESG	
  FederaCon	
  

•  Local	
  copies	
  
   •  data	
  files	
  are	
  copied	
  
      into	
  temporary	
  
      storage	
  in	
  HPC	
  centers	
  
      for	
  post-­‐processing	
  
      and	
  further	
  climate	
  
      analysis.	
  	
  
Climate	
  Data	
  over	
  100Gbps	
  
•  Data	
  volume	
  in	
  climate	
  applicaCons	
  is	
  increasing	
  
   exponenCally.	
  
•  An	
  important	
  challenge	
  in	
  managing	
  ever	
  increasing	
  data	
  sizes	
  
   in	
  climate	
  science	
  is	
  the	
  large	
  variance	
  in	
  file	
  sizes.	
  	
  
   •  Climate	
  simulaCon	
  data	
  consists	
  of	
  a	
  mix	
  of	
  relaCvely	
  small	
  and	
  
      large	
  files	
  with	
  irregular	
  file	
  size	
  distribuCon	
  in	
  each	
  dataset.	
  	
  
            •  Many	
  small	
  files	
  
Keep	
  the	
  data	
  channel	
  full	
  
                                       request
  request a file                       data


      send file
                                       send data




request a file


      send file




                                          RPC
       FTP

                 •  Concurrent	
  transfers	
  
                 •  Parallel	
  streams	
  
lots-­‐of-­‐small-­‐Files	
  problem!	
  
                      File-­‐centric	
  tools?	
  	
  
l    Not	
  necessarily	
  high-­‐speed	
  (same	
  distance)	
  
             -      Latency	
  is	
  sCll	
  a	
  problem	
  
                                                                request a dataset


                                                                       send data




       100Gbps pipe            10Gbps pipe
Block-­‐based	
  




        Front-­‐end	
            Memory	
     network
        threads	
  (access	
     blocks                 Memory	
  
        to	
  memory	
                                                      Front-­‐end	
  
                                                        blocks
        blocks)                                                               threads	
  
                                                                            (access	
  to	
  
                                                                             memory	
  
                                                                              blocks)




memory	
  caches	
  are	
  logically	
  mapped	
  between	
  client	
  and	
  server	
  	
  
Moving	
  climate	
  Files	
  efFiciently	
  
Advantages	
  
•  Decoupling	
  I/O	
  and	
  network	
  operaCons	
  
         •  front-­‐end	
  (I/O,	
  proccesing)	
  
         •  back-­‐end	
  (networking	
  layer)	
  
         	
  
•  Not	
  limited	
  by	
  the	
  characterisCcs	
  of	
  the	
  file	
  sizes	
  
     	
  On	
  the	
  fly	
  tar	
  approach,	
  	
  bundling	
  and	
  sending	
  
     	
  many	
  files	
  together	
  

•  Dynamic	
  data	
  channel	
  management	
  
         	
  Can	
  increase/decrease	
  the	
  parallelism	
  level	
  both	
  
         	
  in	
  the	
  network	
  communicaCon	
  and	
  I/O	
  read/write	
  
         	
  operaCons,	
  without	
  closing	
  and	
  reopening	
  the	
  
         	
  data	
  channel	
  connecCon	
  (as	
  is	
  done	
  in	
  regular	
  FTP	
  
         	
  variants).	
  	
  
Demo	
  ConFiguration	
  




Disk	
  to	
  memory	
  /	
  reading	
  from	
  GPFS	
  (NERSC),	
  max	
  120Gbps	
  read	
  performance	
  	
  
The	
  SC11	
  100Gbps	
  Demo	
  

•  CMIP3	
  data	
  (35TB)	
  from	
  the	
  GPFS	
  filesystem	
  at	
  NERSC	
  	
  	
  
     •  Block	
  size	
  4MB	
  
     •  Each	
  block’s	
  data	
  secCon	
  was	
  aligned	
  according	
  to	
  the	
  
        system	
  pagesize.	
  	
  
     •  1GB	
  cache	
  both	
  at	
  the	
  client	
  and	
  the	
  server	
  	
  
	
  
•  At	
  NERSC,	
  8	
  front-­‐end	
  threads	
  on	
  each	
  host	
  for	
  reading	
  data	
  
   files	
  in	
  parallel.	
  
•  	
  At	
  ANL/ORNL,	
  4	
  front-­‐end	
  threads	
  for	
  processing	
  received	
  
   data	
  blocks.	
  
•  	
  4	
  parallel	
  TCP	
  streams	
  (four	
  back-­‐end	
  threads)	
  were	
  used	
  for	
  
   each	
  host-­‐to-­‐host	
  connecCon.	
  	
  
83Gbps	
  	
  
throughput	
  
MemzNet:	
  memory-­‐mapped	
  zero-­‐copy	
  
               Network	
  channel	
  




        Front-­‐end	
            Memory	
     network
        threads	
  (access	
     blocks                 Memory	
  
        to	
  memory	
                                                      Front-­‐end	
  
                                                        blocks
        blocks)                                                               threads	
  
                                                                            (access	
  to	
  
                                                                             memory	
  
                                                                              blocks)




memory	
  caches	
  are	
  logically	
  mapped	
  between	
  client	
  and	
  server	
  	
  
ANI	
  100Gbps	
  	
  testbed	
  
                                                                    ANI Middleware Testbed

 NERSC
                                                                                  To ESnet
                                                                                                                                                                                                                                   ANL
                                                                                     10G
                                                                                                                                                    To ESnet


                                                                        1GE

                                                                                                                                                      10G
                                                  nersc-asw1                     Site Router
                                                                                 (nersc-mr2)
                                                                                                               ANI 100G Network                                      1GE
                                                                                                                                                                              anl-asw1

                                                                                                                                                                              1 GE
                                                                nersc-C2940                                                                         ANL Site
                                                                   switch                                                                            Router


                                           1 GE


                                                                                                                      100G                                                       anl-C2940
                                                                                                                             100G                                                  switch
                                        1 GE



                                   1 GE           eth0

                                                                                                                                                                                                       1 GE

                                                                  nersc-app
                                                                                                               100G
                                                                                                                                      100G

          nersc-diskpt-1 NICs:                                                                                                                                                                      1 GE
                                                                                          4x10GE (MM)                                                                                                         1 GE
          2: 2x10G Myricom
                                                                                     eth2-5
          1: 4x10G HotLava
                                 1 GE
                                                         eth0
                                                                nersc-diskpt-1
                                               10GE (MM)
          nersc-diskpt-2 NICs:           10GE (MM)
                                                                                                                                                                                                           1 GE
                                                                                                                                                                                         eth0
          1: 2x10G Myricom                                                               4x10GE (MM)
          1: 2x10G Chelsio                                                           eth2-5
                                                                                                             ANI 100G                                                        anl-app
          1: 6x10G HotLava                                                                                                          ANI 100G
                                                         eth0                                                 Router                                                                                              anl-mempt-1 NICs:
                                                                                                                                     Router                eth2-5                            eth0
                                        4x10GE (MM)              nersc-diskpt-2                                                                4x 10GE (MM)                                                       2: 2x10G Myricom

          nersc-diskpt-3 NICs:                                                                 4x10GE (MM)
          1: 2x10G Myricom                                                            eth2-5
                                                                                                                                                                           anl-mempt-1
          1: 2x10G Mellanox
                                                    eth0
          1: 6x10G HotLava                                                                                                                                                                   eth0                    anl-mempt-2 NICs:
                                                                                                                                                            eth2-5
                                                                 nersc-diskpt-3                                                                                                                                      2: 2x10G Myricom
                                                                                                                                                4x10GE (MM)


                                                                                                                                                                           anl-mempt-2
                                                                                                                                                                                                eth0
                                                                                                                                                                                                                  anl-mempt-3 NICs:
                                                                                                                                                            eth2-5
                                                                                                                                                                                                                  1: 2x10G Myricom
                                                                                                                                               4x10GE (MM)
                                                                                                                                                                                                                  1: 2x10G Mellanox
 Note: ANI 100G routers and 100G wave available till summer 2012;
 Testbed resources after that subject funding availability.                                                                                                                anl-mempt-3

                                                                                                                                                                                                                          Updated December 11, 2011
ANI Middleware Testbed


                                                                                                                                                                                                                                                     ANI	
  100Gbps	
  	
  
NERSC
                                                                                 To ESnet
                                                                                                                                                                                                                                  ANL
                                                                                    10G
                                                                                                                                                   To ESnet


                                                                       1GE

                                                                                                                                                     10G
                                                 nersc-asw1                     Site Router




                                                                                                                                                                                                                                                             testbed	
  
                                                                                (nersc-mr2)
                                                                                                              ANI 100G Network                                      1GE
                                                                                                                                                                             anl-asw1

                                                                                                                                                                             1 GE
                                                               nersc-C2940                                                                         ANL Site
                                                                  switch                                                                            Router


                                          1 GE


                                                                                                                     100G                                                       anl-C2940
                                                                                                                            100G                                                  switch
                                       1 GE



                                  1 GE           eth0

                                                                                                                                                                                                      1 GE

                                                                 nersc-app
                                                                                                              100G
                                                                                                                                     100G

         nersc-diskpt-1 NICs:                                                                                                                                                                      1 GE
                                                                                         4x10GE (MM)                                                                                                         1 GE
         2: 2x10G Myricom
                                                                                    eth2-5
         1: 4x10G HotLava
                                1 GE
                                                        eth0
                                                               nersc-diskpt-1
                                              10GE (MM)
         nersc-diskpt-2 NICs:           10GE (MM)
                                                                                                                                                                                                          1 GE
                                                                                                                                                                                        eth0
         1: 2x10G Myricom                                                               4x10GE (MM)
         1: 2x10G Chelsio                                                           eth2-5
                                                                                                            ANI 100G                                                        anl-app
         1: 6x10G HotLava                                                                                                          ANI 100G
                                                        eth0                                                 Router                                                                                              anl-mempt-1 NICs:
                                                                                                                                    Router                eth2-5                            eth0
                                       4x10GE (MM)              nersc-diskpt-2                                                                4x 10GE (MM)                                                       2: 2x10G Myricom

         nersc-diskpt-3 NICs:                                                                 4x10GE (MM)
         1: 2x10G Myricom                                                            eth2-5
                                                                                                                                                                          anl-mempt-1
         1: 2x10G Mellanox
                                                   eth0
         1: 6x10G HotLava                                                                                                                                                                   eth0                    anl-mempt-2 NICs:
                                                                                                                                                           eth2-5
                                                                nersc-diskpt-3                                                                                                                                      2: 2x10G Myricom
                                                                                                                                               4x10GE (MM)


                                                                                                                                                                          anl-mempt-2
                                                                                                                                                                                               eth0
                                                                                                                                                                                                                 anl-mempt-3 NICs:
                                                                                                                                                           eth2-5
                                                                                                                                                                                                                 1: 2x10G Myricom
                                                                                                                                              4x10GE (MM)
                                                                                                                                                                                                                 1: 2x10G Mellanox
Note: ANI 100G routers and 100G wave available till summer 2012;
Testbed resources after that subject funding availability.                                                                                                                anl-mempt-3

                                                                                                                                                                                                                         Updated December 11, 2011




                 SC11	
  100Gbps	
  	
  
                 demo	
  
Many	
  TCP	
  Streams	
  




ANI testbed 100Gbps (10x10NICs, three hosts): Throughput
vs the number of parallel streams [1, 2, 4, 8, 16, 32 64
streams - 5min intervals], TCP buffer size is 50M
ANI testbed 100Gbps (10x10NICs, three hosts): Interface
traffic vs the number of concurrent transfers [1, 2, 4, 8,
16, 32 64 concurrent jobs - 5min intervals], TCP buffer
size is 50M
Performance	
  at	
  SC11	
  demo	
  




                    GridFTP                               MemzNet




TCP	
  buffer	
  size	
  is	
  set	
  to	
  50MB	
  	
  
Performance	
  results	
  in	
  the	
  100Gbps	
  
          ANI	
  testbed	
  
Host	
  tuning	
  
	
  
With	
  proper	
  tuning,	
  we	
  achieved	
  98Gbps	
  using	
  only	
  3	
  
sending	
  hosts,	
  3	
  receiving	
  hosts,	
  10	
  10GE	
  NICS,	
  and	
  10	
  
TCP	
  flows	
  
NIC/TCP	
  Tuning	
  
•  We	
  are	
  using	
  Myricom	
  10G	
  NIC	
  (100Gbps	
  testbed)	
  
   •  Download	
  latest	
  drive/firmware	
  from	
  vendor	
  site	
  
       •  Version	
  of	
  driver	
  in	
  RHEL/CentOS	
  fairly	
  old	
  
   •  Enable	
  MSI-­‐X	
  
   •  Increase	
  txgueuelen	
  
       /sbin/ifconfig eth2 txqueuelen 10000!
   •  Increase	
  Interrupt	
  coalescence!
       /usr/sbin/ethtool -C eth2 rx-usecs 100!
•  TCP	
  Tuning:	
  
   net.core.rmem_max = 67108864!
   net.core.wmem_max = 67108864!
   net.core.netdev_max_backlog =	
  250000	
  
100Gbps	
  =	
  It’s	
  full	
  of	
  frames	
  !	
  
•  Problem:	
  
   •  Interrupts	
  are	
  very	
  expensive	
  
   •  Even	
  with	
  jumbo	
  frames	
  and	
  driver	
  opCmizaCon,	
  	
  
      there	
  is	
  sCll	
  too	
  many	
  interrupts.	
  

•  SoluCon:	
  
   •  Turn	
  off	
  Linux	
  irqbalance	
  (chkconfig	
  irqbalance	
  off)	
  
   •  Use	
  /proc/interrupt	
  to	
  get	
  the	
  list	
  of	
  interrupts	
  
   •  Dedicate	
  an	
  enCre	
  processor	
  core	
  for	
  each	
  10G	
  
      interface	
  
   •  Use	
  /proc/irq/<irq-­‐number>/smp_affinity	
  to	
  bind	
  rx/tx	
  
      queues	
  to	
  a	
  specific	
  core.	
  
Host Tuning Results
       45
       40
       35
       30
       25
Gbps




       20
                                                                  without tuning
       15                                                         with tuning
       10
        5
        0
             Interrupt    Interrupt   IRQ Binding   IRQ Binding
            coalescing   coalescing     (TCP)         (UDP)
              (TCP)        (UDP)
Conclusion	
  

•  Host	
  tuning	
  &	
  host	
  performance	
  	
  
•  MulCple	
  NICs	
  and	
  mulCple	
  cores	
  
•  The	
  effect	
  of	
  the	
  applicaCon	
  design	
  

   •  TCP/UDP	
  buffer	
  tuning,	
  using	
  jumbo	
  frames,	
  and	
  
      interrupt	
  coalescing.	
  	
  	
  
   •  MulC-­‐core	
  systems:	
  IRQ	
  binding	
  is	
  now	
  essenCal	
  
      for	
  maximizing	
  host	
  performance.	
  	
  
Acknowledgements	
  
Peter	
   Nugent,	
   Zarija	
   Lukic	
   ,	
   Patrick	
   Dorn,	
   Evangelos	
  
Chaniotakis,	
   John	
   Christman,	
   Chin	
   Guok,	
   Chris	
   Tracy,	
   Lauren	
  
Rotman,	
   Jason	
   Lee,	
   Shane	
   Canon,	
   Tina	
   Declerck,	
   Cary	
  
Whitney,	
   Ed	
   Holohan,	
  	
   Adam	
   Scovel,	
   Linda	
   Winkler,	
   Jason	
   Hill,	
  
Doug	
  Fuller,	
   	
  Susan	
  Hicks,	
  Hank	
  Childs,	
  Mark	
  Howison,	
  Aaron	
  
Thomas,	
  John	
  Dugan,	
  Gopal	
  Vaswani	
  
Ques?ons?	
  
Contact:	
  
  • Mehmet	
  Balman	
  mbalman@lbl.gov	
  
  • Brian	
  Tierney	
  blCerney@es.net	
  

Mais conteúdo relacionado

Mais procurados

EXPERIENCES WITH HIGH DEFINITION INTERACTIVE VIDEO ...
EXPERIENCES WITH HIGH DEFINITION INTERACTIVE VIDEO ...EXPERIENCES WITH HIGH DEFINITION INTERACTIVE VIDEO ...
EXPERIENCES WITH HIGH DEFINITION INTERACTIVE VIDEO ...Videoguy
 
Transrating_Efficiency
Transrating_EfficiencyTransrating_Efficiency
Transrating_Efficiencyaniruddh Tyagi
 
Understanding Low And Scalable Mpi Latency
Understanding Low And Scalable Mpi LatencyUnderstanding Low And Scalable Mpi Latency
Understanding Low And Scalable Mpi Latencyseiland
 
haffman coding DCT transform
haffman coding DCT transformhaffman coding DCT transform
haffman coding DCT transformaniruddh Tyagi
 
Ofc 2011 Pd Veljanovski 100 G Cp Qpsk Transmission Over 4108 Km Of Dep...
Ofc 2011   Pd    Veljanovski   100 G Cp Qpsk Transmission Over 4108 Km Of Dep...Ofc 2011   Pd    Veljanovski   100 G Cp Qpsk Transmission Over 4108 Km Of Dep...
Ofc 2011 Pd Veljanovski 100 G Cp Qpsk Transmission Over 4108 Km Of Dep...Nestor Garrafa
 
Stefano Giordano
Stefano GiordanoStefano Giordano
Stefano GiordanoGoWireless
 
A Two-Tiered On-Line Server-Side Bandwidth Reservation Framework for the Real...
A Two-Tiered On-Line Server-Side Bandwidth Reservation Framework for the Real...A Two-Tiered On-Line Server-Side Bandwidth Reservation Framework for the Real...
A Two-Tiered On-Line Server-Side Bandwidth Reservation Framework for the Real...white paper
 
Bachelor Thesis Presentation: Analysis and Simulation Of Channel Switching In...
Bachelor Thesis Presentation: Analysis and Simulation Of Channel Switching In...Bachelor Thesis Presentation: Analysis and Simulation Of Channel Switching In...
Bachelor Thesis Presentation: Analysis and Simulation Of Channel Switching In...Laili Aidi
 
Toward an Understanding of the Processing Delay of Peer-to-Peer Relay Nodes
Toward an Understanding of the Processing Delay of Peer-to-Peer Relay NodesToward an Understanding of the Processing Delay of Peer-to-Peer Relay Nodes
Toward an Understanding of the Processing Delay of Peer-to-Peer Relay NodesAcademia Sinica
 
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Delay optimal broadcast for multihop...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Delay optimal broadcast for multihop...DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Delay optimal broadcast for multihop...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Delay optimal broadcast for multihop...IEEEGLOBALSOFTTECHNOLOGIES
 
Virtual Network Performance Challenge
Virtual Network Performance ChallengeVirtual Network Performance Challenge
Virtual Network Performance ChallengeStephen Hemminger
 
Synched E Harvesting Wireless Sensors For Sensors Expo 2009 Dist
Synched E Harvesting Wireless Sensors For Sensors Expo 2009 DistSynched E Harvesting Wireless Sensors For Sensors Expo 2009 Dist
Synched E Harvesting Wireless Sensors For Sensors Expo 2009 Distswarms_2009
 
Microsoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresMicrosoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresVideoguy
 
Energy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingEnergy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingVideoguy
 
Fault tolerant energy aware data dissemination protocol in WSN
Fault tolerant energy aware data dissemination protocol in WSNFault tolerant energy aware data dissemination protocol in WSN
Fault tolerant energy aware data dissemination protocol in WSNPrajwal Panchmahalkar
 

Mais procurados (19)

EXPERIENCES WITH HIGH DEFINITION INTERACTIVE VIDEO ...
EXPERIENCES WITH HIGH DEFINITION INTERACTIVE VIDEO ...EXPERIENCES WITH HIGH DEFINITION INTERACTIVE VIDEO ...
EXPERIENCES WITH HIGH DEFINITION INTERACTIVE VIDEO ...
 
2 applications.key
2 applications.key2 applications.key
2 applications.key
 
Transrating_Efficiency
Transrating_EfficiencyTransrating_Efficiency
Transrating_Efficiency
 
Understanding Low And Scalable Mpi Latency
Understanding Low And Scalable Mpi LatencyUnderstanding Low And Scalable Mpi Latency
Understanding Low And Scalable Mpi Latency
 
haffman coding DCT transform
haffman coding DCT transformhaffman coding DCT transform
haffman coding DCT transform
 
Ofc 2011 Pd Veljanovski 100 G Cp Qpsk Transmission Over 4108 Km Of Dep...
Ofc 2011   Pd    Veljanovski   100 G Cp Qpsk Transmission Over 4108 Km Of Dep...Ofc 2011   Pd    Veljanovski   100 G Cp Qpsk Transmission Over 4108 Km Of Dep...
Ofc 2011 Pd Veljanovski 100 G Cp Qpsk Transmission Over 4108 Km Of Dep...
 
Stefano Giordano
Stefano GiordanoStefano Giordano
Stefano Giordano
 
A Two-Tiered On-Line Server-Side Bandwidth Reservation Framework for the Real...
A Two-Tiered On-Line Server-Side Bandwidth Reservation Framework for the Real...A Two-Tiered On-Line Server-Side Bandwidth Reservation Framework for the Real...
A Two-Tiered On-Line Server-Side Bandwidth Reservation Framework for the Real...
 
Bachelor Thesis Presentation: Analysis and Simulation Of Channel Switching In...
Bachelor Thesis Presentation: Analysis and Simulation Of Channel Switching In...Bachelor Thesis Presentation: Analysis and Simulation Of Channel Switching In...
Bachelor Thesis Presentation: Analysis and Simulation Of Channel Switching In...
 
Channel element
Channel elementChannel element
Channel element
 
Toward an Understanding of the Processing Delay of Peer-to-Peer Relay Nodes
Toward an Understanding of the Processing Delay of Peer-to-Peer Relay NodesToward an Understanding of the Processing Delay of Peer-to-Peer Relay Nodes
Toward an Understanding of the Processing Delay of Peer-to-Peer Relay Nodes
 
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Delay optimal broadcast for multihop...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Delay optimal broadcast for multihop...DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Delay optimal broadcast for multihop...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Delay optimal broadcast for multihop...
 
Virtual Network Performance Challenge
Virtual Network Performance ChallengeVirtual Network Performance Challenge
Virtual Network Performance Challenge
 
Synched E Harvesting Wireless Sensors For Sensors Expo 2009 Dist
Synched E Harvesting Wireless Sensors For Sensors Expo 2009 DistSynched E Harvesting Wireless Sensors For Sensors Expo 2009 Dist
Synched E Harvesting Wireless Sensors For Sensors Expo 2009 Dist
 
Virtual net performance
Virtual net performanceVirtual net performance
Virtual net performance
 
4 network.key
4 network.key4 network.key
4 network.key
 
Microsoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresMicrosoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_Pres
 
Energy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingEnergy-Aware Wireless Video Streaming
Energy-Aware Wireless Video Streaming
 
Fault tolerant energy aware data dissemination protocol in WSN
Fault tolerant energy aware data dissemination protocol in WSNFault tolerant energy aware data dissemination protocol in WSN
Fault tolerant energy aware data dissemination protocol in WSN
 

Destaque

Streaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networksStreaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networksbalmanme
 
2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The TrenchesGeorge Ang
 
2008-11.04 Citizen Science
2008-11.04 Citizen Science2008-11.04 Citizen Science
2008-11.04 Citizen ScienceYu Alex
 
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAAPM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAbalmanme
 
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation NetworksAnalyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation Networksbalmanme
 
2011-2012 Season
2011-2012 Season2011-2012 Season
2011-2012 Seasontbancel
 
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...balmanme
 
2011-2012 Season
2011-2012 Season2011-2012 Season
2011-2012 Seasontbancel
 
Welcome ndm11
Welcome ndm11Welcome ndm11
Welcome ndm11balmanme
 
無線寬頻網路
無線寬頻網路無線寬頻網路
無線寬頻網路9577601
 
Internetsafety
InternetsafetyInternetsafety
Internetsafetyt2839ms18
 
Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011balmanme
 
Topic Review(Trauma Cholesyctitis)
Topic Review(Trauma Cholesyctitis)Topic Review(Trauma Cholesyctitis)
Topic Review(Trauma Cholesyctitis)Vincent Wang
 
Lblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarLblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarbalmanme
 
Aug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarAug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarbalmanme
 
Presentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerPresentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerbalmanme
 
Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopbalmanme
 
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010balmanme
 
Pdcs2010 balman-presentation
Pdcs2010 balman-presentationPdcs2010 balman-presentation
Pdcs2010 balman-presentationbalmanme
 

Destaque (20)

Streaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networksStreaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networks
 
2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches
 
2008-11.04 Citizen Science
2008-11.04 Citizen Science2008-11.04 Citizen Science
2008-11.04 Citizen Science
 
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAAPM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
 
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation NetworksAnalyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
 
2011-2012 Season
2011-2012 Season2011-2012 Season
2011-2012 Season
 
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
 
2011-2012 Season
2011-2012 Season2011-2012 Season
2011-2012 Season
 
Welcome ndm11
Welcome ndm11Welcome ndm11
Welcome ndm11
 
無線寬頻網路
無線寬頻網路無線寬頻網路
無線寬頻網路
 
Internetsafety
InternetsafetyInternetsafety
Internetsafety
 
Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011
 
Topic Review(Trauma Cholesyctitis)
Topic Review(Trauma Cholesyctitis)Topic Review(Trauma Cholesyctitis)
Topic Review(Trauma Cholesyctitis)
 
Ch14
Ch14Ch14
Ch14
 
Lblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarLblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminar
 
Aug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarAug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminar
 
Presentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerPresentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summer
 
Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshop
 
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
 
Pdcs2010 balman-presentation
Pdcs2010 balman-presentationPdcs2010 balman-presentation
Pdcs2010 balman-presentation
 

Semelhante a HPDC 2012 presentation - June 19, 2012 - Delft, The Netherlands

History of 100G and Internet2
History of 100G and Internet2History of 100G and Internet2
History of 100G and Internet2Ed Dodds
 
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PROIDEA
 
Newtec - Optimisation of Satellite Capacity Efficiency for IP Trunking Applic...
Newtec - Optimisation of Satellite Capacity Efficiency for IP Trunking Applic...Newtec - Optimisation of Satellite Capacity Efficiency for IP Trunking Applic...
Newtec - Optimisation of Satellite Capacity Efficiency for IP Trunking Applic...Sematron UK Ltd
 
How networks are build
How networks are buildHow networks are build
How networks are buildMike Siowa
 
Experiences with High-bandwidth Networks
Experiences with High-bandwidth NetworksExperiences with High-bandwidth Networks
Experiences with High-bandwidth Networksbalmanme
 
100G Networking Berlin.pdf
100G Networking Berlin.pdf100G Networking Berlin.pdf
100G Networking Berlin.pdfJunZhao68
 
I Lab3 I Lab Testcenteroverview
I Lab3 I Lab TestcenteroverviewI Lab3 I Lab Testcenteroverview
I Lab3 I Lab Testcenteroverviewimec.archive
 
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Heiko Joerg Schick
 
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...Larry Smarr
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Open stack in action cern _openstack_accelerating_science
Open stack in action  cern _openstack_accelerating_scienceOpen stack in action  cern _openstack_accelerating_science
Open stack in action cern _openstack_accelerating_scienceeNovance
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004xlight
 
LAN, WAN, SAN upgrades: hyperconverged vs traditional vs cloud
LAN, WAN, SAN upgrades: hyperconverged vs traditional vs cloudLAN, WAN, SAN upgrades: hyperconverged vs traditional vs cloud
LAN, WAN, SAN upgrades: hyperconverged vs traditional vs cloudJisc
 
Positioning techniques in 3 g networks (1)
Positioning techniques in 3 g networks (1)Positioning techniques in 3 g networks (1)
Positioning techniques in 3 g networks (1)kike2005
 
PLNOG 22 - Aleksandra Chećko, Robert Cieloch - 5G: wydatek czy oszczędność?
PLNOG 22 - Aleksandra Chećko, Robert Cieloch - 5G: wydatek czy oszczędność?PLNOG 22 - Aleksandra Chećko, Robert Cieloch - 5G: wydatek czy oszczędność?
PLNOG 22 - Aleksandra Chećko, Robert Cieloch - 5G: wydatek czy oszczędność?PROIDEA
 
Ip over wdm
Ip over wdmIp over wdm
Ip over wdmzeedoui2
 

Semelhante a HPDC 2012 presentation - June 19, 2012 - Delft, The Netherlands (20)

History of 100G and Internet2
History of 100G and Internet2History of 100G and Internet2
History of 100G and Internet2
 
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
 
Newtec - Optimisation of Satellite Capacity Efficiency for IP Trunking Applic...
Newtec - Optimisation of Satellite Capacity Efficiency for IP Trunking Applic...Newtec - Optimisation of Satellite Capacity Efficiency for IP Trunking Applic...
Newtec - Optimisation of Satellite Capacity Efficiency for IP Trunking Applic...
 
Newtec FlexACM
Newtec FlexACMNewtec FlexACM
Newtec FlexACM
 
How networks are build
How networks are buildHow networks are build
How networks are build
 
Experiences with High-bandwidth Networks
Experiences with High-bandwidth NetworksExperiences with High-bandwidth Networks
Experiences with High-bandwidth Networks
 
100G Networking Berlin.pdf
100G Networking Berlin.pdf100G Networking Berlin.pdf
100G Networking Berlin.pdf
 
I Lab3 I Lab Testcenteroverview
I Lab3 I Lab TestcenteroverviewI Lab3 I Lab Testcenteroverview
I Lab3 I Lab Testcenteroverview
 
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
 
Userspace networking
Userspace networkingUserspace networking
Userspace networking
 
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Open stack in action cern _openstack_accelerating_science
Open stack in action  cern _openstack_accelerating_scienceOpen stack in action  cern _openstack_accelerating_science
Open stack in action cern _openstack_accelerating_science
 
Ultra_Wide_Band_ppt
Ultra_Wide_Band_pptUltra_Wide_Band_ppt
Ultra_Wide_Band_ppt
 
PMC_6G
PMC_6GPMC_6G
PMC_6G
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004
 
LAN, WAN, SAN upgrades: hyperconverged vs traditional vs cloud
LAN, WAN, SAN upgrades: hyperconverged vs traditional vs cloudLAN, WAN, SAN upgrades: hyperconverged vs traditional vs cloud
LAN, WAN, SAN upgrades: hyperconverged vs traditional vs cloud
 
Positioning techniques in 3 g networks (1)
Positioning techniques in 3 g networks (1)Positioning techniques in 3 g networks (1)
Positioning techniques in 3 g networks (1)
 
PLNOG 22 - Aleksandra Chećko, Robert Cieloch - 5G: wydatek czy oszczędność?
PLNOG 22 - Aleksandra Chećko, Robert Cieloch - 5G: wydatek czy oszczędność?PLNOG 22 - Aleksandra Chećko, Robert Cieloch - 5G: wydatek czy oszczędność?
PLNOG 22 - Aleksandra Chećko, Robert Cieloch - 5G: wydatek czy oszczędność?
 
Ip over wdm
Ip over wdmIp over wdm
Ip over wdm
 

Mais de balmanme

Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...balmanme
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
 
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1balmanme
 
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...balmanme
 
Balman stork cw09
Balman stork cw09Balman stork cw09
Balman stork cw09balmanme
 
Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...balmanme
 
Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...balmanme
 
Dynamic adaptation balman
Dynamic adaptation balmanDynamic adaptation balman
Dynamic adaptation balmanbalmanme
 
Cybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-posterCybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-posterbalmanme
 
Balman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet BalmanBalman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet Balmanbalmanme
 
Opening ndm2012 sc12
Opening ndm2012 sc12Opening ndm2012 sc12
Opening ndm2012 sc12balmanme
 
Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation balmanme
 
2011 agu-town hall-100g
2011 agu-town hall-100g2011 agu-town hall-100g
2011 agu-town hall-100gbalmanme
 
Rdma presentation-kisti-v2
Rdma presentation-kisti-v2Rdma presentation-kisti-v2
Rdma presentation-kisti-v2balmanme
 

Mais de balmanme (14)

Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
 
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
 
Balman stork cw09
Balman stork cw09Balman stork cw09
Balman stork cw09
 
Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...
 
Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...
 
Dynamic adaptation balman
Dynamic adaptation balmanDynamic adaptation balman
Dynamic adaptation balman
 
Cybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-posterCybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-poster
 
Balman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet BalmanBalman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet Balman
 
Opening ndm2012 sc12
Opening ndm2012 sc12Opening ndm2012 sc12
Opening ndm2012 sc12
 
Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation
 
2011 agu-town hall-100g
2011 agu-town hall-100g2011 agu-town hall-100g
2011 agu-town hall-100g
 
Rdma presentation-kisti-v2
Rdma presentation-kisti-v2Rdma presentation-kisti-v2
Rdma presentation-kisti-v2
 

Último

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

HPDC 2012 presentation - June 19, 2012 - Delft, The Netherlands

  • 1.   Experience  with  100Gbps  Network   Applications   Mehmet  Balman,  Eric  Pouyoul,  Yushu  Yao,  E.  Wes   Bethel,  Burlen  Loring,  Prabhat,  John  Shalf,  Alex   Sim,  and  Brian  L.  Tierney   DIDC  –  Del/,  the  Netherlands   June  19,  2012  
  • 2.   Experience  with   100Gbps  Network   Applications   Mehmet  Balman   ComputaConal  Research  Division   Lawrence  Berkeley  NaConal  Laboratory   DIDC  –  Del/,  the  Netherlands   June  19,  2012  
  • 3. Outline   •  A  recent  100Gbps  demo  by  ESnet  and  Internet2   at  SC11   •  Two  applicaCons:   •  VisualizaCon  of  remotely  located  data   (Cosmology)   •  Data  movement  of  large    datasets  with  many   files  (Climate  analysis)   Our  experience  in  applicaCon  design  issues  and   host  tuning  strategies  to  scale  to  100Gbps  rates  
  • 4. The  Need  for  100Gbps  networks   Modern  science  is  Data  driven  and  Collabora?ve  in   nature   •  The  largest  collaboraCons  are  most  likely  to   depend  on  distributed  architectures.     •  LHC  (distributed  architecture)  data  generaCon,   distribuCon,  and  analysis.     •  The  volume  of  data  produced  by  of  genomic   sequencers  is  rising  exponenCally.     •  In  climate  science,  researchers  must  analyze   observaConal  and  simulaCon  data  located  at   faciliCes  around  the  world    
  • 5. ESG  (Earth  Systems  Grid)   •  Over  2,700  sites   •  25,000  users   •  IPCC  Fi[h  Assessment  Report   (AR5)  2PB     •  IPCC  Forth  Assessment  Report   (AR4)  35TB  
  • 6. 100Gbps  networks  arrived   •  Increasing  network  bandwidth  is  an  important  step   toward  tackling  ever-­‐growing  scienCfic  datasets.   •  1Gbps  to  10Gbps  transiCon  (10  years  ago)   •  ApplicaCon  did  not  run  10  Cmes  faster  because   there  was  more  bandwidth  available   In  order  to  take  advantage  of  the  higher  network   capacity,  we  need  to  pay  close  a_enCon  to  the   applicaCon  design  and  host  tuning  issues  
  • 7. Applications’  Perspective   •  Increasing  the  bandwidth  is  not  sufficient  by  itself;   we  need  careful  evaluaCon  of  high-­‐bandwidth   networks  from  the  applicaCons’  perspecCve.     •  Real  ?me  streaming  and  visualiza?on  of  cosmology   data   •  How  high  network  capability  enables  remotely  located   scien6sts  to  gain  insights  from  large  data  volumes?   •  Data  distribu?on  for  climate  science   •   How  scien6fic  data  movement  and  analysis  between   geographically  disparate  supercompu6ng  facili6es  can   benefit  from  high-­‐bandwidth  networks?  
  • 8. The  SC11  100Gbps  demo   •  100Gbps  connecCon  between  ANL  (Argon),  NERSC  at  LBNL,   ORNL  (Oak  Ridge),  and  the  SC11  booth  (Sea_le)  
  • 9. Demo  ConFiguration   RRT:                  Sea_le  –  NERSC    16ms                                    NERSC  –  ANL              50ms            NERSC  –  ORNL        64ms  
  • 10. Visualizing  the  Universe  at  100Gbps   •  VisaPult  for  streaming  data   •  Paraview  for  rendering     •  For  visualizaCon  purposes,  occasional  packet  loss  is   acceptable:  (using  UDP)   •  90Gbps  of  the  bandwidth  is  used  for  the  full  dataset   •  4x10Gbps  NICs  (4  hosts)   •  10Gbps  of  the  bandwidth  is  used  for  1/8  of  the  same   dataset   •  10Gbps  NIC  (one  host)  
  • 11. Demo  ConFiguration   Infiniband 10GigE 1 GigE Connection Connection Connection Sender 01 Receive/ Render Sender 02 H1 Receive/ Sender 03 Render H2 …… Receive/ LBL Render Sender 16 NERSC 100G H3 Pipe Booth Router Router Receive/ Render H4 IB Cloud Low High Gigabit Bandwidth Bandwidth Ethernet Receive/ Vis Srv Flash-based Render/Vis GPFS Cluster Low High Bandwidth Bandwidth Display Display The  1Gbps  connecCon  is  used  for  synchronizaCon  and  communicaCon   of  the  rendering  applicaCon,  not  for  transfer  of  the  raw  data      
  • 12. UDP  shufFling   •  UDP  packets  include  posiCon  (x,  y,  z)  informaCon   (1024ˆ3  matrix)       •  MTU  is  9000.  The  largest  possible  packet    size  under   8972  bytes  (MTU  size  minus  IP  and  UDP  headers)       •  560  points  in  each  packet,  8968  bytes   •  3  integers  (x,y,z)  +  a  floaCng  point  value   UDP Packet Batch#,n X1Y1Z1D1 X2Y2Z2D2 …… XnYnZnDn In the final run n=560, packet size is 8968 bytes
  • 13. Data  Flow   •  For  each  Cme  step,  the  input  data  is  split  into  32   streams  along  the  z-­‐direcCon;  each  stream  contains   a  conCguous  slice  of  the  size  1024  ∗  1024  ∗  32.     •  32  streams  for  90Gbps  demo  (high-­‐bandwidth)   •  4  streams  for  10Gbps  demo  (low  bandwidth,  1/8  of   the  data)   GPFS Flow of Data Flash Stager Shuffler Receiver Render SW /dev/shm /dev/shm Send Server at NERSC Receive Server at SC Booth
  • 14. Performance  Optimization   •  Each  10Gbps  NIC  in  the  system  is  bound  to  a  specific   core.   •  Receiver  processes  are  also  bound  to  the  same  core   •  Renderer  same  NUMA  node  but  different  core   (accessing  to  the  same  memory  region)   NUMA Node 10G Port Mem Mem Receiver 1 Receiver 2 Core Render 1 Render 2 Mem Mem
  • 15. Network  Utilization   •  2.3TB  of  data  from  NERSC  to  SC11  booth  in  Sea_le  in   ≈  3.4  minutes   •  For  each  Cmestep,  corresponding  to  16GB  of  data,  it   took  ≈  1.4  seconds  to  transfer    and  ≈  2.5  seconds  for   rendering  before  the  image  was  updated.   •   Peak  ≈  99Gbps.     •  Average  ≈  85Gbps  
  • 17. Climate  Data  Distribution   •  ESG  data  nodes   •  Data  replicaCon  in  the   ESG  FederaCon   •  Local  copies   •  data  files  are  copied   into  temporary   storage  in  HPC  centers   for  post-­‐processing   and  further  climate   analysis.    
  • 18. Climate  Data  over  100Gbps   •  Data  volume  in  climate  applicaCons  is  increasing   exponenCally.   •  An  important  challenge  in  managing  ever  increasing  data  sizes   in  climate  science  is  the  large  variance  in  file  sizes.     •  Climate  simulaCon  data  consists  of  a  mix  of  relaCvely  small  and   large  files  with  irregular  file  size  distribuCon  in  each  dataset.     •  Many  small  files  
  • 19. Keep  the  data  channel  full   request request a file data send file send data request a file send file RPC FTP •  Concurrent  transfers   •  Parallel  streams  
  • 20. lots-­‐of-­‐small-­‐Files  problem!   File-­‐centric  tools?     l  Not  necessarily  high-­‐speed  (same  distance)   -  Latency  is  sCll  a  problem   request a dataset send data 100Gbps pipe 10Gbps pipe
  • 21. Block-­‐based   Front-­‐end   Memory   network threads  (access   blocks Memory   to  memory   Front-­‐end   blocks blocks) threads   (access  to   memory   blocks) memory  caches  are  logically  mapped  between  client  and  server    
  • 22. Moving  climate  Files  efFiciently  
  • 23. Advantages   •  Decoupling  I/O  and  network  operaCons   •  front-­‐end  (I/O,  proccesing)   •  back-­‐end  (networking  layer)     •  Not  limited  by  the  characterisCcs  of  the  file  sizes    On  the  fly  tar  approach,    bundling  and  sending    many  files  together   •  Dynamic  data  channel  management    Can  increase/decrease  the  parallelism  level  both    in  the  network  communicaCon  and  I/O  read/write    operaCons,  without  closing  and  reopening  the    data  channel  connecCon  (as  is  done  in  regular  FTP    variants).    
  • 24. Demo  ConFiguration   Disk  to  memory  /  reading  from  GPFS  (NERSC),  max  120Gbps  read  performance    
  • 25. The  SC11  100Gbps  Demo   •  CMIP3  data  (35TB)  from  the  GPFS  filesystem  at  NERSC       •  Block  size  4MB   •  Each  block’s  data  secCon  was  aligned  according  to  the   system  pagesize.     •  1GB  cache  both  at  the  client  and  the  server       •  At  NERSC,  8  front-­‐end  threads  on  each  host  for  reading  data   files  in  parallel.   •   At  ANL/ORNL,  4  front-­‐end  threads  for  processing  received   data  blocks.   •   4  parallel  TCP  streams  (four  back-­‐end  threads)  were  used  for   each  host-­‐to-­‐host  connecCon.    
  • 27. MemzNet:  memory-­‐mapped  zero-­‐copy   Network  channel   Front-­‐end   Memory   network threads  (access   blocks Memory   to  memory   Front-­‐end   blocks blocks) threads   (access  to   memory   blocks) memory  caches  are  logically  mapped  between  client  and  server    
  • 28. ANI  100Gbps    testbed   ANI Middleware Testbed NERSC To ESnet ANL 10G To ESnet 1GE 10G nersc-asw1 Site Router (nersc-mr2) ANI 100G Network 1GE anl-asw1 1 GE nersc-C2940 ANL Site switch Router 1 GE 100G anl-C2940 100G switch 1 GE 1 GE eth0 1 GE nersc-app 100G 100G nersc-diskpt-1 NICs: 1 GE 4x10GE (MM) 1 GE 2: 2x10G Myricom eth2-5 1: 4x10G HotLava 1 GE eth0 nersc-diskpt-1 10GE (MM) nersc-diskpt-2 NICs: 10GE (MM) 1 GE eth0 1: 2x10G Myricom 4x10GE (MM) 1: 2x10G Chelsio eth2-5 ANI 100G anl-app 1: 6x10G HotLava ANI 100G eth0 Router anl-mempt-1 NICs: Router eth2-5 eth0 4x10GE (MM) nersc-diskpt-2 4x 10GE (MM) 2: 2x10G Myricom nersc-diskpt-3 NICs: 4x10GE (MM) 1: 2x10G Myricom eth2-5 anl-mempt-1 1: 2x10G Mellanox eth0 1: 6x10G HotLava eth0 anl-mempt-2 NICs: eth2-5 nersc-diskpt-3 2: 2x10G Myricom 4x10GE (MM) anl-mempt-2 eth0 anl-mempt-3 NICs: eth2-5 1: 2x10G Myricom 4x10GE (MM) 1: 2x10G Mellanox Note: ANI 100G routers and 100G wave available till summer 2012; Testbed resources after that subject funding availability. anl-mempt-3 Updated December 11, 2011
  • 29. ANI Middleware Testbed ANI  100Gbps     NERSC To ESnet ANL 10G To ESnet 1GE 10G nersc-asw1 Site Router testbed   (nersc-mr2) ANI 100G Network 1GE anl-asw1 1 GE nersc-C2940 ANL Site switch Router 1 GE 100G anl-C2940 100G switch 1 GE 1 GE eth0 1 GE nersc-app 100G 100G nersc-diskpt-1 NICs: 1 GE 4x10GE (MM) 1 GE 2: 2x10G Myricom eth2-5 1: 4x10G HotLava 1 GE eth0 nersc-diskpt-1 10GE (MM) nersc-diskpt-2 NICs: 10GE (MM) 1 GE eth0 1: 2x10G Myricom 4x10GE (MM) 1: 2x10G Chelsio eth2-5 ANI 100G anl-app 1: 6x10G HotLava ANI 100G eth0 Router anl-mempt-1 NICs: Router eth2-5 eth0 4x10GE (MM) nersc-diskpt-2 4x 10GE (MM) 2: 2x10G Myricom nersc-diskpt-3 NICs: 4x10GE (MM) 1: 2x10G Myricom eth2-5 anl-mempt-1 1: 2x10G Mellanox eth0 1: 6x10G HotLava eth0 anl-mempt-2 NICs: eth2-5 nersc-diskpt-3 2: 2x10G Myricom 4x10GE (MM) anl-mempt-2 eth0 anl-mempt-3 NICs: eth2-5 1: 2x10G Myricom 4x10GE (MM) 1: 2x10G Mellanox Note: ANI 100G routers and 100G wave available till summer 2012; Testbed resources after that subject funding availability. anl-mempt-3 Updated December 11, 2011 SC11  100Gbps     demo  
  • 30. Many  TCP  Streams   ANI testbed 100Gbps (10x10NICs, three hosts): Throughput vs the number of parallel streams [1, 2, 4, 8, 16, 32 64 streams - 5min intervals], TCP buffer size is 50M
  • 31. ANI testbed 100Gbps (10x10NICs, three hosts): Interface traffic vs the number of concurrent transfers [1, 2, 4, 8, 16, 32 64 concurrent jobs - 5min intervals], TCP buffer size is 50M
  • 32. Performance  at  SC11  demo   GridFTP MemzNet TCP  buffer  size  is  set  to  50MB    
  • 33. Performance  results  in  the  100Gbps   ANI  testbed  
  • 34. Host  tuning     With  proper  tuning,  we  achieved  98Gbps  using  only  3   sending  hosts,  3  receiving  hosts,  10  10GE  NICS,  and  10   TCP  flows  
  • 35. NIC/TCP  Tuning   •  We  are  using  Myricom  10G  NIC  (100Gbps  testbed)   •  Download  latest  drive/firmware  from  vendor  site   •  Version  of  driver  in  RHEL/CentOS  fairly  old   •  Enable  MSI-­‐X   •  Increase  txgueuelen   /sbin/ifconfig eth2 txqueuelen 10000! •  Increase  Interrupt  coalescence! /usr/sbin/ethtool -C eth2 rx-usecs 100! •  TCP  Tuning:   net.core.rmem_max = 67108864! net.core.wmem_max = 67108864! net.core.netdev_max_backlog =  250000  
  • 36. 100Gbps  =  It’s  full  of  frames  !   •  Problem:   •  Interrupts  are  very  expensive   •  Even  with  jumbo  frames  and  driver  opCmizaCon,     there  is  sCll  too  many  interrupts.   •  SoluCon:   •  Turn  off  Linux  irqbalance  (chkconfig  irqbalance  off)   •  Use  /proc/interrupt  to  get  the  list  of  interrupts   •  Dedicate  an  enCre  processor  core  for  each  10G   interface   •  Use  /proc/irq/<irq-­‐number>/smp_affinity  to  bind  rx/tx   queues  to  a  specific  core.  
  • 37. Host Tuning Results 45 40 35 30 25 Gbps 20 without tuning 15 with tuning 10 5 0 Interrupt Interrupt IRQ Binding IRQ Binding coalescing coalescing (TCP) (UDP) (TCP) (UDP)
  • 38. Conclusion   •  Host  tuning  &  host  performance     •  MulCple  NICs  and  mulCple  cores   •  The  effect  of  the  applicaCon  design   •  TCP/UDP  buffer  tuning,  using  jumbo  frames,  and   interrupt  coalescing.       •  MulC-­‐core  systems:  IRQ  binding  is  now  essenCal   for  maximizing  host  performance.    
  • 39. Acknowledgements   Peter   Nugent,   Zarija   Lukic   ,   Patrick   Dorn,   Evangelos   Chaniotakis,   John   Christman,   Chin   Guok,   Chris   Tracy,   Lauren   Rotman,   Jason   Lee,   Shane   Canon,   Tina   Declerck,   Cary   Whitney,   Ed   Holohan,     Adam   Scovel,   Linda   Winkler,   Jason   Hill,   Doug  Fuller,    Susan  Hicks,  Hank  Childs,  Mark  Howison,  Aaron   Thomas,  John  Dugan,  Gopal  Vaswani  
  • 40. Ques?ons?   Contact:   • Mehmet  Balman  mbalman@lbl.gov   • Brian  Tierney  blCerney@es.net