SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
Petabye Scale Data Challenge
 - Worldwide LHC Computing Grid

                 ASGC/Jason Shih
            Computex, Jun 2nd, 2010
Outline
  Objectives & Milestones
  WLCG experiment and ASGC Tier-1 Center
  Petabyte Scale Challenge
  Storage Management System
  System Architecture, Configuration and
 Performance
Objectives

 Building sustainable research and collaboration
infrastructure
 Support research by e-Science, on data intensive
sciences and applications require cross disciplinary
distributed collaboration
ASGC Milestone
  Operational from the deployment of LCG0 since 2002
  ASGC CA establish on 2005 (IGTF in same year)
  Tier-1 Center responsibility start from 2005
  Federated Taiwan Tier-2 center (Taiwan Analysis Facility, TAF)
is also collocated in ASGC
  Rep. of EGEE e-Science Asia Federation while joining EGEE
from 2004
  Providing Asia Pacific Regional Operation Center (APROC)
services to regional-wide WLCG/EGEE production
infrastructure from 2005
  Initiate Avian Flu Drug Discovery Project and collaborate with
EGEE in 2006
  Start of EUAsiaGrid Project from April 2008
LHC First Beam – Computing at the Petascale


  General Purpose, pp, heavy ions




LHCb: B-physics, CP Violation                  ALICE: Heavy ions, pp

                                      CMS: General Purpose, pp, heavy ions




      ATLAS: General Purpose, pp, heavy ions
Size of LHC Detector

                             ATLAS
        Bld. 40




                       CMS
Standard Cosmology

                   Good model from 0.01 sec
                   after Big Bang




                                                                               Energy, Density, Temperature
                   Supported by considerable
                   observational evidence




                                                                    Time
               Elementary Particle Physics

               From the Standard Model into the
               unknown: towards energies of
               1 TeV and beyond: the Terascale

               Towards Quantum Gravity

               From the unknown into the
               unknown...
        http://www.damtp.cam.ac.uk/user/gr/public/bb_history.html
     UNESCO Information                                                    7
Preservation debate, April 2007 -
    Jamie Shiers@cern ch
WLCG Timeline

 First Beam on LHC, Sep.
10, 2008
 Severe Incident after 3w
operation (3.5TeV)
Max CERN/T1-ASGC Point2Point
                                                                   Inbound : 9.3 Gbps
    ASGC - Introduction

                                                             1. Most Reliable T1: 98.83%
                                                             2. Very Highly Performing and
                                                              most Stable Site in CCRC08

                                                                            Asia Pacific Regional
 A Worldwide Grid                                                               Operation Center
 Infrastructure
 >250 sites, 48 countries
 >68,000 CPUs, >25 PetaBytes
 >10,000 users, >200 VOs
 >150,000 jobs/day




                                                                Best Demo Award of EGEE’07
Grid Application Platform                                         Avian Flu Drug Discovery




 Lightweight Problem Solving   Large Hadron Collider (LHC)
         Framework

                                                                             21
Collaborating e-Infrastructures




                                                          TWGRID


                                                         EUAsiaGrid


 Potential for linking ~80 countries




“Production” =
Reliable, sustainable, with commitments to quality of service
WLCG Computing Model
   - The Tier Structure
 Tier-0 (CERN)
   Data recording
   Initial data reconstruction
   Data distribution
 Tier-1 (11 countries)
   Permanent storage
   Re-processing
   Analysis
 Tier-2 (~130 countries)
   Simulation
   End-user analysis
Enabling Grids for E-sciencE




 Archeology
 Astronomy
 Astrophysics
 Civil Protection
 Comp. Chemistry
 Earth Sciences
 Finance
 Fusion
 Geophysics
 High Energy Physics
 Life Sciences
 Multimedia
 Material Sciences
 …

EGEE-II INFSO-RI-031688                                  EGEE07, Budapest, 1-5 October 2007   4
Why Petabyte? Challenges

 Why Petabyte?
   Experiment Computing Model
   Comparing with conventional data management
 Challenges
   Performance: LAN and WAN activities
    Sufficient B/W between CPU Farm
    Eliminate Uplink Bottleneck (Switch Tires)
   Fast responding of Critical Events
    Fabric Infrastructure & Service Level Agreement
   Scalability and Manageability
    Robust DB engine (Oracle RAC)
    KB and Adequate Administration (Training)
Tier Model and Data Management Components
WLCG Experiment Computing Model
ATLAS T1 Data Flow                                                            RAW

                                       RAW
                                                      Tape                    ESD (2x)
                                                                              AODm (10x)
                                       ESD2
               RAW
                                       AODm2                                  1 Hz
               1.6 GB/file
               0.02 Hz                 0.044 Hz                               85K f/day
               1.7K f/day              3.74K f/day                            720 MB/s
               32 MB/s                 44 MB/s
               2.7 TB/day              3.66 TB/day


 Tier-0                                                                                                  AODm1       AODm2
                                                       Disk                                              500 MB/file 500 MB/file
                                                                                                         0.04 Hz     0.04 Hz

 ESD1       AODm1            RAW        AOD2
                                                      Buffer   ESD2       AOD2       AODm2
                                                                                                         3.4K f/day 3.4K f/day
                                                                                                         20 MB/s     20 MB/s
                                                                                                         1.6 TB/day 1.6 TB/day
 0.5 GB/file 500 MB/file     1.6 GB/file 10 MB/file            0.5 GB/file 10 MB/file 500 MB/file
 0.02 Hz     0.04 Hz         0.02 Hz     0.2 Hz                0.02 Hz     0.2 Hz     0.004 Hz
 1.7K f/day 3.4K f/day       1.7K f/day 17K f/day              1.7K f/day 17K f/day 0.34K f/day
 10 MB/s     20 MB/s         32 MB/s     2 MB/s                10 MB/s     2 MB/s     2 MB/s
 0.8 TB/day 1.6 TB/day       2.7 TB/day 0.16 TB/day            0.8 TB/day 0.16 TB/day 0.16 TB/day                Each
                                                                                                                   T1
                                                                                                                 Tier-2
                                                                                                                    T1
                                                      CPU         Plus simulation and
 ESD2       AODm2
 0.5 GB/file 500 MB/file                              Farm         analysis data flow
 0.02 Hz     0.036 Hz
 1.7K f/day 3.1K f/day        ESD2       AODm2
 10 MB/s     18 MB/s          0.5 GB/file 500 MB/file
                                                                                           ESD2       AODm2
 0.8 TB/day 1.44 TB/day       0.02 Hz     0.004 Hz                                         0.5 GB/file 500 MB/file
                              1.7K f/day 0.34K f/day                                       0.02 Hz     0.036 Hz
                              10 MB/s     2 MB/s                                           1.7K f/day 3.1K f/day
                              0.8 TB/day 0.16 TB/day                                       10 MB/s     18 MB/s
  Other                                                                                    0.8 TB/day 1.44 TB/day Other
    T1                                             Disk                                                             T1
 Tier-1s
    T1                                                                                                           Tier-1s
                                                                                                                    T1
                                                  Storage
WLCG Tier-1
   - Defined Minimum Levels of Services.
 Define response time refer to max delay before taking action.
 Mean time repairing the service is also crucial but cover
indirectly through required availability target.
WLCG MoU & ASGC Resource Level
   - Pledged Resources and Projection
                  Year     CPU (HEP2k6)     Disk (PB)          Tape (PB)
                End 2009          29.5K           2.6             2.4
                Mou 2009          20K             3.0             3.0
                Mou 2010       28K                3.5             3.5
                6000     CPU MoU                                    6000
                         CPU
                5000                                                5000
 (Unit KSI2k)




                         Disk




                                                                           TeraByte
                4000     Tape                                       4000
                         DISK MoU
                3000                                                3000
                         Tape MoU
                2000                                                2000
                1000                                                1000
                  0                                                0
                  2005     2006     2007   2008         2009    2010
Data Management System
CASTOR V1
  CERN Advanced STORage
  Satisfactorily serving 10s of 1K
 Req/day/TB of Disk Cache
  Limitation: 1M files in cache
  Tape movement API not flexible

                                     CASTOR V2
                                       Centric DB Arch.
                                       Scheduling Feature
                                       GSI and Kerberos
                                       Resource Mgmt
                                       Resource Handling
CASTOR Configurations
   - Current Infrastructure

Shared cores services
   Serving: Atlas and CMS
   Services:
     Stager, NS, DLF, Repack, and LSF
   DB cluster
     Two DB Clusters (SRM and NS)
     5 Services (DB) split into two clusters
     5 Oracle Instances
  Total capacity: 0.63PB and 0.7PB for CMS and Atlas resp.
     Current usage: 63% and 44% for CMS and Atlas
CASTOR Configurations (cont’)
   - Disk Cache
 Disk pools & servers
   Performance (IOPS)
    With 0.5kB IO size: 76.4k and 54k for read & write resp.
    Slightly decrease around 9% for both read and write
   when inc. IO size to 4kB.
   80 disk servers (+6 will be online end of 3rdw Oct)
     Total capacity: 1.67PB (0.3PB allocate dynamically)
     Current usage: 0.79PB (~58% usage)
   14 disk pools (8 for atlas and 3 for CMS, another three
  for bio, SAM, and dynamic)
at
  la
       sG
       RO              Total Capacity (TB)
     bi UP
        om D




                         0
                        50
                       100
                       150
                       200
                       250
                       300
                       350
                       400
     at ed ISK
        la D           450
    cm sH 1T
         sW otD 0
    at A is
      la N k
          sP O
            rd UT
         at D
           la 0
       dt sS T1
    at ea tag
at l a m e
  la sM D
                                                           Install Capacity




    sS C 0T
                                                                              Disk Pool Configuration




         c T 0
                                                                                   - T1 MSS (CASTOR)




    at rat AP
      l a ch E
                                     Num of Disk Servers




          sP D
     cm rd isk
               D
    at sL 1T
       l a TD 0
    cm sM 0T
          sP CD 1
            rd ISK
               D
           S t 1T
              an 0
                db
                                                           Free Capacity




                   y
                       0
                       2
                       4
                       6
                       8
                       10
                       12
                       14
                       16
Distribution of Free Capacity
     - Per Disk Servers vs. per Pool
                   Standby
                 dteamD0T0
               cmsWANOUT
               cmsPrdD1T0
                cmsLTD0T1
               biomedD1T0
Disk Pool




                 atlasStage
            atlasScratchDisk
               atlasPrdD1T0
               atlasPrdD0T1
               atlasMCTAPE
               atlasMCDISK
                atlasHotDisk
            atlasGROUPDISK

                               0   50    100          150    200   250
                                        Free Capacity (TB)
Storage Server Generation
     - Drive vs. Total Capacity
 Total Capacity of Storage


                             800                                       37
                             700                          23
                                                                       741TB
     Generation (TB)




                             600                          683TB
                             500
                             400
                             300        6            18
                             200       238TB        235.5TB
                             100
                               0
                                   0     10         20            30   40
                                          Numer of Raid Subsystem
CASTOR Configurations (cont’)
   - Core Service Overview

Services      OS Level     Release          Remark
 Type
  Core      SLC 4.7/x86-64 2.1.7-19      Stager/NS/DLF
  SRM       SLC 4.7/x86-64 2.7-18        3 Head Nodes
Disk Svr.   SLC 4.7/x86-64 2.1.7-19   80 Q3 2k9 (20+ in Q4)
Tape Svr.   SLC 4.7/32 + 64 2.1.8-8    X86-64 OS deployed
CASTOR Configurations (cont’)
     - CMS Disk Cache: Current Resource Level
  Space Token
                    Capacity/    Disk     TapePool/
   Disk Pool
                    Job Limit   Servers   Capacity
 cmsLTD0T1         278TB/488       9          *
 cmsPrdD1T0        284TB/1560     13
 cmsWanOut          72TB/220       4
* Dep. on tape family.
CASTOR Configurations (cont’)
    - Atlas Disk Cache: Current Resource Level


 Space Token       Cap/JobLimit   DiskServers TapePool/Cap.
 atlasMCDISK        163TB/790         8             -
 atlasMCTAPE         38TB/80          2      atlasMCtp/39TB
 atlasPrdD1T0       278TB/810         15            -
                                             atlasPrdtp/105T
 atlasPrdD0T1       61TB/210          3
                                                    B
atlasGROUPDISK        19T/40          1             -
atlasScratchDisk     28TB/80          1             -
 atlasHotDisk        2/40TB           2             -
     Total         950TB/1835         46            -
IDC Collocation
 Facility install complete at Mar 27th
 Tape system delay after Apr 9th
   Realignment
   RMA for faulty parts
Storage Farm
 ~ 110 raid subsystem deployed since 2003.
 Supporting both Tier1 and 2 storage fabric
 DAS connection to frontend blade server
   Flexible switching front end server upon
  performance requirement
   4-8G fiber channel connectivity
CASTOR Configurations (cont’)
   - Tape Pool


                  Capacity      Drive      LTO3/4
   Tape Pool
                 (TB)/Usage   Dedication   Mixed
   atlasMCtp      8.98/40%        N          Y
   atlasPrdtp     101/65%         N          Y
cmsCSA08cruzet    15.6/46%        N          N
 cmsCSA08reco      5/0%           N          N
  cmsCSAtp        639/99%         N          Y
   cmsLTtp        34.4/44%        N          N
   dteamTest       3.5/1%         N          N
MSS Monitoring Services
Std. Nagios Probes
  NRPE + customized plugins
  SMS to OSE/SM for all types of critical
 alarms
Availability metrics
Tape metrics (SLS)
Throughput, capacity & scheduler per
  VO and Diskpool
MSS Tape System
    - Expansion/Upgrade Planning
Before incident:
   LTO3 * 8 + LTO4 * 4
   720TB with LTO3
   530TB with LTO4
May 2009:
   Two LOT3 drives
   MES: 6 LTO4 drives end of May
   Capacity: 1.3PB (old, LTO3,4 mixed) + 0.8PB (LTO4)
New S54 model introduce mid of 2009
   2K slots with tier model
   Required:
     Upgrade ALMS
     Enhanced gripper
MES Q3 2009
  18 LTO4 drives
  HA implementation resume in Q4
Expansion Planning
 2008
   0.5PB expansion of Tape system in Q2
   Meet MOU target mid of Nov.
   1.3MSI2k per rack base on recent E5450 processor.
 2009 Q1
    150 SMP/QC blade servers
    Raid subsystem consider 2TB per drive
    42TB net capacity per chassis and 0.75PB in total
 2009 Q3-4
    18 LTO4 drives – mid of Oct.
    330 Xeon QC (SMP, Intel 5450) blades servers
    2nd phase TAPE MES - 5 LTO4 drives + HA
    3rd phase TAPE MES – 6 LTO4 drives
    ETA 0.8PB expansion delivery: mid of Nov
Computing/Storage System Infrastructure

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Da
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        ta
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Ce
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             nte
                                                                                                                                             ASGC CASTOR2 Disk Farm                                                                                                                                                                                                                                                                                                                                                                                              r   –   CASTOR2 Tape
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         C3 Servers
                                                                                                                                                                                                                                                                                                                                       CASTOR2 Disk servers
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Ar
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             ch
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               ive
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Ro
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     om

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  2 * GE (LX) to 4F M160
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  (links to HK, JP Tier-2s)

                                                                                                                                                                                                                                                                                                                                                                                                                                                              2 * GE (LX) to 4F
                                   20 x Quanta Blades -                                                                                                                                                                                                                                                                                                                                                                                                   TaipeiGigaPoP-7609
                                           WN                                                                                               Core Services – CE,                                                                                                                                                                                                                                                                                           (links to TW Tier-2s)
        BladeCenter
                                                                                                                                           RB, DPM, PX, BDII etc.                                                                1
                                                                                                                                                                                                                                                            10GBASE-X
                                                                                                                                                                                                                                                                                                                        2                                                                                   3                                                                                 4
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             10G4X                41611




                                                                                                                                                                               Diag
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 1




                                                                                                                                                                               Stat
                                                                                                                                                                                                                                                            10GBASE-X                                                                                                                                                                                                                                        10G4X                41611
                                                                                                                                                                                                                                 1                                                                                      2                                                                                   3                                                                                 4




                                                                                                                                                                               Diag
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 2




                                                                                                                                                                               Stat
                                                                                                                                                                                                                                                            10GBASE-X                                                                                                                                                                                                                                        10G4X                41611
                                                                                                                                                                                                                                 1                                                                                      2                                                                                   3                                                                                 4




                                                                                                                                                                                                                                                  4 * GE (SX) to ASGC Distribution

                                                                                                                                                                               D iag
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 3




                                                                                                                                                                               Stat
                                                                                                                                                                                                                                                            10GBASE-X                                                                                                                                                                                                                                        10G4X                41611
                                                                                                                                                                                                                                 1                                                                                      2                                                                                   3                                                                                 4




                                                                                                                                                                                                                                                                  Switch in Rack#49
                                                                                                                                                                               Diag
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 4




                                                                                                                                                                               Stat
                                                                                                                                                                                                                                                                         10/100/1000BASE-T                                                                                                                                                                                                                        G48T            41511

                                                                                                                                                                                       1


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 5




                                                                                                                                                                               Diag
    1     2           3    4   5   6   7       8        9        10       11       12       13       14
                                                                                                                                                                    Diag
                                                                                                                                                                                  25




                                                                                                                                                                               Stat
                                                                                                                                                                                           1   25        2   26        3   27        4   28            5



                                                                                                                                                                                                                                                            (links to Tier-1 Servers)
                                                                                                                                                                                                                                                           29        6   30       7    31       8    32        9   33        10   34        11   35        12    36    13        37   14    38   15    39       16   40   17    41   18   42   19    43   20        44    21        45            22    46        23




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              G 4 8X a
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       47        24   48




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 4 1 54 2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 A




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 6
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 B


                          64 x IBM HS20
                                                                                                                                                                    Stat




                                                                                                                                                                           1       25          2    26        3   27        4   28            5   29            6   30        7   31        8   32        9   23        10   34        11   35        12    36        13    37        14   38    15   39        16   40   17   41    18   42    19   43        20    44        21        45            22    46        23   47        24    48




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 7



                          Blade system -
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 8


                                WN
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 9

                                               BladeCenter




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 10




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         DC SMR 48V / 100A
                                           1       2         3        4        5        6        7        8   9   10   11   12   13   14




                                                             142 x IBM HS21
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Battery   Battery
                                                              Blade system -                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               #1 + #2   #3 + #4
                                                                   WN
Throughput of WLCG Experiments
 Throughput defined as Job Eff. x # Jobs running
 Characteristic of 4 LHC Exp. depicting in-efficiency
is due to poor coding.
Reliability From Different View Perspective
Summary

 Deploy highly-scalable DM system and performance driven
storage infrastructure
   Eliminate possible complexity of SRM abstraction layer
   Resource utilization, provisioning and optimization
 From POC to Production, the challenges remains:
   Data Challenge, Service Challenge, CCRC08, STEP09, etc.
   Motivation appear clear for: Medical, Climate, Cosmological
   Operation wide:
     Robust Database setup
     KB for fabric infrastructure operation
     Fast enough event processing and documentation
 Consider beyond the data management use cases in WLCG:
   commonality in many other disciplines in EGEE infrastructure
   actively participate in e-Science collaboration within the region

Mais conteúdo relacionado

Mais procurados

Carbon_Nanotubes_Course_Presentation
Carbon_Nanotubes_Course_PresentationCarbon_Nanotubes_Course_Presentation
Carbon_Nanotubes_Course_PresentationPhillip Walker
 
Fabrication and characterization of one-dimensional solid-state model systems...
Fabrication and characterization of one-dimensional solid-state model systems...Fabrication and characterization of one-dimensional solid-state model systems...
Fabrication and characterization of one-dimensional solid-state model systems...François Bianco
 
CERN: Machine Protection Systems
CERN: Machine Protection SystemsCERN: Machine Protection Systems
CERN: Machine Protection Systemsdrbtodd
 
Cmrr Workshop 2009
Cmrr Workshop 2009Cmrr Workshop 2009
Cmrr Workshop 2009uzayemir
 
Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011
Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011
Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011Lewis Larsen
 
Rta Eas & Pittcon 2010 Sers Featured Talks
Rta Eas & Pittcon 2010 Sers Featured TalksRta Eas & Pittcon 2010 Sers Featured Talks
Rta Eas & Pittcon 2010 Sers Featured Talksinscore
 

Mais procurados (8)

Green electronics: a technology for a sustainable future.
Green electronics: a technology for a sustainable future.Green electronics: a technology for a sustainable future.
Green electronics: a technology for a sustainable future.
 
Carbon_Nanotubes_Course_Presentation
Carbon_Nanotubes_Course_PresentationCarbon_Nanotubes_Course_Presentation
Carbon_Nanotubes_Course_Presentation
 
Fabrication and characterization of one-dimensional solid-state model systems...
Fabrication and characterization of one-dimensional solid-state model systems...Fabrication and characterization of one-dimensional solid-state model systems...
Fabrication and characterization of one-dimensional solid-state model systems...
 
Surfaces of Metal Oxides.
Surfaces of Metal Oxides.Surfaces of Metal Oxides.
Surfaces of Metal Oxides.
 
CERN: Machine Protection Systems
CERN: Machine Protection SystemsCERN: Machine Protection Systems
CERN: Machine Protection Systems
 
Cmrr Workshop 2009
Cmrr Workshop 2009Cmrr Workshop 2009
Cmrr Workshop 2009
 
Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011
Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011
Lattice Energy LLC-Nickel-seed LENR Networks-April 20 2011
 
Rta Eas & Pittcon 2010 Sers Featured Talks
Rta Eas & Pittcon 2010 Sers Featured TalksRta Eas & Pittcon 2010 Sers Featured Talks
Rta Eas & Pittcon 2010 Sers Featured Talks
 

Semelhante a Petabye scale data challenge

Hpc, grid and cloud computing - the past, present, and future challenge
Hpc, grid and cloud computing - the past, present, and future challengeHpc, grid and cloud computing - the past, present, and future challenge
Hpc, grid and cloud computing - the past, present, and future challengeJason Shih
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004xlight
 
大強子計算網格與OSS
大強子計算網格與OSS大強子計算網格與OSS
大強子計算網格與OSSYuan CHAO
 
20121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v320121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v3Tim Bell
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NACLarry Smarr
 
Accelerating Science with OpenStack.pptx
Accelerating Science with OpenStack.pptxAccelerating Science with OpenStack.pptx
Accelerating Science with OpenStack.pptxOpenStack Foundation
 
20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating Science20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating ScienceTim Bell
 
20121017 OpenStack Accelerating Science
20121017 OpenStack Accelerating Science20121017 OpenStack Accelerating Science
20121017 OpenStack Accelerating ScienceTim Bell
 
Terabit Applications: What Are They, What is Needed to Enable Them?
Terabit Applications: What Are They, What is Needed to Enable Them?Terabit Applications: What Are They, What is Needed to Enable Them?
Terabit Applications: What Are They, What is Needed to Enable Them?Larry Smarr
 
ESS-Bilbao Initiative Workshop. RF structure comparison for low energy accele...
ESS-Bilbao Initiative Workshop. RF structure comparison for low energy accele...ESS-Bilbao Initiative Workshop. RF structure comparison for low energy accele...
ESS-Bilbao Initiative Workshop. RF structure comparison for low energy accele...ESS BILBAO
 
Big Fast Data in High-Energy Particle Physics
Big Fast Data in High-Energy Particle PhysicsBig Fast Data in High-Energy Particle Physics
Big Fast Data in High-Energy Particle PhysicsAndrew Lowe
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010TELECOM I+D
 
Solving Network Throughput Problems at the Diamond Light Source
Solving Network Throughput Problems at the Diamond Light SourceSolving Network Throughput Problems at the Diamond Light Source
Solving Network Throughput Problems at the Diamond Light SourceJisc
 
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Larry Smarr
 
Ayar Labs TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Pack...
Ayar Labs TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Pack...Ayar Labs TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Pack...
Ayar Labs TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Pack...inside-BigData.com
 
How to Terminate the GLIF by Building a Campus Big Data Freeway System
How to Terminate the GLIF by Building a Campus Big Data Freeway SystemHow to Terminate the GLIF by Building a Campus Big Data Freeway System
How to Terminate the GLIF by Building a Campus Big Data Freeway SystemLarry Smarr
 
Analysis Software Benchmark
Analysis Software BenchmarkAnalysis Software Benchmark
Analysis Software BenchmarkAkira Shibata
 

Semelhante a Petabye scale data challenge (20)

Hpc, grid and cloud computing - the past, present, and future challenge
Hpc, grid and cloud computing - the past, present, and future challengeHpc, grid and cloud computing - the past, present, and future challenge
Hpc, grid and cloud computing - the past, present, and future challenge
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004
 
大強子計算網格與OSS
大強子計算網格與OSS大強子計算網格與OSS
大強子計算網格與OSS
 
20121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v320121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v3
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NAC
 
Rapid optimisation techniques
Rapid optimisation techniquesRapid optimisation techniques
Rapid optimisation techniques
 
Accelerating Science with OpenStack.pptx
Accelerating Science with OpenStack.pptxAccelerating Science with OpenStack.pptx
Accelerating Science with OpenStack.pptx
 
20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating Science20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating Science
 
20121017 OpenStack Accelerating Science
20121017 OpenStack Accelerating Science20121017 OpenStack Accelerating Science
20121017 OpenStack Accelerating Science
 
Terabit Applications: What Are They, What is Needed to Enable Them?
Terabit Applications: What Are They, What is Needed to Enable Them?Terabit Applications: What Are They, What is Needed to Enable Them?
Terabit Applications: What Are They, What is Needed to Enable Them?
 
Mateo valero p1
Mateo valero p1Mateo valero p1
Mateo valero p1
 
ESS-Bilbao Initiative Workshop. RF structure comparison for low energy accele...
ESS-Bilbao Initiative Workshop. RF structure comparison for low energy accele...ESS-Bilbao Initiative Workshop. RF structure comparison for low energy accele...
ESS-Bilbao Initiative Workshop. RF structure comparison for low energy accele...
 
Big Fast Data in High-Energy Particle Physics
Big Fast Data in High-Energy Particle PhysicsBig Fast Data in High-Energy Particle Physics
Big Fast Data in High-Energy Particle Physics
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010
 
Solving Network Throughput Problems at the Diamond Light Source
Solving Network Throughput Problems at the Diamond Light SourceSolving Network Throughput Problems at the Diamond Light Source
Solving Network Throughput Problems at the Diamond Light Source
 
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
 
Ayar Labs TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Pack...
Ayar Labs TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Pack...Ayar Labs TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Pack...
Ayar Labs TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Pack...
 
How to Terminate the GLIF by Building a Campus Big Data Freeway System
How to Terminate the GLIF by Building a Campus Big Data Freeway SystemHow to Terminate the GLIF by Building a Campus Big Data Freeway System
How to Terminate the GLIF by Building a Campus Big Data Freeway System
 
Analysis Software Benchmark
Analysis Software BenchmarkAnalysis Software Benchmark
Analysis Software Benchmark
 
Jarp big data_sydney_v7
Jarp big data_sydney_v7Jarp big data_sydney_v7
Jarp big data_sydney_v7
 

Último

CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxAneriPatwari
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 

Último (20)

CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptx
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 

Petabye scale data challenge

  • 1. Petabye Scale Data Challenge - Worldwide LHC Computing Grid ASGC/Jason Shih Computex, Jun 2nd, 2010
  • 2. Outline Objectives & Milestones WLCG experiment and ASGC Tier-1 Center Petabyte Scale Challenge Storage Management System System Architecture, Configuration and Performance
  • 3. Objectives Building sustainable research and collaboration infrastructure Support research by e-Science, on data intensive sciences and applications require cross disciplinary distributed collaboration
  • 4. ASGC Milestone Operational from the deployment of LCG0 since 2002 ASGC CA establish on 2005 (IGTF in same year) Tier-1 Center responsibility start from 2005 Federated Taiwan Tier-2 center (Taiwan Analysis Facility, TAF) is also collocated in ASGC Rep. of EGEE e-Science Asia Federation while joining EGEE from 2004 Providing Asia Pacific Regional Operation Center (APROC) services to regional-wide WLCG/EGEE production infrastructure from 2005 Initiate Avian Flu Drug Discovery Project and collaborate with EGEE in 2006 Start of EUAsiaGrid Project from April 2008
  • 5. LHC First Beam – Computing at the Petascale General Purpose, pp, heavy ions LHCb: B-physics, CP Violation ALICE: Heavy ions, pp CMS: General Purpose, pp, heavy ions ATLAS: General Purpose, pp, heavy ions
  • 6. Size of LHC Detector ATLAS Bld. 40 CMS
  • 7. Standard Cosmology Good model from 0.01 sec after Big Bang Energy, Density, Temperature Supported by considerable observational evidence Time Elementary Particle Physics From the Standard Model into the unknown: towards energies of 1 TeV and beyond: the Terascale Towards Quantum Gravity From the unknown into the unknown... http://www.damtp.cam.ac.uk/user/gr/public/bb_history.html UNESCO Information 7 Preservation debate, April 2007 - Jamie Shiers@cern ch
  • 8. WLCG Timeline First Beam on LHC, Sep. 10, 2008 Severe Incident after 3w operation (3.5TeV)
  • 9. Max CERN/T1-ASGC Point2Point Inbound : 9.3 Gbps ASGC - Introduction 1. Most Reliable T1: 98.83% 2. Very Highly Performing and most Stable Site in CCRC08 Asia Pacific Regional A Worldwide Grid Operation Center Infrastructure >250 sites, 48 countries >68,000 CPUs, >25 PetaBytes >10,000 users, >200 VOs >150,000 jobs/day Best Demo Award of EGEE’07 Grid Application Platform Avian Flu Drug Discovery Lightweight Problem Solving Large Hadron Collider (LHC) Framework 21
  • 10. Collaborating e-Infrastructures TWGRID EUAsiaGrid Potential for linking ~80 countries “Production” = Reliable, sustainable, with commitments to quality of service
  • 11. WLCG Computing Model - The Tier Structure Tier-0 (CERN) Data recording Initial data reconstruction Data distribution Tier-1 (11 countries) Permanent storage Re-processing Analysis Tier-2 (~130 countries) Simulation End-user analysis
  • 12. Enabling Grids for E-sciencE Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences … EGEE-II INFSO-RI-031688 EGEE07, Budapest, 1-5 October 2007 4
  • 13. Why Petabyte? Challenges Why Petabyte? Experiment Computing Model Comparing with conventional data management Challenges Performance: LAN and WAN activities Sufficient B/W between CPU Farm Eliminate Uplink Bottleneck (Switch Tires) Fast responding of Critical Events Fabric Infrastructure & Service Level Agreement Scalability and Manageability Robust DB engine (Oracle RAC) KB and Adequate Administration (Training)
  • 14. Tier Model and Data Management Components
  • 16. ATLAS T1 Data Flow RAW RAW Tape ESD (2x) AODm (10x) ESD2 RAW AODm2 1 Hz 1.6 GB/file 0.02 Hz 0.044 Hz 85K f/day 1.7K f/day 3.74K f/day 720 MB/s 32 MB/s 44 MB/s 2.7 TB/day 3.66 TB/day Tier-0 AODm1 AODm2 Disk 500 MB/file 500 MB/file 0.04 Hz 0.04 Hz ESD1 AODm1 RAW AOD2 Buffer ESD2 AOD2 AODm2 3.4K f/day 3.4K f/day 20 MB/s 20 MB/s 1.6 TB/day 1.6 TB/day 0.5 GB/file 500 MB/file 1.6 GB/file 10 MB/file 0.5 GB/file 10 MB/file 500 MB/file 0.02 Hz 0.04 Hz 0.02 Hz 0.2 Hz 0.02 Hz 0.2 Hz 0.004 Hz 1.7K f/day 3.4K f/day 1.7K f/day 17K f/day 1.7K f/day 17K f/day 0.34K f/day 10 MB/s 20 MB/s 32 MB/s 2 MB/s 10 MB/s 2 MB/s 2 MB/s 0.8 TB/day 1.6 TB/day 2.7 TB/day 0.16 TB/day 0.8 TB/day 0.16 TB/day 0.16 TB/day Each T1 Tier-2 T1 CPU Plus simulation and ESD2 AODm2 0.5 GB/file 500 MB/file Farm analysis data flow 0.02 Hz 0.036 Hz 1.7K f/day 3.1K f/day ESD2 AODm2 10 MB/s 18 MB/s 0.5 GB/file 500 MB/file ESD2 AODm2 0.8 TB/day 1.44 TB/day 0.02 Hz 0.004 Hz 0.5 GB/file 500 MB/file 1.7K f/day 0.34K f/day 0.02 Hz 0.036 Hz 10 MB/s 2 MB/s 1.7K f/day 3.1K f/day 0.8 TB/day 0.16 TB/day 10 MB/s 18 MB/s Other 0.8 TB/day 1.44 TB/day Other T1 Disk T1 Tier-1s T1 Tier-1s T1 Storage
  • 17. WLCG Tier-1 - Defined Minimum Levels of Services. Define response time refer to max delay before taking action. Mean time repairing the service is also crucial but cover indirectly through required availability target.
  • 18. WLCG MoU & ASGC Resource Level - Pledged Resources and Projection Year CPU (HEP2k6) Disk (PB) Tape (PB) End 2009 29.5K 2.6 2.4 Mou 2009 20K 3.0 3.0 Mou 2010 28K 3.5 3.5 6000 CPU MoU 6000 CPU 5000 5000 (Unit KSI2k) Disk TeraByte 4000 Tape 4000 DISK MoU 3000 3000 Tape MoU 2000 2000 1000 1000 0 0 2005 2006 2007 2008 2009 2010
  • 19. Data Management System CASTOR V1 CERN Advanced STORage Satisfactorily serving 10s of 1K Req/day/TB of Disk Cache Limitation: 1M files in cache Tape movement API not flexible CASTOR V2 Centric DB Arch. Scheduling Feature GSI and Kerberos Resource Mgmt Resource Handling
  • 20. CASTOR Configurations - Current Infrastructure Shared cores services Serving: Atlas and CMS Services: Stager, NS, DLF, Repack, and LSF DB cluster Two DB Clusters (SRM and NS) 5 Services (DB) split into two clusters 5 Oracle Instances Total capacity: 0.63PB and 0.7PB for CMS and Atlas resp. Current usage: 63% and 44% for CMS and Atlas
  • 21. CASTOR Configurations (cont’) - Disk Cache Disk pools & servers Performance (IOPS) With 0.5kB IO size: 76.4k and 54k for read & write resp. Slightly decrease around 9% for both read and write when inc. IO size to 4kB. 80 disk servers (+6 will be online end of 3rdw Oct) Total capacity: 1.67PB (0.3PB allocate dynamically) Current usage: 0.79PB (~58% usage) 14 disk pools (8 for atlas and 3 for CMS, another three for bio, SAM, and dynamic)
  • 22. at la sG RO Total Capacity (TB) bi UP om D 0 50 100 150 200 250 300 350 400 at ed ISK la D 450 cm sH 1T sW otD 0 at A is la N k sP O rd UT at D la 0 dt sS T1 at ea tag at l a m e la sM D Install Capacity sS C 0T Disk Pool Configuration c T 0 - T1 MSS (CASTOR) at rat AP l a ch E Num of Disk Servers sP D cm rd isk D at sL 1T l a TD 0 cm sM 0T sP CD 1 rd ISK D S t 1T an 0 db Free Capacity y 0 2 4 6 8 10 12 14 16
  • 23. Distribution of Free Capacity - Per Disk Servers vs. per Pool Standby dteamD0T0 cmsWANOUT cmsPrdD1T0 cmsLTD0T1 biomedD1T0 Disk Pool atlasStage atlasScratchDisk atlasPrdD1T0 atlasPrdD0T1 atlasMCTAPE atlasMCDISK atlasHotDisk atlasGROUPDISK 0 50 100 150 200 250 Free Capacity (TB)
  • 24. Storage Server Generation - Drive vs. Total Capacity Total Capacity of Storage 800 37 700 23 741TB Generation (TB) 600 683TB 500 400 300 6 18 200 238TB 235.5TB 100 0 0 10 20 30 40 Numer of Raid Subsystem
  • 25. CASTOR Configurations (cont’) - Core Service Overview Services OS Level Release Remark Type Core SLC 4.7/x86-64 2.1.7-19 Stager/NS/DLF SRM SLC 4.7/x86-64 2.7-18 3 Head Nodes Disk Svr. SLC 4.7/x86-64 2.1.7-19 80 Q3 2k9 (20+ in Q4) Tape Svr. SLC 4.7/32 + 64 2.1.8-8 X86-64 OS deployed
  • 26. CASTOR Configurations (cont’) - CMS Disk Cache: Current Resource Level Space Token Capacity/ Disk TapePool/ Disk Pool Job Limit Servers Capacity cmsLTD0T1 278TB/488 9 * cmsPrdD1T0 284TB/1560 13 cmsWanOut 72TB/220 4 * Dep. on tape family.
  • 27. CASTOR Configurations (cont’) - Atlas Disk Cache: Current Resource Level Space Token Cap/JobLimit DiskServers TapePool/Cap. atlasMCDISK 163TB/790 8 - atlasMCTAPE 38TB/80 2 atlasMCtp/39TB atlasPrdD1T0 278TB/810 15 - atlasPrdtp/105T atlasPrdD0T1 61TB/210 3 B atlasGROUPDISK 19T/40 1 - atlasScratchDisk 28TB/80 1 - atlasHotDisk 2/40TB 2 - Total 950TB/1835 46 -
  • 28. IDC Collocation Facility install complete at Mar 27th Tape system delay after Apr 9th Realignment RMA for faulty parts
  • 29. Storage Farm ~ 110 raid subsystem deployed since 2003. Supporting both Tier1 and 2 storage fabric DAS connection to frontend blade server Flexible switching front end server upon performance requirement 4-8G fiber channel connectivity
  • 30. CASTOR Configurations (cont’) - Tape Pool Capacity Drive LTO3/4 Tape Pool (TB)/Usage Dedication Mixed atlasMCtp 8.98/40% N Y atlasPrdtp 101/65% N Y cmsCSA08cruzet 15.6/46% N N cmsCSA08reco 5/0% N N cmsCSAtp 639/99% N Y cmsLTtp 34.4/44% N N dteamTest 3.5/1% N N
  • 31. MSS Monitoring Services Std. Nagios Probes NRPE + customized plugins SMS to OSE/SM for all types of critical alarms Availability metrics Tape metrics (SLS) Throughput, capacity & scheduler per VO and Diskpool
  • 32. MSS Tape System - Expansion/Upgrade Planning Before incident: LTO3 * 8 + LTO4 * 4 720TB with LTO3 530TB with LTO4 May 2009: Two LOT3 drives MES: 6 LTO4 drives end of May Capacity: 1.3PB (old, LTO3,4 mixed) + 0.8PB (LTO4) New S54 model introduce mid of 2009 2K slots with tier model Required: Upgrade ALMS Enhanced gripper MES Q3 2009 18 LTO4 drives HA implementation resume in Q4
  • 33. Expansion Planning 2008 0.5PB expansion of Tape system in Q2 Meet MOU target mid of Nov. 1.3MSI2k per rack base on recent E5450 processor. 2009 Q1 150 SMP/QC blade servers Raid subsystem consider 2TB per drive 42TB net capacity per chassis and 0.75PB in total 2009 Q3-4 18 LTO4 drives – mid of Oct. 330 Xeon QC (SMP, Intel 5450) blades servers 2nd phase TAPE MES - 5 LTO4 drives + HA 3rd phase TAPE MES – 6 LTO4 drives ETA 0.8PB expansion delivery: mid of Nov
  • 34. Computing/Storage System Infrastructure Da ta Ce nte ASGC CASTOR2 Disk Farm r – CASTOR2 Tape C3 Servers CASTOR2 Disk servers Ar ch ive Ro om 2 * GE (LX) to 4F M160 (links to HK, JP Tier-2s) 2 * GE (LX) to 4F 20 x Quanta Blades - TaipeiGigaPoP-7609 WN Core Services – CE, (links to TW Tier-2s) BladeCenter RB, DPM, PX, BDII etc. 1 10GBASE-X 2 3 4 10G4X 41611 Diag 1 Stat 10GBASE-X 10G4X 41611 1 2 3 4 Diag 2 Stat 10GBASE-X 10G4X 41611 1 2 3 4 4 * GE (SX) to ASGC Distribution D iag 3 Stat 10GBASE-X 10G4X 41611 1 2 3 4 Switch in Rack#49 Diag 4 Stat 10/100/1000BASE-T G48T 41511 1 5 Diag 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Diag 25 Stat 1 25 2 26 3 27 4 28 5 (links to Tier-1 Servers) 29 6 30 7 31 8 32 9 33 10 34 11 35 12 36 13 37 14 38 15 39 16 40 17 41 18 42 19 43 20 44 21 45 22 46 23 G 4 8X a 47 24 48 4 1 54 2 A 6 B 64 x IBM HS20 Stat 1 25 2 26 3 27 4 28 5 29 6 30 7 31 8 32 9 23 10 34 11 35 12 36 13 37 14 38 15 39 16 40 17 41 18 42 19 43 20 44 21 45 22 46 23 47 24 48 7 Blade system - 8 WN 9 BladeCenter 10 DC SMR 48V / 100A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 142 x IBM HS21 Battery Battery Blade system - #1 + #2 #3 + #4 WN
  • 35. Throughput of WLCG Experiments Throughput defined as Job Eff. x # Jobs running Characteristic of 4 LHC Exp. depicting in-efficiency is due to poor coding.
  • 36. Reliability From Different View Perspective
  • 37. Summary Deploy highly-scalable DM system and performance driven storage infrastructure Eliminate possible complexity of SRM abstraction layer Resource utilization, provisioning and optimization From POC to Production, the challenges remains: Data Challenge, Service Challenge, CCRC08, STEP09, etc. Motivation appear clear for: Medical, Climate, Cosmological Operation wide: Robust Database setup KB for fabric infrastructure operation Fast enough event processing and documentation Consider beyond the data management use cases in WLCG: commonality in many other disciplines in EGEE infrastructure actively participate in e-Science collaboration within the region