SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Distributed Data Management
for LHC
Dirk Duellmann
CERN, Geneva
Accelerating Science and Innovation
1
July4th2012TheStatusoftheHiggsSearchJ.IncandelafortheCMSCOLLABORATION
H #γγ
candidate
Ian.Bird@cern.ch	
   2	
  
July4th2012TheStatusoftheHiggsSearchJ.IncandelafortheCMSCOLLABOR
!  B%is%integral%of%background%model%over%a%constant%signal%fraction%inte
ATLAS: Status of SM Higgs searches, 4/7/2012
Evolution of the excess with time
Energy-
system
not incl
4	
  
Founded in 1954: “Science for Peace”
Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark,
Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway,
Poland, Portugal, Slovakia, Spain, Sweden, Switzerland and
the United Kingdom
Candidate for Accession: Romania
Associate Members in the Pre-Stage to Membership: Israel, Serbia
Applicant States: Cyprus, Slovenia, Turkey
Observers to Council: India, Japan, the Russian Federation, the United
States of America, Turkey, the European Commission and UNESCO
~ 2300 staff
~ 1050 other paid personnel
~ 11000 users
Budget (2012) ~1000 MCHF
CERN: 20 member states
5	
  
Global Science: 11000 scientists
Dirk Düllmann, CERN/IT 7
8
Stars and Planets only account for a
small percentage of the universe !
CERN	
  /	
  May	
  2011	
  
Ø 27 kilometre circle
Ø proton collisions at 7+7 TeV
Ø 10.000 magnets
Ø 8000 km super-conducting cables
Ø 120 t of liquid Helium
The Large Hadron Collider
The largest super conducting
installation in the word
Dirk Düllmann, CERN/IT 14
Precision ! The 27 km long ring is
sensitive to <1mm changes
Tides
Stray currents
Rainfall
LHC
Dirk Düllmann, CERN/IT 17
Ø 140 000 m3 rock removed
Ø 53 000 m3 concrete
Ø 6 000 tons steel reinforcement
Ø 55 meters long
Ø 30 meters wide
Ø 53 meters high
(10-storey building)
The ATLAS Cavern
15	
  
	
  A	
  collision	
  at	
  LHC	
  
26	
  June	
  2009	
   Ian	
  Bird,	
  CERN	
  
Ian	
  Bird,	
  CERN	
   18	
  
The	
  Data	
  AcquisiIon	
  for	
  one	
  Experiment	
  
Tier	
  0	
  at	
  CERN:	
  AcquisiIon,	
  First	
  reconstrucIon,	
  
	
  Storage	
  &	
  DistribuIon	
  
Ian.Bird@cern.ch	
  
1.25 GB/sec (ions)
19	
  
2011: 400-500 MB/sec
2011: 4-6 GB/sec
20	
  
The	
  LHC	
  Computing	
  Challenge	
  
ž  Signal/Noise:	
  10-­‐13	
  (10-­‐9	
  offline)	
  
ž  Data	
  volume	
  
—  High	
  rate	
  *	
  large	
  number	
  of	
  
channels	
  *	
  4	
  experiments	
  
è	
  ~15	
  PetaBytes	
  of	
  new	
  data	
  each	
  
year	
  
ž  Compute	
  power	
  
—  Event	
  complexity	
  *	
  Nb.	
  events	
  *	
  
thousands	
  users	
  
è 200	
  k	
  CPUs	
  
è 45	
  PB	
  of	
  disk	
  storage	
  
ž  Worldwide	
  analysis	
  &	
  funding	
  
—  CompuIng	
  funding	
  locally	
  in	
  major	
  
regions	
  &	
  countries	
  
—  Efficient	
  analysis	
  everywhere	
  
è	
  GRID	
  technology	
  
à ~30 PB in 2012
à 170 PB
à 300 k CPU
CERN	
  Computer	
  Centre	
  
CERN	
  computer	
  centre:	
  
•  Built	
  in	
  the	
  70s	
  on	
  the	
  CERN	
  site	
  
•  ~3000	
  m2	
  (on	
  three	
  machine	
  rooms)	
  
•  3.5	
  MW	
  for	
  equipment	
  
A	
  recent	
  extension:	
  
•  Located	
  at	
  Wigner	
  (Budapest,	
  Hungary)	
  
•  ~1000	
  m2	
  	
  
•  2.7	
  MW	
  for	
  equipment	
  
•  Connected	
  to	
  CERN	
  with	
  2x100Gb	
  links	
  
21	
  
•  A	
  distributed	
  compuIng	
  
infrastructure	
  to	
  provide	
  the	
  
producIon	
  and	
  analysis	
  
environments	
  for	
  the	
  LHC	
  
experiments	
  
•  Managed	
  and	
  operated	
  by	
  a	
  
worldwide	
  collaboraIon	
  
between	
  the	
  experiments	
  and	
  
the	
  parIcipaIng	
  computer	
  
centres	
  
	
  
•  The	
  resources	
  are	
  distributed	
  –	
  
for	
  funding	
  and	
  sociological	
  
reasons	
  
•  Our	
  task	
  was	
  to	
  make	
  use	
  of	
  
the	
  resources	
  available	
  to	
  us	
  –	
  
no	
  mafer	
  where	
  they	
  are	
  
located	
  
23	
  
World	
  Wide	
  Grid	
  –	
  what	
  and	
  why?	
  
Tier-0 (CERN):
• Data recording
• Initial data reconstruction
• Data distribution
Tier-1 (11 centres):
• Permanent storage
• Re-processing
• Analysis
Tier-2 (~130 centres):
• Simulation
• End-user analysis
•  The	
  grid	
  really	
  works	
  
•  All	
  sites,	
  large	
  and	
  small	
  can	
  
contribute	
  
–  And	
  their	
  contribuIons	
  are	
  
needed!	
  
Ian.Bird@cern.ch	
   24	
  
CPU	
  –	
  around	
  the	
  Tiers	
  	
  
CPU$delivered$+$January$2011$
CERN%
BNL%
CNAF%
KIT%
NL%LHC/Tier21%
RAL%
FNAL%
CC2IN2P3%
ASGC%
PIC%
NDGF%
TRIUMF%
Tier%2%
Tier%2%CPU%delivered%by%country%4%January%2011% USA$ UK$
France$ Germany$
Italy$ Russian$Federa7on$
Spain$ Canada$
Poland$ Switzerland$
Slovenia$ Czech$Republic$
China$ Portugal$
Japan$ Sweden$
Israel$ Romania$
Belgium$ Austria$
Hungary$ Taipei$
Australia$ Republic$of$Korea$
Norway$ Turkey$
Ukraine$ Finland$
India$ Pakistan$
Estonia$ Brazil$
Greece$
25	
  
Evolution	
  of	
  capacity:	
  CERN	
  &	
  WLCG	
  
0"
200000"
400000"
600000"
800000"
1000000"
1200000"
1400000"
1600000"
1800000"
2000000"
2008" 2009" 2010" 2011" 2012" 2013"
WLCG%CPU%Growth%
Tier2%
Tier1%
CERN%
0"
20"
40"
60"
80"
100"
120"
140"
160"
180"
200"
2008" 2009" 2010" 2011" 2012" 2013"
WLCG%Disk%Growth%
Tier2%
Tier1%
CERN%
0"
100000"
200000"
300000"
400000"
500000"
600000"
2005" 2006" 2007" 2008" 2009" 2010" 2011" 2012" 2013"
CERN%Compu*ng%Capacity%
CERN"
2013/14:	
  modest	
  increases	
  to	
  process	
  
“parked	
  data”	
  
2015	
  à	
  budget	
  limited	
  ?	
  
	
  	
  	
  -­‐	
  experiments	
  will	
  push	
  trigger	
  rates	
  
	
  	
  	
  -­‐	
  flat	
  budgets	
  give	
  ~20%/year	
  growth	
  
What	
  we	
  thought	
  was	
  
needed	
  at	
  LHC	
  start	
  
What	
  we	
  actually	
  
used	
  at	
  LHC	
  start!	
  
•  Relies	
  on	
  	
  
–  OPN,	
  GEANT,	
  US-­‐LHCNet	
  
–  NRENs	
  &	
  other	
  naIonal	
  
&	
  internaIonal	
  providers	
  Ian	
  Bird,	
  CERN	
   27	
  
LHC	
  Networking	
  
28	
  
Computing	
  model	
  evolution	
  
EvoluIon	
  of	
  
compuIng	
  models	
  
Hierarchy	
   Mesh	
  
Physics Storage @ CERN: CASTOR and EOS	
  
CASTOR	
  and	
  EOS	
  are	
  using	
  the	
  same	
  commodity	
  disk	
  servers	
  
•  With	
  RAID-­‐1	
  for	
  CASTOR	
  
•  2	
  copies	
  in	
  the	
  mirror	
  
•  JBOD	
  with	
  RAIN	
  for	
  EOS	
  
•  Replicas	
  spread	
  over	
  different	
  disk	
  servers	
  
•  Tunable	
  redundancy	
  
Storage	
  Systems	
  developed	
  at	
  CERN	
  
30	
  
CERN Disk/Tape Storage Management @ storage-day.ch
CASTOR	
  -­‐	
  Physics	
  Data	
  Archive	
  
31
Data:
•  ~90 PB of data on tape; 250 M files
•  Up to 4.5 PB new data per month
•  Over 10GB/s (R+W) peaks
Infrastructure:
•  ~ 52K tapes (1TB, 4TB, 5TB)
•  9 Robotic libraries (IBM and Oracle)
•  80 production + 30 legacy tape drives
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Internet
Services
DSS
44.8 PB
136 (279)
Mio.
20.7k
32.1 PB
EOS Usage at CERN Today
Availability	
  and	
  Performance	
  
Archival	
  &	
  Data	
  DistribuIon	
   User	
  Analysis	
  
Usage	
  	
  
Peaks	
  
pp	
  2012	
  
pA	
  2013	
  
34	
  
CERN	
  openlab	
  in	
  a	
  nutshell	
  
•  A	
  science	
  –	
  industry	
  partnership	
  to	
  drive	
  R&D	
  and	
  
innovaIon	
  with	
  over	
  a	
  decade	
  of	
  success	
  
	
  
•  Evaluate	
  state-­‐of-­‐the-­‐art	
  technologies	
  in	
  a	
  
challenging	
  environment	
  and	
  improve	
  them	
  
	
  
•  Test	
  in	
  a	
  research	
  environment	
  today	
  what	
  will	
  be	
  
used	
  in	
  many	
  business	
  sectors	
  tomorrow	
  
	
  
•  Train	
  next	
  generaIon	
  of	
  engineers/employees	
  
•  Disseminate	
  results	
  and	
  outreach	
  to	
  new	
  
audiences	
  
40	
  
41
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Internet
Services
DSS Ongoing R&D: Eg Cloud Storage
•  CERN openlab
– joint project since Jan 12
– Testing scaling and TCO
gains with prototype
applications
•  Huawei S3 storage
appliance (0.8 PB)
•  logical replication
•  fail-in-place
Thanks for your attention!
More at http://cern.ch
Accelerating Science and Innovation
45

Mais conteúdo relacionado

Mais procurados

NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineering
inside-BigData.com
 
Computing Challenges at the Large Hadron Collider
Computing Challenges at the Large Hadron ColliderComputing Challenges at the Large Hadron Collider
Computing Challenges at the Large Hadron Collider
inside-BigData.com
 
Academic cloud experiences cern v4
Academic cloud experiences cern v4Academic cloud experiences cern v4
Academic cloud experiences cern v4
Tim Bell
 
Elster falch-gpu-cse-sem-oct2013
Elster falch-gpu-cse-sem-oct2013Elster falch-gpu-cse-sem-oct2013
Elster falch-gpu-cse-sem-oct2013
Anne Elster
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
Anubhav Jain
 

Mais procurados (20)

"Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner""Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner"
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspective
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineering
 
Computing Challenges at the Large Hadron Collider
Computing Challenges at the Large Hadron ColliderComputing Challenges at the Large Hadron Collider
Computing Challenges at the Large Hadron Collider
 
Academic cloud experiences cern v4
Academic cloud experiences cern v4Academic cloud experiences cern v4
Academic cloud experiences cern v4
 
Data-driven methods for the initialization of full-waveform inversion
Data-driven methods for the initialization of full-waveform inversionData-driven methods for the initialization of full-waveform inversion
Data-driven methods for the initialization of full-waveform inversion
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
Hybrid Cloud for CERN
Hybrid Cloud for CERN Hybrid Cloud for CERN
Hybrid Cloud for CERN
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
Weather Station Data Publication at Irstea: an implementation Report.
Weather Station Data Publication at Irstea: an implementation Report.  Weather Station Data Publication at Irstea: an implementation Report.
Weather Station Data Publication at Irstea: an implementation Report.
 
ArrayUDF: User-Defined Scientific Data Analysis on Arrays
ArrayUDF: User-Defined Scientific Data Analysis on ArraysArrayUDF: User-Defined Scientific Data Analysis on Arrays
ArrayUDF: User-Defined Scientific Data Analysis on Arrays
 
Surface-related multiple elimination through orthogonal encoding in the laten...
Surface-related multiple elimination through orthogonal encoding in the laten...Surface-related multiple elimination through orthogonal encoding in the laten...
Surface-related multiple elimination through orthogonal encoding in the laten...
 
Elster falch-gpu-cse-sem-oct2013
Elster falch-gpu-cse-sem-oct2013Elster falch-gpu-cse-sem-oct2013
Elster falch-gpu-cse-sem-oct2013
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
 
Neural network-based low-frequency data extrapolation
Neural network-based low-frequency data extrapolationNeural network-based low-frequency data extrapolation
Neural network-based low-frequency data extrapolation
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...
 
Geoscience Data Analysis and Visualization Tools from NCAR
Geoscience Data Analysis and Visualization Tools from NCARGeoscience Data Analysis and Visualization Tools from NCAR
Geoscience Data Analysis and Visualization Tools from NCAR
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
The Materials Project: overview and infrastructure
The Materials Project: overview and infrastructureThe Materials Project: overview and infrastructure
The Materials Project: overview and infrastructure
 

Destaque (6)

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to use
 
Large Hadron Collider
Large Hadron ColliderLarge Hadron Collider
Large Hadron Collider
 
Grid computing [2005]
Grid computing [2005]Grid computing [2005]
Grid computing [2005]
 
Introducción al Grid Computing
Introducción al Grid ComputingIntroducción al Grid Computing
Introducción al Grid Computing
 
Grid computing ppt
Grid computing pptGrid computing ppt
Grid computing ppt
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
 

Semelhante a The World Wide Distributed Computing Architecture of the LHC Datagrid

Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Databricks
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
confluent
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
Ian Foster
 
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Databricks
 
Monitoring Exascale Supercomputers With Tim Osborne | Current 2022
Monitoring Exascale Supercomputers With Tim Osborne | Current 2022Monitoring Exascale Supercomputers With Tim Osborne | Current 2022
Monitoring Exascale Supercomputers With Tim Osborne | Current 2022
HostedbyConfluent
 

Semelhante a The World Wide Distributed Computing Architecture of the LHC Datagrid (20)

Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
 
Jarp big data_sydney_v7
Jarp big data_sydney_v7Jarp big data_sydney_v7
Jarp big data_sydney_v7
 
LCG project description
LCG project descriptionLCG project description
LCG project description
 
Gridforum Juergen Knobloch Grids For Science 20080402
Gridforum Juergen Knobloch Grids For Science 20080402Gridforum Juergen Knobloch Grids For Science 20080402
Gridforum Juergen Knobloch Grids For Science 20080402
 
London Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERNLondon Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERN
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardNERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie Bard
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
 
Hybrid Cloud for CERN
Hybrid Cloud for CERNHybrid Cloud for CERN
Hybrid Cloud for CERN
 
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Prototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and CeremonyPrototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and Ceremony
 
CERN User Story
CERN User StoryCERN User Story
CERN User Story
 
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
 
2013 bio it world
2013 bio it world2013 bio it world
2013 bio it world
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014
 
Mateo Valero - Big data: de la investigación científica a la gestión empresarial
Mateo Valero - Big data: de la investigación científica a la gestión empresarialMateo Valero - Big data: de la investigación científica a la gestión empresarial
Mateo Valero - Big data: de la investigación científica a la gestión empresarial
 
Cern general information
Cern general informationCern general information
Cern general information
 
Colloborative computing
Colloborative computing Colloborative computing
Colloborative computing
 
Monitoring Exascale Supercomputers With Tim Osborne | Current 2022
Monitoring Exascale Supercomputers With Tim Osborne | Current 2022Monitoring Exascale Supercomputers With Tim Osborne | Current 2022
Monitoring Exascale Supercomputers With Tim Osborne | Current 2022
 

Mais de Swiss Big Data User Group

Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density Choice
Swiss Big Data User Group
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maket
Swiss Big Data User Group
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph database
Swiss Big Data User Group
 

Mais de Swiss Big Data User Group (20)

A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operator
 
Data Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2CData Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2C
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data Analysis
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companies
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
 
Educating Data Scientists of the Future
Educating Data Scientists of the FutureEducating Data Scientists of the Future
Educating Data Scientists of the Future
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data Warehouse
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexity
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density Choice
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maket
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph database
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computing
 
In-Store Analysis with Hadoop
In-Store Analysis with HadoopIn-Store Analysis with Hadoop
In-Store Analysis with Hadoop
 
Big Data Visualization With ParaView
Big Data Visualization With ParaViewBig Data Visualization With ParaView
Big Data Visualization With ParaView
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
Oracle's BigData solutions
Oracle's BigData solutionsOracle's BigData solutions
Oracle's BigData solutions
 
Introducing Splunk – The Big Data Engine
Introducing Splunk – The Big Data EngineIntroducing Splunk – The Big Data Engine
Introducing Splunk – The Big Data Engine
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

The World Wide Distributed Computing Architecture of the LHC Datagrid

  • 1. Distributed Data Management for LHC Dirk Duellmann CERN, Geneva Accelerating Science and Innovation 1
  • 2. July4th2012TheStatusoftheHiggsSearchJ.IncandelafortheCMSCOLLABORATION H #γγ candidate Ian.Bird@cern.ch   2   July4th2012TheStatusoftheHiggsSearchJ.IncandelafortheCMSCOLLABOR !  B%is%integral%of%background%model%over%a%constant%signal%fraction%inte ATLAS: Status of SM Higgs searches, 4/7/2012 Evolution of the excess with time Energy- system not incl
  • 3. 4   Founded in 1954: “Science for Peace” Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland and the United Kingdom Candidate for Accession: Romania Associate Members in the Pre-Stage to Membership: Israel, Serbia Applicant States: Cyprus, Slovenia, Turkey Observers to Council: India, Japan, the Russian Federation, the United States of America, Turkey, the European Commission and UNESCO ~ 2300 staff ~ 1050 other paid personnel ~ 11000 users Budget (2012) ~1000 MCHF CERN: 20 member states
  • 4. 5   Global Science: 11000 scientists
  • 6. 8 Stars and Planets only account for a small percentage of the universe !
  • 7. CERN  /  May  2011  
  • 8. Ø 27 kilometre circle Ø proton collisions at 7+7 TeV Ø 10.000 magnets Ø 8000 km super-conducting cables Ø 120 t of liquid Helium The Large Hadron Collider The largest super conducting installation in the word
  • 9. Dirk Düllmann, CERN/IT 14 Precision ! The 27 km long ring is sensitive to <1mm changes Tides Stray currents Rainfall LHC
  • 10. Dirk Düllmann, CERN/IT 17 Ø 140 000 m3 rock removed Ø 53 000 m3 concrete Ø 6 000 tons steel reinforcement Ø 55 meters long Ø 30 meters wide Ø 53 meters high (10-storey building) The ATLAS Cavern
  • 11. 15    A  collision  at  LHC   26  June  2009   Ian  Bird,  CERN  
  • 12.
  • 13. Ian  Bird,  CERN   18   The  Data  AcquisiIon  for  one  Experiment  
  • 14. Tier  0  at  CERN:  AcquisiIon,  First  reconstrucIon,    Storage  &  DistribuIon   Ian.Bird@cern.ch   1.25 GB/sec (ions) 19   2011: 400-500 MB/sec 2011: 4-6 GB/sec
  • 15. 20   The  LHC  Computing  Challenge   ž  Signal/Noise:  10-­‐13  (10-­‐9  offline)   ž  Data  volume   —  High  rate  *  large  number  of   channels  *  4  experiments   è  ~15  PetaBytes  of  new  data  each   year   ž  Compute  power   —  Event  complexity  *  Nb.  events  *   thousands  users   è 200  k  CPUs   è 45  PB  of  disk  storage   ž  Worldwide  analysis  &  funding   —  CompuIng  funding  locally  in  major   regions  &  countries   —  Efficient  analysis  everywhere   è  GRID  technology   à ~30 PB in 2012 à 170 PB à 300 k CPU
  • 16. CERN  Computer  Centre   CERN  computer  centre:   •  Built  in  the  70s  on  the  CERN  site   •  ~3000  m2  (on  three  machine  rooms)   •  3.5  MW  for  equipment   A  recent  extension:   •  Located  at  Wigner  (Budapest,  Hungary)   •  ~1000  m2     •  2.7  MW  for  equipment   •  Connected  to  CERN  with  2x100Gb  links   21  
  • 17. •  A  distributed  compuIng   infrastructure  to  provide  the   producIon  and  analysis   environments  for  the  LHC   experiments   •  Managed  and  operated  by  a   worldwide  collaboraIon   between  the  experiments  and   the  parIcipaIng  computer   centres     •  The  resources  are  distributed  –   for  funding  and  sociological   reasons   •  Our  task  was  to  make  use  of   the  resources  available  to  us  –   no  mafer  where  they  are   located   23   World  Wide  Grid  –  what  and  why?   Tier-0 (CERN): • Data recording • Initial data reconstruction • Data distribution Tier-1 (11 centres): • Permanent storage • Re-processing • Analysis Tier-2 (~130 centres): • Simulation • End-user analysis
  • 18. •  The  grid  really  works   •  All  sites,  large  and  small  can   contribute   –  And  their  contribuIons  are   needed!   Ian.Bird@cern.ch   24   CPU  –  around  the  Tiers     CPU$delivered$+$January$2011$ CERN% BNL% CNAF% KIT% NL%LHC/Tier21% RAL% FNAL% CC2IN2P3% ASGC% PIC% NDGF% TRIUMF% Tier%2% Tier%2%CPU%delivered%by%country%4%January%2011% USA$ UK$ France$ Germany$ Italy$ Russian$Federa7on$ Spain$ Canada$ Poland$ Switzerland$ Slovenia$ Czech$Republic$ China$ Portugal$ Japan$ Sweden$ Israel$ Romania$ Belgium$ Austria$ Hungary$ Taipei$ Australia$ Republic$of$Korea$ Norway$ Turkey$ Ukraine$ Finland$ India$ Pakistan$ Estonia$ Brazil$ Greece$
  • 19. 25   Evolution  of  capacity:  CERN  &  WLCG   0" 200000" 400000" 600000" 800000" 1000000" 1200000" 1400000" 1600000" 1800000" 2000000" 2008" 2009" 2010" 2011" 2012" 2013" WLCG%CPU%Growth% Tier2% Tier1% CERN% 0" 20" 40" 60" 80" 100" 120" 140" 160" 180" 200" 2008" 2009" 2010" 2011" 2012" 2013" WLCG%Disk%Growth% Tier2% Tier1% CERN% 0" 100000" 200000" 300000" 400000" 500000" 600000" 2005" 2006" 2007" 2008" 2009" 2010" 2011" 2012" 2013" CERN%Compu*ng%Capacity% CERN" 2013/14:  modest  increases  to  process   “parked  data”   2015  à  budget  limited  ?        -­‐  experiments  will  push  trigger  rates        -­‐  flat  budgets  give  ~20%/year  growth   What  we  thought  was   needed  at  LHC  start   What  we  actually   used  at  LHC  start!  
  • 20. •  Relies  on     –  OPN,  GEANT,  US-­‐LHCNet   –  NRENs  &  other  naIonal   &  internaIonal  providers  Ian  Bird,  CERN   27   LHC  Networking  
  • 21. 28   Computing  model  evolution   EvoluIon  of   compuIng  models   Hierarchy   Mesh  
  • 22. Physics Storage @ CERN: CASTOR and EOS   CASTOR  and  EOS  are  using  the  same  commodity  disk  servers   •  With  RAID-­‐1  for  CASTOR   •  2  copies  in  the  mirror   •  JBOD  with  RAIN  for  EOS   •  Replicas  spread  over  different  disk  servers   •  Tunable  redundancy   Storage  Systems  developed  at  CERN   30  
  • 23. CERN Disk/Tape Storage Management @ storage-day.ch CASTOR  -­‐  Physics  Data  Archive   31 Data: •  ~90 PB of data on tape; 250 M files •  Up to 4.5 PB new data per month •  Over 10GB/s (R+W) peaks Infrastructure: •  ~ 52K tapes (1TB, 4TB, 5TB) •  9 Robotic libraries (IBM and Oracle) •  80 production + 30 legacy tape drives
  • 24. CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Internet Services DSS 44.8 PB 136 (279) Mio. 20.7k 32.1 PB EOS Usage at CERN Today
  • 25. Availability  and  Performance   Archival  &  Data  DistribuIon   User  Analysis   Usage     Peaks   pp  2012   pA  2013   34  
  • 26. CERN  openlab  in  a  nutshell   •  A  science  –  industry  partnership  to  drive  R&D  and   innovaIon  with  over  a  decade  of  success     •  Evaluate  state-­‐of-­‐the-­‐art  technologies  in  a   challenging  environment  and  improve  them     •  Test  in  a  research  environment  today  what  will  be   used  in  many  business  sectors  tomorrow     •  Train  next  generaIon  of  engineers/employees   •  Disseminate  results  and  outreach  to  new   audiences   40  
  • 27. 41 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Internet Services DSS Ongoing R&D: Eg Cloud Storage •  CERN openlab – joint project since Jan 12 – Testing scaling and TCO gains with prototype applications •  Huawei S3 storage appliance (0.8 PB) •  logical replication •  fail-in-place
  • 28. Thanks for your attention! More at http://cern.ch Accelerating Science and Innovation 45