SlideShare a Scribd company logo
1 of 26
HEP and its data
                    What is the trouble?
                 A possible way forward


Salvatore Mele         Costs, Benefits and Motivations
                               for Digital Preservation
CERN                          Nice - November 28th 2008
High-Energy Physics (or Particle Physics)
HEP aims to understand how our Universe works:
— by discovering the most elementary constituents
of matter and energy
— by probing their interactions
— by exploring the basic nature of space and time

In other words, try to answer two eternal questions:
— quot;What is the world made of?”
— quot;What holds it together?”

Build the largest scientific instruments ever to reach
energy densities close to the Big Bang; write theories
  to predict and describe the observed phenomena
CERN: European Organization for
     Nuclear Research (since 1954)
•   The world leading HEP laboratory, Geneva (CH)
•   2500 staff (mostly engineers, from Europe)
•   9000 users (mostly physicists, from 580 institutes in 85 countries)
•   3 Nobel prizes (Accelerators, Detectors, Discoveries)
•   Invented the web
•   Just switched on the 27-km (6bn€) LHC accelerator
•   Runs a 1-million objects Digital Library

CERN Convention (1953): ante-litteram Open Access mandate
“… the results of its experimental and theoretical work shall be
  published or otherwise made generally available”
The Large Hadron Collider

•Largest scientific instrument
 ever built, 27km of circumference
•The “coolest” place in the Universe
 -271˚C
•10000 people involved in its
 design and construction
•Worldwide budget of 6bn€

•Will collide protons to reproduce
  conditions at the birth of the
  Universe...
...40 million times a second
The LHC experiments:
   about 100 million “sensors” each
   [think your 6MP digital camera...
...taking 40 million pictures a second]
                       ATLAS   CMS
five-storey building
The LHC data
• 40 million events (pictures) per second
• Select (on the fly) the ~200 interesting
  events per second to write on tape
• “Reconstruct” data and convert for analysis:
  “physics data” [inventing the grid...]
    (x4 experiments x15 years)   Per event   Per year
    Raw data                       1.6 MB    3200 TB
    Reconstructed data             1.0 MB    2000 TB
    Physics data                   0.1 MB     200 TB
Preservation, re-use and (Open)
Access to HEP data



Problem
            Opportunity
                          Challenge
Some other HEP facilities
       (recently stopped or about to stop)
No real long-term archival strategy...
                  ...against billions of invested funds!
 Energy frontier
                             HERA@DESY
      TEVATRON@FNAL                         BELLE@KEK
                               KLOE@LNF
                    LEP@CERN
 BABAR@SLAC




                         Precision frontier
Why shall we care?


•We cared producing these data in first instance
•Unique, extremely costly, non-reproducible
•Might need to go back to the past (it happened)

•A peculiar community (the web, arXiv, the grid...)
•“Worst-case scenario” complex data, no
preservation... “If it works here, will work in many
other places”
Preservation, re-use and (open)
 access continua (who and when)
• The same researchers who took the data, after the
  closure of the facility (~1 year, ~10 years)
• Researchers working at similar experiments at the
  same time (~1 day, week, month, year)
• Researchers of future experiments (~20 years)
• Theoretical physicists who may want to re-
  interpret the data (~1 month, ~1 year, ~10 years)
• Theoretical physicists who may want to test future
  ideas (~1 year, ~10 years, ~20 years)
Data preservation, circa 2004




                 140 pages of tables
What is the trouble with
 preserving HEP data?


Where to put them ?
Hardware migration ?
Software migration/emulation?
What is the trouble with
 preserving HEP data?


Where to put them ?
Hardware migration ?
Software migration/emulation?
HEP, Open Access & Repositories
• HEP is decades ahead in thinking Open Access:
  – Mountains of paper preprints shipped around the
    world for 40 years (at author/institute expenses!)
  – Launched arXiv (1991), archetypal Open Archive
  – >90% HEP production self-archived in repositories
  – 100% HEP production indexed in SPIRES(community
    run database, first WWW server on US soil)
• OA is second nature: posting on arXiv before
  submitting to a journal is common practice
  – No mandate, no debate. Author-driven.
• HEP scholars have the tradition of arXiving
  their output (helas, articles) somewhere
 Data repositories would be a natural concept
What is the trouble with
 preserving HEP data?


Where to put them ?
Hardware migration ?
Software migration/emulation?
Life-cycle of previous-generation
   CERN experiment L3 at LEP
1984   Begin of construction
1989   Start of data taking
2000   End of data taking
2002   End of in-silico experiments
2005   End of (most) data analysis
  Storage and migration of data
  at the CERN computing centre
1993   ~150’000 9track  3480  0.2GB
1997   ~250’000 3480  Redwood 20GB
2001   ~25’000 Redwood  9940   60GB
2004   ~5’000 9940A  9940B    200GB
2007   ~22’000 9940B  T1000A 500GB
What is the trouble with
 preserving HEP data?


Where to put them ?
Hardware migration ?
Software migration/emulation?
Life-cycle of previous-generation
    CERN experiment L3 at LEP
1984    Begin of construction
1989    Start of data taking
2000    End of data taking
2002    End of in-silico experiments
2005    End of (most) data analysis
       Computing environment of
       the L3 experiment at LEP
1989-2001    VAX for data taking
1986-1994    IBM for data analysis
1992-1998    Apollo (HP) workstations
1996-2001    SGI mainframe
1997-2007    Linux boxes
Life-cycle of previous-generation
    CERN experiment L3 at LEP
1984    Begin of construction
1989    Start of data taking
2000    End of data taking
2002    End of in-silico experiments
2005    End of (most) data analysis
       Computing environment of
       the L3 experiment at LEP
1989-2001    VAX for data taking
1986-1994    IBM for data analysis
1992-1998    Apollo (HP) workstations
1996-2001    SGI mainframe
1997-2007    Linux boxes
What is the trouble with
 preserving HEP data?


Where to put them ?
Hardware migration ?
Software migration/emulation?

The HEP data !
Balloon

       Preserving HEP data?                             (30 km)



• The HEP data model is highly complex.             CD stack with
                                                    1 year LHC data!
  Data are traditionally not re-used as in          (~ 20 km)


  Astronomy or Climate science.
• Raw data  calibrated data  skimmed
  data  high-level objects  physics
  analyses  results.                           Concorde
                                                (15 km)
• All of the above duplicated for in-silico
  experiments, necessary to interpret the
  highly-complex data.
• Final results depend on the grey
  literature on calibration constants,          Mt. Blanc
                                                (4.8 km)
  human knowledge and algorithms
  needed for each pass...oral tradition!
• Years of training for a successful analysis
Need a“parallel way” to
 publish/preserve/re-use/OpenAccess
• In addition to experiment data models, elaborate a
  parallel format for (re-)usable high-level objects
   – In times of need (to combine data of “competing”
     experiments) this approach has worked
   – Embed the “oral” and “additional” knowledge
• A format understandable and thus re-usable by
  practitioners in other experiments and theorists
• Start from tables and work back towards primary data
• How much additional work? 1%, 5%, 10%?
Issues with the “parallel” way
• A small fraction of a big number gives a large number
• Activity in competition with research time
• 1000s person-years for parallel data models need
  enormous (impossible?) academic incentives for
  realization      ...or additional (external) funds
• Need insider knowledge to produce parallel data
• Address issues of (open) access, credit, accountability,
  “careless measurements”, “careless discoveries”,
  reproducibility of results, depth of peer-reviewing
• A monolithic way of doing business needs rethinking
Preservation, re-use and (Open)
   Access to HEP data... first steps!
•Outgrowing an institutionalized state of denial
•A difficult and costly way ahead
•An issue which starts surfacing on the agenda
  — Part of an integrated digital-library vision
  — CERN joined the Alliance for Permanent Access
  — CERN part of the FP7 PARSE.Insight project
•PARSE.Insight case study on HEP preservation:
  — HEP tradition: community-inspired and community-built
  (e-)infrastructures for research
  — Understand the desires and concerns of the community
  — Chart a way out for data preservation in the “worst-
  case scenario” of HEP, benefiting other communities
  — Debates and workshops planned in next months
HEP Community Survey - First findings
500+ scientists answering a survey in its 1st week. 50% from Europe, 25% from the US.
34% left their e-mail address to hear more. 17% wants to be interviewed to say more!


             0.4%

                    7.7%

                                               24.2%

                                                                        39.0%

                                                       28.8%


•92%: data preservation is important, very important or crucial
•40%: thinks important HEP data have been lost
•45%: access to old data would have improved my scientific
results
•74%: a trans-national infrastructure for long-term preservation
of all kinds of scientific data should be built
  Two dozen more questions about do’s and dont’s in HEP preservation
Conclusions
• HEP spearheaded (Open) Access to Scientific
  Information: 50 years of preprints, 16 of repositories
  ... but data preservation is not yet on the radar
• Heterogeneous continua to preserve data for
• No insurmountable technical problems
• The issue is the data model itself
   – (Primary) data intelligible only to the producers
   – A monolithic culture need a paradigm shift
   – Preservation implies massive person-power costs
• Need cultural consensus and financial support
     Preservation is appearing on the agenda...
         Powerful synergy between CERN and
 the Alliance for Permanent Access & PARSE.Insight
           Exciting times are ahead!

More Related Content

What's hot

Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructureAnubhav Jain
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
Parton Distributions: future needs, and the role of the High-Luminosity LHC
Parton Distributions: future needs, and the role of the High-Luminosity LHCParton Distributions: future needs, and the role of the High-Luminosity LHC
Parton Distributions: future needs, and the role of the High-Luminosity LHCjuanrojochacon
 
Curating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital ExperimentsCurating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital ExperimentsJose Enrique Ruiz
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Ian Foster
 
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in Detail
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in DetailOverlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in Detail
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in DetailJose Antonio Coarasa Perez
 
FireWorks overview
FireWorks overviewFireWorks overview
FireWorks overviewAnubhav Jain
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAnubhav Jain
 
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsPeter van Heusden
 
Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Anubhav Jain
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...Anubhav Jain
 
Morgan uw maGIV v1.3 dist
Morgan uw maGIV v1.3 distMorgan uw maGIV v1.3 dist
Morgan uw maGIV v1.3 distddm314
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceRobert Grossman
 
Atomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discoveryAtomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discoveryAnubhav Jain
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!Ian Foster
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryIan Foster
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsAnubhav Jain
 

What's hot (20)

Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
Big Data Management at CERN: The CMS Example
Big Data Management at CERN: The CMS ExampleBig Data Management at CERN: The CMS Example
Big Data Management at CERN: The CMS Example
 
Parton Distributions: future needs, and the role of the High-Luminosity LHC
Parton Distributions: future needs, and the role of the High-Luminosity LHCParton Distributions: future needs, and the role of the High-Luminosity LHC
Parton Distributions: future needs, and the role of the High-Luminosity LHC
 
Curating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital ExperimentsCurating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital Experiments
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
 
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in Detail
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in DetailOverlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in Detail
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in Detail
 
ICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials ProjectICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials Project
 
FireWorks overview
FireWorks overviewFireWorks overview
FireWorks overview
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
 
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformatics
 
Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...
 
Morgan uw maGIV v1.3 dist
Morgan uw maGIV v1.3 distMorgan uw maGIV v1.3 dist
Morgan uw maGIV v1.3 dist
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Atomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discoveryAtomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discovery
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 

Viewers also liked (9)

Long Term Preservation Dale Peters
Long Term Preservation Dale PetersLong Term Preservation Dale Peters
Long Term Preservation Dale Peters
 
just check it
just check itjust check it
just check it
 
funny kids
funny kidsfunny kids
funny kids
 
Infrastructure Training Session
Infrastructure Training SessionInfrastructure Training Session
Infrastructure Training Session
 
Presentazione Lunghi
Presentazione LunghiPresentazione Lunghi
Presentazione Lunghi
 
Trm Vilnius Metadata New
Trm Vilnius Metadata NewTrm Vilnius Metadata New
Trm Vilnius Metadata New
 
think about it
think about itthink about it
think about it
 
Lunghi Lisbon
Lunghi LisbonLunghi Lisbon
Lunghi Lisbon
 
Appraisal, Selection And Preservation Hans Hofman
Appraisal, Selection And Preservation Hans HofmanAppraisal, Selection And Preservation Hans Hofman
Appraisal, Selection And Preservation Hans Hofman
 

Similar to Preservation And Reuse In High Energy Physics Salvatore Mele

Computing Challenges at the Large Hadron Collider
Computing Challenges at the Large Hadron ColliderComputing Challenges at the Large Hadron Collider
Computing Challenges at the Large Hadron Colliderinside-BigData.com
 
MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...Sri Ambati
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridSwiss Big Data User Group
 
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystemTraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystemTimeScience
 
Big data from the LHC commissioning: practical lessons from big science - Sim...
Big data from the LHC commissioning: practical lessons from big science - Sim...Big data from the LHC commissioning: practical lessons from big science - Sim...
Big data from the LHC commissioning: practical lessons from big science - Sim...jaxLondonConference
 
Introducing the HACC Simulation Data Portal
Introducing the HACC Simulation Data PortalIntroducing the HACC Simulation Data Portal
Introducing the HACC Simulation Data PortalGlobus
 
Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...ExtremeEarth
 
Big Data for Big Discoveries
Big Data for Big DiscoveriesBig Data for Big Discoveries
Big Data for Big DiscoveriesGovnet Events
 
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...Brad Evans
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Databricks
 
Integration of oreChem with the eCrystals repository for crystal structures
Integration of oreChem with the eCrystals repository for crystal structuresIntegration of oreChem with the eCrystals repository for crystal structures
Integration of oreChem with the eCrystals repository for crystal structuresMark Borkum
 
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...EarthCube
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Anubhav Jain
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesBertram Ludäscher
 

Similar to Preservation And Reuse In High Energy Physics Salvatore Mele (20)

Computing Challenges at the Large Hadron Collider
Computing Challenges at the Large Hadron ColliderComputing Challenges at the Large Hadron Collider
Computing Challenges at the Large Hadron Collider
 
MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
 
Jarp big data_sydney_v7
Jarp big data_sydney_v7Jarp big data_sydney_v7
Jarp big data_sydney_v7
 
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystemTraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
 
Big data from the LHC commissioning: practical lessons from big science - Sim...
Big data from the LHC commissioning: practical lessons from big science - Sim...Big data from the LHC commissioning: practical lessons from big science - Sim...
Big data from the LHC commissioning: practical lessons from big science - Sim...
 
Eln Jisc Mrc 18dec07 Nsb
Eln Jisc Mrc 18dec07 NsbEln Jisc Mrc 18dec07 Nsb
Eln Jisc Mrc 18dec07 Nsb
 
Introducing the HACC Simulation Data Portal
Introducing the HACC Simulation Data PortalIntroducing the HACC Simulation Data Portal
Introducing the HACC Simulation Data Portal
 
Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...
 
Big Data for Big Discoveries
Big Data for Big DiscoveriesBig Data for Big Discoveries
Big Data for Big Discoveries
 
Big data
Big dataBig data
Big data
 
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
 
Integration of oreChem with the eCrystals repository for crystal structures
Integration of oreChem with the eCrystals repository for crystal structuresIntegration of oreChem with the eCrystals repository for crystal structures
Integration of oreChem with the eCrystals repository for crystal structures
 
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 

More from DigitalPreservationEurope

Digital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and RequirementsDigital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and RequirementsDigitalPreservationEurope
 
Building A Sustainable Model for Digital Preservation Services, Clive Billenn...
Building A Sustainable Model for Digital Preservation Services, Clive Billenn...Building A Sustainable Model for Digital Preservation Services, Clive Billenn...
Building A Sustainable Model for Digital Preservation Services, Clive Billenn...DigitalPreservationEurope
 
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred ThallerSignificant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred ThallerDigitalPreservationEurope
 
Scalable Services For Digital Preservation Ross King
Scalable Services For Digital Preservation Ross KingScalable Services For Digital Preservation Ross King
Scalable Services For Digital Preservation Ross KingDigitalPreservationEurope
 
Preservation Challenge Radioactive Waste Ian Upshall
Preservation Challenge Radioactive Waste Ian UpshallPreservation Challenge Radioactive Waste Ian Upshall
Preservation Challenge Radioactive Waste Ian UpshallDigitalPreservationEurope
 

More from DigitalPreservationEurope (20)

Drm Training Session
Drm Training SessionDrm Training Session
Drm Training Session
 
2009 Barcelona Wepreserve Nestor
2009 Barcelona Wepreserve Nestor2009 Barcelona Wepreserve Nestor
2009 Barcelona Wepreserve Nestor
 
Trusted Repositories
Trusted RepositoriesTrusted Repositories
Trusted Repositories
 
Preservation Metadata
Preservation MetadataPreservation Metadata
Preservation Metadata
 
An Introduction to Digital Preservation
An Introduction to Digital PreservationAn Introduction to Digital Preservation
An Introduction to Digital Preservation
 
Introduction to Planets
Introduction to PlanetsIntroduction to Planets
Introduction to Planets
 
Digital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and RequirementsDigital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and Requirements
 
The Planets Preservation Planning workflow
The Planets Preservation Planning workflowThe Planets Preservation Planning workflow
The Planets Preservation Planning workflow
 
Building A Sustainable Model for Digital Preservation Services, Clive Billenn...
Building A Sustainable Model for Digital Preservation Services, Clive Billenn...Building A Sustainable Model for Digital Preservation Services, Clive Billenn...
Building A Sustainable Model for Digital Preservation Services, Clive Billenn...
 
Preservation Metadata, Michael Day, DCC
Preservation Metadata, Michael Day, DCCPreservation Metadata, Michael Day, DCC
Preservation Metadata, Michael Day, DCC
 
PLATTER - Jan Hutar
PLATTER - Jan HutarPLATTER - Jan Hutar
PLATTER - Jan Hutar
 
Sustainability Clive Billenness
Sustainability Clive  BillennessSustainability Clive  Billenness
Sustainability Clive Billenness
 
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred ThallerSignificant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
 
Shaman Project Hemmje
Shaman Project  HemmjeShaman Project  Hemmje
Shaman Project Hemmje
 
Scalable Services For Digital Preservation Ross King
Scalable Services For Digital Preservation Ross KingScalable Services For Digital Preservation Ross King
Scalable Services For Digital Preservation Ross King
 
Risks Benefits And Motivations Seamus Ross
Risks Benefits And Motivations Seamus RossRisks Benefits And Motivations Seamus Ross
Risks Benefits And Motivations Seamus Ross
 
Representation Information Steve Rankin
Representation Information Steve RankinRepresentation Information Steve Rankin
Representation Information Steve Rankin
 
Preservation Challenge Radioactive Waste Ian Upshall
Preservation Challenge Radioactive Waste Ian UpshallPreservation Challenge Radioactive Waste Ian Upshall
Preservation Challenge Radioactive Waste Ian Upshall
 
Platter Colin Rosenthal
Platter Colin RosenthalPlatter Colin Rosenthal
Platter Colin Rosenthal
 
Planets Testbed Brian Aitken
Planets Testbed Brian AitkenPlanets Testbed Brian Aitken
Planets Testbed Brian Aitken
 

Recently uploaded

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Recently uploaded (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

Preservation And Reuse In High Energy Physics Salvatore Mele

  • 1. HEP and its data What is the trouble? A possible way forward Salvatore Mele Costs, Benefits and Motivations for Digital Preservation CERN Nice - November 28th 2008
  • 2. High-Energy Physics (or Particle Physics) HEP aims to understand how our Universe works: — by discovering the most elementary constituents of matter and energy — by probing their interactions — by exploring the basic nature of space and time In other words, try to answer two eternal questions: — quot;What is the world made of?” — quot;What holds it together?” Build the largest scientific instruments ever to reach energy densities close to the Big Bang; write theories to predict and describe the observed phenomena
  • 3. CERN: European Organization for Nuclear Research (since 1954) • The world leading HEP laboratory, Geneva (CH) • 2500 staff (mostly engineers, from Europe) • 9000 users (mostly physicists, from 580 institutes in 85 countries) • 3 Nobel prizes (Accelerators, Detectors, Discoveries) • Invented the web • Just switched on the 27-km (6bn€) LHC accelerator • Runs a 1-million objects Digital Library CERN Convention (1953): ante-litteram Open Access mandate “… the results of its experimental and theoretical work shall be published or otherwise made generally available”
  • 4. The Large Hadron Collider •Largest scientific instrument ever built, 27km of circumference •The “coolest” place in the Universe -271˚C •10000 people involved in its design and construction •Worldwide budget of 6bn€ •Will collide protons to reproduce conditions at the birth of the Universe... ...40 million times a second
  • 5. The LHC experiments: about 100 million “sensors” each [think your 6MP digital camera... ...taking 40 million pictures a second] ATLAS CMS five-storey building
  • 6. The LHC data • 40 million events (pictures) per second • Select (on the fly) the ~200 interesting events per second to write on tape • “Reconstruct” data and convert for analysis: “physics data” [inventing the grid...] (x4 experiments x15 years) Per event Per year Raw data 1.6 MB 3200 TB Reconstructed data 1.0 MB 2000 TB Physics data 0.1 MB 200 TB
  • 7. Preservation, re-use and (Open) Access to HEP data Problem Opportunity Challenge
  • 8. Some other HEP facilities (recently stopped or about to stop) No real long-term archival strategy... ...against billions of invested funds! Energy frontier HERA@DESY TEVATRON@FNAL BELLE@KEK KLOE@LNF LEP@CERN BABAR@SLAC Precision frontier
  • 9. Why shall we care? •We cared producing these data in first instance •Unique, extremely costly, non-reproducible •Might need to go back to the past (it happened) •A peculiar community (the web, arXiv, the grid...) •“Worst-case scenario” complex data, no preservation... “If it works here, will work in many other places”
  • 10. Preservation, re-use and (open) access continua (who and when) • The same researchers who took the data, after the closure of the facility (~1 year, ~10 years) • Researchers working at similar experiments at the same time (~1 day, week, month, year) • Researchers of future experiments (~20 years) • Theoretical physicists who may want to re- interpret the data (~1 month, ~1 year, ~10 years) • Theoretical physicists who may want to test future ideas (~1 year, ~10 years, ~20 years)
  • 11. Data preservation, circa 2004 140 pages of tables
  • 12. What is the trouble with preserving HEP data? Where to put them ? Hardware migration ? Software migration/emulation?
  • 13. What is the trouble with preserving HEP data? Where to put them ? Hardware migration ? Software migration/emulation?
  • 14. HEP, Open Access & Repositories • HEP is decades ahead in thinking Open Access: – Mountains of paper preprints shipped around the world for 40 years (at author/institute expenses!) – Launched arXiv (1991), archetypal Open Archive – >90% HEP production self-archived in repositories – 100% HEP production indexed in SPIRES(community run database, first WWW server on US soil) • OA is second nature: posting on arXiv before submitting to a journal is common practice – No mandate, no debate. Author-driven. • HEP scholars have the tradition of arXiving their output (helas, articles) somewhere Data repositories would be a natural concept
  • 15. What is the trouble with preserving HEP data? Where to put them ? Hardware migration ? Software migration/emulation?
  • 16. Life-cycle of previous-generation CERN experiment L3 at LEP 1984 Begin of construction 1989 Start of data taking 2000 End of data taking 2002 End of in-silico experiments 2005 End of (most) data analysis Storage and migration of data at the CERN computing centre 1993 ~150’000 9track  3480 0.2GB 1997 ~250’000 3480  Redwood 20GB 2001 ~25’000 Redwood  9940 60GB 2004 ~5’000 9940A  9940B 200GB 2007 ~22’000 9940B  T1000A 500GB
  • 17. What is the trouble with preserving HEP data? Where to put them ? Hardware migration ? Software migration/emulation?
  • 18. Life-cycle of previous-generation CERN experiment L3 at LEP 1984 Begin of construction 1989 Start of data taking 2000 End of data taking 2002 End of in-silico experiments 2005 End of (most) data analysis Computing environment of the L3 experiment at LEP 1989-2001 VAX for data taking 1986-1994 IBM for data analysis 1992-1998 Apollo (HP) workstations 1996-2001 SGI mainframe 1997-2007 Linux boxes
  • 19. Life-cycle of previous-generation CERN experiment L3 at LEP 1984 Begin of construction 1989 Start of data taking 2000 End of data taking 2002 End of in-silico experiments 2005 End of (most) data analysis Computing environment of the L3 experiment at LEP 1989-2001 VAX for data taking 1986-1994 IBM for data analysis 1992-1998 Apollo (HP) workstations 1996-2001 SGI mainframe 1997-2007 Linux boxes
  • 20. What is the trouble with preserving HEP data? Where to put them ? Hardware migration ? Software migration/emulation? The HEP data !
  • 21. Balloon Preserving HEP data? (30 km) • The HEP data model is highly complex. CD stack with 1 year LHC data! Data are traditionally not re-used as in (~ 20 km) Astronomy or Climate science. • Raw data  calibrated data  skimmed data  high-level objects  physics analyses  results. Concorde (15 km) • All of the above duplicated for in-silico experiments, necessary to interpret the highly-complex data. • Final results depend on the grey literature on calibration constants, Mt. Blanc (4.8 km) human knowledge and algorithms needed for each pass...oral tradition! • Years of training for a successful analysis
  • 22. Need a“parallel way” to publish/preserve/re-use/OpenAccess • In addition to experiment data models, elaborate a parallel format for (re-)usable high-level objects – In times of need (to combine data of “competing” experiments) this approach has worked – Embed the “oral” and “additional” knowledge • A format understandable and thus re-usable by practitioners in other experiments and theorists • Start from tables and work back towards primary data • How much additional work? 1%, 5%, 10%?
  • 23. Issues with the “parallel” way • A small fraction of a big number gives a large number • Activity in competition with research time • 1000s person-years for parallel data models need enormous (impossible?) academic incentives for realization ...or additional (external) funds • Need insider knowledge to produce parallel data • Address issues of (open) access, credit, accountability, “careless measurements”, “careless discoveries”, reproducibility of results, depth of peer-reviewing • A monolithic way of doing business needs rethinking
  • 24. Preservation, re-use and (Open) Access to HEP data... first steps! •Outgrowing an institutionalized state of denial •A difficult and costly way ahead •An issue which starts surfacing on the agenda — Part of an integrated digital-library vision — CERN joined the Alliance for Permanent Access — CERN part of the FP7 PARSE.Insight project •PARSE.Insight case study on HEP preservation: — HEP tradition: community-inspired and community-built (e-)infrastructures for research — Understand the desires and concerns of the community — Chart a way out for data preservation in the “worst- case scenario” of HEP, benefiting other communities — Debates and workshops planned in next months
  • 25. HEP Community Survey - First findings 500+ scientists answering a survey in its 1st week. 50% from Europe, 25% from the US. 34% left their e-mail address to hear more. 17% wants to be interviewed to say more! 0.4% 7.7% 24.2% 39.0% 28.8% •92%: data preservation is important, very important or crucial •40%: thinks important HEP data have been lost •45%: access to old data would have improved my scientific results •74%: a trans-national infrastructure for long-term preservation of all kinds of scientific data should be built Two dozen more questions about do’s and dont’s in HEP preservation
  • 26. Conclusions • HEP spearheaded (Open) Access to Scientific Information: 50 years of preprints, 16 of repositories ... but data preservation is not yet on the radar • Heterogeneous continua to preserve data for • No insurmountable technical problems • The issue is the data model itself – (Primary) data intelligible only to the producers – A monolithic culture need a paradigm shift – Preservation implies massive person-power costs • Need cultural consensus and financial support Preservation is appearing on the agenda... Powerful synergy between CERN and the Alliance for Permanent Access & PARSE.Insight Exciting times are ahead!