2. Page 2| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, May 2019
(National)
3. Page 3
DESY Campus Hamburg – much more communities
Synchrotron radiation source (highest brilliance)
VUV & soft-x-ray free-electron laser
MPI-SD
FLASH
PETRA III
+
X-Ray Free-Electron Laser
atomic structure & fs dynamics
of complex matter
CHyN
HARBOR
CXNS
NanoLab
CWS
4. Page 4
sources of data
• 3 active accelerators on-site (all photon science) – Petra III, FLASH and EuXFEL
• currently 30 active experimental areas (called beamlines) - operated in parallel
• more in preparation
• Petra IV (future) – expect 104-5
more data
• majority of generated data is analyzed with a few months (cooling)
• have two independent copies asap (raw & calibration data)
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, May 2019
5. Page 5
DESY datacenter - resources interacting with ARCHIVER
data processing resources before archiving
• HPC cluster – 400 nodes, 30,000 cores, large InfiniBand fabric
• GPFS – 30 building blocks, >30PB, all InfiniBand connected
• BeeGFS - 3PB, InfiniBand connected
• LHC computing - Analysis Facility + Tier-2, 1000 nodes, 30, 000 cores
• 50-60% more resources outside the datacenter (mostly at experimental stations)
current archiving capabilities
• dCache - 5 large instances, >50PB capacity, >120 building blocks, Tape gateway
• Tape – 2 x SL8500 (15000 Slots), 25 x LTO8, 8xLTO6, >80PB capacity
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, May 2019
6. Page 6
data life cycle as of today - from the cradle to the grave
• new archive service connected to ‘Core-FS’ and/or after dCache to fit seamlessly into existing workflow
• this scenario will most likely use the full automated (API/CLI) archive system interface
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, May 2019
7. Page 7
site manager & administrative workflows
• full networked service allowing vertical and horizontal scaling (obvious)
• wide range of authentication methods usable (beside local site ones) – x509, OpenID, eduGAIN, … - more is better
• used to ‘authenticate’ and to be usable in ‘ACL’ like authorization settings (the identity or DN)
• role based service selection (archive profiles - user->set of roles->set of archive profiles)
• delegation model for administration - site admin + group admins (with site admin defined limits/pre-selections)
• site data policy and community contract dependent ‘archive profiles’ defining major parameters and limits i.e. QOS defs
• wide-area access
• http* based - allow platform independent tools and standard firewall configs (i.e. webdav, …)
• mobile devices (tablet, phone, …) (tools + protocols) not excluded
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, May 2019
integration, setup and control - workflow derived requirements
8. Page 8
end user workflows - I
• individual scientist archiving important work (i.e. publication, partial analysis results, …) – DOI required
• key metrics
• Single archive size: average 10-100 GB.
• Files in archive: average 10,000
• Total archive size per user: 5 TB
• Duration: 5-10 years
• Ingest rates: 10-100 MB/s (more is better)
• encryption: not required, nice to have
• browser based interaction (authentication, data transfers, metadata query/ingest)
• cli tools usable for data ingest
• metadata query
• starting from a single string input (like Google search) - interactive/immediate selection response
• other methods (i.e. referencing/finding through experiment managing services) used in addition
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, May 2019
individual scientist – managing private scientific data (on its own generated and managed)
9. Page 9
end user workflows - II
• beamline (experimental station) specific + experiment specific, medium size and rate
• key size parameters
• Single archive size: average 5 TB
• Files in archive: average 150,000
• Total archive size per beamline: 400 TB, doubles every year
• Duration: 10 years
• Ingest rates: 1-2GB/s
• encryption: no required
• 3’rd party copy - ‘gather’ all data from various primary storage systems - controlled from single point
• local (to site) data transport should be RDMA based and operate (efficiently) on networks faster than 10Gbs
• data encryption in transit not required
• API + CLI for seamless automation - i.e. API manifested as Rest-API
• CLI on Linux, API should support all platforms (incl. Windows ;-)
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, May 2019
beamline manager – mix of automated and experiment specific/manual archive interaction
10. Page 10
end user workflows - III
• large collaboration or site managing and controlling archive operations on behalf of (all experiments) - all automated and
large scale
• all inherited from previous workflow - except the manual part - all interaction automated
• key size parameters
• Single archive size: average 400 TB.
• Files in archive: average 25,000
• Total archive size per beamline: 10s PB, doubles every year
• Duration: 10 years
• Ingest rates: 10-100GB/s - for a period of 20-50 min
• encryption: not required
• bulk recall - planned re-analysis require bulk restore operation with decent rates (50% of ingest rate) (feed the compute engine)
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, May 2019
Integrated data archiving for large standardized beamline/facility experiments
11. Page 11
left over…
• life cycle of archive objects (not bound to a single access session) - create, fill (meta)data, close - data becomes immutable, query
• archive objects could be related to existing ones - i.e. containing new versions of derived data
• archive service should generate and handle DOIs (zenodo) for durable external references
• all data access should be ‘stream’ based
• no random access (within a file) is required
• recalls of pre-selected files out of single archive object
• asynchronous notifications on (selectable) conditions (events). Support interaction (external state) with DBs external to archive system
• i.e. archive object is saved, verified (as condition)
• deployment scenarios
• main services and esp. metadata store/query
• local on site
• cloud (using remote service and storage/handling hardware)
• bit stream preservation layer
• local only
• tiered - local and remote (i.e. remote tape) - remote could be ‘cooperating lab’, public cloud, …
• (streaming) protocol to transfer data between tiers should adhere to ‘wide area’ thoughts (standards based)
• Billing
• any ‘non-local’ deployment requires billing services and methods (obvious) seperated in service and storage costs (at least)
• external storage resource - long term predictable costs/contracts preferred (less ‘pay as you go’)
• per user and group billing (user may be member of several groups and groups might be nested)
• encryption - in all cases is ‘nice to have’ expecting issues with local ‘key management’ services
• pre and post en/decryption of data in motion and/or at rest is a valid alternative
• (Meta)data formats
• no special (known to the archive service) data formats required, thus no format conversions (without user interaction) required
• Metadata, needs ‘exportable’ to new/updated instances
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, May 2019
other thoughts, requirements and options