Injustice - Developers Among Us (SciFiDevCon 2024)
Quality and capacity expansion of thematic services in EOSC-SYNERGY
1. www.eosc-synergy.eu
www.eosc-synergy.eu
Quality and Capacity
expansion of Thematic
Services in EOSC-SYNERGY
26/03/2021
Ignacio Blanquer (UPV), Alberto Azevedo (LNEC), Thiago Emmanuel Pereira
(UFCG), Manuel Pavesio-Blanco (INDRA), Salvador Capella-Gutierrez (BSC),
Laura del Cano (CSIC-CNB), Rubio-Montero Antonio Juan (CIEMAT), Jan Astalos
(IISAS), Tobias Kerzenmacher (KIT).
2. www.eosc-synergy.eu
EOSC-SYNERGY in a nutshell
22 partners in 10 countries
(ES, PT, FR, UK, DE, NL, CZ, SK, PL and BR)
Promote EOSC High Quality
Services
Software quality as a service, FAIRness evaluation
and quality certification badges
Thematic Services Integration
10 thematic services addressing 4 scientific
areas (Earth Observation, Environment,
Biomedicine and Astrophysics)
Alignment at the Policy Level
Collaboration with regional projects on
landscaping activities, gap analysis and
contribution to EOSC policies
Capacity Expansion at the
Infrastructure level
Integration of services and resources from
the RIs of the consortium partners
Skills development
Environment for tutorials with a dedicated
MOOC platform, courses methodology and a
Hackaton as a service platform
2
3. www.eosc-synergy.eu 3
The Thematic Services Virtuous Cycle
Increase the capacity,
performance,
reliability and/or
functionality
By means of best practices for
adopting common EOSC core
tools and services.
Increase service
quality
FAIR data practices and
software quality assessment.
Increase relevance
of National
Thematic Services
By expanding the use of the
mature national services in
an international scope.
Increase the
number of users
By means of the integration
in EOSC and the training.
01
02
03
04
Biomedicine Astrophysics
Earth
Observ.
Environment
4. www.eosc-synergy.eu 4
0. Thematic Services
Earth
Observation
WORSICA - Water Monitoring
Sentinel Cloud Platform
SAPS - Surface Energy Balance
Automated Processing
Service
GCORE - Acquire, catalogue &
process EOS data
Environment
UMSA - Untargeted Mass-
Spectrometry Analysis
MSWSS - Water Supply Systems
modeling and analysis
O3AS - Ozone Analysis Service
SDSWAS - A Service related to the
mineral dust forecast
Biomedicine
SCIPION - CryoEM
data processing for
Structural Biology
OpenEBench - ELIXIR
benchmarking and
technical
monitoring
platform
Astrophysics
LAGO - Latin
American Giant
cosmic ray
Observatory
5. www.eosc-synergy.eu
1. Increase the capacity, performance, reliability and/or
functionality: Improvements due to EOSC-Synergy
- Integration of standardized AAI IdPs to facilitate user management.
- Improvement of processing backends by replacing single computing
instances with batch job queues, container management
platforms or clients to high-throughput computing backends.
- Publishing the output results in persistent repositories.
- Improving repeatability and platform-agnosticism by describing
the application topologies as code using standard TOSCA language.
- PID annotation of output data and integration in official harvesters.
- Self-management of resources to reduce maintenance costs.
5
6. www.eosc-synergy.eu
1. Increase the capacity, performance, reliability and/or
functionality: Adoption of EOSC Services
6
Service WORSICA G-Core SAPS Scipion LAGO SDS-WAS UMSA MSWSS O3AS OpenEBench
AAI
EGI Check
in
Kerberos
LDAP & CAS
User/pwd
EGI Check
in
EGI Check in
eduTEAMS+
EGI Check-in
B2ACCESS
EGI Check in &
Life- science
AAI
EGI Check
in
EGI Check in
Life Sciences
AAI
Workload
Mng.
ArcCE,
Batch
(SLURM)
GCore+ K8s K8s
Batch
(SLURM)
Batch
(SLURM)
Batch
(SLURM)
Batch
(SLURM) in
IM/EC3 (in
Galaxy)
Batch
(SLURM)
in EC3 (in
Galaxy)
Cluster batch
(SLURM) &
K8s
GA4GH
WES/TES
stack +
NextFlow
Resource
Mng.
IM (TOSCA) IM / EC3 IM / EC3 IM / EC3
Local
clusters &
IM+EC3
EC3 IM / EC3 IM / EC3 IM one
Data
Storage
Nextcloud,
Dataverse
ElasticSearch
for the
catalogue
OpenStack
Swift
Local + S3
EGI DataHub
ONEDATA
B2HANDLE
/B2SAFE
Local + S3
Local +
ONEDAT
A
WebDAV
Local +
B2SHARE
Services already integrated
at MS18 (PM16)
Services Planned
by MS19 (PM24)
Services in EOSC Marketplace
7. www.eosc-synergy.eu 7
2. Increase service quality: Software Quality
Assessment
- TSs Quality increase through three paths:
- Evaluation of software quality of the components
adopted.
- Evaluation of the software quality of the adaptation
performed in the TSs in the frame of EOSC-SYNERGY.
- Evaluation of the service quality by the adoption of
monitoring and CI/CD pipelines.
- Currently:
- 5 TS have adopted the JePL for software quality
(WORSICA, SAPS, LAGO, OpenEBench, O3AS).
- Already involves 14 TS repositories.
- Best practices created for Python and Java
(https://u.i3m.upv.es/s71hd).
8. www.eosc-synergy.eu
2. Increase service quality: FAIR Data
evaluation
- Services consume and produce Data.
- In some cases the output Data is the major
outcome.
- Data produced will be evaluated according
to the FAIR principles.
- Checking compliance with the individual
DMPs (http://hdl.handle.net/10261/219309).
- Using automatic evaluation tools (T3.3).
- Registering them on Scientific Community
Repositories (such as EMPIAR).
8
9. www.eosc-synergy.eu 9
3. Increase the number of users: Integration
in EOSC & Training
- Services will be registration of
services in the EOSC
Marketplace Portal
- Once MS19 is reached.
- Demonstration videos available
for explaining the improvements
to the Thematic Services and the
new usage models.
https://learn.eosc-synergy.eu/
10. www.eosc-synergy.eu 10
4. Increase relevance: Measuring success -
Metrics and KPIs
Cross
Fertilization
05
● Number of code transfers.
● Number of joint dissemination actions.
● Number of synergies among thematic services not reflected above.
Scientific Impact
04
● Number of publications acknowledging the service.
● Number of communications (talks, panels, posters, etc.)..
● Number of individual training hours on the service.
Usability
03
● Performance, Scalability.
● Learning curve, Error management, Robustness.
● Completion, Interoperability, Convenience.
Service Usage
02
● Number of service accesses in a given time / accumulated
● CPU hours / RAM Size in a give time / accumulated
● Max. capacity / capability experimented (vcpus & Memory).
● Max. Throughput in service accesses.
01 User Community
● Number of direct/indirect users in a given period/ Accumulative
● Number of centres/countries
● Number of recurrent users
Accounting
https://bit.ly/2L
GpiFp &
Deployment
Services
VOs
operations-
portal.egi.eu/vo
/
Questionnaires
Publication
archives
11. www.eosc-synergy.eu 11
4. Increase relevance: Measuring success - Metrics and
KPIs: Linkage to e-Infrastructures
Over 1 Million CPU core hours in the last year.
https://bit.ly/2LGpiFp
90 registered users in VOs
Sustained increase of deployments
12. www.eosc-synergy.eu 12
Conclusions
- EOSC-SYNERGY aims at Building Capacities in EOSC through the
development of ten data-intensive thematic services oriented to different
scientific disciplines.
- The adaptation, improvement and quality assessment of those services on a
Federated Data Infrastructure strongly aligns with the objectives of EOSC
(*) and will develop best practices and experiences
- A key factor for the success of EOSC (**) is performance: how EOSC as an ecosystem
operates and how the resources are used and acknowledged by the users.
- All the services consume services from the EOSC catalogue, which will provide
feedback on the usability and relevance of the model.
(*) Draft EOSC partnership proposal: “..It aims to accelerate the deployment and consolidation of an open, trusted, virtual, federated environment in Europe to store,
share and re-use research data across borders and scientific disciplines and provide access to rich array of related services..”.
(**) Solutions for a Sustainable EOSC: An Iron Lady report from the EOSC Sustainability Working Group, Draft 16 September 2020.
13. www.eosc-synergy.eu
Questions & Contact
Ignacio Blanquer Espert
iblanque@dsic.upv.es
Instituto de Instrumentación Para la
Imagen Molecular
Universitat Politècnica de València
13
www.eosc-synergy.eu
@EOSC_synergy
Notas do Editor
Authentication and Authorization Infrastructure (AAI). All cases require users to be authenticated and authorised. In some cases, there is a need for delegation from the users that access the platform for accessing data or processing resources. In those cases, it is mandatory to have a coherent single-sign on mechanism. Other cases may require an AAI linked to popular scientific IdPs and implement the authentication via Virtual Organization membership.
Workload Management. Most of the cases deal with the execution of a set of batch jobs. In those cases, workload managers should be integrated. This will provide the capability to deal with a larger capacity. Options range from using a standard batch queue (SLURM) eventually powered up with automatic elasticity to using Kubernetes for the orchestration of containers.
Resource Management. Most of the thematic services require deploying a virtual infrastructure where the services that provide the functionality and the processing will take place. In most cases, the use of Infrastructure Manager (IM) or Elastic Compute Clusters in the Cloud (EC3) could provide the capability of defining a virtual infrastructure as code and deploying it on the cloud.
Data Storage. The services need to have a storage connected to the processing that can be efficiently accessed. In this case, there is a wide range of different solutions, ranging from EGI-DataHub and B2Share to local solutions based on Nextcloud, Datavers, Elasticsearch and WebDav.
Two usage modes:
User deploys its own service on Cloud resources (e.g. SAPS, SCIPION).
Single instance of the service serving multiple users (e.g. WORSICA, OpenEBench).
AAI: EGI Check-in or B2Access mainly for interacting with the infrastructure
Few cases in which users will use federated credentials to access the services - mainly related to storage.
Resource Management
IM or EC3 with recipes for K8s, Slurm, Galaxy or dedicated clusters.
Job Scheduling
Mostly container-oriented (e.g. gCore, SAPS...).
Jobs based on SLURM or Galaxy.
Storage
EGIDATAHUB, Dataverse, B2SAFE, B2SHARE