SlideShare uma empresa Scribd logo
1 de 24
www.eudat.euEUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
B2SAFE
How to replicate your data using EUDAT’s B2SAFE
Version 3
November 2015
This work is licensed under the Creative
Commons CC-BY 4.0 licence.
Attribution: EUDAT – www.eudat.eu
Replicate Research Data Safely
eudat.eu/b2safe
www.eudat.eu
B2SAFE
B2SAFE is a robust, safe and highly available service which allows
community and departmental repositories to implement data
management policies on research data across multiple administrative
domains in a trustworthy manner.
eudat.eu/b2safe
replicate research data into secure
data stores
archive and preserve research data in
the long-term
bring data close to powerful
compute resources
co-locate data with different
communities
benefit from economies of scale
The ideal solution for communities with no facility for archival
to:
Features:
large-scale storage
robust and highly available
permanent PIDs
eudat.eu/b2safe
Where is B2SAFE in the EUDAT suite?
B2SAFE
Replicate
Research
Data Safely
eudat.eu/b2safe
Better safe than sorry….
to guard against data loss in
long-term archiving and
preservation,
to optimize access for users
from different regions, and
to bring data closer to
powerful computers for
compute-intensive analysis.
In today’s rich data-storage
ecosystems, large data centres
must offer a robust, safe and
highly available replication
service to allow community and
departmental repositories to
replicate their research data:
“I want to replicate
my collection X to
two data centres
and store the
collection safely for
10 years”.
eudat.eu/b2safe
B2SAFE Features (1/2)
Based on the execution of auditable data policy rules and the
use of persistent identifiers (PIDs).
Respects the rights of the data owners to define the access
rights for their data and to decide how and when they are
made publicly referenceable.
Employs Data Policy Manager to allow centrally managed,
community-defined data policies.
B2SAFE Training
eudat.eu/b2safe
B2SAFE Features (2/2)
Uses site rule-engines to implement and enforce policy
rules.
Aggregates data from different disciplines into a
storage system of trustworthy and capable data service
providers.
Supports repository packages (e.g. DSPACE, FEDORA)
and a lightweight HTTP-based solution.
B2SAFE Training
eudat.eu/b2safe
Who can benefit?
Small and medium-sized
repositories
lacking the capacity to
store data over longer
periods of time
without long-term
funding for the
preservation of their data
without adequate
compute capacity for
data-intensive
computational services
Data producers and data
consumers
who need to be sure that
trusted centres are taking
care of their data
who want to access added-
value services on data
sources of interest to them
who wish to perform
interdisciplinary research on
top of data from the
heterogeneous EUDAT
communities
eudat.eu/b2safe
What makes B2SAFE unique
Data are stored in the EUDAT Collaborative Data
Infrastructure (CDI) with known policies. Therefore, data
are stored in transparent infrastructures across Europe.
Communities can benefit from the professionally
managed EUDAT infrastructure and concentrate their
effort and budget on their core research.
EUDAT is building a suite of additional services relevant
for the “engine under the hood” of e-science
infrastructures (e.g. EPOS, EMSO, CLARIN).
Data are stored next to HTC & HPC servers ideal for
compute - intensive data processing.
eudat.eu/b2safe
How can you use B2SAFE?
Any community and departmental data repositories can
use B2SAFE. EUDAT experts can help setup the followed
requered technologies
Persistent Identifiers (PIDs).
Metadata describing the properties and context of
the data being replicated.
iRODS (recommended) or similar data management
technology for federation.
To help these groups use the B2SAFE service, EUDAT
offers documentation, training material and a service
helpdesk.
For more information please email: eudat-safereplication@postit.csc.fi
eudat.eu/b2safe
Safe Replication with B2SAFE
EUDAT CDI Domain of registered data
PIDPID
Data Centre
Store
Data Centre
Store
Data Centre
Store
EPIC
service
eudat.eu/b2safe
What happens?
Data from the
Community
repository is
replicated in other
data centres…..
…distributed across
Europe.
eudat.eu/b2safe
What happens step by step?
iRods
PID
Data Center
Store 1
Community repository
Digital Object (DO) unique identifier
(PID) to the DO
PID
Data ingestion
Data replication
own PID
system
OR
iRODS rules
iRods
CommunityCentre
iRods
PID
Data Center
Store 2
Based on community policy
PID assignment
eudat.eu/b2safe
ROR: Repository of Records, the
repository where data was stored
first.
PPID: Parent PID, the persistent
identifier associated to the source
object in a replication chain. If the
chain has only two elements, the
master copy and the first replica,
then the PPID = ROR.
Original DO and replicas
eudat.eu/b2safe
EUDAT partners are already using B2SAFE
eudat.eu/b2safe
Community centre
EUDAT centre
CLARIN
ENES
VPH
Lifewatch
Replicate my collection X to three data centres
CINECA
BSC
EPCC
EPOS
eudat.eu/b2safe
EPOS
EUDAT and EPOS community set up a collaboration to
provide safe back-up and service redundancy to the Italian
seismologist community. The set up of the automated data
transfer between EPOS community and EUDAT is:
EPOS joined the EUDAT CDI
EUDAT defined a specific policy with EPOS
The iRODS irsync protocol was chosen to achieve the best
performance.
In order to achieve an hourly synchronization, checksum
sync and file-age limit options are used.
eudat.eu/b2safe
How to replicate the INGV data to B2SAFE -
The process
Each digital object ingested by CINECA has been registere
assigning to it a Persistent Identifier (PID)
iRODS irsync tool,
running multiple
irsync processes The data archive,
so far, amount to
28,6 TB
7500000 files
PID Registry
EUDAT CDI – CINECA node
The PIDs are registered into the PID registry,
which is hosted at SURFsara and based on the
EPIC service
eudat.eu/b2safe
Experimental features
The current B2SAFE implementation is able to support only a
simple messaging model: the synchronous one. Messaging is
an experimental feature that provides the results in case of
asynchronous (server side triggered) replication process. The
messages are posted to a queue which can be accessed via
an HTTP interface.
The users who ingest data into B2SAFE via GridFTP are not
able to retrieve the pid of the object. Metadata management
is an experimental feature, that supports this
functionality. When enabled it provides a set of metadata
properties for each data object, storing them into a file
(json), placed in (nearly) the same path of the related data
object.
eudat.eu/b2safe
B2SAFE Summary
B2SAFE offers:
functionality to replicate datasets across different
data centres in a safe and efficient way
long-term solution for archiving and preserving
research data
an entry point to bring data closer to powerful
computers for compute-intensive analysis
eudat.eu/b2safe
Future features
Easy setup. B2SAFE provides a script to build rpm and deb
packages. Plan to provide downloadable, easy to install
packages (i.e. click-install-run).
New extensions - connectors. For now, it is possible to ingest
data into B2SAFE stored on a file system or in the DSPACE
repository . New connectors for FEDORA and ePRINTS are
planned to be implemented.
Improve the service with “dynamic data” (streaming data)
capabilities.
Further integration with B2ACCESS.
Support authorization on basis of community access rules.
eudat.eu/b2safe
Hands-on material
Material on B2SAFE hands-on (part 6)
Based on iRODS
Hands-on tutorial which
shows how to:
Manage data across
iRODS zones by policies
Employ PIDs to track data
in a distributed storage
environment
https://github.com/EUDAT-
Training/B2SAFE-B2STAGE-Training
Training module which
provides hands-on
material for:
EUDAT B2SAFE
iRODS4
B2HANDLE
and the EUDAT B2STAGE
service.
eudat.eu/b2safe
Thanks
For more info: https://www.eudat.eu/services/b2safe
www.eudat.eu
Authors Contributors
This work is licensed under the Creative Commons CC-BY 4.0 licence
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures.
Contract No. 654065
Themis Zamani, GRNET Claudio Cacciari, Cineca
Thank you

Mais conteúdo relacionado

Mais de EUDAT

EUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2HANDLE.pdfEUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2HANDLE.pdfEUDAT
 
EUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2DROP.pdfEUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2DROP.pdfEUDAT
 
EUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SHARE.pdfEUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SHARE.pdfEUDAT
 
EUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2SAFE.pdfEUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2SAFE.pdfEUDAT
 
EUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2FIND(1).pdfEUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2FIND(1).pdfEUDAT
 
EUDAT Brochure - B2ACCESS.pdf
EUDAT Brochure - B2ACCESS.pdfEUDAT Brochure - B2ACCESS.pdf
EUDAT Brochure - B2ACCESS.pdfEUDAT
 
Rob Carrillo - Writing effective service documentation for EUDAT services
Rob Carrillo - Writing effective service documentation for EUDAT servicesRob Carrillo - Writing effective service documentation for EUDAT services
Rob Carrillo - Writing effective service documentation for EUDAT servicesEUDAT
 
Ariyo - EUDAT CDI B2 services documentation
Ariyo - EUDAT CDI B2 services documentationAriyo - EUDAT CDI B2 services documentation
Ariyo - EUDAT CDI B2 services documentationEUDAT
 
Introduction to eudat and its services
Introduction to eudat and its servicesIntroduction to eudat and its services
Introduction to eudat and its servicesEUDAT
 
Using B2NOTE: The U.Porto Pilot
Using B2NOTE: The U.Porto PilotUsing B2NOTE: The U.Porto Pilot
Using B2NOTE: The U.Porto PilotEUDAT
 
OpenAIRE Advance - Kick off last week
OpenAIRE Advance - Kick off last weekOpenAIRE Advance - Kick off last week
OpenAIRE Advance - Kick off last weekEUDAT
 
European Open Science Cloud - Skills workshop
European Open Science Cloud - Skills workshopEuropean Open Science Cloud - Skills workshop
European Open Science Cloud - Skills workshopEUDAT
 
Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...EUDAT
 
FAIRness of training materials
FAIRness of training materialsFAIRness of training materials
FAIRness of training materialsEUDAT
 
Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...EUDAT
 
Draft Governance Framework for the EOSC
Draft Governance Framework for the EOSCDraft Governance Framework for the EOSC
Draft Governance Framework for the EOSCEUDAT
 
Building Interoperable AAI for Researchers
Building Interoperable AAI for ResearchersBuilding Interoperable AAI for Researchers
Building Interoperable AAI for ResearchersEUDAT
 
ENVRIPLUS Data for Science Theme
ENVRIPLUS Data for Science ThemeENVRIPLUS Data for Science Theme
ENVRIPLUS Data for Science ThemeEUDAT
 
Data for Science Service Portfolio
Data for Science Service PortfolioData for Science Service Portfolio
Data for Science Service PortfolioEUDAT
 
The ENVRI user landscape
The ENVRI user landscapeThe ENVRI user landscape
The ENVRI user landscapeEUDAT
 

Mais de EUDAT (20)

EUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2HANDLE.pdfEUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2HANDLE.pdf
 
EUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2DROP.pdfEUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2DROP.pdf
 
EUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SHARE.pdfEUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SHARE.pdf
 
EUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2SAFE.pdfEUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2SAFE.pdf
 
EUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2FIND(1).pdfEUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2FIND(1).pdf
 
EUDAT Brochure - B2ACCESS.pdf
EUDAT Brochure - B2ACCESS.pdfEUDAT Brochure - B2ACCESS.pdf
EUDAT Brochure - B2ACCESS.pdf
 
Rob Carrillo - Writing effective service documentation for EUDAT services
Rob Carrillo - Writing effective service documentation for EUDAT servicesRob Carrillo - Writing effective service documentation for EUDAT services
Rob Carrillo - Writing effective service documentation for EUDAT services
 
Ariyo - EUDAT CDI B2 services documentation
Ariyo - EUDAT CDI B2 services documentationAriyo - EUDAT CDI B2 services documentation
Ariyo - EUDAT CDI B2 services documentation
 
Introduction to eudat and its services
Introduction to eudat and its servicesIntroduction to eudat and its services
Introduction to eudat and its services
 
Using B2NOTE: The U.Porto Pilot
Using B2NOTE: The U.Porto PilotUsing B2NOTE: The U.Porto Pilot
Using B2NOTE: The U.Porto Pilot
 
OpenAIRE Advance - Kick off last week
OpenAIRE Advance - Kick off last weekOpenAIRE Advance - Kick off last week
OpenAIRE Advance - Kick off last week
 
European Open Science Cloud - Skills workshop
European Open Science Cloud - Skills workshopEuropean Open Science Cloud - Skills workshop
European Open Science Cloud - Skills workshop
 
Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...
 
FAIRness of training materials
FAIRness of training materialsFAIRness of training materials
FAIRness of training materials
 
Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...
 
Draft Governance Framework for the EOSC
Draft Governance Framework for the EOSCDraft Governance Framework for the EOSC
Draft Governance Framework for the EOSC
 
Building Interoperable AAI for Researchers
Building Interoperable AAI for ResearchersBuilding Interoperable AAI for Researchers
Building Interoperable AAI for Researchers
 
ENVRIPLUS Data for Science Theme
ENVRIPLUS Data for Science ThemeENVRIPLUS Data for Science Theme
ENVRIPLUS Data for Science Theme
 
Data for Science Service Portfolio
Data for Science Service PortfolioData for Science Service Portfolio
Data for Science Service Portfolio
 
The ENVRI user landscape
The ENVRI user landscapeThe ENVRI user landscape
The ENVRI user landscape
 

Último

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 

Último (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 

B2 safe how to replicate your data| www.eudat.eu |

  • 1. www.eudat.euEUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 B2SAFE How to replicate your data using EUDAT’s B2SAFE Version 3 November 2015 This work is licensed under the Creative Commons CC-BY 4.0 licence. Attribution: EUDAT – www.eudat.eu
  • 2. Replicate Research Data Safely eudat.eu/b2safe www.eudat.eu B2SAFE B2SAFE is a robust, safe and highly available service which allows community and departmental repositories to implement data management policies on research data across multiple administrative domains in a trustworthy manner.
  • 3. eudat.eu/b2safe replicate research data into secure data stores archive and preserve research data in the long-term bring data close to powerful compute resources co-locate data with different communities benefit from economies of scale The ideal solution for communities with no facility for archival to: Features: large-scale storage robust and highly available permanent PIDs
  • 4. eudat.eu/b2safe Where is B2SAFE in the EUDAT suite? B2SAFE Replicate Research Data Safely
  • 5. eudat.eu/b2safe Better safe than sorry…. to guard against data loss in long-term archiving and preservation, to optimize access for users from different regions, and to bring data closer to powerful computers for compute-intensive analysis. In today’s rich data-storage ecosystems, large data centres must offer a robust, safe and highly available replication service to allow community and departmental repositories to replicate their research data: “I want to replicate my collection X to two data centres and store the collection safely for 10 years”.
  • 6. eudat.eu/b2safe B2SAFE Features (1/2) Based on the execution of auditable data policy rules and the use of persistent identifiers (PIDs). Respects the rights of the data owners to define the access rights for their data and to decide how and when they are made publicly referenceable. Employs Data Policy Manager to allow centrally managed, community-defined data policies. B2SAFE Training
  • 7. eudat.eu/b2safe B2SAFE Features (2/2) Uses site rule-engines to implement and enforce policy rules. Aggregates data from different disciplines into a storage system of trustworthy and capable data service providers. Supports repository packages (e.g. DSPACE, FEDORA) and a lightweight HTTP-based solution. B2SAFE Training
  • 8. eudat.eu/b2safe Who can benefit? Small and medium-sized repositories lacking the capacity to store data over longer periods of time without long-term funding for the preservation of their data without adequate compute capacity for data-intensive computational services Data producers and data consumers who need to be sure that trusted centres are taking care of their data who want to access added- value services on data sources of interest to them who wish to perform interdisciplinary research on top of data from the heterogeneous EUDAT communities
  • 9. eudat.eu/b2safe What makes B2SAFE unique Data are stored in the EUDAT Collaborative Data Infrastructure (CDI) with known policies. Therefore, data are stored in transparent infrastructures across Europe. Communities can benefit from the professionally managed EUDAT infrastructure and concentrate their effort and budget on their core research. EUDAT is building a suite of additional services relevant for the “engine under the hood” of e-science infrastructures (e.g. EPOS, EMSO, CLARIN). Data are stored next to HTC & HPC servers ideal for compute - intensive data processing.
  • 10. eudat.eu/b2safe How can you use B2SAFE? Any community and departmental data repositories can use B2SAFE. EUDAT experts can help setup the followed requered technologies Persistent Identifiers (PIDs). Metadata describing the properties and context of the data being replicated. iRODS (recommended) or similar data management technology for federation. To help these groups use the B2SAFE service, EUDAT offers documentation, training material and a service helpdesk. For more information please email: eudat-safereplication@postit.csc.fi
  • 11. eudat.eu/b2safe Safe Replication with B2SAFE EUDAT CDI Domain of registered data PIDPID Data Centre Store Data Centre Store Data Centre Store EPIC service
  • 12. eudat.eu/b2safe What happens? Data from the Community repository is replicated in other data centres….. …distributed across Europe.
  • 13. eudat.eu/b2safe What happens step by step? iRods PID Data Center Store 1 Community repository Digital Object (DO) unique identifier (PID) to the DO PID Data ingestion Data replication own PID system OR iRODS rules iRods CommunityCentre iRods PID Data Center Store 2 Based on community policy PID assignment
  • 14. eudat.eu/b2safe ROR: Repository of Records, the repository where data was stored first. PPID: Parent PID, the persistent identifier associated to the source object in a replication chain. If the chain has only two elements, the master copy and the first replica, then the PPID = ROR. Original DO and replicas
  • 15. eudat.eu/b2safe EUDAT partners are already using B2SAFE
  • 16. eudat.eu/b2safe Community centre EUDAT centre CLARIN ENES VPH Lifewatch Replicate my collection X to three data centres CINECA BSC EPCC EPOS
  • 17. eudat.eu/b2safe EPOS EUDAT and EPOS community set up a collaboration to provide safe back-up and service redundancy to the Italian seismologist community. The set up of the automated data transfer between EPOS community and EUDAT is: EPOS joined the EUDAT CDI EUDAT defined a specific policy with EPOS The iRODS irsync protocol was chosen to achieve the best performance. In order to achieve an hourly synchronization, checksum sync and file-age limit options are used.
  • 18. eudat.eu/b2safe How to replicate the INGV data to B2SAFE - The process Each digital object ingested by CINECA has been registere assigning to it a Persistent Identifier (PID) iRODS irsync tool, running multiple irsync processes The data archive, so far, amount to 28,6 TB 7500000 files PID Registry EUDAT CDI – CINECA node The PIDs are registered into the PID registry, which is hosted at SURFsara and based on the EPIC service
  • 19. eudat.eu/b2safe Experimental features The current B2SAFE implementation is able to support only a simple messaging model: the synchronous one. Messaging is an experimental feature that provides the results in case of asynchronous (server side triggered) replication process. The messages are posted to a queue which can be accessed via an HTTP interface. The users who ingest data into B2SAFE via GridFTP are not able to retrieve the pid of the object. Metadata management is an experimental feature, that supports this functionality. When enabled it provides a set of metadata properties for each data object, storing them into a file (json), placed in (nearly) the same path of the related data object.
  • 20. eudat.eu/b2safe B2SAFE Summary B2SAFE offers: functionality to replicate datasets across different data centres in a safe and efficient way long-term solution for archiving and preserving research data an entry point to bring data closer to powerful computers for compute-intensive analysis
  • 21. eudat.eu/b2safe Future features Easy setup. B2SAFE provides a script to build rpm and deb packages. Plan to provide downloadable, easy to install packages (i.e. click-install-run). New extensions - connectors. For now, it is possible to ingest data into B2SAFE stored on a file system or in the DSPACE repository . New connectors for FEDORA and ePRINTS are planned to be implemented. Improve the service with “dynamic data” (streaming data) capabilities. Further integration with B2ACCESS. Support authorization on basis of community access rules.
  • 22. eudat.eu/b2safe Hands-on material Material on B2SAFE hands-on (part 6) Based on iRODS Hands-on tutorial which shows how to: Manage data across iRODS zones by policies Employ PIDs to track data in a distributed storage environment https://github.com/EUDAT- Training/B2SAFE-B2STAGE-Training Training module which provides hands-on material for: EUDAT B2SAFE iRODS4 B2HANDLE and the EUDAT B2STAGE service.
  • 23. eudat.eu/b2safe Thanks For more info: https://www.eudat.eu/services/b2safe
  • 24. www.eudat.eu Authors Contributors This work is licensed under the Creative Commons CC-BY 4.0 licence EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 Themis Zamani, GRNET Claudio Cacciari, Cineca Thank you

Notas do Editor

  1. This presentation will give an introduction to one of the core EUDAT Services, B2SAFE by explaining apart of its necessity, how you may actually replicate your Data. There are some main topics that we will discuss: B2SAFE: What, Who, How .. B2SAFE: Safe Replication in EUDAT Step by Step B2SAFE: B2SAFE Example: The EPOS Community Β2SAFE: What’s next?
  2. B2SAFE: B2SAFE is a robust, safe and highly available service which allows community and departmental repositories to implement data management policies on research data across multiple administrative domains in a trustworthy manner.
  3. So let’s begin by looking at the EUDAT service suite and how it helps communities with data management. Β2SAFE in the EUDAT service suite is ..
  4. Where is B2SAFE in the EUDAT suite?: B2SAFE = Replicate Research Data Safely . B2SAFE is the cornerstone of EUDAT's service for safe replication. B2SAFE and other B2 Services: B2SHARE: B2SHARE is a user-friendly, reliable and trustworthy way for researchers, scientific communities and citizen scientists to store and share small-scale research data from diverse contexts. The service is professionally managed and data is safely preserved via B2SAFE replication service. B2STAGE: In conjunction with B2SAFE, replicate community data sets, ingesting them onto EUDAT storage resources for long-term preservation. B2STAGE is a reliable, efficient, light-weight and easy-to-use service to transfer research data sets between EUDAT storage resources and high-performance computing (HPC) workspaces. B2FIND: B2FIND is a discovery service based on metadata steadily harvested from research data collections from EUDAT data centres and other repositories. The service offers faceted browsing and it allows in particular to discover data that is stored through the B2SAFE and B2SHARE services.
  5. Better safe than sorry….: “I want to replicate my collection X to two data centres and store the collection safely for 10 years”. There’s been a data explosion. The number of data output is increasing. Scientists, projects, communities use and produce data sets for research purposes. These data are managed and stored at repositories. So the biggest challenge is being able to guard them and keep them safe. B2SAFE is a solution to this problem. In today’s rich data-storage ecosystems, large data centres must offer a robust, safe and highly available replication service….
  6. The Data Policy Manager aims to provide a policy management via a central interface. Data policies are centrally managed via a Data Policy Manager, and the policy rules are implemented and enforced by site-local rule engines (irods rules). This allows data managers to manage their policies on their data (ex. Set replication policies to community data) even for B2SAFE.   Based on the execution of auditable data policy rules and the use of persistent identifiers (PIDs).: Respects the rights of the data owners to define the access rights for their data and to decide how and when they are made publicly referenceable. Employs Data Policy Manager to allow centrally managed, community-defined data policies.:
  7. Uses site rule-engines to implement and enforce policy rules. Aggregates data from different disciplines into a storage system of trustworthy and capable data service providers. Supports repository packages (e.g. DSPACE, FEDORA) and a lightweight HTTP-based solution.
  8. Repositories. Data producers and consumers
  9. SLIDE DATA
  10. How can you use B2SAFE?: You want to use B2SAFE? There are some prerequisites, the followed required technologies:   Persistent Identifiers (PIDs). Metadata describing the properties and context of the data being replicated. iRODS (recommended) or similar data management technology for federation.   If you have questions or you need help installing or even using these technologies, EUDAT offers documentation, training material and a service helpdesk. Any community and departmental data repositories can use B2SAFE. EUDAT experts can help setup the B2SAFE module and its prerequisites.
  11. Lets take a look at a simple example. We are within the EUDAT infrastructure and a Community Center wants to replicate a DO at 3 EUDAT Data Centers.
  12. This is actually so simple that Data from the Community repository is replicated in other data centres…..across Europe. BUT lets see the process step by step.
  13. The B2SAFE module is a set of iRODS rules which can be put together in workflows enabling data replication and PID management.   Prerequisites for a Community center Persistent Identifiers (PIDs): The community center is responsible for assigning a unique identifier (PID) to the Digital Object. iRODS (recommended) or similar data management technology for federation B2SAFE module enabled Steps Community center assigns a PID to DO Based on Community Policy and with iRods Rules the replication starts We want to replicate the DO to EUDAT Data Center 1. A predefined B2SAFE rule is called which sends a PID creation request to the PID service in use. The replication process is triggered by invoking the B2SAFE replication rule at the client-side. The B2SAFE module ensure that the replica from EUDAT DC1 is assigned a unique PID (handle) . The EUDAT DC1 replica is ready. We want to replicate the DO to EUDAT Data Center 2. We follow the same process. The B2SAFE module ensure that the replica from EUDAT DC1 is assigned a unique PID (handle) . The EUDAT DC2 replica is ready.   But let’s discuss about the actual DO and its replica.
  14. Main Acronyms: ROR: Repository of Records, the repository where data was stored first.( controls replication process) PID: Persistent identifier associated to a digital object or to a whole collection. PPID: Parent PID, the persistent identifier associated to the source object in a replication chain. If the chain has only two elements, the master copy and the first replica, then the PPID = ROR. Procedure (First Picture with CC and DC1 and DC2): The Community Center owns a DO that wants to replicate across different data centres. (EUDAT Data Center 1 , EUDAT Data Center 2 as shown in the picture) Before performing the 'safe replication' procedure the Community Center will be responsible for assigning a unique identifier (PID) to the Digital Object. This PID of the original digital object will be used as value for the ROR (Repository Of Record) for the handle record of the first replica and as a parent PID for the handle records of all other replica. So the community center assigns the PID1x to the DO. Now we want to replicate the DO to EUDAT Data Center 1. The B2SAFE module ensure that the replica from EUDAT DC1 is assigned a unique PID (handle) . The PID is PID1y and the handle record contains RoR: a reference to the original source of the replica (typically the community centre) RoR = PID1x Now we want to replicate the DO to EUDAT Data Center 2. The B2SAFE module ensure that the replica from EUDAT DC2 is assigned a unique PID (handle) . The PID is PID1z the handle record contains RoR: a reference to the original source of the replica (typically the community centre). RoR = PID1x PPID: and a reference to the replica created by B2SAFE from this Community Center to another Data Center (in our example EUDAT Data Center 1). PPID = PID1y.   This results in a tree structure of PID records identifying all replicas and the "flow" of replication. PID handle records: For each replica, links are stored in the instance at a data centre via one or more protocols, e.g. a URL to access the object via the iRODS protocol and via HTTP. In addition to the link, the B2SAFE service stores a checksum for that specific replica. This information is intended to be used to perform integrity checks. abc/xyz: The PID of the DO a) loc: various links, locations .(one location is the PID of the EUDAT DATA CENTER 1 ) b) checksum for integrity def/uvw: The PID of the Data Center 1 a) loc: various links, locations (one location is the PID of the EUDAT Data Center 2 ). b) checksum for integrity, c) RoR: reference to the original source of the replica ghi/rst: The PID of the Data Center 1 a) loc: various links, locations . b) checksum for integrity, c) RoR: reference to the original source of the replica, d) PPID: reference to the replica created by B2SAFE from this Community Center to another Data Center (in our example EUDAT Data Center 1.
  15. EUDAT partners are already using B2SAFE: A number of communities are already using the EUDAT B2 Suite.
  16. Replicate my collection X to three data centres: Clarin, EPOS, LifeWatch are using B2SAFE and are replicating data on behalf of their community to EUDAT Data Centers. Let’s have a look at the EPOS Community.
  17. EPOS: Community example: European Plate Observing System (EPOS) is the integrated solid-Earth sciences research infrastructure. The EPOS community currently utilise (or are planning to use) services like B2SAFE, B2DROP, B2STAGE, B2ACCESS, and B2HANDLE. About B2SAFE. This service is currently being used to facilitate long-term preservation of seismological datasets that are enriched with persistent identifiers (PIDs) and replicated onto external data facilities. In order to achieve that the basic setup is:   EPOS joined the EUDAT CDI EUDAT defined a specific policy with EPOS (of when, how and what to do with data) The iRODS irsync protocol was chosen to achieve the best performance. But lets discuss about the actual process.
  18. All this stations transmits data via satellite links, internet, wire lines, radio modems, optical fibers.   Facts The chosen door for the EUDAT CDI is the CINECA node, placed in Bologna, Italy INGV receives around 15 GB/day - 5 TB/year The connection between INGV and CINECA is via internet and has a theoretical bandwidth of 100 Mbits INGV receives the data at its data center in Rome Process iRODS irsync tool is used, running multiple irsync processes to aggregate bandwidth, since each process reaches an average of 16 Mbits Each digital object ingested by CINECA has been registered, assigning to it a Persistent Identifier (PID). The PIDs are registered into the PID registry, which is hosted at SURFsara and based on the EPIC service. CINECA node acts as the main archive for the data (master archive) Numbers The replication of the whole archive has taken 3 months INGV data archive, so far, amount to 28,6 TB 7500000 files During the periodic synchronization process, about 3000 PIDs/day are create created So far 7100000 PID are registered in the PID registry
  19. Messaging: The current B2SAFE implementation is able to support only a simple messaging model: the synchronous one. Apart from the synchronous messaging model, there are scenarios where other models are required, like for example the asynchronous one via message queue or the publish/subscription. Messaging is an experimental feature that provides the results in case of asynchronous (server side triggered) replication process. The messaging feature supports a message queue for the asynchronous model. The users who ingest data into B2SAFE via GridFTP are not able to get back the PID of the object unless it is written in a file object, that is the rationale of the metadata feature. The messages are posted to a queue which can be accessed via an HTTP interface. Metadata Management: The current B2SAFE implementation doesn’t support the retrieval of the PID of an object. For example, the users who ingest data into B2SAFE via GridFTP are not able to retrieve the PID of the object. The PID is minted but the user cannot retrieve it. Metadata management is an experimental feature, that supports this functionality.  When enabled it provides a set of metadata properties for each data object, storing them into a file (json), placed in (nearly) the same path of the related data object.
  20. Let’s close by looking briefly at actually B2SAFE is. B2SAFE Summary.
  21. And for the future some of our main concerns are.   Easy setup: B2SAFE provides a script to build rpm and deb packages. The idea is to provide a click-install-run easy to install package. New extensions – connectors: Communities use different types of repositories to store their data. The most common repositories are DSPACE, Fedora, ePrints. At the moment, a DSPACE connector to ingest data to B2SAFE is implemented. New connectors for FEDORA and ePRINTS are planned to be implemented. Improve the service with “dynamic data” (streaming data) capabilities.: Dynamic data may produce data streams that may be temporarily incomplete (and that may consequently fill up over time - automatically or after manual intervention). Dynamic data has been a challenging subject for B2SAFE. It is difficult to keep consistency between data objects, which are eligible to change and are replicated in a distributed environment. B2ASFE wants to support dynamic data within communities who have to deal with. Further integration with B2ACCESS. B2ACCESS is an easy-to-use and secure Authentication and Authorization platform developed by EUDAT. It is a federated cross-infrastructure authentication and authorization framework for user identification and community-defined access control enforcement. It will be available for B2SAFE. Support authorization on basis of community access rules. e.g. based on user memberships in EUDAT groups/communities
  22. This training module provides hands-on material for iRODS4, EUDAT B2SAFE, B2HANDLE (based on handle version 8) and B2STAGE. It provides install files which indicate how the training machines are set up and which will give the users an idea how to install the software stack themselves. The training material itself is targeted at scientist end-users and site admins. The order of the markdown files proposes the curriculum of the training. Each component takes about 1 hour. The main component about the B2SAFE replication is Chapter 6. B2SAFE hands on EUDAT B2SAFE hands-on This hands-on tutorial illustrates how B2SAFE rules can be employed to manage data across iRODS zones by policies. The tutorial makes use of the icommands. If you are not familiar with iRODS, please first follow the tutorial on using iRODS. The tutorial will guide you through Registering Data and Replicating Data with the B2SAFE module from the B2SAFE administrator perspective. As B2SAFE admin you will copy data which a user ingested into the iRODS instance, to another location in iRODS. You will register the data and by that build the so-called repository of records and replicate the collection to another iRODS server using the B2SAFE rules. Throughout the whole replication workflow data is tracked by persistent identifiers. The training explains and illustrates how this is achieved.
  23. Thank you all. For more information about B2SAFE you may visit https://www.eudat.eu/services/b2safe