SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
Use-cases for the ARCHIVER project
The European Bioinformatics Institute
Tony Wildish
wildish@ebi.ac.uk
What is EMBL-EBI?
• Europe’s home for biological data services, research and training
• A trusted data provider for the life sciences
• Part of the European Molecular Biology Laboratory, an
intergovernmental research organisation
• International: 650 members of staff from 66 nations
• Home of the ELIXIR Technical hub.
Our mission
Deliver
excellent
research
Train the
next
generation
of scientists
Engage with
industry
Coordinate
bioinformatics
in Europe
Deliver
scientific
services
The European Molecular Biology Laboratory
Heidelberg, Germany
Main Laboratory
Barcelona, Spain
Tissue Biology, Disease Modeling
80+ nationalities
Hinxton, Cambridge, UK
Bioinformatics
Mouse Biology
Rome, Italy
>1700 personnel
Grenoble, France
Hamburg, Germany
Structural Biology
6 sites in Europe
Structural Biology
Data resources at EMBL-EBI
Database interactions
• Our collaborative community
facilitates social, scientific and
technical interactions
• This image shows internal
interactions between data
resources, as determined by
the exchange of data.
• The width of each internal arc is
weighted according to the number
of different data types exchanged.
Increasing Data, Increasing Analysis
Storage growth at EBI
• ~40-50% per year
• i.e. doubling every two
years
• No reason to expect
that to slow down
EGA and ENA account for
the bulk of the data
• DNA sequences
See the live map at www.ebi.ac.uk/about/our-impact
Who uses EMBL-EBI services?
Where does our
data come from?
Data characteristics
DNA sequence data
○ The bulk of our data, files from few MB up to many tens of GB
○ ‘long-read’ sequencing technology, can expect file sizes to increase?
Lifetime
○ EBI has custodial responsibility, most of our data is stored ‘forever’
○ Data is immutable (but may be versioned)
Analyses
○ Assembly: stream/index whole file, then random access string matching
○ Query: byte-range lookup
Access
○ POSIX, FTP, HTTP, S3…
○ Data discovery by portal lookup, dedicated portals with cross-references
Privacy, security
Public
○ Available without authorisation or identification – anonymous FTP
Private, secure
○ Apply to a committee for access, individually encrypted copy provided if granted
Collaboration
○ Team of people with access, varying degrees (R/O, R/W), fluctuating membership
Embargo
○ Public after analysis/publication, or after time window expires
“EMBL on FIRE” - Background
The FIle REplication Project started in Systems and Networking team in 2008
○ Provide an efficient, reliable, scalable and replicated data storage (for disaster recovery)
○ Provide a cost-effective and vendor-independent solution
○ Different storage technologies on Replica A and Replica B to mitigate possible data loss
Projects using FIRE include:
○ 1000 Genomes (G1K)
○ European Nucleotide Archive (ENA)
○ European Genome-phenome Archive (EGA)
○ Human Induced Pluripotent Stem Cells Initiative (HIPSCI)
○ Functional Annotation of Animal Genomes (FAANG)
○ BioImaging Data Archive
2018
Stability with
1PB/month ingress
2019
Become S3 like cloud
with metadata
features
2020
Ingress 2PB/month
Egress 60PB/month
2021
Metadata explorer
Ingress 3PB/month
2022
Not yet defined
5 Years plan
“EMBL on FIRE” - Challenges
Cost-effective scaling:
○ Can cloud-based storage offer a cost-efficient approach?
○ How do ingest rates affect this model?
○ Current use is ~1PB download, 2 billion requests, per month
Cost-effective analysis:
○ As the data-volume grows, we expect users to switch to cloud-based analysis platforms.
How can we effectively distribute/present the data for analysis
○ Need a hybrid/multi-cloud model that blurs the boundaries between on-premises and
public cloud
○ Long tail of analysis, effectively no ‘cold data’ -> tiered storage not a panacea
Caching in the cloud
Why?
○ Increasing data volumes strain our in-house compute resources
○ Many of our data products have regular release cycles, e.g. quarterly
○ Downstream processing becoming a bottleneck, unable to keep up
○ Bandwidth for access to data
○ Some workflows require specialized hardware, e.g. >> 1 TB RAM
○ Prefer to move to the cloud as soon as is cost-effective
How?
○ Hybrid-cloud model, extend on-premises resources transparently into multiple clouds
Caching in the cloud
EMBL-EBI Data
Centre Space
JANET – UK Academic Network
Public Clouds
Clusters
NFS Object
Store
Research
Team
Cache
Public
Service
Service
Team
Users
Caching in the cloud
Which data do we cache?
○ Which data is most likely to be used in the future? When?
○ Half our data is less than 2 years old
○ Long tail of analysis, not use-once-and-forget
○ Need monitoring of access patterns and knowledge of file relationships
○ Some knowledge of a-priori requirements, but not complete
How much data to cache?
○ Trade-off long-term caching vs. cost of upload/download of data, available bandwidth
Cache lifetime?
○ Instrument workflows with caching hints?
○ Process-mining to determine which files are used in what manner for a given workflow?
○ How much can we automate this vs. requiring the user to tell us?
Caching in the cloud and FIRE?
Cache vs. archive:
○ Cache lifetime goes to infinity -> archive
Moving target
○ Need a process that can evolve over time, over many orders of magnitude
○ Tools & technologies may change, must be fluid
Testing plans
Functionality
○ Ingest + download with multiple clients, rate ~PB/month
○ Clients distributed around several clouds, several locations
○ Byte-range download for subsets of large files
Performance
○ Sustained functionality over long periods – days, not minutes
Security
○ Test RBAC functionality, reliability, usability, latency (e.g. if eventually consistent)
Accounting, billing
○ Ability to get near-realtime ‘cost’ reports, predictions, alerts, breakdowns…
Summary
o Data growing fast, ~doubling every two years
o Don’t expect this to slow down anytime soon
o Cloud-migration for user community just beginning
o Actively pushing to accelerate this
o Need a hybrid/multi-cloud storage solution
o Flexible, performant, cost-effective

Mais conteúdo relacionado

Mais procurados

The Heterogenous Zone: Six use cases for six research data collections in Edi...
The Heterogenous Zone: Six use cases for six research data collections in Edi...The Heterogenous Zone: Six use cases for six research data collections in Edi...
The Heterogenous Zone: Six use cases for six research data collections in Edi...
EDINA, University of Edinburgh
 
Building new knowledge from distributed scientific corpus: HERBADROP & EUROPE...
Building new knowledge from distributed scientific corpus: HERBADROP & EUROPE...Building new knowledge from distributed scientific corpus: HERBADROP & EUROPE...
Building new knowledge from distributed scientific corpus: HERBADROP & EUROPE...
Nuno Freire
 

Mais procurados (20)

COBWEB technology platform and future development needs
COBWEB technology platform and future development needsCOBWEB technology platform and future development needs
COBWEB technology platform and future development needs
 
Who is doing what, and how do we know? [PEPRS]
Who is doing what, and how do we know? [PEPRS]Who is doing what, and how do we know? [PEPRS]
Who is doing what, and how do we know? [PEPRS]
 
UK RepositoryNet+ Mimas Workshop
UK RepositoryNet+ Mimas WorkshopUK RepositoryNet+ Mimas Workshop
UK RepositoryNet+ Mimas Workshop
 
Prototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and CeremonyPrototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and Ceremony
 
The Heterogenous Zone: Six use cases for six research data collections in Edi...
The Heterogenous Zone: Six use cases for six research data collections in Edi...The Heterogenous Zone: Six use cases for six research data collections in Edi...
The Heterogenous Zone: Six use cases for six research data collections in Edi...
 
Building new knowledge from distributed scientific corpus: HERBADROP & EUROPE...
Building new knowledge from distributed scientific corpus: HERBADROP & EUROPE...Building new knowledge from distributed scientific corpus: HERBADROP & EUROPE...
Building new knowledge from distributed scientific corpus: HERBADROP & EUROPE...
 
EPOS metadata catalogue
EPOS metadata catalogueEPOS metadata catalogue
EPOS metadata catalogue
 
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
 
3 archiver omc deployment_scenarios
3 archiver omc deployment_scenarios3 archiver omc deployment_scenarios
3 archiver omc deployment_scenarios
 
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
 
Data exchange alternatives, GIGA TAG (2009)
Data exchange alternatives, GIGA TAG (2009)Data exchange alternatives, GIGA TAG (2009)
Data exchange alternatives, GIGA TAG (2009)
 
Databases
DatabasesDatabases
Databases
 
Implementation of the RIOXX Metadata Guidelines in the UK's repositories thro...
Implementation of the RIOXX Metadata Guidelines in the UK's repositories thro...Implementation of the RIOXX Metadata Guidelines in the UK's repositories thro...
Implementation of the RIOXX Metadata Guidelines in the UK's repositories thro...
 
TDWG VoMaG Vocabulary management workflow, 2013-10-31
TDWG VoMaG Vocabulary management workflow, 2013-10-31TDWG VoMaG Vocabulary management workflow, 2013-10-31
TDWG VoMaG Vocabulary management workflow, 2013-10-31
 
EMBL-ABR_ AGRF2016
EMBL-ABR_ AGRF2016EMBL-ABR_ AGRF2016
EMBL-ABR_ AGRF2016
 
Geoservices Activities at EDINA
Geoservices Activities at EDINAGeoservices Activities at EDINA
Geoservices Activities at EDINA
 
Report on EDINA Authentication Related Academic Sector Activities
Report on EDINA Authentication Related Academic Sector ActivitiesReport on EDINA Authentication Related Academic Sector Activities
Report on EDINA Authentication Related Academic Sector Activities
 
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
 
How can repositories support the text-mining of their content and why?
How can repositories support the text-mining of their content and why? How can repositories support the text-mining of their content and why?
How can repositories support the text-mining of their content and why?
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
 

Semelhante a Embl ebi use-cases_-_t.wildish

Semelhante a Embl ebi use-cases_-_t.wildish (20)

Globus in European Life Science
Globus in European Life ScienceGlobus in European Life Science
Globus in European Life Science
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
Providing Bioinformatics Services on Cloud
Providing Bioinformatics Services on CloudProviding Bioinformatics Services on Cloud
Providing Bioinformatics Services on Cloud
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
 
The Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservationThe Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservation
 
ELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciences
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
EGI Engage: Impact & Results
EGI Engage: Impact & ResultsEGI Engage: Impact & Results
EGI Engage: Impact & Results
 
E Infrastructure for OA
E Infrastructure for OAE Infrastructure for OA
E Infrastructure for OA
 
SCAPE - Building Digital Preservation Infrastructure
SCAPE - Building Digital Preservation InfrastructureSCAPE - Building Digital Preservation Infrastructure
SCAPE - Building Digital Preservation Infrastructure
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...
 
Secure data management, analysis, infrastructure and policy in an internation...
Secure data management, analysis, infrastructure and policy in an internation...Secure data management, analysis, infrastructure and policy in an internation...
Secure data management, analysis, infrastructure and policy in an internation...
 
IDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on CloudIDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on Cloud
 
E-ARK: Open Data Mining for Government Archives
E-ARK: Open Data Mining for Government ArchivesE-ARK: Open Data Mining for Government Archives
E-ARK: Open Data Mining for Government Archives
 
Bioinformatics and sequencing tools used in research and development - OECD B...
Bioinformatics and sequencing tools used in research and development - OECD B...Bioinformatics and sequencing tools used in research and development - OECD B...
Bioinformatics and sequencing tools used in research and development - OECD B...
 

Mais de Archiver

Mais de Archiver (20)

Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 
Wrapping Up and Next Steps¶
Wrapping Up and Next Steps¶Wrapping Up and Next Steps¶
Wrapping Up and Next Steps¶
 
Overview of the EOSC¶
Overview of the EOSC¶Overview of the EOSC¶
Overview of the EOSC¶
 
ARCHIVER Tender Requirements
ARCHIVER Tender RequirementsARCHIVER Tender Requirements
ARCHIVER Tender Requirements
 
Project update - João Fernandes
Project update - João FernandesProject update - João Fernandes
Project update - João Fernandes
 
Wrapping up and_next_steps_stansted
Wrapping up and_next_steps_stanstedWrapping up and_next_steps_stansted
Wrapping up and_next_steps_stansted
 
20190523 archiver fim
20190523 archiver fim20190523 archiver fim
20190523 archiver fim
 
Geant cloud peering-v2
Geant cloud peering-v2Geant cloud peering-v2
Geant cloud peering-v2
 
Archiver omc stansted_tendering_procedure_and_requirements_final
Archiver omc stansted_tendering_procedure_and_requirements_finalArchiver omc stansted_tendering_procedure_and_requirements_final
Archiver omc stansted_tendering_procedure_and_requirements_final
 
Archiver 3rd omc_project_overview
Archiver 3rd omc_project_overviewArchiver 3rd omc_project_overview
Archiver 3rd omc_project_overview
 
Wrapping up_and_next_steps
Wrapping up_and_next_stepsWrapping up_and_next_steps
Wrapping up_and_next_steps
 
Introduction to_planning_poker_addestino
Introduction to_planning_poker_addestinoIntroduction to_planning_poker_addestino
Introduction to_planning_poker_addestino
 
Archiver 2nd_OMC event_Barcelona_Project Overview
Archiver 2nd_OMC event_Barcelona_Project OverviewArchiver 2nd_OMC event_Barcelona_Project Overview
Archiver 2nd_OMC event_Barcelona_Project Overview
 
Archiver OMC event_Barcelona_ Welcome to_accio
Archiver OMC event_Barcelona_ Welcome to_accio Archiver OMC event_Barcelona_ Welcome to_accio
Archiver OMC event_Barcelona_ Welcome to_accio
 
6 presentation wrapping up and next steps v2
6 presentation wrapping up and next steps v26 presentation wrapping up and next steps v2
6 presentation wrapping up and next steps v2
 
5 introduction to geant
5 introduction to geant5 introduction to geant
5 introduction to geant
 
4 archiver omc session 1
4 archiver omc session 1 4 archiver omc session 1
4 archiver omc session 1
 
2 procurement and legal aspects
2 procurement and legal aspects 2 procurement and legal aspects
2 procurement and legal aspects
 
1 archiver omc project_overview
1 archiver omc project_overview1 archiver omc project_overview
1 archiver omc project_overview
 

Último

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 

Último (20)

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 

Embl ebi use-cases_-_t.wildish

  • 1. Use-cases for the ARCHIVER project The European Bioinformatics Institute Tony Wildish wildish@ebi.ac.uk
  • 2. What is EMBL-EBI? • Europe’s home for biological data services, research and training • A trusted data provider for the life sciences • Part of the European Molecular Biology Laboratory, an intergovernmental research organisation • International: 650 members of staff from 66 nations • Home of the ELIXIR Technical hub.
  • 3. Our mission Deliver excellent research Train the next generation of scientists Engage with industry Coordinate bioinformatics in Europe Deliver scientific services
  • 4. The European Molecular Biology Laboratory Heidelberg, Germany Main Laboratory Barcelona, Spain Tissue Biology, Disease Modeling 80+ nationalities Hinxton, Cambridge, UK Bioinformatics Mouse Biology Rome, Italy >1700 personnel Grenoble, France Hamburg, Germany Structural Biology 6 sites in Europe Structural Biology
  • 5. Data resources at EMBL-EBI
  • 6. Database interactions • Our collaborative community facilitates social, scientific and technical interactions • This image shows internal interactions between data resources, as determined by the exchange of data. • The width of each internal arc is weighted according to the number of different data types exchanged.
  • 7. Increasing Data, Increasing Analysis Storage growth at EBI • ~40-50% per year • i.e. doubling every two years • No reason to expect that to slow down EGA and ENA account for the bulk of the data • DNA sequences
  • 8. See the live map at www.ebi.ac.uk/about/our-impact Who uses EMBL-EBI services?
  • 9. Where does our data come from?
  • 10. Data characteristics DNA sequence data ○ The bulk of our data, files from few MB up to many tens of GB ○ ‘long-read’ sequencing technology, can expect file sizes to increase? Lifetime ○ EBI has custodial responsibility, most of our data is stored ‘forever’ ○ Data is immutable (but may be versioned) Analyses ○ Assembly: stream/index whole file, then random access string matching ○ Query: byte-range lookup Access ○ POSIX, FTP, HTTP, S3… ○ Data discovery by portal lookup, dedicated portals with cross-references
  • 11. Privacy, security Public ○ Available without authorisation or identification – anonymous FTP Private, secure ○ Apply to a committee for access, individually encrypted copy provided if granted Collaboration ○ Team of people with access, varying degrees (R/O, R/W), fluctuating membership Embargo ○ Public after analysis/publication, or after time window expires
  • 12. “EMBL on FIRE” - Background The FIle REplication Project started in Systems and Networking team in 2008 ○ Provide an efficient, reliable, scalable and replicated data storage (for disaster recovery) ○ Provide a cost-effective and vendor-independent solution ○ Different storage technologies on Replica A and Replica B to mitigate possible data loss Projects using FIRE include: ○ 1000 Genomes (G1K) ○ European Nucleotide Archive (ENA) ○ European Genome-phenome Archive (EGA) ○ Human Induced Pluripotent Stem Cells Initiative (HIPSCI) ○ Functional Annotation of Animal Genomes (FAANG) ○ BioImaging Data Archive
  • 13.
  • 14.
  • 15. 2018 Stability with 1PB/month ingress 2019 Become S3 like cloud with metadata features 2020 Ingress 2PB/month Egress 60PB/month 2021 Metadata explorer Ingress 3PB/month 2022 Not yet defined 5 Years plan
  • 16. “EMBL on FIRE” - Challenges Cost-effective scaling: ○ Can cloud-based storage offer a cost-efficient approach? ○ How do ingest rates affect this model? ○ Current use is ~1PB download, 2 billion requests, per month Cost-effective analysis: ○ As the data-volume grows, we expect users to switch to cloud-based analysis platforms. How can we effectively distribute/present the data for analysis ○ Need a hybrid/multi-cloud model that blurs the boundaries between on-premises and public cloud ○ Long tail of analysis, effectively no ‘cold data’ -> tiered storage not a panacea
  • 17. Caching in the cloud Why? ○ Increasing data volumes strain our in-house compute resources ○ Many of our data products have regular release cycles, e.g. quarterly ○ Downstream processing becoming a bottleneck, unable to keep up ○ Bandwidth for access to data ○ Some workflows require specialized hardware, e.g. >> 1 TB RAM ○ Prefer to move to the cloud as soon as is cost-effective How? ○ Hybrid-cloud model, extend on-premises resources transparently into multiple clouds
  • 18. Caching in the cloud EMBL-EBI Data Centre Space JANET – UK Academic Network Public Clouds Clusters NFS Object Store Research Team Cache Public Service Service Team Users
  • 19. Caching in the cloud Which data do we cache? ○ Which data is most likely to be used in the future? When? ○ Half our data is less than 2 years old ○ Long tail of analysis, not use-once-and-forget ○ Need monitoring of access patterns and knowledge of file relationships ○ Some knowledge of a-priori requirements, but not complete How much data to cache? ○ Trade-off long-term caching vs. cost of upload/download of data, available bandwidth Cache lifetime? ○ Instrument workflows with caching hints? ○ Process-mining to determine which files are used in what manner for a given workflow? ○ How much can we automate this vs. requiring the user to tell us?
  • 20. Caching in the cloud and FIRE? Cache vs. archive: ○ Cache lifetime goes to infinity -> archive Moving target ○ Need a process that can evolve over time, over many orders of magnitude ○ Tools & technologies may change, must be fluid
  • 21. Testing plans Functionality ○ Ingest + download with multiple clients, rate ~PB/month ○ Clients distributed around several clouds, several locations ○ Byte-range download for subsets of large files Performance ○ Sustained functionality over long periods – days, not minutes Security ○ Test RBAC functionality, reliability, usability, latency (e.g. if eventually consistent) Accounting, billing ○ Ability to get near-realtime ‘cost’ reports, predictions, alerts, breakdowns…
  • 22. Summary o Data growing fast, ~doubling every two years o Don’t expect this to slow down anytime soon o Cloud-migration for user community just beginning o Actively pushing to accelerate this o Need a hybrid/multi-cloud storage solution o Flexible, performant, cost-effective