SlideShare uma empresa Scribd logo
1 de 30
e-Biothon
V. Breton (breton@clermont.in2p3.fr)
LPC Clermont-Ferrand, IdGC
CNRS-IN2P3
http://france-grilles.fr
Credit: N. Bard, A. Franc, JF Gibrat
Extreme Performance Computational Science workshop
Tokyo, April 15th 2014
Table of content
2
• What are the computing challenges of life
sciences?
• France Grilles: a
multidisciplinarydistributede-
infrastructure for science
• E-Biothon: an HPC platform for research in
life sciences
Generalities on sequencing
• Genome = DNA sequence (4 nucleotids:
A, C, G, T)
– Smallest non viral genome:
Carsonellaruddii (0,16Mbp)
– Largestgenome: Polychaosdubium(670Gbp)
Sanger technology 500 bpsequences
454 technology 105reads of 450 to 600bp seq.
Illumina Technology 106 reads of 100 bpseq.
Currentprojects(Tara) 107reads of 100 to 400 bpseq.
Explosion of data set size
Data analysis ?
Algorithms?
Heuristics?
Tara @ http://oceans.taraexpeditions.org/
Evolution of sequencing
techniques
Data production isdistributed
2558 High Throughput « NextGeneration » sequencingfacilities in the world,
located in 920 centers (only 10 with more than 15 machines)
Source: omicspmaps.com
Data production
growsfasterthanMoore’slaw
Sequencing scenarii
• Interest for a new genome requires assembly
– process of taking a large number of short DNA sequences and
putting them back together to create a representation of the
original
– Algorithms based on read overlapping benefit from large RAM (1
TO) -> HPC
• Working with a reference genome requires comparative
analysis
– Alignment algorithms (BLAST) findregions of local
similaritybetweensequences
– Phylogeny algorithms (PhyML) build evolutionary relationships
between genomes
– Comparative analyses are easily parallelized at data level -> HTC
Summary
• Life Sciences have specificcomputational challenges
– Data production growsfasterthan Moore law
– Permanent need of comparing new data to existingones
• Life sciences needscanberelevantlyaddressed on
multidisciplinary IT infrastructures (e-infrastructures)
– HPC resources best fitted for genomeassembly
– Grid/cloud HTC resourceswellfitted for comparative analysis
• Life sciences are among the main users of the French
national grid/cloud production infrastructure
France Grilles
• Is a ScientificInterest Group…
– Created in 2010 by 8 partners: CEA, CNRS,CPU, INRA, INRIA,
INSERM, MESR, RENATER…
– To steer up and coordinate the national strategy in the fields of
grids and clouds
• Vision:
– Build and operate a national distributedcomputing
infrastructure open to all sciences and to developing countries
9
France Grilles model
• France Grilles does not own the resources
– Resourcesowned by user communities
• France Grilles provides a framework
– To shareresources, expertise and know how
– To promote innovation and initiatives
– To foster collaboration at national and international
levels
– To reach out to the long tail of users
10
France Grilles resources
France-Grillesbackbone:
LCG-France
France-Grillesspine:
CC-IN2P3
EGI de 2010 à 2013
12
2010-2013: from 14 regional to 34 operations centres in 53 countries,
from 188,000 jobs/day with 80,000 cores on 250 Resource Centres
to 1,200,000 jobs/day with 430,000 cores on 337 Resource Centres
Technologies
• Grids
• Clouds
• Desktops
Exposé S. Newhouse Madrid, Sept. 2013
France Grilles, a partner of EGI
Provide a commonframework to all user communities
Provide an open environment for fruitfuldisciplinary and
multidisciplinaryresearch
14
5 1 1
218
54
9 1 5 9 11 15 13 11
755
99 50
9
23
1
10
100
1000
Over 1500 scientific publications
june 2010 – April 2014
Web portal
Users
479 registered users in Nov 2013 (175 in France)
Most used robot certificate in EGI (http://go.egi.eu/wiki.robot.users)
Neuro-image analysisCancer therapy simulation
Prostate radiotherapy plan simulated
with GATE(L. Grevillot and D. Sarrut)
Image simulation
Echocardiography simulated with
FIELD-II (O. Bernard et al)
Modeling and optimization of
distributed computing systems
Acceleration yielded by non-clairvoyant
task replication (R. Ferreira da Silva et al)
Brain tissue segmentation
with Freesurfer
Scientific applications
Infrastructure
Supported by EGI Infrastructure
Uses biomed VO (most used EGI VO for life sciences in 2013)
VIP accounts for ~25% of biomed's activity
VIP consumes ~50 CPU years every month
DIRAC
France-Grilles
Application as a service
File transfer to/from grid
Virtual Imaging Platform:
http://www.creatis.insa-lyon.fr/vip
Collaborations withdedicated life sciences infrastructures
• Institut Français de Bioinformatique (computing
and storageresourcesatIDRIS)
• France Genomique ( computing and
storageresourcesat TGCC)
• France Life Imaging (infrastructure for
biomedicalimaging)
• E-Biothon
16
17
• Telethon: everyyear, fundraising by
french media for French
MuscularDistrophy Association (AFM)
• FromTelethon to Decrypthon
– Computing infrastructure (IBM)
– Researchprojects (CNRS)
– Humanresources (AFM)
• FromDecrypthon to E-Biothon
E-Biothon: history
e-Biothon: an HPC platform for
research in life sciences
18
User Support
Blue Gene / p
machines
Technical supportUser Support
Blue Gene / P
operationWeb access
portal
E-Biothon: infrastructure
19
• 2 Blue Gene/P IBM racks
with 200 TO storage
– 2x1024 4-core nodes
– up to 28 TFlopspeak
performance
• SysFera-DS web access
to computingresources
• 2 modes:
– Standard (MPI)
– HTC (1024
independenttasks in
parallel)
E-Biothon vision is to offer a service to
the user communities in life sciences
• 2013-2014: first 3 projects
– Jean-François Gibrat et al, (MIGALE
platform, INRA Jouy-en-Josas)
– Olivier Gascuel, Stéphane Guindon et
Vincent Lefort (CNRS Montpellier)
– Yec’hanLaizet, Philippe
Chaumeil, Jean-Marc
Frigerio, Stéphanie Mariette, Sophie
Gerber, Alain Franc (INRA BioGeCo –
Bordeaux)
• > 2014: open call for projects (IFB)
Studying the synteny over a wide
range of microbialgenomes
21
• Definition: similar blocks of genes in the same relative positions in
the genome
• Interest: Study of syntenycan show how the genomeiscut and pasted
in the course of evolution
• MIGALE team at INRA designed a pipeline analysis to
computesyntenybetween 2 genomes and store it in a database
• E-Biothon impact: change in scale - capacity to
computesyntenybetween 2000 completebacterialgenomes (7
millions comparisons)
PhyML
Philogeneticsis the study of evolutionaryrelationshipsamong groups of
organisms
PhyMLis a software thatestimates maximum
likelihoodphylogeniesfromalignments of nucleotide or
aminoacidsequences
PhyML original publication in 2007 is the mostcited in environment and
ecology (> 6000 citations).
E-Biothon impact: change in scale in the resources made available
to PhyMLusers
Characterizing biodiversity
According to botanictheory,
biodiversityisorganized in
species, genders, families, orders:
isitconfirmed in the distance
betweensequences?
Study of biodiversity in Guyane
16000 differenttreespecies
in amazonianforest (≈ 300
in Europe)
More biodiversity in 10000
m2 of forest in French
Guyana than in Europe
Decrypthonadded value
Change in scale (from local Mesocenter in
Bordeaux)
Millions of reads
Exact distance computation
withoutheuristics (alignement scores)
TOctets of data producedeveryweek
Conclusion
• Both HPC and HTC resources are increasinglyneeded to
address life sciences data and computing challenges:
– As sequencing technologies keepevolving, data production
growsfasterthan Moore law and isincreasinglydistributed
– Biological data need to beconstantlycompared to
eachother (phylogenetics, genomics comparative analysis)
• France isdevelopingcomplementary HPC and HTC
infrastructures for life sciences
– Institut Français de Bioinformatique, France Génomique
– E-Biothon: an HPC platform for research in life sciences
– France Grilles: a multidisciplinarygrid/cloud production
infrastructure
2558 NextGenerationSequencers in the world
Are life sciences
specificw.r.tcomputing?
Whatisspecific to life sciences:
- As sequencing technologies keepevolving, data production growsfasterthan
Moore law
- Biological data need to beconstantlycompared to eachother (phylogenetics,
Genomics comparative analysis)
Whatis not specific?
- Data production isdistributed
- Multiscalemodeling

Mais conteúdo relacionado

Destaque

Ученый совет 22 мая 2014 - Представление к ученым званиям
Ученый совет 22 мая 2014 - Представление к ученым званиямУченый совет 22 мая 2014 - Представление к ученым званиям
Ученый совет 22 мая 2014 - Представление к ученым званиямuch_sovet_RGPU
 
Copy of bbm huruf sama bunyi & sama bentuk esok
Copy of bbm huruf sama bunyi & sama bentuk esokCopy of bbm huruf sama bunyi & sama bentuk esok
Copy of bbm huruf sama bunyi & sama bentuk esokainimat
 
Greenway Medical Technologies interview questions and answers
Greenway Medical Technologiesinterview questions and answersGreenway Medical Technologiesinterview questions and answers
Greenway Medical Technologies interview questions and answersnadsavan
 
научные работники 24 июня 2014
научные работники  24 июня 2014 научные работники  24 июня 2014
научные работники 24 июня 2014 uch_sovet_RGPU
 
Business Etiquette Toronto
Business Etiquette TorontoBusiness Etiquette Toronto
Business Etiquette TorontoAlex Waugh
 
Ideal Learning Environment
Ideal Learning EnvironmentIdeal Learning Environment
Ideal Learning EnvironmentPatrick O'Conner
 
Overview of power quality problems
Overview of power quality problemsOverview of power quality problems
Overview of power quality problemsMitesh Karmur
 
Granite City Tool History
Granite City Tool HistoryGranite City Tool History
Granite City Tool Historygranitecitytool
 
Social Commerce 2.0 With CPC Strategy & AddShoppers
Social Commerce 2.0 With CPC Strategy & AddShoppersSocial Commerce 2.0 With CPC Strategy & AddShoppers
Social Commerce 2.0 With CPC Strategy & AddShoppersTinuiti
 
Rubicon aal testbed erf workshop rovereto 2014
Rubicon aal testbed erf workshop rovereto 2014Rubicon aal testbed erf workshop rovereto 2014
Rubicon aal testbed erf workshop rovereto 2014pintailfp7
 
Nscu 302 wk 1 2
Nscu 302 wk 1 2Nscu 302 wk 1 2
Nscu 302 wk 1 2jfazaker
 
Enabling the digital business
Enabling the digital businessEnabling the digital business
Enabling the digital businessDaisy Group
 
Fall leaves fall!
Fall leaves fall!Fall leaves fall!
Fall leaves fall!sherrywyche
 
Page rank optimization to push successful URLs or products for e-commerce
Page rank optimization to push successful URLs or products for e-commercePage rank optimization to push successful URLs or products for e-commerce
Page rank optimization to push successful URLs or products for e-commerceStefan Duprey
 

Destaque (18)

Ученый совет 22 мая 2014 - Представление к ученым званиям
Ученый совет 22 мая 2014 - Представление к ученым званиямУченый совет 22 мая 2014 - Представление к ученым званиям
Ученый совет 22 мая 2014 - Представление к ученым званиям
 
Copy of bbm huruf sama bunyi & sama bentuk esok
Copy of bbm huruf sama bunyi & sama bentuk esokCopy of bbm huruf sama bunyi & sama bentuk esok
Copy of bbm huruf sama bunyi & sama bentuk esok
 
Greenway Medical Technologies interview questions and answers
Greenway Medical Technologiesinterview questions and answersGreenway Medical Technologiesinterview questions and answers
Greenway Medical Technologies interview questions and answers
 
(Group 13) kbat
(Group 13) kbat(Group 13) kbat
(Group 13) kbat
 
научные работники 24 июня 2014
научные работники  24 июня 2014 научные работники  24 июня 2014
научные работники 24 июня 2014
 
(Group 6) pisa
(Group 6) pisa(Group 6) pisa
(Group 6) pisa
 
Business Etiquette Toronto
Business Etiquette TorontoBusiness Etiquette Toronto
Business Etiquette Toronto
 
Ideal Learning Environment
Ideal Learning EnvironmentIdeal Learning Environment
Ideal Learning Environment
 
Overview of power quality problems
Overview of power quality problemsOverview of power quality problems
Overview of power quality problems
 
Granite City Tool History
Granite City Tool HistoryGranite City Tool History
Granite City Tool History
 
Social Commerce 2.0 With CPC Strategy & AddShoppers
Social Commerce 2.0 With CPC Strategy & AddShoppersSocial Commerce 2.0 With CPC Strategy & AddShoppers
Social Commerce 2.0 With CPC Strategy & AddShoppers
 
Rubicon aal testbed erf workshop rovereto 2014
Rubicon aal testbed erf workshop rovereto 2014Rubicon aal testbed erf workshop rovereto 2014
Rubicon aal testbed erf workshop rovereto 2014
 
Nscu 302 wk 1 2
Nscu 302 wk 1 2Nscu 302 wk 1 2
Nscu 302 wk 1 2
 
Enabling the digital business
Enabling the digital businessEnabling the digital business
Enabling the digital business
 
Fall leaves fall!
Fall leaves fall!Fall leaves fall!
Fall leaves fall!
 
PRUEBA DE SLIDE
PRUEBA DE SLIDEPRUEBA DE SLIDE
PRUEBA DE SLIDE
 
Page rank optimization to push successful URLs or products for e-commerce
Page rank optimization to push successful URLs or products for e-commercePage rank optimization to push successful URLs or products for e-commerce
Page rank optimization to push successful URLs or products for e-commerce
 
(Group 7) ppsmi mbmmbi
(Group 7) ppsmi mbmmbi(Group 7) ppsmi mbmmbi
(Group 7) ppsmi mbmmbi
 

Semelhante a E-Biothon Platform Accelerates Genomics Research

E cconcertation lyon-22-sep2011-v3
E cconcertation lyon-22-sep2011-v3E cconcertation lyon-22-sep2011-v3
E cconcertation lyon-22-sep2011-v3Alex Hardisty
 
IDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on CloudIDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on Cloudstratuslab
 
Life watch structural funds workshop 2014 05 12 - V. Breton
Life watch structural funds workshop 2014 05 12 - V. BretonLife watch structural funds workshop 2014 05 12 - V. Breton
Life watch structural funds workshop 2014 05 12 - V. BretonVincent Breton
 
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5Gianpaolo Coro
 
National scale research computing and beyond pearc panel 2017
National scale research computing and beyond   pearc panel 2017National scale research computing and beyond   pearc panel 2017
National scale research computing and beyond pearc panel 2017Gregory Newby
 
10th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v210th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v2Alex Hardisty
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 
ELIXIR Node Poster France
ELIXIR Node Poster FranceELIXIR Node Poster France
ELIXIR Node Poster FranceELIXIR-Europe
 
Providing Bioinformatics Services on Cloud
Providing Bioinformatics Services on CloudProviding Bioinformatics Services on Cloud
Providing Bioinformatics Services on Cloudstratuslab
 
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...OpenAIRE
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchBlue BRIDGE
 
Data is the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniquesData is the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniquesAboul Ella Hassanien
 
Data are the new oil: Big data, data mining and bio - inspiring techniques
Data are the new oil: Big data, data mining and bio - inspiring techniquesData are the new oil: Big data, data mining and bio - inspiring techniques
Data are the new oil: Big data, data mining and bio - inspiring techniquesAboul Ella Hassanien
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
 
AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011Alex Hardisty
 
Science for water management in Mediterranean
Science for water management in MediterraneanScience for water management in Mediterranean
Science for water management in MediterraneanAgropolis International
 

Semelhante a E-Biothon Platform Accelerates Genomics Research (20)

E cconcertation lyon-22-sep2011-v3
E cconcertation lyon-22-sep2011-v3E cconcertation lyon-22-sep2011-v3
E cconcertation lyon-22-sep2011-v3
 
IDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on CloudIDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on Cloud
 
Life watch structural funds workshop 2014 05 12 - V. Breton
Life watch structural funds workshop 2014 05 12 - V. BretonLife watch structural funds workshop 2014 05 12 - V. Breton
Life watch structural funds workshop 2014 05 12 - V. Breton
 
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
 
National scale research computing and beyond pearc panel 2017
National scale research computing and beyond   pearc panel 2017National scale research computing and beyond   pearc panel 2017
National scale research computing and beyond pearc panel 2017
 
10th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v210th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v2
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
ELIXIR Node Poster France
ELIXIR Node Poster FranceELIXIR Node Poster France
ELIXIR Node Poster France
 
Providing Bioinformatics Services on Cloud
Providing Bioinformatics Services on CloudProviding Bioinformatics Services on Cloud
Providing Bioinformatics Services on Cloud
 
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Data is the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniquesData is the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniques
 
Data are the new oil: Big data, data mining and bio - inspiring techniques
Data are the new oil: Big data, data mining and bio - inspiring techniquesData are the new oil: Big data, data mining and bio - inspiring techniques
Data are the new oil: Big data, data mining and bio - inspiring techniques
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011
 
BIOMED_presentation.ppt
BIOMED_presentation.pptBIOMED_presentation.ppt
BIOMED_presentation.ppt
 
Science for water management in Mediterranean
Science for water management in MediterraneanScience for water management in Mediterranean
Science for water management in Mediterranean
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 

Último

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 

Último (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 

E-Biothon Platform Accelerates Genomics Research

  • 1. e-Biothon V. Breton (breton@clermont.in2p3.fr) LPC Clermont-Ferrand, IdGC CNRS-IN2P3 http://france-grilles.fr Credit: N. Bard, A. Franc, JF Gibrat Extreme Performance Computational Science workshop Tokyo, April 15th 2014
  • 2. Table of content 2 • What are the computing challenges of life sciences? • France Grilles: a multidisciplinarydistributede- infrastructure for science • E-Biothon: an HPC platform for research in life sciences
  • 3. Generalities on sequencing • Genome = DNA sequence (4 nucleotids: A, C, G, T) – Smallest non viral genome: Carsonellaruddii (0,16Mbp) – Largestgenome: Polychaosdubium(670Gbp)
  • 4. Sanger technology 500 bpsequences 454 technology 105reads of 450 to 600bp seq. Illumina Technology 106 reads of 100 bpseq. Currentprojects(Tara) 107reads of 100 to 400 bpseq. Explosion of data set size Data analysis ? Algorithms? Heuristics? Tara @ http://oceans.taraexpeditions.org/ Evolution of sequencing techniques
  • 5. Data production isdistributed 2558 High Throughput « NextGeneration » sequencingfacilities in the world, located in 920 centers (only 10 with more than 15 machines) Source: omicspmaps.com
  • 7. Sequencing scenarii • Interest for a new genome requires assembly – process of taking a large number of short DNA sequences and putting them back together to create a representation of the original – Algorithms based on read overlapping benefit from large RAM (1 TO) -> HPC • Working with a reference genome requires comparative analysis – Alignment algorithms (BLAST) findregions of local similaritybetweensequences – Phylogeny algorithms (PhyML) build evolutionary relationships between genomes – Comparative analyses are easily parallelized at data level -> HTC
  • 8. Summary • Life Sciences have specificcomputational challenges – Data production growsfasterthan Moore law – Permanent need of comparing new data to existingones • Life sciences needscanberelevantlyaddressed on multidisciplinary IT infrastructures (e-infrastructures) – HPC resources best fitted for genomeassembly – Grid/cloud HTC resourceswellfitted for comparative analysis • Life sciences are among the main users of the French national grid/cloud production infrastructure
  • 9. France Grilles • Is a ScientificInterest Group… – Created in 2010 by 8 partners: CEA, CNRS,CPU, INRA, INRIA, INSERM, MESR, RENATER… – To steer up and coordinate the national strategy in the fields of grids and clouds • Vision: – Build and operate a national distributedcomputing infrastructure open to all sciences and to developing countries 9
  • 10. France Grilles model • France Grilles does not own the resources – Resourcesowned by user communities • France Grilles provides a framework – To shareresources, expertise and know how – To promote innovation and initiatives – To foster collaboration at national and international levels – To reach out to the long tail of users 10
  • 12. EGI de 2010 à 2013 12 2010-2013: from 14 regional to 34 operations centres in 53 countries, from 188,000 jobs/day with 80,000 cores on 250 Resource Centres to 1,200,000 jobs/day with 430,000 cores on 337 Resource Centres Technologies • Grids • Clouds • Desktops Exposé S. Newhouse Madrid, Sept. 2013 France Grilles, a partner of EGI
  • 13. Provide a commonframework to all user communities
  • 14. Provide an open environment for fruitfuldisciplinary and multidisciplinaryresearch 14 5 1 1 218 54 9 1 5 9 11 15 13 11 755 99 50 9 23 1 10 100 1000 Over 1500 scientific publications june 2010 – April 2014
  • 15. Web portal Users 479 registered users in Nov 2013 (175 in France) Most used robot certificate in EGI (http://go.egi.eu/wiki.robot.users) Neuro-image analysisCancer therapy simulation Prostate radiotherapy plan simulated with GATE(L. Grevillot and D. Sarrut) Image simulation Echocardiography simulated with FIELD-II (O. Bernard et al) Modeling and optimization of distributed computing systems Acceleration yielded by non-clairvoyant task replication (R. Ferreira da Silva et al) Brain tissue segmentation with Freesurfer Scientific applications Infrastructure Supported by EGI Infrastructure Uses biomed VO (most used EGI VO for life sciences in 2013) VIP accounts for ~25% of biomed's activity VIP consumes ~50 CPU years every month DIRAC France-Grilles Application as a service File transfer to/from grid Virtual Imaging Platform: http://www.creatis.insa-lyon.fr/vip
  • 16. Collaborations withdedicated life sciences infrastructures • Institut Français de Bioinformatique (computing and storageresourcesatIDRIS) • France Genomique ( computing and storageresourcesat TGCC) • France Life Imaging (infrastructure for biomedicalimaging) • E-Biothon 16
  • 17. 17 • Telethon: everyyear, fundraising by french media for French MuscularDistrophy Association (AFM) • FromTelethon to Decrypthon – Computing infrastructure (IBM) – Researchprojects (CNRS) – Humanresources (AFM) • FromDecrypthon to E-Biothon E-Biothon: history
  • 18. e-Biothon: an HPC platform for research in life sciences 18 User Support Blue Gene / p machines Technical supportUser Support Blue Gene / P operationWeb access portal
  • 19. E-Biothon: infrastructure 19 • 2 Blue Gene/P IBM racks with 200 TO storage – 2x1024 4-core nodes – up to 28 TFlopspeak performance • SysFera-DS web access to computingresources • 2 modes: – Standard (MPI) – HTC (1024 independenttasks in parallel)
  • 20. E-Biothon vision is to offer a service to the user communities in life sciences • 2013-2014: first 3 projects – Jean-François Gibrat et al, (MIGALE platform, INRA Jouy-en-Josas) – Olivier Gascuel, Stéphane Guindon et Vincent Lefort (CNRS Montpellier) – Yec’hanLaizet, Philippe Chaumeil, Jean-Marc Frigerio, Stéphanie Mariette, Sophie Gerber, Alain Franc (INRA BioGeCo – Bordeaux) • > 2014: open call for projects (IFB)
  • 21. Studying the synteny over a wide range of microbialgenomes 21 • Definition: similar blocks of genes in the same relative positions in the genome • Interest: Study of syntenycan show how the genomeiscut and pasted in the course of evolution • MIGALE team at INRA designed a pipeline analysis to computesyntenybetween 2 genomes and store it in a database • E-Biothon impact: change in scale - capacity to computesyntenybetween 2000 completebacterialgenomes (7 millions comparisons)
  • 22. PhyML Philogeneticsis the study of evolutionaryrelationshipsamong groups of organisms PhyMLis a software thatestimates maximum likelihoodphylogeniesfromalignments of nucleotide or aminoacidsequences PhyML original publication in 2007 is the mostcited in environment and ecology (> 6000 citations). E-Biothon impact: change in scale in the resources made available to PhyMLusers
  • 24. According to botanictheory, biodiversityisorganized in species, genders, families, orders: isitconfirmed in the distance betweensequences?
  • 25. Study of biodiversity in Guyane 16000 differenttreespecies in amazonianforest (≈ 300 in Europe) More biodiversity in 10000 m2 of forest in French Guyana than in Europe Decrypthonadded value Change in scale (from local Mesocenter in Bordeaux) Millions of reads Exact distance computation withoutheuristics (alignement scores) TOctets of data producedeveryweek
  • 26. Conclusion • Both HPC and HTC resources are increasinglyneeded to address life sciences data and computing challenges: – As sequencing technologies keepevolving, data production growsfasterthan Moore law and isincreasinglydistributed – Biological data need to beconstantlycompared to eachother (phylogenetics, genomics comparative analysis) • France isdevelopingcomplementary HPC and HTC infrastructures for life sciences – Institut Français de Bioinformatique, France Génomique – E-Biothon: an HPC platform for research in life sciences – France Grilles: a multidisciplinarygrid/cloud production infrastructure
  • 27.
  • 28.
  • 30. Are life sciences specificw.r.tcomputing? Whatisspecific to life sciences: - As sequencing technologies keepevolving, data production growsfasterthan Moore law - Biological data need to beconstantlycompared to eachother (phylogenetics, Genomics comparative analysis) Whatis not specific? - Data production isdistributed - Multiscalemodeling