SlideShare a Scribd company logo
1 of 50
Large scale NGS pipelines using the MOLGENIS platform: processing the Genome of the Netherlands  Morris Swertz , UMC Groningen, Netherlands and members of BBMRI-NL, NBIC, MOLGENIS BOSC 2011, July 15, Vienna
BOSC 2010 we demonstrated the MOLGENIS software toolkit Use (web) Animal Observatory NextGenSeq Mutation database Model organisms Model (xml) Generator (java) Swertz  et al  (2010)  BMC Bioinformatics  11(Suppl 12):S12,  http://www.molgenis.org
Get stuff for free as others build it already Connect to  annotation services Plugin rich  analysis tools Connect to  statistics UML documentation of your model Edit & trace your data Import/export to Excel find.investigation() 102 downloaded obs<-find.observedvalue( 43,920 downloaded #some calculation add.inferredvalue(res) 36 added      
Three steps:  Model  –> Generate –> Use Swertz  et al  (2010)  BMC Bioinformatics  11(Suppl 12):S12,  http://www.molgenis.org
Three steps: Model –>  Generate  –> Use 9200 INFO  [FormScreenGen] generated generatedavaicreenopMenuainrotocolsForm.java 9293 INFO  [FormScreenGen] generated generatedavaicreenopMenuainrotocolsrotocolMenuarametersForm.java 9325 INFO  [FormScreenGen] generated generatedavaicreenopMenuainrotocolsrotocolMenurotocolComponentsForm.java 9496 INFO  [FormScreenGen] generated generatedavaicreenopMenuainntologiesntologyTermsForm.java 9528 INFO  [FormScreenGen] generated generatedavaicreenopMenuainntologiesntologySourcesForm.java 9606 INFO  [FormScreenGen] generated generatedavaicreenopMenuainntologiesntologySourcesntologyTermsForm.java 9638 INFO  [FormScreenGen] generated generatedavaicreenopMenuainntologiesodeListsForm.java 9700 INFO  [FormScreenGen] generated generatedavaicreenopMenuainntologiesodeListsodesForm.java 9965 INFO  [MenuScreenGen] generated generatedavaicreenopMenuMenu.java 10012 INFO  [MenuScreenGen] generated generatedavaicreenopMenuainMenu.java 10059 INFO  [MenuScreenGen] generated generatedavaicreenopMenuainnvestigationsnvestigationMenuMenu.java 10152 INFO  [MenuScreenGen] generated generatedavaicreenopMenuainnvestigationsnvestigationMenurotocolApplicationsrotocolApplicationMenuMenu.java 10230 INFO  [MenuScreenGen] generated generatedavaicreenopMenuainbservationTargetsMenu.java 10293 INFO  [MenuScreenGen] generated generatedavaicreenopMenuainrotocolsrotocolMenuMenu.java 10324 INFO  [MenuScreenGen] generated generatedavaicreenopMenuainntologiesMenu.java 11354 INFO  [PluginScreenGen] generated Molgenis33Workspaceolgenis4phenotypeeneratedavaicreenopMenuaineportPlugin.java 11557 INFO  [PluginScreenGen] generated Molgenis33Workspaceolgenis4phenotypeeneratedavaicreenopMenuainntologiesntologyManagerPlugin.java 11604 INFO  [PluginScreenGen] generated Molgenis33Workspaceolgenis4phenotypeeneratedavaicreenopMenuodel_documentationPlugin.java 11604 INFO  [PluginScreenGen] generated Molgenis33Workspaceolgenis4phenotypeeneratedavaicreenopMenuprojectApiPlugin.java 11620 INFO  [PluginScreenGen] generated Molgenis33Workspaceolgenis4phenotypeeneratedavaicreenopMenuttpApiPlugin.java 11635 INFO  [PluginScreenGen] generated Molgenis33Workspaceolgenis4phenotypeeneratedavaicreenopMenuebServicesApiPlugin.java 11651 WARN  [PluginScreenFTLTemplateGen] Skipped because exists: handwrittenavalugineportnvestigationOverview.ftl 11807 WARN  [PluginScreenFTLTemplateGen] Skipped because exists: handwrittenavaluginntologyBrowserntologyBrowserPlugin.ftl 11807 WARN  [PluginScreenFTLTemplateGen] Skipped because exists: handwrittenavaluginopmenuocumentationScreen.ftl 11807 WARN  [PluginScreenFTLTemplateGen] Skipped because exists: handwrittenavaluginopmenuprojectApiScreen.ftl 11823 WARN  [PluginScreenFTLTemplateGen] Skipped because exists: handwrittenavaluginopmenuttpAPiScreen.ftl 11823 WARN  [PluginScreenFTLTemplateGen] Skipped because exists: handwrittenavaluginopmenuoapApiScreen.ftl 11854 WARN  [PluginScreenJavaTemplateGen] Skipped because exists: handwrittenavalugineportnvestigationOverview.java 12057 WARN  [PluginScreenJavaTemplateGen] Skipped because exists: handwrittenavaluginntologyBrowserntologyBrowserPlugin.java 12072 WARN  [PluginScreenJavaTemplateGen] Skipped because exists: handwrittenavaluginopmenuocumentationScreen.java 12088 WARN  [PluginScreenJavaTemplateGen] Skipped because exists: handwrittenavaluginopmenuprojectApiScreen.java 12088 WARN  [PluginScreenJavaTemplateGen] Skipped because exists: handwrittenavaluginopmenuttpAPiScreen.java 12088 WARN  [PluginScreenJavaTemplateGen] Skipped because exists: handwrittenavaluginopmenuoapApiScreen.java 12103 INFO  [MolgenisServletContextGen] generated WebContentETA-INFontext.xml 12259 INFO  [SoapApiGen] generated generatedavaioapApi.java 12353 INFO  [CsvExportGen] generated generatedavaoolssvExport.java 12431 INFO  [CsvImportByNameGen] generated generatedavaoolssvImportByName.java 12636 INFO  [CopyMemoryToDatabaseGen] generated generatedavaioolsopyMemoryToDatabase.java Real example: Generates 150 files, 30k lines of Java, MySQL, CXF, Tomcat config, and R code + docs
Three steps: Model –> Generate –>  Use Swertz  et al  (2010)  BMC Bioinformatics  11(Suppl 12):S12,  http://www.molgenis.org
Currently: Towards an integrated app suite XGAP for GWAS/GWL Disease specific databases BBMRI biobank catalogue GWAS central data manager NGS cyber infrastructure MAGE-TAB microarray AnimalDB Swertz  et al  (2010)  BMC Bioinformatics  11(Suppl 12):S12,  http://www.molgenis.org
Large scale NGS pipelines using the MOLGENIS platform: processing the Genome of the Netherlands  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Large scale NGS pipelines using the MOLGENIS platform: processing the Genome of the Netherlands  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Motivation: GWAS revolution in human genetics
Motivation: GWAS revolution in human genetics
Motivation: GWAS revolution in human genetics
Motivation: GWAS revolution in human genetics
Motivation: GWAS revolution in human genetics
GREAT! Ankylosing Spondylitis Celiac Disease Crohn’s disease Multiple Sclerosis Psoriasis Rheumatoid Arthritis Systemic Lupus Erythematosus Type 1 Diabetes Ulcerative Colitis
BUT … these explain a small part of heritability
Missing heritability? Where might it be hiding?
However: Sequencing candidate loci implicates unknown (rare) variants
More insight into the specific genetic architecture of individual populations is crucial First analysis of 1000G project data Durbin  et al., Nature 2010 common known
More insight into the specific genetic architecture of individual populations is crucial First analysis of 1000G project data shows that the majority of the newly identified and rare variants are  population specific (and there are no Dutch in 1000G) Durbin  et al., Nature 2010 common known new
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Idea 1: sequence 1000 independent Dutch chromosomes Biobanks * analysis teams
Idea 2: lets impute 100.000 existing Dutch GWAS data   Imputation is the process of inferring any missing or untyped genetic variants from typed flanking genetic variants, based on the known local LD relationship  GWAS data
Large scale NGS pipelines using the MOLGENIS platform: processing the Genome of the Netherlands  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
GoNL: sequence 1000 independent Dutch chromosomes ,[object Object],[object Object],[object Object],[object Object],[object Object]
GoNL: sequence 1000 independent Dutch chromosomes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
GoNL: sequence 1000 independent Dutch chromosomes ,[object Object],[object Object],[object Object],[object Object],[object Object],TODO:  Imputation ~100,000 Dutch samples with GWAS data ,[object Object],[object Object],[object Object],[object Object]
GoNL: sequence 1000 independent Dutch chromosomes ,[object Object],[object Object],[object Object],[object Object],TODO:  Imputation ~100,000 Dutch samples with GWAS data ,[object Object],[object Object],[object Object],[object Object],TODO: Further analysis Structural variation, Population Genetics,  De novo mutations, Mitochondrial DNA This is an open national project: please contact  [email_address]   [email_address]  and  [email_address]  for analysis ideas.
GoNL: sequence 1000 independent Dutch chromosomes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Imputation existing GWAS ~100,000 Dutch samples with GWAS data Further analysis Structural variation, Population Genetics,  De novo mutations, Mitochondrial DNA This is an open national project: please contact  debakker@broadinstitute.org; m.a.swertz@rug.nl;  [email_address]  for analysis ideas.
Challenge 1: Data storage ,[object Object],[object Object],[object Object]
Challenge 2: Alignment, Variant Calling, and QC pipelines Alignment Variant calling Alignment to human genome (Build 37) Clean up alignment  (mark duplicates, realignment, recalibration) Quality control SNP calling Indel calling Variant Filtering ~ 1 Week ~ 1 Week QC: Immunochip concordance
2300 lanes * 15 analysis steps => 34.500 commands needed ,[object Object],/data/gcc/tools/bwa-0.5.8c_patched/bwa aln /data/gcc/resources/hg19/indices/human_g1k_v37.fa /data/gcc/rawdata/ngs/in-house/28may11/24173/110303_SN163_0393_L6_A80MP0ABXX_AGAGAT_1.fq.gz -t 4 -f /data/gcc/rawdata/ngs/in-house/28may11/results/24173/24173.393_L6.HSpe01.bwa_align_pair1.ftl.human_g1k_v37.2011_05_30_20_22.1.sai /data/gcc/tools/bwa_45_patched/bwa sampe -P -p illumina -i L6 -m 24173 -l A80MP0ABXX /data/gcc/resources/hg19/indices/human_g1k_v37.fa /data/gcc/rawdata/ngs/in-house/28may11/results/24173/24173.393_L6.HSpe01.bwa_align_pair1.ftl.human_g1k_v37.2011_05_30_20_22.1.sai /data/gcc/rawdata/ngs/in-house/28may11/results/24173/24173.393_L6.HSpe02.bwa_align_pair2.ftl.human_g1k_v37.2011_05_30_20_22.2.sai /data/gcc/rawdata/ngs/in-house/28may11/24173/110303_SN163_0393_L6_A80MP0ABXX_AGAGAT_1.fq.gz /data/gcc/rawdata/ngs/in-house/28may11/24173/110303_SN163_0393_L6_A80MP0ABXX_AGAGAT_2.fq.gz -f /data/gcc/rawdata/ngs/in-house/28may11/results/24173/24173.393_L6.HSpe03.bwa_sampe.ftl.human_g1k_v37.2011_05_30_20_22.sam java -jar -Xmx3g /data/gcc/tools/picard-tools-1.32/SamFormatConverter.jar INPUT=/data/gcc/rawdata/ngs/in-house/28may11/results/24173/24173.393_L6.HSpe03.bwa_sampe.ftl.human_g1k_v37.2011_05_30_20_22.sam OUTPUT=/data/gcc/rawdata/ngs/in-house/28may11/results/24173/24173.393_L6.HSpe04.sam_to_bam.ftl.human_g1k_v37.2011_05_30_20_22.bam VALIDATION_STRINGENCY=LENIENT MAX_RECORDS_IN_RAM=2000000 TMP_DIR=/local java -jar -Xmx3g /data/gcc/tools/picard-tools-1.32/SortSam.jar INPUT=/data/gcc/rawdata/ngs/in-house/28may11/results/24173/24173.393_L6.HSpe04.sam_to_bam.ftl.human_g1k_v37.2011_05_30_20_22.bam OUTPUT=/data/gcc/rawdata/ngs/in-house/28may11/results/24173/24173.393_L6.HSpe05.sam_sort.ftl.human_g1k_v37.2011_05_30_20_22.sorted.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=LENIENT MAX_RECORDS_IN_RAM=1000000 TMP_DIR=/local java -jar -Xmx3g /data/gcc/tools/picard-tools-1.32/BuildBamIndex.jar INPUT=/data/gcc/rawdata/ngs/in-house/28may11/results/24173/24173.393_L6.HSpe05.sam_sort.ftl.human_g1k_v37.2011_05_30_20_22.sorted.bam OUTPUT=/data/gcc/rawdata/ngs/in-house/28may11/results/24173/24173.393_L6.HSpe05.sam_sort.ftl.human_g1k_v37.2011_05_30_20_22.sorted.bam.bai VALIDATION_STRINGENCY=LENIENT MAX_RECORDS_IN_RAM=1000000 TMP_DIR=/local
Challenge 3: > 200.000 hours compute hours ,[object Object],[object Object],[object Object],Compute power Network and storage I/O
Challenge 4: Did we analyze it all? Correctly? Completely? Batches: UModqR 60 HUMcriR 90  HUMhxsR 222 HUMrutR 235 HUMjxbR 153  HUMsnrR 10
Large scale NGS pipelines using the MOLGENIS platform: processing the Genome of the Netherlands  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Kickstart the project building on NBIC/BioAssist ,[object Object],[object Object],[object Object]
Solution 1: GPFS shared data storage ,[object Object],[object Object],[object Object],[object Object],2.000 TB 750 x 3TB disks 3200 tapes GPFS http://www.bbmriwiki.nl/wiki/DataManagement   http://www.rug.nl/target/index
Solution 2: data management via  sample-lane worksheet sample flowcell lane lib machine date file A24a FC80R35ABXX L3 HUMhxsRJODIAAPE I433 101119 101119_I433_FC80R35ABXX_L3_HUMhxsRJODIAAPE A24a FC80F2RABXX L3 HUMhxsRJODIABPE I481 101120 101120_I481_FC80F2RABXX_L3_HUMhxsRJODIABPE A24a FC80GHKABXX L2 HUMhxsRJODIBAPE I114 101202 101202_I114_FC80GHKABXX_L2_HUMhxsRJODIBAPE A24b FC80R35ABXX L4 HUMhxsRJPDIAAPE I433 101119 101119_I433_FC80R35ABXX_L4_HUMhxsRJPDIAAPE A24b FC80F2RABXX L4 HUMhxsRJPDIABPE I481 101120 101120_I481_FC80F2RABXX_L4_HUMhxsRJPDIABPE A24b FC80GHKABXX L3 HUMhxsRJPDIBAPE I114 101202 101202_I114_FC80GHKABXX_L3_HUMhxsRJPDIBAPE A24b FC81C8UABXX L3 HUMhxsRJPDIBAPE I340 110114 110114_I340_FC81C8UABXX_L3_HUMhxsRJPDIBAPE A24c FC80R35ABXX L5 HUMhxsRJQDIAAPE I433 101119 101119_I433_FC80R35ABXX_L5_HUMhxsRJQDIAAPE A24c FC80F2RABXX L6 HUMhxsRJQDIABPE I481 101120 101120_I481_FC80F2RABXX_L6_HUMhxsRJQDIABPE A24c FC80GHKABXX L4 HUMhxsRJQDIBAPE I114 101202 101202_I114_FC80GHKABXX_L4_HUMhxsRJQDIBAPE A25a FC80R35ABXX L6 HUMhxsRJRDIAAPE I433 101119 101119_I433_FC80R35ABXX_L6_HUMhxsRJRDIAAPE A25a FC81C8UABXX L2 HUMhxsRJRDIAAPE I340 110114 110114_I340_FC81C8UABXX_L2_HUMhxsRJRDIAAPE A25a FC80F54ABXX L7 HUMhxsRJRDIABPE I171 101122 101122_I171_FC80F54ABXX_L7_HUMhxsRJRDIABPE A25a FC80GHKABXX L5 HUMhxsRJRDIBAPE I114 101202 101202_I114_FC80GHKABXX_L5_HUMhxsRJRDIBAPE A25b FC80R35ABXX L7 HUMhxsRJSDIAAPE I433 101119 101119_I433_FC80R35ABXX_L7_HUMhxsRJSDIAAPE A25b FC80EE1ABXX L5 HUMhxsRJSDIABPE I171 101122 101122_I171_FC80EE1ABXX_L5_HUMhxsRJSDIABPE A25b FC80GHKABXX L6 HUMhxsRJSDIBAPE I114 101202 101202_I114_FC80GHKABXX_L6_HUMhxsRJSDIBAPE A25b FC80GHJABXX L1 HUMhxsRJSDIBAPE I117 101208 101208_I117_FC80GHJABXX_L1_HUMhxsRJSDIBAPE A25c FC80R35ABXX L8 HUMhxsRJTDIAAPE I433 101119 101119_I433_FC80R35ABXX_L8_HUMhxsRJTDIAAPE A25c FC80F54ABXX L5 HUMhxsRJTDIABPE I171 101122 101122_I171_FC80F54ABXX_L5_HUMhxsRJTDIABPE A25c FC80GHKABXX L7 HUMhxsRJTDIBAPE I114 101202 101202_I114_FC80GHKABXX_L7_HUMhxsRJTDIBAPE A25c FC81C7KABXX L5 HUMhxsRJTDIBAPE I125 110115 110115_I125_FC81C7KABXX_L5_HUMhxsRJTDIBAPE A26a FC80PEWABXX L5 HUMhxsRJUDIAAPE I198 101120 101120_I198_FC80PEWABXX_L5_HUMhxsRJUDIAAPE A26a FC80F2RABXX L7 HUMhxsRJUDIABPE I481 101120 101120_I481_FC80F2RABXX_L7_HUMhxsRJUDIABPE A26a FC80GHKABXX L8 HUMhxsRJUDIBAPE I114 101202 101202_I114_FC80GHKABXX_L8_HUMhxsRJUDIBAPE A26b FC80N58ABXX L5 HUMhxsRJVDIAAPE I245 101120 101120_I245_FC80N58ABXX_L5_HUMhxsRJVDIAAPE A26b FC80PNWABXX L2 HUMhxsRJVDIABPE I453 101119 101119_I453_FC80PNWABXX_L2_HUMhxsRJVDIABPE A26b FC80G37ABXX L1 HUMhxsRJVDIBAPE I127 101126 101126_I127_FC80G37ABXX_L1_HUMhxsRJVDIBAPE A26c FC80LDLABXX L1 HUMhxsRJWDIAAPE I453 101119 101119_I453_FC80LDLABXX_L1_HUMhxsRJWDIAAPE A26c FC80PNWABXX L3 HUMhxsRJWDIABPE I453 101119 101119_I453_FC80PNWABXX_L3_HUMhxsRJWDIABPE A26c FC80G37ABXX L2 HUMhxsRJWDIBAPE I127 101126 101126_I127_FC80G37ABXX_L2_HUMhxsRJWDIBAPE
(of course it is a bit more advanced than that) ,[object Object],[object Object],[object Object],[object Object]
Solution 3: auto-generate all computational protocols ,[object Object],Generate scripts 1. Create  SampleLane list 2. Generate pipeline  from templates 3. Submit to  Compute cluster bwa aln  ${lane} bwa aln  FC80R35ABXX_L3.fq.gz bwa aln  FC80R35ABXX_L3.fq.gz bwa aln  FC80R35ABXX_L3.fq.gz 34.500 scripts 15 templates http://www.bbmriwiki.nl/svn/ngs_pipelines/templates/ngs/
Solution 4: distributed compute efforts > 200.000 hours ,[object Object],[object Object],[object Object],RUG CIT/Target ~900 lanes done ~240 per week 360 cpus AMC/BigGrid ~250 lanes done ~30 per week ~270 cpus EMC Hubrecht Other BigGrid
Solution 5: a tool to submit and monitor compute jobs
Solution 6: REST based services ,[object Object],[object Object],[object Object],http://www.molgenis.org/wiki/MolgenisRestInterface http://www.molgenis.org/wiki/MolgenisRinterface   curl -d  'data_type_input=org.molgenis.pheno.Individual &data_input=Name,Descriptio%0AInd1,Desc1%0AInd2,Desc2 &data_action=ADD &data_silent=F&submit_input=submit'   http://vm7.target.rug.nl/ngs_test/api/add source(&quot;http://a.host:8080/molgenis_ngs/api/R&quot;)”> res <- find.NgsSample();
All working together (beta) MOLGENIS user interface  for NGS (Java) Petabyte File storage (GPFS, GridFS?) compute cluster (PBS, Grid?) bwa aln  ${lane} Protocol catalogue (Freermaker) Lane & Sample metadata  And QC reports (MySQL) MOLGENIS/compute Generate  ‘ ProtocolApplications ’ Submit and monitor (GridGain) uses ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Data & protocols Result exploration uses Test & play
Large scale NGS pipelines using the MOLGENIS platform: processing the Genome of the Netherlands  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Download demo from DropBox ,[object Object]
Large scale NGS pipelines using the MOLGENIS platform: processing the Genome of the Netherlands  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Alignment results Alignment Variant calling Alignment to human genome (Build 37) Clean up alignment  (mark duplicates, realignment, recalibration) Quality control Individual SNP calling Indel calling Variant Filtering ~ 1 Week ~ 1 Week >94% reads aligned >13x avg coverage
SNP calling result (GoNL Pilot Chr20  – 1KG Phase I) 16,045 177,389 648,284 1KG Estimated Chr20 Ti/Tv:  2.36 GoNL Pilot Only SNPs 16,045 %dbSNP 2.05 Ti/Tv 2.20 1KG Phase 1 Only SNPs 648,284 %dbSNP 10.23 Ti/Tv 2.36 Intersection SNPs 177,389 %dbSNP 65.91 Ti/Tv 2.41
Next… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Get all as open source: GoNL -  http://www.nlgenome.nl MOLGENIS  -  http://www.molgenis.org   Analysis team -  http://www.bbmriwiki.nl   Contact? [email_address]

More Related Content

What's hot

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
BOSC 2010
 
Toast 2015 qiime_talk2
Toast 2015 qiime_talk2Toast 2015 qiime_talk2
Toast 2015 qiime_talk2
TOASTworkshop
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Monica Munoz-Torres
 
Toast 2015 qiime_talk
Toast 2015 qiime_talkToast 2015 qiime_talk
Toast 2015 qiime_talk
TOASTworkshop
 

What's hot (20)

T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
 
Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
Toast 2015 qiime_talk2
Toast 2015 qiime_talk2Toast 2015 qiime_talk2
Toast 2015 qiime_talk2
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
2014 naples
2014 naples2014 naples
2014 naples
 
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
 
NGS: bioinformatic challenges
NGS: bioinformatic challengesNGS: bioinformatic challenges
NGS: bioinformatic challenges
 
Coding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS eraCoding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS era
 
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious Disease
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and Annotations
 
Genome Big Data
Genome Big DataGenome Big Data
Genome Big Data
 
Toast 2015 qiime_talk
Toast 2015 qiime_talkToast 2015 qiime_talk
Toast 2015 qiime_talk
 
Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015
 
2014 ucl
2014 ucl2014 ucl
2014 ucl
 
Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR Genomics
 

Viewers also liked (8)

F06-Cloud-Enabling NGS
F06-Cloud-Enabling NGSF06-Cloud-Enabling NGS
F06-Cloud-Enabling NGS
 
Talk1 ben sadi for_gmod_bosc_2011
Talk1 ben sadi for_gmod_bosc_2011Talk1 ben sadi for_gmod_bosc_2011
Talk1 ben sadi for_gmod_bosc_2011
 
Mobyle 1 0_new_features_new_types_of_service
Mobyle 1 0_new_features_new_types_of_serviceMobyle 1 0_new_features_new_types_of_service
Mobyle 1 0_new_features_new_types_of_service
 
G03-SemanticWeb-OntoCAT
G03-SemanticWeb-OntoCATG03-SemanticWeb-OntoCAT
G03-SemanticWeb-OntoCAT
 
Sap Business One Food Project 2016
Sap Business One Food Project 2016Sap Business One Food Project 2016
Sap Business One Food Project 2016
 
Running workflows through galaxy bosc presentation
Running workflows through galaxy bosc presentationRunning workflows through galaxy bosc presentation
Running workflows through galaxy bosc presentation
 
Bosc mercer
Bosc mercerBosc mercer
Bosc mercer
 
Menager bosc2010 mobyle
Menager bosc2010 mobyleMenager bosc2010 mobyle
Menager bosc2010 mobyle
 

Similar to D02-NextGenSeq-MOLGENIS

Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
BOSC 2010
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
David Ruau
 

Similar to D02-NextGenSeq-MOLGENIS (20)

Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
 
Cloud Accelerated Genomics
Cloud Accelerated GenomicsCloud Accelerated Genomics
Cloud Accelerated Genomics
 
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
 
2022-04-11 Opening X-omics festival 2022, Nijmegen, Alain van Gool
2022-04-11 Opening X-omics festival 2022, Nijmegen, Alain van Gool2022-04-11 Opening X-omics festival 2022, Nijmegen, Alain van Gool
2022-04-11 Opening X-omics festival 2022, Nijmegen, Alain van Gool
 
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen
 
SFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free softwareSFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free software
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011
 
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionProduction Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on Production
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...
 
Platforms CIBERER and INB-ELIXIR-es
Platforms CIBERER and INB-ELIXIR-esPlatforms CIBERER and INB-ELIXIR-es
Platforms CIBERER and INB-ELIXIR-es
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
The Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceThe Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data Science
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
 
Scientific Data Management
Scientific Data ManagementScientific Data Management
Scientific Data Management
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Bioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsBioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformatics
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The Ugly
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 

More from Bioinformatics Open Source Conference

More from Bioinformatics Open Source Conference (20)

Bosc2011 arakawa
Bosc2011 arakawaBosc2011 arakawa
Bosc2011 arakawa
 
Bosc2011 isobar-fbp
Bosc2011 isobar-fbpBosc2011 isobar-fbp
Bosc2011 isobar-fbp
 
Talk6 biopython bosc2011
Talk6 biopython bosc2011Talk6 biopython bosc2011
Talk6 biopython bosc2011
 
Unipro ugene bosc 2011 update
Unipro ugene bosc 2011 updateUnipro ugene bosc 2011 update
Unipro ugene bosc 2011 update
 
Bosc2011 ntino-krampis-full
Bosc2011 ntino-krampis-fullBosc2011 ntino-krampis-full
Bosc2011 ntino-krampis-full
 
Bosc talk 7-15-2011x
Bosc talk 7-15-2011xBosc talk 7-15-2011x
Bosc talk 7-15-2011x
 
F02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinuxF02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinux
 
B07-GenomeContent-Biomart
B07-GenomeContent-BiomartB07-GenomeContent-Biomart
B07-GenomeContent-Biomart
 
B03-GenomeContent-Intermine
B03-GenomeContent-IntermineB03-GenomeContent-Intermine
B03-GenomeContent-Intermine
 
D03-NextGen-Bio-NGS
D03-NextGen-Bio-NGSD03-NextGen-Bio-NGS
D03-NextGen-Bio-NGS
 
F07-Cloud-Hadoop-BAM
F07-Cloud-Hadoop-BAMF07-Cloud-Hadoop-BAM
F07-Cloud-Hadoop-BAM
 
C03-Visualization-Webapollo
C03-Visualization-WebapolloC03-Visualization-Webapollo
C03-Visualization-Webapollo
 
F01-Cloud-Mygene.info
F01-Cloud-Mygene.infoF01-Cloud-Mygene.info
F01-Cloud-Mygene.info
 
A01-Openness in knowledge-based systems
A01-Openness in knowledge-based systemsA01-Openness in knowledge-based systems
A01-Openness in knowledge-based systems
 
F03-Cloud-Obiwee
F03-Cloud-ObiweeF03-Cloud-Obiwee
F03-Cloud-Obiwee
 
F05-Cloud-Sequencescape
F05-Cloud-SequencescapeF05-Cloud-Sequencescape
F05-Cloud-Sequencescape
 
C02-Visualization-Applying visual analytics
C02-Visualization-Applying visual analyticsC02-Visualization-Applying visual analytics
C02-Visualization-Applying visual analytics
 
B04-GenomeContent-EasyDAS
B04-GenomeContent-EasyDASB04-GenomeContent-EasyDAS
B04-GenomeContent-EasyDAS
 
G04-Misc-Debianmed
G04-Misc-DebianmedG04-Misc-Debianmed
G04-Misc-Debianmed
 
G07-Misc-Gmod
G07-Misc-GmodG07-Misc-Gmod
G07-Misc-Gmod
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

D02-NextGenSeq-MOLGENIS

  • 1. Large scale NGS pipelines using the MOLGENIS platform: processing the Genome of the Netherlands Morris Swertz , UMC Groningen, Netherlands and members of BBMRI-NL, NBIC, MOLGENIS BOSC 2011, July 15, Vienna
  • 2. BOSC 2010 we demonstrated the MOLGENIS software toolkit Use (web) Animal Observatory NextGenSeq Mutation database Model organisms Model (xml) Generator (java) Swertz et al (2010) BMC Bioinformatics 11(Suppl 12):S12, http://www.molgenis.org
  • 3. Get stuff for free as others build it already Connect to annotation services Plugin rich analysis tools Connect to statistics UML documentation of your model Edit & trace your data Import/export to Excel find.investigation() 102 downloaded obs<-find.observedvalue( 43,920 downloaded #some calculation add.inferredvalue(res) 36 added      
  • 4. Three steps: Model –> Generate –> Use Swertz et al (2010) BMC Bioinformatics 11(Suppl 12):S12, http://www.molgenis.org
  • 5. Three steps: Model –> Generate –> Use 9200 INFO [FormScreenGen] generated generatedavaicreenopMenuainrotocolsForm.java 9293 INFO [FormScreenGen] generated generatedavaicreenopMenuainrotocolsrotocolMenuarametersForm.java 9325 INFO [FormScreenGen] generated generatedavaicreenopMenuainrotocolsrotocolMenurotocolComponentsForm.java 9496 INFO [FormScreenGen] generated generatedavaicreenopMenuainntologiesntologyTermsForm.java 9528 INFO [FormScreenGen] generated generatedavaicreenopMenuainntologiesntologySourcesForm.java 9606 INFO [FormScreenGen] generated generatedavaicreenopMenuainntologiesntologySourcesntologyTermsForm.java 9638 INFO [FormScreenGen] generated generatedavaicreenopMenuainntologiesodeListsForm.java 9700 INFO [FormScreenGen] generated generatedavaicreenopMenuainntologiesodeListsodesForm.java 9965 INFO [MenuScreenGen] generated generatedavaicreenopMenuMenu.java 10012 INFO [MenuScreenGen] generated generatedavaicreenopMenuainMenu.java 10059 INFO [MenuScreenGen] generated generatedavaicreenopMenuainnvestigationsnvestigationMenuMenu.java 10152 INFO [MenuScreenGen] generated generatedavaicreenopMenuainnvestigationsnvestigationMenurotocolApplicationsrotocolApplicationMenuMenu.java 10230 INFO [MenuScreenGen] generated generatedavaicreenopMenuainbservationTargetsMenu.java 10293 INFO [MenuScreenGen] generated generatedavaicreenopMenuainrotocolsrotocolMenuMenu.java 10324 INFO [MenuScreenGen] generated generatedavaicreenopMenuainntologiesMenu.java 11354 INFO [PluginScreenGen] generated Molgenis33Workspaceolgenis4phenotypeeneratedavaicreenopMenuaineportPlugin.java 11557 INFO [PluginScreenGen] generated Molgenis33Workspaceolgenis4phenotypeeneratedavaicreenopMenuainntologiesntologyManagerPlugin.java 11604 INFO [PluginScreenGen] generated Molgenis33Workspaceolgenis4phenotypeeneratedavaicreenopMenuodel_documentationPlugin.java 11604 INFO [PluginScreenGen] generated Molgenis33Workspaceolgenis4phenotypeeneratedavaicreenopMenuprojectApiPlugin.java 11620 INFO [PluginScreenGen] generated Molgenis33Workspaceolgenis4phenotypeeneratedavaicreenopMenuttpApiPlugin.java 11635 INFO [PluginScreenGen] generated Molgenis33Workspaceolgenis4phenotypeeneratedavaicreenopMenuebServicesApiPlugin.java 11651 WARN [PluginScreenFTLTemplateGen] Skipped because exists: handwrittenavalugineportnvestigationOverview.ftl 11807 WARN [PluginScreenFTLTemplateGen] Skipped because exists: handwrittenavaluginntologyBrowserntologyBrowserPlugin.ftl 11807 WARN [PluginScreenFTLTemplateGen] Skipped because exists: handwrittenavaluginopmenuocumentationScreen.ftl 11807 WARN [PluginScreenFTLTemplateGen] Skipped because exists: handwrittenavaluginopmenuprojectApiScreen.ftl 11823 WARN [PluginScreenFTLTemplateGen] Skipped because exists: handwrittenavaluginopmenuttpAPiScreen.ftl 11823 WARN [PluginScreenFTLTemplateGen] Skipped because exists: handwrittenavaluginopmenuoapApiScreen.ftl 11854 WARN [PluginScreenJavaTemplateGen] Skipped because exists: handwrittenavalugineportnvestigationOverview.java 12057 WARN [PluginScreenJavaTemplateGen] Skipped because exists: handwrittenavaluginntologyBrowserntologyBrowserPlugin.java 12072 WARN [PluginScreenJavaTemplateGen] Skipped because exists: handwrittenavaluginopmenuocumentationScreen.java 12088 WARN [PluginScreenJavaTemplateGen] Skipped because exists: handwrittenavaluginopmenuprojectApiScreen.java 12088 WARN [PluginScreenJavaTemplateGen] Skipped because exists: handwrittenavaluginopmenuttpAPiScreen.java 12088 WARN [PluginScreenJavaTemplateGen] Skipped because exists: handwrittenavaluginopmenuoapApiScreen.java 12103 INFO [MolgenisServletContextGen] generated WebContentETA-INFontext.xml 12259 INFO [SoapApiGen] generated generatedavaioapApi.java 12353 INFO [CsvExportGen] generated generatedavaoolssvExport.java 12431 INFO [CsvImportByNameGen] generated generatedavaoolssvImportByName.java 12636 INFO [CopyMemoryToDatabaseGen] generated generatedavaioolsopyMemoryToDatabase.java Real example: Generates 150 files, 30k lines of Java, MySQL, CXF, Tomcat config, and R code + docs
  • 6. Three steps: Model –> Generate –> Use Swertz et al (2010) BMC Bioinformatics 11(Suppl 12):S12, http://www.molgenis.org
  • 7. Currently: Towards an integrated app suite XGAP for GWAS/GWL Disease specific databases BBMRI biobank catalogue GWAS central data manager NGS cyber infrastructure MAGE-TAB microarray AnimalDB Swertz et al (2010) BMC Bioinformatics 11(Suppl 12):S12, http://www.molgenis.org
  • 8.
  • 9.
  • 10. Motivation: GWAS revolution in human genetics
  • 11. Motivation: GWAS revolution in human genetics
  • 12. Motivation: GWAS revolution in human genetics
  • 13. Motivation: GWAS revolution in human genetics
  • 14. Motivation: GWAS revolution in human genetics
  • 15. GREAT! Ankylosing Spondylitis Celiac Disease Crohn’s disease Multiple Sclerosis Psoriasis Rheumatoid Arthritis Systemic Lupus Erythematosus Type 1 Diabetes Ulcerative Colitis
  • 16. BUT … these explain a small part of heritability
  • 17. Missing heritability? Where might it be hiding?
  • 18. However: Sequencing candidate loci implicates unknown (rare) variants
  • 19. More insight into the specific genetic architecture of individual populations is crucial First analysis of 1000G project data Durbin et al., Nature 2010 common known
  • 20. More insight into the specific genetic architecture of individual populations is crucial First analysis of 1000G project data shows that the majority of the newly identified and rare variants are population specific (and there are no Dutch in 1000G) Durbin et al., Nature 2010 common known new
  • 21.
  • 22. Idea 2: lets impute 100.000 existing Dutch GWAS data  Imputation is the process of inferring any missing or untyped genetic variants from typed flanking genetic variants, based on the known local LD relationship GWAS data
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30. Challenge 2: Alignment, Variant Calling, and QC pipelines Alignment Variant calling Alignment to human genome (Build 37) Clean up alignment (mark duplicates, realignment, recalibration) Quality control SNP calling Indel calling Variant Filtering ~ 1 Week ~ 1 Week QC: Immunochip concordance
  • 31.
  • 32.
  • 33. Challenge 4: Did we analyze it all? Correctly? Completely? Batches: UModqR 60 HUMcriR 90 HUMhxsR 222 HUMrutR 235 HUMjxbR 153 HUMsnrR 10
  • 34.
  • 35.
  • 36.
  • 37. Solution 2: data management via sample-lane worksheet sample flowcell lane lib machine date file A24a FC80R35ABXX L3 HUMhxsRJODIAAPE I433 101119 101119_I433_FC80R35ABXX_L3_HUMhxsRJODIAAPE A24a FC80F2RABXX L3 HUMhxsRJODIABPE I481 101120 101120_I481_FC80F2RABXX_L3_HUMhxsRJODIABPE A24a FC80GHKABXX L2 HUMhxsRJODIBAPE I114 101202 101202_I114_FC80GHKABXX_L2_HUMhxsRJODIBAPE A24b FC80R35ABXX L4 HUMhxsRJPDIAAPE I433 101119 101119_I433_FC80R35ABXX_L4_HUMhxsRJPDIAAPE A24b FC80F2RABXX L4 HUMhxsRJPDIABPE I481 101120 101120_I481_FC80F2RABXX_L4_HUMhxsRJPDIABPE A24b FC80GHKABXX L3 HUMhxsRJPDIBAPE I114 101202 101202_I114_FC80GHKABXX_L3_HUMhxsRJPDIBAPE A24b FC81C8UABXX L3 HUMhxsRJPDIBAPE I340 110114 110114_I340_FC81C8UABXX_L3_HUMhxsRJPDIBAPE A24c FC80R35ABXX L5 HUMhxsRJQDIAAPE I433 101119 101119_I433_FC80R35ABXX_L5_HUMhxsRJQDIAAPE A24c FC80F2RABXX L6 HUMhxsRJQDIABPE I481 101120 101120_I481_FC80F2RABXX_L6_HUMhxsRJQDIABPE A24c FC80GHKABXX L4 HUMhxsRJQDIBAPE I114 101202 101202_I114_FC80GHKABXX_L4_HUMhxsRJQDIBAPE A25a FC80R35ABXX L6 HUMhxsRJRDIAAPE I433 101119 101119_I433_FC80R35ABXX_L6_HUMhxsRJRDIAAPE A25a FC81C8UABXX L2 HUMhxsRJRDIAAPE I340 110114 110114_I340_FC81C8UABXX_L2_HUMhxsRJRDIAAPE A25a FC80F54ABXX L7 HUMhxsRJRDIABPE I171 101122 101122_I171_FC80F54ABXX_L7_HUMhxsRJRDIABPE A25a FC80GHKABXX L5 HUMhxsRJRDIBAPE I114 101202 101202_I114_FC80GHKABXX_L5_HUMhxsRJRDIBAPE A25b FC80R35ABXX L7 HUMhxsRJSDIAAPE I433 101119 101119_I433_FC80R35ABXX_L7_HUMhxsRJSDIAAPE A25b FC80EE1ABXX L5 HUMhxsRJSDIABPE I171 101122 101122_I171_FC80EE1ABXX_L5_HUMhxsRJSDIABPE A25b FC80GHKABXX L6 HUMhxsRJSDIBAPE I114 101202 101202_I114_FC80GHKABXX_L6_HUMhxsRJSDIBAPE A25b FC80GHJABXX L1 HUMhxsRJSDIBAPE I117 101208 101208_I117_FC80GHJABXX_L1_HUMhxsRJSDIBAPE A25c FC80R35ABXX L8 HUMhxsRJTDIAAPE I433 101119 101119_I433_FC80R35ABXX_L8_HUMhxsRJTDIAAPE A25c FC80F54ABXX L5 HUMhxsRJTDIABPE I171 101122 101122_I171_FC80F54ABXX_L5_HUMhxsRJTDIABPE A25c FC80GHKABXX L7 HUMhxsRJTDIBAPE I114 101202 101202_I114_FC80GHKABXX_L7_HUMhxsRJTDIBAPE A25c FC81C7KABXX L5 HUMhxsRJTDIBAPE I125 110115 110115_I125_FC81C7KABXX_L5_HUMhxsRJTDIBAPE A26a FC80PEWABXX L5 HUMhxsRJUDIAAPE I198 101120 101120_I198_FC80PEWABXX_L5_HUMhxsRJUDIAAPE A26a FC80F2RABXX L7 HUMhxsRJUDIABPE I481 101120 101120_I481_FC80F2RABXX_L7_HUMhxsRJUDIABPE A26a FC80GHKABXX L8 HUMhxsRJUDIBAPE I114 101202 101202_I114_FC80GHKABXX_L8_HUMhxsRJUDIBAPE A26b FC80N58ABXX L5 HUMhxsRJVDIAAPE I245 101120 101120_I245_FC80N58ABXX_L5_HUMhxsRJVDIAAPE A26b FC80PNWABXX L2 HUMhxsRJVDIABPE I453 101119 101119_I453_FC80PNWABXX_L2_HUMhxsRJVDIABPE A26b FC80G37ABXX L1 HUMhxsRJVDIBAPE I127 101126 101126_I127_FC80G37ABXX_L1_HUMhxsRJVDIBAPE A26c FC80LDLABXX L1 HUMhxsRJWDIAAPE I453 101119 101119_I453_FC80LDLABXX_L1_HUMhxsRJWDIAAPE A26c FC80PNWABXX L3 HUMhxsRJWDIABPE I453 101119 101119_I453_FC80PNWABXX_L3_HUMhxsRJWDIABPE A26c FC80G37ABXX L2 HUMhxsRJWDIBAPE I127 101126 101126_I127_FC80G37ABXX_L2_HUMhxsRJWDIBAPE
  • 38.
  • 39.
  • 40.
  • 41. Solution 5: a tool to submit and monitor compute jobs
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47. Alignment results Alignment Variant calling Alignment to human genome (Build 37) Clean up alignment (mark duplicates, realignment, recalibration) Quality control Individual SNP calling Indel calling Variant Filtering ~ 1 Week ~ 1 Week >94% reads aligned >13x avg coverage
  • 48. SNP calling result (GoNL Pilot Chr20 – 1KG Phase I) 16,045 177,389 648,284 1KG Estimated Chr20 Ti/Tv: 2.36 GoNL Pilot Only SNPs 16,045 %dbSNP 2.05 Ti/Tv 2.20 1KG Phase 1 Only SNPs 648,284 %dbSNP 10.23 Ti/Tv 2.36 Intersection SNPs 177,389 %dbSNP 65.91 Ti/Tv 2.41
  • 49.
  • 50.

Editor's Notes

  1. Phase information; accurate haplotypes Better characterization of Structural Variation Detection of de novo variants and new mutation rates