SlideShare uma empresa Scribd logo
1 de 16
QBI’s Centre for Brain Genomics The informatics side of things [Sprengben [why not get a friend]] September 8, 2011
Objective of QBI’s Centre for Brain genomics On-time delivery Reliable data production Convincing data Easy delivery Perkel JM. Coding your way out of a problem. Nat Methods. 2011 Jun PMID: 21716280.
Birdseye view of facility’s workflow September 8, 2011
Detailed workflow September 8, 2011 Cbot HiSeq 30 diff.  programs CASAVA Raw sequence reads projects flowcell HiSeq cluster cluster
Overview of Production Informatics framework September 8, 2011 Automatic Manual Processing                             Evaluation Run/ Data/ MakeFastq.sh trigger.sh  armed trigger.sh  html Unaligned/ bwa/, reCaAl/, variant/ Summary.html //clusterstorage Apache, IGV, R, UCSC //cluster-vm
Trigger.sh September 8, 2011 Keeping data separate from scripts Automating verification, quality control and summary HTML generation Rerunning pipeline from every point
Flexible generic names: header #Programs BWA="/clusterdata/hiseq_apps/bin/$MODE/bwa" SAMTOOLS="/clusterdata/hiseq_apps/bin/$MODE/samtools" IGVTOOLS="/clusterdata/hiseq_apps/bin/$MODE/igvtools/IGVTools/igvtools.jar” # Task names TASKFASTQC="fastQC" TASKBWA="bwa" TASKRCA="reCalAln” #Fileabb READONE="read1" READTWO="read2" FASTQ="fastq.gz" ALN="aln" # aligned  September 8, 2011
Config.txt September 8, 2011 #******************** # Tasks #******************** mappingBWA="1"  recalibrateQualScore="1"  #******************** # Paths #******************** FASTA="/clusterdata/resources/hg19/hg19.fasta"  SEQREG=chr1:229994688-230071581" DBSNP="/clusterdata/resources/hg19/snpdb132.vcf"  #******************** # PARAMETER #******************** LIBRARY="QBI” ADDPARAMBWA=“--force single”  Specifics what to do, e.g. mapping and recalibration  Specifics where to find resources  Customizes stanardsripts for this project
call trigger.shconfig.txtarmed trigger.shconfig.txthtml September 8, 2011 s_1_read1.fastq s_1_read2.fastq s_2_read1.fastq s_2_read2.fastq s_3_read1.fastq s_3_read2.fastq s_4_read1.fastq s_4_read2.fastq s_1.bam s_2.bam s_1.ashrr.bam s_2.ashrr.bam s_3.bam s_4.bam s_3.ashrr.bam s_4.ashrr.bam Sub1_s_1.out Sub1_s_2.out Sub2_s_3.out Sub2_s_4.out Sub1_s_1.out Sub1_s_2.out Sub2_s_3.out Sub2_s_4.out
Summary.html Project Cards September 8, 2011 Sequence statistics Run check  points Data Visualization Mapping stats Download Interesting Regions
Scaffold of pbsScripts.sh: Error catching September 8, 2011 Code example for setting up what errors to look out for # QCVARIABLES, loosing reads, unmapped read,no such file,file not found,bwa.sh: line Output in Summary.html >>>>>>>>>> Errors QC_PASS .. 0 have We are loosing reads/184 QC_PASS .. 0 have for unmapped read/184 QC_PASS .. 0 have no such file/184 QC_PASS .. 0 have file not found/184 QC_PASS .. 0 have bwa.sh: line/184
Scaffold of pbsScripts.sh: checkpoints September 8, 2011 qsub -by -jy [PBSOPTIONS] pbsScript.sh -k HISEQINF [PARAMETERS] Code example for setting up checkpoints in the pbsScript.sh echo “********* mapping” $BWA aln -t $THREADS $FASTA $f > $OUT/${n/$FASTQ/sai} $BWA aln -t $THREADS $FASTA ${f/$READONE/$READTWO} > $OUT/${n/$READONE.$FASTQ/$READTWO.sai} Output in Summary.html >>>>>>>>>> CheckPoints QC_PASS .. 184 have mapping/184 QC_PASS .. 184 have sorting and bam-conversion/184 QC_PASS .. 184 have mark duplicates/184 QC_PASS .. 184 have statistics/184 QC_PASS .. 184 have coverage track/184
Availability: tailored to skills 1 2 3 Website  RStudio Command line
The big picture Covering all aspects of: design*, set-up*, maintenance*, usage  (*except cluster) Documentation: Project Server //project 5 TB raw data 750 GB processed data 57 GB external data 7 project-cards 10 Projects, 6 HiSeq-Runs  40 wiki pages, 250 Tasks, 551h logged 160 Commits 35 external programs 41 custom scripts (4197 lines of code) Application Backup/Version Control Data Warehousing Statistic  Analysis HiSeq Output RSudio Raw Data Quality Control Project Cards Processed Data Processed Data Rsync Hypothesis Generation Software BWA, GATK, samtools, etc. Custom Scripts Custom Scripts Version Control Data Processing and Analysis External Genomic Resources Cluster Genomes, Annotation, etc. Project Server Content Galaxy Visualization IGV Genome Browser //cluster-vm //clusterstorage //groupshare, //ethan
Three things to remember Reliable data production Projects have all a similar structure and are processed in the same way Convincing data All steps are tightly quality controlled and the QC report is accessible Easy delivery We tailored data availability to skill-levels (webpage, Rstudio, console On time delivery Production informatics has priority on the cluster September 8, 2011 ( )
Next week NGS Discussion group:  Methylation analysis 	Kevin Dudley and Danay Baker-Andresen September 8, 2011

Mais conteúdo relacionado

Destaque

Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Reid Robison
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedSri Ambati
 
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...Wesley De Neve
 
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive
 

Destaque (6)

Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
 
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
 
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
 
Machine learning
Machine learningMachine learning
Machine learning
 

Semelhante a Qbi Centre for Brain genomics (Informatics side)

Django - Python MVC Framework
Django - Python MVC FrameworkDjango - Python MVC Framework
Django - Python MVC FrameworkBala Kumar
 
Jboss World 2011 Infinispan
Jboss World 2011 InfinispanJboss World 2011 Infinispan
Jboss World 2011 Infinispancbo_
 
Shibboleth 2.0 SP slides - Installfest
Shibboleth 2.0 SP slides - InstallfestShibboleth 2.0 SP slides - Installfest
Shibboleth 2.0 SP slides - InstallfestJISC.AM
 
Intro To Mvc Development In Php
Intro To Mvc Development In PhpIntro To Mvc Development In Php
Intro To Mvc Development In Phpfunkatron
 
Instrumentación de entrega continua con Gitlab
Instrumentación de entrega continua con GitlabInstrumentación de entrega continua con Gitlab
Instrumentación de entrega continua con GitlabSoftware Guru
 
Deploy Rails Application by Capistrano
Deploy Rails Application by CapistranoDeploy Rails Application by Capistrano
Deploy Rails Application by CapistranoTasawr Interactive
 
Scaling up development of a modular code base
Scaling up development of a modular code baseScaling up development of a modular code base
Scaling up development of a modular code baseRobert Munteanu
 
Buying a Ferrari for your teenager? You may want to think twice
Buying a Ferrari for your teenager? You may want to think twiceBuying a Ferrari for your teenager? You may want to think twice
Buying a Ferrari for your teenager? You may want to think twiceAl Zindiq
 
Internet Explorer 8 for Developers by Christian Thilmany
Internet Explorer 8 for Developers by Christian ThilmanyInternet Explorer 8 for Developers by Christian Thilmany
Internet Explorer 8 for Developers by Christian ThilmanyChristian Thilmany
 
Workshop quality assurance for php projects - phpbelfast
Workshop quality assurance for php projects - phpbelfastWorkshop quality assurance for php projects - phpbelfast
Workshop quality assurance for php projects - phpbelfastMichelangelo van Dam
 
Create a web-app with Cgi Appplication
Create a web-app with Cgi AppplicationCreate a web-app with Cgi Appplication
Create a web-app with Cgi Appplicationolegmmiller
 
Service Oriented Integration With ServiceMix
Service Oriented Integration With ServiceMixService Oriented Integration With ServiceMix
Service Oriented Integration With ServiceMixBruce Snyder
 
Introduction To ASP.NET MVC
Introduction To ASP.NET MVCIntroduction To ASP.NET MVC
Introduction To ASP.NET MVCAlan Dean
 
Workshop quality assurance for php projects - ZendCon 2013
Workshop quality assurance for php projects - ZendCon 2013Workshop quality assurance for php projects - ZendCon 2013
Workshop quality assurance for php projects - ZendCon 2013Michelangelo van Dam
 
Augustus Overview Open Source Analytics
Augustus Overview  Open Source AnalyticsAugustus Overview  Open Source Analytics
Augustus Overview Open Source Analyticsjtrussell
 
Front End Website Optimization
Front End Website OptimizationFront End Website Optimization
Front End Website OptimizationGerard Sychay
 
Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020
Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020
Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020data://disrupted®
 

Semelhante a Qbi Centre for Brain genomics (Informatics side) (20)

Django - Python MVC Framework
Django - Python MVC FrameworkDjango - Python MVC Framework
Django - Python MVC Framework
 
Jboss World 2011 Infinispan
Jboss World 2011 InfinispanJboss World 2011 Infinispan
Jboss World 2011 Infinispan
 
Shibboleth 2.0 SP slides - Installfest
Shibboleth 2.0 SP slides - InstallfestShibboleth 2.0 SP slides - Installfest
Shibboleth 2.0 SP slides - Installfest
 
Bioinformatica 10-11-2011-p6-bioperl
Bioinformatica 10-11-2011-p6-bioperlBioinformatica 10-11-2011-p6-bioperl
Bioinformatica 10-11-2011-p6-bioperl
 
Intro To Mvc Development In Php
Intro To Mvc Development In PhpIntro To Mvc Development In Php
Intro To Mvc Development In Php
 
Instrumentación de entrega continua con Gitlab
Instrumentación de entrega continua con GitlabInstrumentación de entrega continua con Gitlab
Instrumentación de entrega continua con Gitlab
 
Deploy Rails Application by Capistrano
Deploy Rails Application by CapistranoDeploy Rails Application by Capistrano
Deploy Rails Application by Capistrano
 
Scaling up development of a modular code base
Scaling up development of a modular code baseScaling up development of a modular code base
Scaling up development of a modular code base
 
Buying a Ferrari for your teenager? You may want to think twice
Buying a Ferrari for your teenager? You may want to think twiceBuying a Ferrari for your teenager? You may want to think twice
Buying a Ferrari for your teenager? You may want to think twice
 
Internet Explorer 8 for Developers by Christian Thilmany
Internet Explorer 8 for Developers by Christian ThilmanyInternet Explorer 8 for Developers by Christian Thilmany
Internet Explorer 8 for Developers by Christian Thilmany
 
Workshop quality assurance for php projects - phpbelfast
Workshop quality assurance for php projects - phpbelfastWorkshop quality assurance for php projects - phpbelfast
Workshop quality assurance for php projects - phpbelfast
 
Create a web-app with Cgi Appplication
Create a web-app with Cgi AppplicationCreate a web-app with Cgi Appplication
Create a web-app with Cgi Appplication
 
Service Oriented Integration With ServiceMix
Service Oriented Integration With ServiceMixService Oriented Integration With ServiceMix
Service Oriented Integration With ServiceMix
 
Beyond Unit Testing
Beyond Unit TestingBeyond Unit Testing
Beyond Unit Testing
 
Introduction To ASP.NET MVC
Introduction To ASP.NET MVCIntroduction To ASP.NET MVC
Introduction To ASP.NET MVC
 
Workshop quality assurance for php projects - ZendCon 2013
Workshop quality assurance for php projects - ZendCon 2013Workshop quality assurance for php projects - ZendCon 2013
Workshop quality assurance for php projects - ZendCon 2013
 
Augustus Overview Open Source Analytics
Augustus Overview  Open Source AnalyticsAugustus Overview  Open Source Analytics
Augustus Overview Open Source Analytics
 
Php frameworks
Php frameworksPhp frameworks
Php frameworks
 
Front End Website Optimization
Front End Website OptimizationFront End Website Optimization
Front End Website Optimization
 
Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020
Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020
Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020
 

Mais de Denis C. Bauer

Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Denis C. Bauer
 
Translating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteTranslating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteDenis C. Bauer
 
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataGoing Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataDenis C. Bauer
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
Population-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisPopulation-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisDenis C. Bauer
 
Allelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingAllelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingDenis C. Bauer
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysisDenis C. Bauer
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expressionDenis C. Bauer
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseqDenis C. Bauer
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variantsDenis C. Bauer
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Denis C. Bauer
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Denis C. Bauer
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencingDenis C. Bauer
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsDenis C. Bauer
 
The missing data issue for HiSeq runs
The missing data issue for HiSeq runsThe missing data issue for HiSeq runs
The missing data issue for HiSeq runsDenis C. Bauer
 
Deciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDeciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDenis C. Bauer
 
STAR: Recombination site prediction
STAR: Recombination site predictionSTAR: Recombination site prediction
STAR: Recombination site predictionDenis C. Bauer
 

Mais de Denis C. Bauer (20)

Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research
 
Translating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteTranslating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynote
 
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataGoing Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
Population-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisPopulation-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysis
 
Trip Report Seattle
Trip Report SeattleTrip Report Seattle
Trip Report Seattle
 
Allelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingAllelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome Sequencing
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysis
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expression
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variants
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
The missing data issue for HiSeq runs
The missing data issue for HiSeq runsThe missing data issue for HiSeq runs
The missing data issue for HiSeq runs
 
Deciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDeciphering the regulatory code in the genome
Deciphering the regulatory code in the genome
 
ReliF
ReliFReliF
ReliF
 
STAR: Recombination site prediction
STAR: Recombination site predictionSTAR: Recombination site prediction
STAR: Recombination site prediction
 

Último

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Qbi Centre for Brain genomics (Informatics side)

  • 1. QBI’s Centre for Brain Genomics The informatics side of things [Sprengben [why not get a friend]] September 8, 2011
  • 2. Objective of QBI’s Centre for Brain genomics On-time delivery Reliable data production Convincing data Easy delivery Perkel JM. Coding your way out of a problem. Nat Methods. 2011 Jun PMID: 21716280.
  • 3. Birdseye view of facility’s workflow September 8, 2011
  • 4. Detailed workflow September 8, 2011 Cbot HiSeq 30 diff. programs CASAVA Raw sequence reads projects flowcell HiSeq cluster cluster
  • 5. Overview of Production Informatics framework September 8, 2011 Automatic Manual Processing Evaluation Run/ Data/ MakeFastq.sh trigger.sh armed trigger.sh html Unaligned/ bwa/, reCaAl/, variant/ Summary.html //clusterstorage Apache, IGV, R, UCSC //cluster-vm
  • 6. Trigger.sh September 8, 2011 Keeping data separate from scripts Automating verification, quality control and summary HTML generation Rerunning pipeline from every point
  • 7. Flexible generic names: header #Programs BWA="/clusterdata/hiseq_apps/bin/$MODE/bwa" SAMTOOLS="/clusterdata/hiseq_apps/bin/$MODE/samtools" IGVTOOLS="/clusterdata/hiseq_apps/bin/$MODE/igvtools/IGVTools/igvtools.jar” # Task names TASKFASTQC="fastQC" TASKBWA="bwa" TASKRCA="reCalAln” #Fileabb READONE="read1" READTWO="read2" FASTQ="fastq.gz" ALN="aln" # aligned September 8, 2011
  • 8. Config.txt September 8, 2011 #******************** # Tasks #******************** mappingBWA="1" recalibrateQualScore="1" #******************** # Paths #******************** FASTA="/clusterdata/resources/hg19/hg19.fasta" SEQREG=chr1:229994688-230071581" DBSNP="/clusterdata/resources/hg19/snpdb132.vcf" #******************** # PARAMETER #******************** LIBRARY="QBI” ADDPARAMBWA=“--force single” Specifics what to do, e.g. mapping and recalibration Specifics where to find resources Customizes stanardsripts for this project
  • 9. call trigger.shconfig.txtarmed trigger.shconfig.txthtml September 8, 2011 s_1_read1.fastq s_1_read2.fastq s_2_read1.fastq s_2_read2.fastq s_3_read1.fastq s_3_read2.fastq s_4_read1.fastq s_4_read2.fastq s_1.bam s_2.bam s_1.ashrr.bam s_2.ashrr.bam s_3.bam s_4.bam s_3.ashrr.bam s_4.ashrr.bam Sub1_s_1.out Sub1_s_2.out Sub2_s_3.out Sub2_s_4.out Sub1_s_1.out Sub1_s_2.out Sub2_s_3.out Sub2_s_4.out
  • 10. Summary.html Project Cards September 8, 2011 Sequence statistics Run check points Data Visualization Mapping stats Download Interesting Regions
  • 11. Scaffold of pbsScripts.sh: Error catching September 8, 2011 Code example for setting up what errors to look out for # QCVARIABLES, loosing reads, unmapped read,no such file,file not found,bwa.sh: line Output in Summary.html >>>>>>>>>> Errors QC_PASS .. 0 have We are loosing reads/184 QC_PASS .. 0 have for unmapped read/184 QC_PASS .. 0 have no such file/184 QC_PASS .. 0 have file not found/184 QC_PASS .. 0 have bwa.sh: line/184
  • 12. Scaffold of pbsScripts.sh: checkpoints September 8, 2011 qsub -by -jy [PBSOPTIONS] pbsScript.sh -k HISEQINF [PARAMETERS] Code example for setting up checkpoints in the pbsScript.sh echo “********* mapping” $BWA aln -t $THREADS $FASTA $f > $OUT/${n/$FASTQ/sai} $BWA aln -t $THREADS $FASTA ${f/$READONE/$READTWO} > $OUT/${n/$READONE.$FASTQ/$READTWO.sai} Output in Summary.html >>>>>>>>>> CheckPoints QC_PASS .. 184 have mapping/184 QC_PASS .. 184 have sorting and bam-conversion/184 QC_PASS .. 184 have mark duplicates/184 QC_PASS .. 184 have statistics/184 QC_PASS .. 184 have coverage track/184
  • 13. Availability: tailored to skills 1 2 3 Website RStudio Command line
  • 14. The big picture Covering all aspects of: design*, set-up*, maintenance*, usage (*except cluster) Documentation: Project Server //project 5 TB raw data 750 GB processed data 57 GB external data 7 project-cards 10 Projects, 6 HiSeq-Runs 40 wiki pages, 250 Tasks, 551h logged 160 Commits 35 external programs 41 custom scripts (4197 lines of code) Application Backup/Version Control Data Warehousing Statistic Analysis HiSeq Output RSudio Raw Data Quality Control Project Cards Processed Data Processed Data Rsync Hypothesis Generation Software BWA, GATK, samtools, etc. Custom Scripts Custom Scripts Version Control Data Processing and Analysis External Genomic Resources Cluster Genomes, Annotation, etc. Project Server Content Galaxy Visualization IGV Genome Browser //cluster-vm //clusterstorage //groupshare, //ethan
  • 15. Three things to remember Reliable data production Projects have all a similar structure and are processed in the same way Convincing data All steps are tightly quality controlled and the QC report is accessible Easy delivery We tailored data availability to skill-levels (webpage, Rstudio, console On time delivery Production informatics has priority on the cluster September 8, 2011 ( )
  • 16. Next week NGS Discussion group: Methylation analysis Kevin Dudley and Danay Baker-Andresen September 8, 2011

Notas do Editor

  1. http://www.haynesboone.com/files/ImageControl/64f36756-3f0f-4254-b7bb-d9b447ae14d5/c8cd574b-4e35-4071-8a35-007febd928ee/Presentation/Image/mainImage_perspective.jpg