SlideShare a Scribd company logo
1 of 60
Enabling Large Scale Sequencing Studies through Science as a Service (ScaaS) Justin H. Johnson Director of Bioinformatics EdgeBio Washington DC, USA
Agenda Who We Are NGS at 30K Challenges and Enabling Through ScaaS Transcriptome Projects Exome Projects Ion Torrent Data
Life Tech Service Provider
Contract Research Division Five SOLiD4 sequencing platforms One Life Techologies 5500XL Two Ion Torrent PGMs Automation thru Caliper Sciclone& BiomekFX Life Technologies Preferred Service Provider Agilent Certified Service Provider Commercial partnerships with companies such as CLCBio, DNANexusand Genologics MD/PhD & Masters Level Scientists and Bioinformaticians IT Infrastructure of >100 CPUs and >100TB storage
Edge BioServ Scientific Advisory Board Elaine Mardis, Ph.D. Co-Director, Genome Sequencing Center Washington University School of Medicine Sam Levy, Ph.D. Director of Genome SciencesScripps Translational Science Institute Scripps Genomic Medicine Michael Zody, M.S. Chief Technologist Broad Institute Ken Dewar, Ph.D. Assistant Professor McGill University and Genome Quebec Steven Salzberg, Ph.D. Director, Center for Bioinformatics and Computational Biology University of Maryland Gabor Marth, Ph.D. Professor of Bioinformatics Boston College Elliott Margulies, Ph.D. Investigator Genome Informatics Section National Human Genome Research Institute National Institutes of Health
Machines and Vendors GnuBio
Obligatory NGS Exponential Growth Slide Nature Biotechnology Volume 26  Number10  October2008
Ultra High Throughput + Lower Cost = Broader Applications
Experimental Design Considerations ,[object Object]
Choice of Library Construction
Depth of coverage
Re$ources
Number of Replicates
Number of Samples and Control
Etc…,[object Object]
Flexibility with Standards and Scale Then (CE) – The Norm 10 Machines, 30 – 360 Days, 1 Project Now (Illumina/SOLiD/454) – Scale 1 machine, 14 Days, 30 Projects Now (Ion Torrent) - Flexibility 1 machine, 1 Day, 1 Project. Future (CLCBio, Nexus, Open Source) Standardization of analysis
Partial List of Mappers 	* BFAST - Blat-like Fast Accurate Search Tool. Written by Nils Homer, Stanley F. Nelson and Barry Merriman at UCLA.* Bowtie - Ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of 25 million reads per hour on a typical workstation with 2 gigabytes of memory. Uses a Burrows-Wheeler-Transformed (BWT) index. Link to discussion thread here. Written by Ben Langmead and Cole Trapnell. Linux, Windows, and Mac OS X.* BWA - Heng Lee's BWT Alignment program - a progression from Maq. BWA is a fast light-weighted tool that aligns short sequences to a sequence database, such as the human reference genome. By default, BWA finds an alignment within edit distance 2 to the query sequence. C++ source.* ELAND - Efficient Large-Scale Alignment of Nucleotide Databases. Whole genome alignments to a reference genome. Written by Illumina author Anthony J. Cox for the Solexa 1G machine.* Exonerate - Various forms of pairwise alignment (including Smith-Waterman-Gotoh) of DNA/protein against a reference. Authors are Guy St C Slater and Ewan Birney from EMBL. C for POSIX.* GenomeMapper - GenomeMapper is a short read mapping tool designed for accurate read alignments. It quickly aligns millions of reads either with ungapped or gapped alignments. A tool created by the 1001 Genomes project. Source for POSIX.* GMAP - GMAP (Genomic Mapping and Alignment Program) for mRNA and EST Sequences. Developed by Thomas Wu and Colin Watanabe at Genentec. C/Perl for Unix.* gnumap - The Genomic Next-generation Universal MAPper (gnumap) is a program designed to accurately map sequence data obtained from next-generation sequencing machines (specifically that of Solexa/Illumina) back to a genome of any size. It seeks to align reads from nonunique repeats using statistics. From authors at Brigham Young University. C source/Unix.* MAQ - Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly designed for Illumina with preliminary functions to handle ABI SOLiD data. Written by Heng Li from the Sanger Centre. Features extensive supporting tools for DIP/SNP detection, etc. C++ source* MOSAIK - MOSAIK produces gapped alignments using the Smith-Waterman algorithm. Features a number of support tools. Support for Roche FLX, Illumina, SOLiD, and Helicos. Written by Michael Strömberg at Boston College. Win/Linux/MacOSX* MrFAST and MrsFAST - mrFAST & mrsFAST are designed to map short reads generated with the Illumina platform to reference genome assemblies; in a fast and memory-efficient manner. Robust to INDELs and MrsFAST has a bisulphite mode. Authors are from the University of Washington. C as source.* MUMmer - MUMmer is a modular system for the rapid whole genome alignment of finished or draft sequence. Released as a package providing an efficient suffix tree library, seed-and-extend alignment, SNP detection, repeat detection, and visualization tools. Version 3.0 was developed by Stefan Kurtz, Adam Phillippy, Arthur L Delcher, Michael Smoot, Martin Shumway, Corina Antonescu and Steven L Salzberg - most of whom are at The Institute for Genomic Research in Maryland, USA. POSIX OS required.* Novocraft - Tools for reference alignment of paired-end and single-end Illumina reads. Uses a Needleman-Wunsch algorithm. Can support Bis-Seq. Commercial. Available free for evaluation, educational use and for use on open not-for-profit projects. Requires Linux or Mac OS X.* PASS - It supports Illumina, SOLiD and Roche-FLX data formats and allows the user to modulate very finely the sensitivity of the alignments. Spaced seed intial filter, then NW dynamic algorithm to a SW(like) local alignment. Authors are from CRIBI in Italy. Win/Linux.* RMAP - Assembles 20 - 64 bp Illumina reads to a FASTA reference genome. By Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC Bioinformatics). POSIX OS required.* SeqMap - Supports up to 5 or more bp mismatches/INDELs. Highly tunable. Written by Hui Jiang from the Wong lab at Stanford. Builds available for most OS's.* SHRiMP - Assembles to a reference sequence. Developed with Applied Biosystem's colourspace genomic representation in mind. Authors are Michael Brudno and Stephen Rumble at the University of Toronto. POSIX.* Slider- An application for the Illumina Sequence Analyzer output that uses the probability files instead of the sequence files as an input for alignment to a reference sequence or a set of reference sequences. Authors are from BCGSC. Paper is here.* SOAP - SOAP (Short Oligonucleotide Alignment Program). A program for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The updated version uses a BWT. Can call SNPs and INDELs. Author is Ruiqiang Li at the Beijing Genomics Institute. C++, POSIX.* SSAHA - SSAHA (Sequence Search and Alignment by Hashing Algorithm) is a tool for rapidly finding near exact matches in DNA or protein databases using a hash table. Developed at the Sanger Centre by Zemin Ning, Anthony Cox and James Mullikin. C++ for Linux/Alpha.* SOCS - Aligns SOLiD data. SOCS is built on an iterative variation of the Rabin-Karp string search algorithm, which uses hashing to reduce the set of possible matches, drastically increasing search speed. Authors are Ondov B, Varadarajan A, Passalacqua KD and Bergman NH.* SWIFT - The SWIFT suit is a software collection for fast index-based sequence comparison. It contains: SWIFT — fast local alignment search, guaranteeing to find epsilon-matches between two sequences. SWIFT BALSAM — a very fast program to find semiglobal non-gapped alignments based on k-mer seeds. Authors are Kim Rasmussen (SWIFT) and Wolfgang Gerlach (SWIFT BALSAM)* SXOligoSearch - SXOligoSearch is a commercial platform offered by the Malaysian based Synamatix. Will align Illumina reads against a range of Refseq RNA or NCBI genome builds for a number of organisms. Web Portal. OS independent.* Vmatch - A versatile software tool for efficiently solving large scale sequence matching tasks. Vmatch subsumes the software tool REPuter, but is much more general, with a very flexible user interface, and improved space and time requirements. Essentially a large string matching toolbox. POSIX.* Zoom - ZOOM (Zillions Of Oligos Mapped) is designed to map millions of short reads, emerged by next-generation sequencing technology, back to the reference genomes, and carry out post-analysis. ZOOM is developed to be highly accurate, flexible, and user-friendly with speed being a critical priority. Commercial. Supports Illumina and SOLiD data. Courtesy of SeqAnswers.com
Evolving Sequencing & Analysis Methods to Enable Genomic Research
Real World Examples - Scale 1500+ Sample Epigenetic Study Challenges ,[object Object]
Tracking (LIMS)
QC (Automation and Standardization)
Delivery (Automation and Standardization)Solution ,[object Object]
CLC Bio and Genologics
Custom Algorithms
HPC and Storage
Onsite 100 TB NAS
S3 for Backup and Delivery,[object Object]
tiRNA ATG AAA AAA ATG ATG ATG AAA AAA ATG ATG AAA ATG AAA genomic DNA Mammalian transcriptional complexity Mammalian Transcriptome Complexity TSS TSS TSS pA pA pA ATG ATG TSS pA PASR TASR miRNA AAA spliced intron microRNAs TSS pA polyadenylation signal transcription start site protein coding regions AAA translation start site polyadenylation non-coding regions ATG Courtesy of Life Technologies
RNA-Seq ,[object Object]
Yet based on well-established methodologies
Substantial Benefits over Hybridization-Based Methods
Better quantitative gene expression performance (DGE)
In addition, can allow a comprehensive view of transcription (Whole Transcriptome)
Transcriptome projects overview
Identification of imprinted genes contributing to specific brain regions by whole transcriptome sequencing
24 sample cohort for basic human expression and variant analysis in diseased patients.
32 Sample cohort looking at  novel splice junctions, gene fusions, and differential expression of colon cancer samples over a time series
Collaboration with Scripps Translational on Colon Cancer Transciptomes,[object Object]
Sample Sourcing for Transcriptome Projects Blood: Large quantities of sample available, but with limited utility in transcriptome analysis Tissue: Needle biopsy most common, but sample quantity very low Surgical section: Larger quantities available, but limited utility; need laser capture microdissection to provide useful results, sample quantity very low FFPE Slides: Very useful in clinical research but amount of sample and quality low.
Unamplified vs Amplified Prostate Cancer Cell Line (Vcap) from CPDR Well characterized Differential Expression upon the addition of androgens. Compared transcriptome from a single pool of RNA Unamplified, ribosomally depleted (Ribominus™) Amplified, no ribosomal depletion required Two Pipelines for analysis
Amplification Gives Different Results Gene Expression in Unstimulated Cells Unamp Amplified 14,075 2112 1071
Spearman’s Correlation from 2 Pipelines
RNA-Seq Analysis Between Pipelines is Either Concordant Amplified, Stimulated, Pipe A Amplified, Stimulated, Pipe B
Or not… Unamplified, Stimulated, Pipe A Unamplified, Stimulated, Pipe B
Even if you remove all SNORA and SNORD Unamplified, Stimulated, Pipe A Unamplified, Stimulated, Pipe B
NM refseq NR refseq Histones (circles) SNORD/SNORA rRNA dots PolyA Selection vs Ribosomal Depletion Courtesy of Life Technologies
Not what you want to hear… ,[object Object]
Join discordance
Scripting
Visualization
Filtering techniques based on YOUR data.,[object Object]
Exome and Targeted Resequencing  ,[object Object]
Fine map a region

More Related Content

What's hot

Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Morgan Langille
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysismikaelhuss
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsRare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsGolden Helix Inc
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 
Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Torsten Seemann
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionJatinder Singh
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Li Shen
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopMorgan Langille
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expressionDenis C. Bauer
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseqDenis C. Bauer
 
NGS: bioinformatic challenges
NGS: bioinformatic challengesNGS: bioinformatic challenges
NGS: bioinformatic challengesLex Nederbragt
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
diffReps: automated ChIP-seq differential analysis package
diffReps: automated ChIP-seq differential analysis packagediffReps: automated ChIP-seq differential analysis package
diffReps: automated ChIP-seq differential analysis packageLi Shen
 

What's hot (20)

Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsRare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
Molecular Biology Software Links
Molecular Biology Software LinksMolecular Biology Software Links
Molecular Biology Software Links
 
Whole exome sequencing(wes)
Whole exome sequencing(wes)Whole exome sequencing(wes)
Whole exome sequencing(wes)
 
Rna seq
Rna seq Rna seq
Rna seq
 
Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expression
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
 
NGS: bioinformatic challenges
NGS: bioinformatic challengesNGS: bioinformatic challenges
NGS: bioinformatic challenges
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
ChIP-seq - Data processing
ChIP-seq - Data processingChIP-seq - Data processing
ChIP-seq - Data processing
 
diffReps: automated ChIP-seq differential analysis package
diffReps: automated ChIP-seq differential analysis packagediffReps: automated ChIP-seq differential analysis package
diffReps: automated ChIP-seq differential analysis package
 

Similar to Enabling Large Scale Sequencing Studies through Science as a Service

Closing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real ScienceClosing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real ScienceJustin Johnson
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxxRowlet
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...EMC
 
Under the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS ResearchersUnder the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS Researchers Golden Helix Inc
 
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious DiseaseJoão André Carriço
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsPriscill Orue Esquivel
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Solutions
 
Chambwe bosc2010
Chambwe bosc2010Chambwe bosc2010
Chambwe bosc2010BOSC 2010
 
Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64PeterMaf
 
Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64PeterMaf
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 

Similar to Enabling Large Scale Sequencing Studies through Science as a Service (20)

Closing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real ScienceClosing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real Science
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
Under the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS ResearchersUnder the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS Researchers
 
2015_CV_J_SHELTON_linked
2015_CV_J_SHELTON_linked2015_CV_J_SHELTON_linked
2015_CV_J_SHELTON_linked
 
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious Disease
 
Cloud bioinformatics 2
Cloud bioinformatics 2Cloud bioinformatics 2
Cloud bioinformatics 2
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
A
AA
A
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
3003 eve 1
3003 eve 13003 eve 1
3003 eve 1
 
Understanding Genome
Understanding Genome Understanding Genome
Understanding Genome
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methods
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic Solutions
 
Chambwe bosc2010
Chambwe bosc2010Chambwe bosc2010
Chambwe bosc2010
 
Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64
 
Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64Genome res. 2002-kent-656-64
Genome res. 2002-kent-656-64
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 

Recently uploaded

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Recently uploaded (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Enabling Large Scale Sequencing Studies through Science as a Service

  • 1. Enabling Large Scale Sequencing Studies through Science as a Service (ScaaS) Justin H. Johnson Director of Bioinformatics EdgeBio Washington DC, USA
  • 2. Agenda Who We Are NGS at 30K Challenges and Enabling Through ScaaS Transcriptome Projects Exome Projects Ion Torrent Data
  • 3.
  • 4. Life Tech Service Provider
  • 5. Contract Research Division Five SOLiD4 sequencing platforms One Life Techologies 5500XL Two Ion Torrent PGMs Automation thru Caliper Sciclone& BiomekFX Life Technologies Preferred Service Provider Agilent Certified Service Provider Commercial partnerships with companies such as CLCBio, DNANexusand Genologics MD/PhD & Masters Level Scientists and Bioinformaticians IT Infrastructure of >100 CPUs and >100TB storage
  • 6. Edge BioServ Scientific Advisory Board Elaine Mardis, Ph.D. Co-Director, Genome Sequencing Center Washington University School of Medicine Sam Levy, Ph.D. Director of Genome SciencesScripps Translational Science Institute Scripps Genomic Medicine Michael Zody, M.S. Chief Technologist Broad Institute Ken Dewar, Ph.D. Assistant Professor McGill University and Genome Quebec Steven Salzberg, Ph.D. Director, Center for Bioinformatics and Computational Biology University of Maryland Gabor Marth, Ph.D. Professor of Bioinformatics Boston College Elliott Margulies, Ph.D. Investigator Genome Informatics Section National Human Genome Research Institute National Institutes of Health
  • 7.
  • 9. Obligatory NGS Exponential Growth Slide Nature Biotechnology Volume 26 Number10 October2008
  • 10. Ultra High Throughput + Lower Cost = Broader Applications
  • 11.
  • 12.
  • 13.
  • 14. Choice of Library Construction
  • 18. Number of Samples and Control
  • 19.
  • 20. Flexibility with Standards and Scale Then (CE) – The Norm 10 Machines, 30 – 360 Days, 1 Project Now (Illumina/SOLiD/454) – Scale 1 machine, 14 Days, 30 Projects Now (Ion Torrent) - Flexibility 1 machine, 1 Day, 1 Project. Future (CLCBio, Nexus, Open Source) Standardization of analysis
  • 21. Partial List of Mappers * BFAST - Blat-like Fast Accurate Search Tool. Written by Nils Homer, Stanley F. Nelson and Barry Merriman at UCLA.* Bowtie - Ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of 25 million reads per hour on a typical workstation with 2 gigabytes of memory. Uses a Burrows-Wheeler-Transformed (BWT) index. Link to discussion thread here. Written by Ben Langmead and Cole Trapnell. Linux, Windows, and Mac OS X.* BWA - Heng Lee's BWT Alignment program - a progression from Maq. BWA is a fast light-weighted tool that aligns short sequences to a sequence database, such as the human reference genome. By default, BWA finds an alignment within edit distance 2 to the query sequence. C++ source.* ELAND - Efficient Large-Scale Alignment of Nucleotide Databases. Whole genome alignments to a reference genome. Written by Illumina author Anthony J. Cox for the Solexa 1G machine.* Exonerate - Various forms of pairwise alignment (including Smith-Waterman-Gotoh) of DNA/protein against a reference. Authors are Guy St C Slater and Ewan Birney from EMBL. C for POSIX.* GenomeMapper - GenomeMapper is a short read mapping tool designed for accurate read alignments. It quickly aligns millions of reads either with ungapped or gapped alignments. A tool created by the 1001 Genomes project. Source for POSIX.* GMAP - GMAP (Genomic Mapping and Alignment Program) for mRNA and EST Sequences. Developed by Thomas Wu and Colin Watanabe at Genentec. C/Perl for Unix.* gnumap - The Genomic Next-generation Universal MAPper (gnumap) is a program designed to accurately map sequence data obtained from next-generation sequencing machines (specifically that of Solexa/Illumina) back to a genome of any size. It seeks to align reads from nonunique repeats using statistics. From authors at Brigham Young University. C source/Unix.* MAQ - Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly designed for Illumina with preliminary functions to handle ABI SOLiD data. Written by Heng Li from the Sanger Centre. Features extensive supporting tools for DIP/SNP detection, etc. C++ source* MOSAIK - MOSAIK produces gapped alignments using the Smith-Waterman algorithm. Features a number of support tools. Support for Roche FLX, Illumina, SOLiD, and Helicos. Written by Michael Strömberg at Boston College. Win/Linux/MacOSX* MrFAST and MrsFAST - mrFAST & mrsFAST are designed to map short reads generated with the Illumina platform to reference genome assemblies; in a fast and memory-efficient manner. Robust to INDELs and MrsFAST has a bisulphite mode. Authors are from the University of Washington. C as source.* MUMmer - MUMmer is a modular system for the rapid whole genome alignment of finished or draft sequence. Released as a package providing an efficient suffix tree library, seed-and-extend alignment, SNP detection, repeat detection, and visualization tools. Version 3.0 was developed by Stefan Kurtz, Adam Phillippy, Arthur L Delcher, Michael Smoot, Martin Shumway, Corina Antonescu and Steven L Salzberg - most of whom are at The Institute for Genomic Research in Maryland, USA. POSIX OS required.* Novocraft - Tools for reference alignment of paired-end and single-end Illumina reads. Uses a Needleman-Wunsch algorithm. Can support Bis-Seq. Commercial. Available free for evaluation, educational use and for use on open not-for-profit projects. Requires Linux or Mac OS X.* PASS - It supports Illumina, SOLiD and Roche-FLX data formats and allows the user to modulate very finely the sensitivity of the alignments. Spaced seed intial filter, then NW dynamic algorithm to a SW(like) local alignment. Authors are from CRIBI in Italy. Win/Linux.* RMAP - Assembles 20 - 64 bp Illumina reads to a FASTA reference genome. By Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC Bioinformatics). POSIX OS required.* SeqMap - Supports up to 5 or more bp mismatches/INDELs. Highly tunable. Written by Hui Jiang from the Wong lab at Stanford. Builds available for most OS's.* SHRiMP - Assembles to a reference sequence. Developed with Applied Biosystem's colourspace genomic representation in mind. Authors are Michael Brudno and Stephen Rumble at the University of Toronto. POSIX.* Slider- An application for the Illumina Sequence Analyzer output that uses the probability files instead of the sequence files as an input for alignment to a reference sequence or a set of reference sequences. Authors are from BCGSC. Paper is here.* SOAP - SOAP (Short Oligonucleotide Alignment Program). A program for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The updated version uses a BWT. Can call SNPs and INDELs. Author is Ruiqiang Li at the Beijing Genomics Institute. C++, POSIX.* SSAHA - SSAHA (Sequence Search and Alignment by Hashing Algorithm) is a tool for rapidly finding near exact matches in DNA or protein databases using a hash table. Developed at the Sanger Centre by Zemin Ning, Anthony Cox and James Mullikin. C++ for Linux/Alpha.* SOCS - Aligns SOLiD data. SOCS is built on an iterative variation of the Rabin-Karp string search algorithm, which uses hashing to reduce the set of possible matches, drastically increasing search speed. Authors are Ondov B, Varadarajan A, Passalacqua KD and Bergman NH.* SWIFT - The SWIFT suit is a software collection for fast index-based sequence comparison. It contains: SWIFT — fast local alignment search, guaranteeing to find epsilon-matches between two sequences. SWIFT BALSAM — a very fast program to find semiglobal non-gapped alignments based on k-mer seeds. Authors are Kim Rasmussen (SWIFT) and Wolfgang Gerlach (SWIFT BALSAM)* SXOligoSearch - SXOligoSearch is a commercial platform offered by the Malaysian based Synamatix. Will align Illumina reads against a range of Refseq RNA or NCBI genome builds for a number of organisms. Web Portal. OS independent.* Vmatch - A versatile software tool for efficiently solving large scale sequence matching tasks. Vmatch subsumes the software tool REPuter, but is much more general, with a very flexible user interface, and improved space and time requirements. Essentially a large string matching toolbox. POSIX.* Zoom - ZOOM (Zillions Of Oligos Mapped) is designed to map millions of short reads, emerged by next-generation sequencing technology, back to the reference genomes, and carry out post-analysis. ZOOM is developed to be highly accurate, flexible, and user-friendly with speed being a critical priority. Commercial. Supports Illumina and SOLiD data. Courtesy of SeqAnswers.com
  • 22. Evolving Sequencing & Analysis Methods to Enable Genomic Research
  • 23.
  • 25. QC (Automation and Standardization)
  • 26.
  • 27. CLC Bio and Genologics
  • 31.
  • 32.
  • 33. tiRNA ATG AAA AAA ATG ATG ATG AAA AAA ATG ATG AAA ATG AAA genomic DNA Mammalian transcriptional complexity Mammalian Transcriptome Complexity TSS TSS TSS pA pA pA ATG ATG TSS pA PASR TASR miRNA AAA spliced intron microRNAs TSS pA polyadenylation signal transcription start site protein coding regions AAA translation start site polyadenylation non-coding regions ATG Courtesy of Life Technologies
  • 34.
  • 35. Yet based on well-established methodologies
  • 36. Substantial Benefits over Hybridization-Based Methods
  • 37. Better quantitative gene expression performance (DGE)
  • 38. In addition, can allow a comprehensive view of transcription (Whole Transcriptome)
  • 40. Identification of imprinted genes contributing to specific brain regions by whole transcriptome sequencing
  • 41. 24 sample cohort for basic human expression and variant analysis in diseased patients.
  • 42. 32 Sample cohort looking at novel splice junctions, gene fusions, and differential expression of colon cancer samples over a time series
  • 43.
  • 44. Sample Sourcing for Transcriptome Projects Blood: Large quantities of sample available, but with limited utility in transcriptome analysis Tissue: Needle biopsy most common, but sample quantity very low Surgical section: Larger quantities available, but limited utility; need laser capture microdissection to provide useful results, sample quantity very low FFPE Slides: Very useful in clinical research but amount of sample and quality low.
  • 45. Unamplified vs Amplified Prostate Cancer Cell Line (Vcap) from CPDR Well characterized Differential Expression upon the addition of androgens. Compared transcriptome from a single pool of RNA Unamplified, ribosomally depleted (Ribominus™) Amplified, no ribosomal depletion required Two Pipelines for analysis
  • 46. Amplification Gives Different Results Gene Expression in Unstimulated Cells Unamp Amplified 14,075 2112 1071
  • 48.
  • 49. RNA-Seq Analysis Between Pipelines is Either Concordant Amplified, Stimulated, Pipe A Amplified, Stimulated, Pipe B
  • 50. Or not… Unamplified, Stimulated, Pipe A Unamplified, Stimulated, Pipe B
  • 51. Even if you remove all SNORA and SNORD Unamplified, Stimulated, Pipe A Unamplified, Stimulated, Pipe B
  • 52. NM refseq NR refseq Histones (circles) SNORD/SNORA rRNA dots PolyA Selection vs Ribosomal Depletion Courtesy of Life Technologies
  • 53.
  • 54.
  • 58.
  • 59.
  • 60. Fine map a region
  • 62. Catalogue variants for downstream filtering and identification of causative mutation(s)
  • 63. Exome and Targeted Resequencing projects overview
  • 64. Identification of the genetic basis of colorectal cancer through exome sequencing
  • 65. 600+ sample cohort to identify the genetic basis of a novel syndrome
  • 66. Exome sequencing of Tumor/Normal Leukemia patients to identify novel mutations present in tumor samples
  • 67.
  • 68. Targeted Capture Technologies NimblegenSeqCap EZ Agilent SureSelect NimblegenSeqCap EZ FebitHybSelect Agilent SureSelect LR-PCR Raindance Technologies Fluidigm 20Kb 1 MB 2 MB 3 MB 4 MB 5 MB 30-50MB Exome Genomic Region Captured
  • 69.
  • 70. Ultimately Comes to Variation Coverage Project Design Cohorts Cancer Algorithms a Solved Problem? Single open source pipelines Single commercial pipelines Proprietary internal algorithms. A mixture?
  • 71. Ultimately Comes to Variation Coverage Project Design Cohorts Cancer Algorithms Solved Problem? Single open source pipelines Single commercial pipelines Proprietary internal algorithms. A mixture?
  • 73. EdgeBio Exon Coverage StatisticsHow well is the exome covered?* * Data from Fragment Runs – Since moving to PE, seeing 15% improvement
  • 74. Venter Genome - Algorithms PLOS genetics 2008 vol 4 issue 8 e10000160 ~21K SNP in exons (29MB Targeted) 36,206 expected SNPs for 50MB Kit
  • 75. 3 Tools and Associated SNP Counts Software A 45,551 Software B 29,814 Software C 40,964
  • 76. Software B v. Software A A 45,511 B 29,814 21,250 24,261 8,564 Union: 54,075 Intersection: 21,250 Not to Scale
  • 77. Software B v. Software C C 40,964 B 29,814 23,456 17,508 6,358 Union: 47,322 Intersection 23,456
  • 78. Software A v. Software C C 40,964 A 45,511 30,773 10,191 14,738 Union: 55,702 Intersection: 30,773
  • 79. A 45,511 B 29,814 13,130 4,750 1,608 19,642 3,814 11,131 6,377 Union: 60,452 Intersection: 19,642 Voting Scheme (2/3): 36,195 C 40,964
  • 80.
  • 81.
  • 85. Better algorithms for variant calling
  • 87. Standardization of algorithms for variant calling
  • 88.
  • 89.
  • 90. Ion Torrent PGM Longer, Accurate Reads in 2.5 Hours Microbial & Viral Resequencing Microbial & Viral De novo Applications Eukaryotic Amplicon Sequencing Metagenomics WGS 16S Surveys
  • 94. Real World Examples – Speed Rapid sequenced the genome of the Escherichia coli strain from European outbreak “…[University of Münster & Life Tech] ]received the samples on Monday, began sequencing that evening, and began analyzing the data on Wednesday…” “…Justin Johnson, director of bioinformatics at EdgeBio, assembled and analyzed the raw reads made publicly available by BGI using CLC Bio's software…Johnson said his analysis took just a couple of hours…
  • 95. Acknowledgements CPDR (Center for Prostate Disease Research) Collaboration Shyh-Han Tan, Ph.D. DNA Farber Cancer Institute Collaboration Andrew Lane M.D.,Ph.D.; David Weinstock M.D.; Oliver Weigert M.D.,Ph.D Scripps Translational Health Samuel Levy Sequencing Team led by Joy Adigun EdgeBio Research IFX led by John Seed, Ph.D. and Quang Nguyen MD, Ph.D.

Editor's Notes

  1. Evolving Sequencing Methods to Enable Genomic Research
  2. Every house is built with a sturdy foundation.
  3. Evolving Sequencing Methods to Enable Genomic Research
  4. Because of this…
  5. We have this…
  6. Which allows this…As a CRO – we especially see how this is happening with those that may not have had access to these applications before due to access or finances.
  7. But with constantly expanding applications come…
  8. How does one stay technically relevant in a dramatically changing landscape?
  9. With sequencing becoming ubiquitous – not as simple as just sequence then science…Many questions to answer and expertise to be gained to make each project successful. We spend upwards of 25% of our time in this phase.
  10. We now have the issues of scale, compressed timelines, and standardization of sample prep and informatics.
  11. To illustrate the informatics challenge of standardization…Each can be run in hundreds of combinations to produce answers. All different.
  12. But when challenges are addressed, there can be immense power in discovery and eventually diagnostics. I will quickly mention 2 current projects that highlight and address some of the challenges, then jump into Transcriptome, Exome, and Ion Torrent sequencing.
  13. I can share a bit more of the finding later on..
  14. Key Points:New approach enabled by NGS , but it’s based on mature methodsIn highlighting the two benefits, say “which is called” DGE/RNA-Seq respectively to initially define the two terms. The next two slides clarify these definitions.Old Slide below:A (somewhat) new approach to RNA profiling using Sequencing rather than HybridizationVariations on the theme have been used since mid-90sEST Sequencing, SAGE, LongSAGE, MPSSHowever, limitations and cost of sequencing technology, as well as lack of a finished, well-annotated genome reference, had prevented broad use vs. microarrays Digital Gene Expression using Next-Gen SequencingA transformative technology Improved sensitivity, dynamic range, and linearity over microarraysRemoves background and biases seen with microarraysCan provide a comprehensive view of splicing and transcription If desired, not required.Now dozens of published papers validating the approachAggressively competing vendors making it better/faster/cheaper
  15. We’re no longer in the early stages of technology adoption, and the biology is becoming more important. More accurate biology requires more refined samples, and that leads to issues in NGS which generally has voracious material requirements. Amplification is generally the solution, but this leads to additional problems in analysis.
  16. Total number of genes expressed as a function of the union of the two methods was 17,258. The picture was the same for Stimulated cells except that androgen stimulation very slightly reduced the complexity of the transcriptome with a total of 17,128 genes expressed in the union of the two methods. Different results doesn’t necessarily mean one more accurate…just different. This analysis was performed with a single RNA-Seq pipeline. We subsequently discovered that different pipelines also give different results.
  17. You can conclude that the very significant changes in biology associated with androgen stimulation are more closely correlated than using the different methods of sample preparation. The data also suggest that different analytic tools may have a greater impact than the biology as well.
  18. It used to be sequencing chewed up the costs for projects, now inverse.
  19. The next slide shows the difference between ribosomal depletion and poly(A)+ selection in the distribution of genes. Integrating the informatics pipelines with sample preparation methods and researcher’s needs is critical. There are amplification methods that don’t have the particular bias of the method used in these studies.
  20. Again choice and goal of project is paramount when choosing and designing a capture
  21. Start to lose your return on investment.
  22. Wrap up with Ion Torret…
  23. Nature Preceeding