The document provides information about Golden Helix's VarSeq 2.4.0 software and its VSClinical ACMG workflow module. It summarizes a presentation given on the topic, including details about Golden Helix as a company, the content covered in the presentation, and examples that will be demonstrated including analysis of variants, copy number variants, and structural variants from PacBio long read sequencing data using VSClinical.
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Streamlined NGS Workflow for Mendelian Disorder Analysis
1. VarSeq 2.4.0: VSClinical ACMG Workflow
from the User Perspective
June 7, 2023
Presented by Rana Smalling, PhD, Field Application Scientist and Solomon
Reinman, Technical Field Application Scientist
3. VarSeq 2.4.0: VSClinical ACMG Workflow
from the User Perspective
June 7, 2023
Presented by Rana Smalling, PhD, Field Application Scientist and Solomon
Reinman, Technical Field Application Scientist
4. NIH Grant Funding Acknowledgments
4
• Research reported in this publication was supported by the National Institute Of General Medical Sciences of
the National Institutes of Health under:
o Award Number R43GM128485-01
o Award Number R43GM128485-02
o Award Number 2R44 GM125432-01
o Award Number 2R44 GM125432-02
o Montana SMIR/STTR Matching Funds Program Grant Agreement Number 19-51-RCSBIR-005
• PI is Dr. Andreas Scherer, CEO of Golden Helix.
• The content is solely the responsibility of the authors and does not necessarily represent the official views of the
National Institutes of Health.
5. Who Are We?
5
Golden Helix is a global bioinformatics company founded in 1998
Filtering and Annotation
ACMG & AMP Guidelines
Clinical Reports
CNV Analysis
CNV Analysis
GWAS | Genomic Prediction
Large-N Population Studies
RNA-Seq
Large-N CNV-Analysis
Variant Warehouse
Centralized Annotations
Hosted Reports
Sharing and Integration
Pipeline: Run Workflows
8. The Golden Helix Difference
8
FLEXIBLE DEPLOYMENT
On premise or in a private
cloud
BUSINESS MODEL
Annual fee for software,
training and support
CLIENT CENTRIC
Unlimited support from the
very beginning
SINGLE SOLUTION
Comprehensive cancer and
germline diagnostics
SCALABILITY
Gene panels to whole
exomes or genomes
THROUGHPUT
Automated pipeline
capabilities
QUALITY
Clinical reports correct the
first time
9. Today’s Presenters
9
Rana Smalling, PhD
Field Application Scientist
VarSeq 2.4.0: Structural Variants and Advanced Automation in VSClinical ACMG
Solomon Reinman
Technical Field Application
Scientist
10. Content Overview
10
• Streamlined NGS workflow
• Efficient and precise NGS analysis of germline
disorders with Golden Helix software
• Evaluation of multiple variant types
• VSClinical ACMG supports the spectrum of
variant types for Mendelian disorder analysis
• Demonstration
1) Product demo with PacBio long read example
2) Review fusion analysis in new ACMG workflow
11. 11
Confidential |
NGS Clinical Workflow
Golden Helix provides comprehensive data analytics software that scales across gene panels, whole exomes, and whole genomes
DNA Extraction in Wet
Lab and Sequence
Generation
Interpretation and
Result Reporting
Primary
Read Processing and
Quality Filtering
Alignment and Variant
Calling
Secondary
*Golden Helix provides
Secondary Analysis through
a reseller agreement
Tertiary
Golden Helix’s software and
primary focus
Comprehensive
secondary and tertiary
analysis solutions for
primary data
aggregated by all
commercially available
sequencers
Type Size
Gene Panel Small (100MB)
Whole Exome Medium (1GB)
Whole Genome Large (100GB)
Cancer use case
Hereditary use case
Process Analysis
… and scales across multiple
data set sizes for cancer and
hereditary use cases
Filtering and Annotation
Data Warehousing
Workflow Automation
Golden Helix works with all major
sequencers…
12. Role of Long Read Sequencing in NGS analysis
12
• Nature Methods journal declared long-read sequencing to be 2023 “Method of the Year”
• Fulfilled wish lists for genomics labs doing large scale projects
• Long-read sequencing enabled the Telomere-to-Telomere Consortium (T2T) and
Vertebrate Genomes Project (VGP)
• Better precision and accuracy with lower coverage
• More accurate coverage in difficult to capture regions
o Regulatory sequences, pseudogenes, centromeres, Alu elements, short tandem repeats,
LINE1 elements and long repeats
• More confident SV and CNV calling
• Fewer gaps in genome assemblies mean more accurate assessment of deletions
and duplications with multi-species implications
• Long-read is better at detecting complex genome rearrangements and structural
variants often seen in cancer.
• The affordability and quality of long-read sequencing continues to improve
• 95-99% accuracy and rising in 2023 compared to 30-40% in 2010
PacBio read length histogram
https://www.pacb.com/technology/hifi-sequencing/how-it-works/
13. Sentieon's DNAscope is optimized for long-read data
13
Literature from PacBio and Sentieon support DNAscope as the optimal variant caller
• Pre-built, robust pipelines from Sentieon to process SNPs, indels,
and SVs for PacBio and Oxford Nanopore data and a growing list
of other sequencers
• DNAscope LongRead from Sentieon is accurate, fast and efficient
• 4 hours on a 16-core machine for 30x HiFi samples
• >99.83% precision and recall on most recent GIAB benchmark
dataset
• Compared to other machine learning approaches to long-read
variant calling, Sentieon is user-friendly and fast
15. More than one type of genetic mutation can drive disorders
• Mutations that activate genes:
o Missense
o In-frame insertions/deletions
o Fusions
o Copy number amplifications
• Functions that inhibit or disable genes:
o Gene deletions
o Loss of function nonsense, frameshift indels
o Disabling fusions, structural variants
o Genomic Signatures that describe overall state of
the mutated genome
Comprehensive Genomic Profiling For Germline Disorders
15
Copy Number
Rearrangements
Base Substitutions
Deletions
Insertions
Genomic Signatures
16. Increased Adoption of Structural Variants in NGS
16
Use in both Oncology and Germline testing
• SNPs/indels, CNVs and SVs are implicated in genetic disorders
• Structural variants relevance to cancer is well established; extended to ACMG
germline scoring
• Rationale for including SVs in NGS tests for Mendelian disorders
• Increased affordability and accuracy of long-read technology (PacBio,
Nanopore, etc.)
• Kits that simplify and integrate RNA detection with DNA
• Adoption of whole genome sequencing which enables structural variant
calling
• Increased diagnostic yield from comprehensive tests covering multiple
variant types.
17. Multi-Variant Type Analysis Workflow
17
1. Import Wizard
• Variant types identified on import based on VCF info
• Variants, CNVs, Break-end Pairs
• One VCF/multiple VCFs merged by name in header
• Import long read sequencing VCFs and alignment files from any secondary caller into VarSeq
(PacBio, Oxford Nanopore, Element Biosciences, Illumina, other)
2. Variant Type Specific Tables and Filters
• Annotate and filter variant types individually
• Gene impact analysis
• Type-specific annotations and algorithms
3. VSClinical Analysis
• Import from VarSeq tables or use Evaluation Script
• Only analyze filtered or “marked” variants, CNVs, fusions
• Import sample QC details, phenotype, clinical features
4. Integrated Reporting
• Report sections: Primary, Secondary, VUS
• All variant types can be reported in any section
18. Interpreting Break-ends or fusions in VarSeq
18
Break-end Location
• Rearrangements within and in between
chromosomes
• Coding genes, non-coding genes,
introns
Orientation and SV Type
• Translocations
• Deletions
• Duplications
• Inversions
Effect on Gene
• In-frame Fusion
• Frameshift fusion
• Transcript ablation,
frameshift, start
loss
• Transcript fusion
• Non-functional
rearrangement
• Intronic, intergenic,
upstream,
downstream, start
gain
19. VarSeq Suite: Automate Inputs and Outputs
19
• VarSeq is built for flexibility and automation
• Modular set of capabilities
• Support all records in VCF 4.3
• Built-in support of custom pipelines
• VCFs with all types of variants
• Custom callers output such as ArcherDx
• Sample or patient info
• VSPipeline automation of VSClinical input and
outputs
• Reduction or elimination of custom work
20. Examples for Product Demonstration
20
• PACBIO long read example (Microcephaly indicated by gene panel screening)
• Small variants
• STAG1, OCA2, AP3B2,MKS1
• Copy number variants
• APC6+1247 genes
• HNF1B+413 genes
• Complex structural variant
• RAB3GAP1::CDNF
• VSClinical ACMG guidelines evaluation
• Evaluation and report on 3 variant types
22. NIH Grant Funding Acknowledgments
22
• Research reported in this publication was supported by the National Institute Of General Medical Sciences of
the National Institutes of Health under:
o Award Number R43GM128485-01
o Award Number R43GM128485-02
o Award Number 2R44 GM125432-01
o Award Number 2R44 GM125432-02
o Montana SMIR/STTR Matching Funds Program Grant Agreement Number 19-51-RCSBIR-005
• PI is Dr. Andreas Scherer, CEO of Golden Helix.
• The content is solely the responsibility of the authors and does not necessarily represent the official views of the
National Institutes of Health.
24. 25 Licenses for 25 Months
24
Celebrating 25 Years in Business
• Limited quantity
• Licenses are 25-month license periods
• Available to new customers only
• Orders must be received by June 15, 2023
• Visit goldenhelix.com/forms/25-for-25 or
scan the QR code below
25. Conferences
25
European Human Genetics Conference, Booth #566
• June 10 – 13, 2023
• Glasgow, UK
• Monday, June 12, 12:00 - Corporate Satellite Talk (ALSH 1,
Level 0) Achieving Economic Success as an NGS Lab:
Strategy and Implementation
AMP Europe, Milan, Italy, Booth #14
• June 18 – 20, 2023
• Milan, Italy
• Monday, June 19, 1:00 – Industry Symposium Achieving
Economic Success as an NGS Lab: Strategy and
Implementation
Before we start diving into the subject, I wanted mention our appreciation for our grant funding from NIH.
The research reported in this publication was supported by the National institute of general medical sciences of the national institutes of health under the listed awards.
We are also grateful to have received local grant funding from the state of Montana. Our PI is Dr. Andreas Scherer who is also the CEO at Golden Helix and the content described today is the responsibility of the authors and does not officially represent the views of the NIH.
So with that covered, lets take just a few minutes to talk a little bit about our company Golden Helix.
Golden Helix is a global bioinformatics software and analytics company that enables research and clinical practices to analyze large genomic datasets. We were originally founded in 1998 based off pharmacogenomics work performed at GlaxoSmithKline, who is still a primary investor in our company.
VarSeq, our flagship product, serves as a clinical tertiary analysis tool. At its core, it serves as a variant annotation and filtration engine. Additionally, however, users have access to automated AMP or ACMG variant guidelines. VarSeq also have the capability to detect copy number variations scaling from single exome to large aneuploidy events. Lastly, the finalization of variant interpretation and classification is further optimized with the VarSeq clinical reporting capability. Users can integrate all of these features into a standardized workflow.
Paired with VarSeq are VSWarehouse and VSPipeline. VSWarehouse serves as a repository for the large amount of useful genomic data wrangled by our customers. Warehouse not only solves the issue of data storage for ever-increasing genomic content, but also is fully queryable and auditable and allows for the definability of user access for project managers or collaborators. In tandem with this, VSPipeline, which will be a large part of today's discussion, allows for the automated execution of routine workflows, further optimizing users' abilities to handle large amounts of data and throughput.
Lastly, our research platform, SVS, enables researchers to perform complex analysis and visualizations on genomic and phenotypic data. SVS has a range of tools to perform GWAW, genomic prediction, and RNA-Seq analysis, among other common research applications.
Our software has been very well received by the industry. We have been cited in thousands of peer-reviewed publications, and that’s a testament to our customer base.
We work with over 400 organizations all over the globe. This includes top-tier institutions, like Stanford and yale, government organizations like the NCI and NIH, clinics such as Sick Kids, and many other genetic testing labs. We now have well over 20,000 installs of our products and with 1,000’s of unique users.
So how is this relevant to you?
At Golden Helix, we focus on the seven pillars of customer success. Golden Helix offers a single software solution that encompasses germline, somatic, and CNV analysis. Our software is also highly scalable, supporting gene panel to whole genome sequencing workflows. With our complete automation capabilities, we now offer a FASTQ or VCF to report pipeline. Our software can be locally deployed, or installed in cloud, and our business model of annual subscription per user means you are able to increase your workload without increasing analysis fees. And it goes without saying, that our FAS team is here to support you on your analysis journey.
Today, Dr. Rana Smalling, a member of our Field Application Science team, and myself, Solomon Reinman, our technical field application scientist, have the pleasure of presenting. Not only are we delighted to be presenting the user perspective in VarSeq 2.4.0, but we look forward to showing off these capabilities to our current and future customers.
There are many exciting new features that we'll be exploring and familiarizing our customers through various means as we roll out VarSeq 2.4.0, but today we want to focus on a few key points. We'll start by discussing the streamlined NGS workflows enabled by VarSeq 2.4.0. In particular, we're keen on showing off how long-read data from sequencers like those provided by PacBio have revamped analysis from small variants to large structural variants. We'll then delve into the inner workings of evaluating multiple variant types in VarSeq, and wrap things up with a detailed product demo tackling demonstrating these improvements to germline workflows.
Let's start with a bird's-eye view of an NGS clinical workflow, and explore how VarSeq handles the analysis of NGS data. VarSeq mainly encompasses the tertiary analysis steps of filtering and annotation, interpretation and result reporting, and workflow automation. However, its modular and flexible design makes it compatible with a variety of outputs from secondary analysis pipelines. Golden Helix software functions with all major sequencers, and our partnership with Sentieon allows users to establish industry-leading secondary analysis pipelines. Sentieon's benchmarks show unparalleled performance, and they currently offer models for calling small variants and structural variants with long read technology. We package and ship these models along with user-friendly scripts to get our users up to speed quickly analyzing long-read and short-read data alike. Of course, our customers utilize a large variety of secondary analysis pipelines, and VarSeq is fine-tuned to be compatible with virtually anything under the sun. In any case, today, we're excited to highlight how the new features we've implemented with VarSeq 2.4.0 work in tandem with the capabilities of long-read technology.
<visual aid breaking up sentieon and varseq – Casey??>
What are we getting out of long read in terms of variant detection?
Long-read sequencing is having its moment in the context of NGS analysis methodology, being hailed as the scientific method of the year for 2023 by Nature Publishing Group. This technology has enabled large scale projects such as the T2T consortium
What makes long read sequencing so great is it comes with better….and allows more confidence in calling structural variants and CNVs – a large part of this is the fact that the reads are just longer (point to the diagram) allowing you to sequence through those break points and break ends to capture a more accurate picture of structural genomic variations.
Affordability, quality improved
What are some of the benefits of long-read technology? Hailed as the "Scientific Method of the Year for 2023" by Nature Publishing Group, long-read sequencing has already enabled large scale projects such as the Telomere to Telomere, or T2T, consortium. In general, long read technology enables better precision and accuracy with lower coverage, and is a boon to accuracy in difficult-to-capture regions. Long read technology also afford more confident SV and CNV calling, which goes hand-in-hand with our recent updates to VSClinical ACMG in VarSeq 2.4.0. Lastly, with 95-99% accuracy in 2023, long read sequencing is becoming steadily more affordable and cost-effective.
Should long read data be the right choice for your lab, our partnership with Sentieon provides industry-leading precision, accuracy, and speed via DNAscope LongRead, Sentieon's newly updated algorithm for calling SNPs, indels, and SVs. Our pre-built pipelines constructed with Sentieon's machine learning models provide fast and user-friendly framework to get you up and running quickly and effectively. In particular, Sentieon has worked hard with PacBio's engineers to develop and benchmark excellent variant calling capabilities.
Whether or not you choose to take advantage of Sentieon, let's jump into how VarSeq facilitates the analysis and reporting of NGS data. Bear in mind that, as excited as we are about long-read data, the rest of this webcast applies to any NGS data.
Thanks Solomon for the overview on long read sequencing and how we partner with Sentieon to bring users a comprehensive solution for secondary and tertiary analysis of NGS germline data. I also want to note that our tertiary analysis with VarSeq is compatible with inputs from any standard long read secondary pipeline such as those offered by Pacbio and Oxford Nanopore.
What we want to focus in on today is how this year’s updates to VarSeq facilitate comprehensive analysis of all your variant types for germline analysis. We have made updates to all three stages of our VarSeq variant workflow. Earlier this year with Varseq 230, we upgraded our import methodology to bring in SNPs/indels, CNVs and breakends, with the associated filtering and annotation capabilities which covers Stage 1 of the analyis; Then for stages 2 and 3 we’ve made some changes to VSClinical evaluation and reporting that our users have already began to appreciate. Similarly to what we did for our somatic variant analysis earlier this year, VSClinical ACMG in 2.4.0 now allows our users to import fusions and create clinical report for these events in the germline context and has been enhanced with evaluation scripts. And to wrap it all up into one package VSClinical ACMG users now have the ability to fully automate a germline project workflow using VSPipeline, enabling the user to go from the initial VCFs and BAMS all the way to the final clinical report with just a simple command line script.
So what these updates mean for the user is that VarSeq now captures the variant types that will be implicated in all kinds of diseases, both in the cancer and germline disorder contexts. The variants that drive diseases are mutations that activate genes like your missense variants, insertions, CN amplifications and of course you have those disabling mutations and large deletions that inactivate genes. That said, a comprehensive capture of the relevant types of variants for NGS germline analysis would not be complete without fusions. These events often create new proteins which activate novel functions, or you can have disabling fusions either of which can have potentially pathogenic effects. For some time we have fully supported analysis SNPs/indels and copy number variations, but we are now happy to announce our increased support for fusions. These structural variants have always been relevant, but are becoming more and more feasible to analyze in the context of germline disorders as the technology for detecting these structural events continues to improve.
Historically fusion events have been a challenge to detect and analyze at a large scale in a clinical context. The most common ways to detect fusions have been targeted assays such as PCRs but the limitation there, is that you can only identify known events. Using an NGS approach allows people to identify novel events, and we have definitely seen a greater push to adopt NGS for detection of structural variants.
Driving this adoption are the fact that methods like long read sequencing technology, which greatly enable SV detection, has become much more accurate and affordable. as Solomon reviewed,
Furthermore, RNA analysis is used to detect specific types of fusions like exon skipping and RNA splice events, and now, there are kits available to perform RNA fusion analysis alongside DNA sequencing in the same NGS run.
Another aspect to this is In the past, due to the fact that the relevance of fusions is well established in the cancer context, only our users doing somatic variant analysis had an interface for clinically evaluating and reporting on gene fusions, but now, we support this capability for our germline users with the new VSClinical ACMG in VarSeq 2.4.0.
Let’s review how VarSeq facilitates that comprehensive NGS analysis with our multivariant type workflow.
Our users can import both short and long read sequencing VCFs and BAM or CRAM files from any secondary caller (PacBio, Oxford Nanopore are the most popular for long read, with established file formats, but we’re remaining nimble to support long read from other secondary callers as well, so we look forward to working with the data types our users have on hand).
From the beginning when you import your variants, our import wizard will automatically identify the different variant types and place them into separate tables for variants, CNVS and fusions/breakends. Each variant type will also have their specific annotations and algorithms, and gene impact analysis.
Then updates in VSClinical allow users to import of all filtered variants from all VarSeq tables simultaneously, thereby reducing the click rate for getting to your final report. This is achievable with evaluation scripts, just as we have done for VSClinical AMP. Other built in evaluation scripts include those to import sample phenotypic info and quickly add or remove groups of variants, but really the sky is the limit when it comes to automation, as we allow users to create unlimited custom scripts including scripts for API integration.
With these updates, we have also modified our shipped report templates in VSClinical to report ALL the variant types including fusions,
So let’s discuss more about fusions, Specifically, how might one interpret the events and determine if a fusion makes its way into a report? Of course, not every breakend event will be impactful. Some will cause for example, non functional rearrangements, or others may fall into regions of the genome where they have no impact whatsoever. In VarSeq you can decipher the location and orientation of the breakends, and from that combination the potential effect of the fusion event.
The types of fusions that may be the most impactful will be your in-frame fusions, where you end up with a new fusion gene with a completely new function or new protein product, while in other cases, you may have an event that destroys the gene product, like a transcript ablation and then you may have a loss of function.
These are some of the fusion types that we want to bring into VSClinical for evaluation and clinical reporting, along with your Small variants and CNVs.
On the right is an example in-frame fusion that we will explore in VSClinical, But first let me hand back to Solomon who will talk a bit more about automation and kick off our product demonstration,
Thanks for the extensive information on the current state of multivariant workflows, Rana! That might seem a little overwhelming, and while VarSeq's high degree of flexibility begets a comprehensive start-up, we've also worked hard to provide the tools for our customers to automate their workflows. No matter how complex and comprehensive a workflow you build in VarSeq, we can use automation to provide flexible imports, project creation, and exports. Our built-in support of custom pipelines and the ability to import such a comprehensive set of variants allows virtually any workflow to be automated and executed in a consistent manner, overall vastly minimizing click rate and improving throughput. Many steps that would have previously been manual within VSClinical ACMG can now be defined and implemented as evaluation scripts and carried out in a routine, reproducible fashion.
With all of that in mind, let's look at the inner workings of a VarSeq project showcasing some of our new capabilities.
In summary, today we touched on the highlights of what’s new in VS 2.4.0 and gave you a glimpse of these updates from the user perspective. Overall we wanted to assure our users that as germline structural variant analysis and long read tech are becoming more mainstream, we’ve made sure that VarSeq can handle these data types, so we look forward to working with you and helping you analyze your data! So, thank you for tuning in, and now I will hand it back over to Casey to wrap up.
Question – Do you have secondary support for structural variant calling with short read sequencing? Solomon will answer this one (pair star fusion with Sentieon)
Question – Can you import phenotype data in the form of phenopackets from PacBIoHas GHI seen many labs utilizing long read data? Yea, we have seen customers using long read at a lot higher frequency than expected both for germline and somatic.
Does your business model compensate for rerun of samples when setting up validation of workflows You can run as many samples as you need to to validate your pipeline
Before wrapping up, we'd like to again state our appreciation for the grants included here. And with that, I'll hand things back to Casey to talk about some exciting marketing updates and take us through a Q&A session.
Again, I want to mention how grateful we are we are thankful of grants such as this which support the advancement and development of our software to create the high quality software you'll see today.
So with that covered, lets take a few minutes to talk a little bit about our company Golden Helix.