Gagandeep Singh Anand - Poster - A Multifaceted Solution to Bioinformatics Research - full

A Multifaceted Solution to Bioinformatics Research
By Gagandeep Singh Anand – ganand@uwaterloo.ca
MOUNT SINAI HOSPITAL – SAMUEL LUNENFELD RESEARCH INSTITUTE
Pancreatic adenocarcinoma is the fourth leading cause of cancer death in Canada1.
Most often detected at an advanced stage and frequently resistant to
chemo/radiotherapy;
5-year survival rates are poor (<5%). Even after surgical resection, within five-years, the
survival rate is only 15-20%2.
This study focuses primarily on Familial Pancreatic Cancer (FPC) cases, which result from
the accumulation of acquired genetic alterations. There have been a few studies associating
genetic mutations with other genetic syndromes (i.e. BRCA1/BRCA2, p16), but this
constitutes to only a small fraction of hereditary pancreatic cancer.
It is difficult to study based on the limited pedigree data, due to the high fatality rate. In
this case, it becomes especially important to utilize a variety of bioinformatics tools and
the ability to cross-reference a multitude of available datasets.
Though we already have a set of interesting regions manually discovered by analyzing CNV data,
there is still room for data quality assurance and determining regions harder to detect by manual
means.
Using the Bioinformatics approach to gene / region localization, following are the overlapping based
results:
When overlapping the DGV +our control CNVs (25700) with the FPC Gain (631) and Loss (266)
CNVs, we find: 565 FPC gain CNVs / 235 FPC deleted CNVs coinciding with a control and 240 FPC
gain CNVs / 55 FPC deleted CNVs not overlapping with any controls.
When overlapping regions from 12 academic papers containing 1265 regions (27 hypo/methylated,
12 miRNA’s and 1233 somatic PC CNVs) we find: 63 FPC deleted and 240 FPC gain CNVs overlapping
with other regions (somatic CNVs, miRNA's or hypo/methylated regions) which were observed in
cases of pancreatic adenocarcinoma
After obtaining the set of genes coinciding with FPC CNVs and Control CNVs, then determining the
set of genes disjoint to both sets we find the following:
When evaluating the differences in multi-tooled
data manipulation vs. that which utilizes only
one tool, the example of determining
non-overlaps among CNVs is important. By
using MS Access to referentially find all overlaps then
using Perl to determine regions of non-overlaps, asymptotically
the algorithm is less than quadratic time. However, when using Perl to determine non-overlaps
between cases and controls, asymptotically the algorithm is in quadratic time. For very large n (many
case and control CNVs) and a single tool, the time to determine non-overlaps would be unfeasible.
Using three computational algorithms (dChip, CNAG, and Partek® Genomics Suite-HMM
CNV detection algorithm), a set of 124 cases and 1198 controls were analyzed. There were
266 regions of low copy number and 631 regions of elevated copy number.
Three different approaches have been used to further determine regions of interest:
Create a larger set of controls using the Database of Genomic Variants along with the
set of controls from the study to compare with our FPC cases.
Compare other types of highly conserved regions found in pancreatic adenocarcinoma
cases (i.e. Somatic CNVs, miRNAs, Hypo/Methylated regions) from various academic
papers, against our FPC CNVs.
Instead of focusing on regions of non-overlaps between FPC CNVs and control CNVs,
with the use of bioinformatics resources such as UCSC and Ensembl, determine the
genes that coincide with our FPC CNVs and controls then find the set of disjoint genes
that do not intersect each set. This gives a list of genes only in FPC CNV regions.
1 Li D, et al. Panreatic cancer. Lancet 2004;363:1049-57.
2Lin M, et al. Bioinformatics 2004;20:1233-40.
3Nannya Y, et al. Cancer Res 2005;65:6071-9.
4http://www.genome.gov//Pages/Hyperion//DIR/VIP/Glossary/Illustration/deletion.cfm
5http://www.genome.gov//Pages/Hyperion//DIR/VIP/Glossary/Illustration/duplication.cfm
6http://www.nature.com/nrc/journal/v3/n4/glossary/nrc1041_glossary.html
There is an apparent difference in the ability to organize data in the manual sense vs. the automated.
The combination of tools such as MS Access, Perl/Bioperl, Notepad++ give the researcher the ability to
dynamically search for a multitude of different metrics. In complement with Bioinformatics resources
such as UCSC, Ensembl and DGV, this opens variable research pipelines for the researcher.
By multi-tooling, automating and exploring greater means of data association, there is a more
effective and efficient study pipeline whence a quicker turnaround and sooner impact on screening
and treatment.
Our next steps involve formulating a new pipeline, one that automates the ability to annotate genes
simultaneously, while using the annotated information to prioritize these genes / regions of the
genome. Technologies of relevance are Bioperl (a collection of Perl modules that facilitate the
development of Perl scripts for bioinformatics applications) and support vector machines (SVMs - a
technique for separating data points into classes) 6 such as CANDID.
Identify and annotate germline genomic alterations that predispose to familial pancreatic
cancer (FPC)
Continue analysis and data mining of copy number variation (CNV – variability in copy
number [>2 or <2] of >1Kb DNA regions) in FPC Cases vs. Controls to find “interesting”
regions of inactivation and/or over-expression.
Do this with the aid of a more efficient pipeline; utilizing numerous programming
languages and bioinformatics tools (i.e. Perl/Bioperl, MS Access, Notepad++, UCSC,
Ensembl, Database of Genomic Variants (DGV), Support vector machines (SVM), etc.)
BACKGROUND
PURPOSE AND HYPOTHESIS
MATERIALS AND METHODS
RESULTS
CONCLUSIONS
BIBLIOGRAPHY
Deletion on a
chromosome4
Duplication on a
chromosome5
Only FPC
AMP Genes
(1864)
Only FPC Del
Genes (84)
Only CTRL
AMP Genes
(14272)
Only CTRL DEL
Genes (1257)
Genes in
FPC & CTRL
AMPs
(1678)
Genes in
FPC & CTRL
DELs (428)

Gagandeep Singh Anand - Poster - A Multifaceted Solution to Bioinformatics Research - full

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (20)

Gagandeep Singh Anand - Poster - A Multifaceted Solution to Bioinformatics Research - full