O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
A Multifaceted Solution to Bioinformatics Research
By Gagandeep Singh Anand – ganand@uwaterloo.ca
Próximos SlideShares
Carregando em…5

Gagandeep Singh Anand - Poster - A Multifaceted Solution to Bioinformatics Research - full

167 visualizações

Publicada em

  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Gagandeep Singh Anand - Poster - A Multifaceted Solution to Bioinformatics Research - full

  1. 1. A Multifaceted Solution to Bioinformatics Research By Gagandeep Singh Anand – ganand@uwaterloo.ca MOUNT SINAI HOSPITAL – SAMUEL LUNENFELD RESEARCH INSTITUTE Pancreatic adenocarcinoma is the fourth leading cause of cancer death in Canada1. Most often detected at an advanced stage and frequently resistant to chemo/radiotherapy; 5-year survival rates are poor (<5%). Even after surgical resection, within five-years, the survival rate is only 15-20%2. This study focuses primarily on Familial Pancreatic Cancer (FPC) cases, which result from the accumulation of acquired genetic alterations. There have been a few studies associating genetic mutations with other genetic syndromes (i.e. BRCA1/BRCA2, p16), but this constitutes to only a small fraction of hereditary pancreatic cancer. It is difficult to study based on the limited pedigree data, due to the high fatality rate. In this case, it becomes especially important to utilize a variety of bioinformatics tools and the ability to cross-reference a multitude of available datasets. Though we already have a set of interesting regions manually discovered by analyzing CNV data, there is still room for data quality assurance and determining regions harder to detect by manual means. Using the Bioinformatics approach to gene / region localization, following are the overlapping based results: When overlapping the DGV +our control CNVs (25700) with the FPC Gain (631) and Loss (266) CNVs, we find: 565 FPC gain CNVs / 235 FPC deleted CNVs coinciding with a control and 240 FPC gain CNVs / 55 FPC deleted CNVs not overlapping with any controls. When overlapping regions from 12 academic papers containing 1265 regions (27 hypo/methylated, 12 miRNA’s and 1233 somatic PC CNVs) we find: 63 FPC deleted and 240 FPC gain CNVs overlapping with other regions (somatic CNVs, miRNA's or hypo/methylated regions) which were observed in cases of pancreatic adenocarcinoma After obtaining the set of genes coinciding with FPC CNVs and Control CNVs, then determining the set of genes disjoint to both sets we find the following: When evaluating the differences in multi-tooled data manipulation vs. that which utilizes only one tool, the example of determining non-overlaps among CNVs is important. By using MS Access to referentially find all overlaps then using Perl to determine regions of non-overlaps, asymptotically the algorithm is less than quadratic time. However, when using Perl to determine non-overlaps between cases and controls, asymptotically the algorithm is in quadratic time. For very large n (many case and control CNVs) and a single tool, the time to determine non-overlaps would be unfeasible. Using three computational algorithms (dChip, CNAG, and Partek® Genomics Suite-HMM CNV detection algorithm), a set of 124 cases and 1198 controls were analyzed. There were 266 regions of low copy number and 631 regions of elevated copy number. Three different approaches have been used to further determine regions of interest: Create a larger set of controls using the Database of Genomic Variants along with the set of controls from the study to compare with our FPC cases. Compare other types of highly conserved regions found in pancreatic adenocarcinoma cases (i.e. Somatic CNVs, miRNAs, Hypo/Methylated regions) from various academic papers, against our FPC CNVs. Instead of focusing on regions of non-overlaps between FPC CNVs and control CNVs, with the use of bioinformatics resources such as UCSC and Ensembl, determine the genes that coincide with our FPC CNVs and controls then find the set of disjoint genes that do not intersect each set. This gives a list of genes only in FPC CNV regions. 1 Li D, et al. Panreatic cancer. Lancet 2004;363:1049-57. 2Lin M, et al. Bioinformatics 2004;20:1233-40. 3Nannya Y, et al. Cancer Res 2005;65:6071-9. 4http://www.genome.gov//Pages/Hyperion//DIR/VIP/Glossary/Illustration/deletion.cfm 5http://www.genome.gov//Pages/Hyperion//DIR/VIP/Glossary/Illustration/duplication.cfm 6http://www.nature.com/nrc/journal/v3/n4/glossary/nrc1041_glossary.html There is an apparent difference in the ability to organize data in the manual sense vs. the automated. The combination of tools such as MS Access, Perl/Bioperl, Notepad++ give the researcher the ability to dynamically search for a multitude of different metrics. In complement with Bioinformatics resources such as UCSC, Ensembl and DGV, this opens variable research pipelines for the researcher. By multi-tooling, automating and exploring greater means of data association, there is a more effective and efficient study pipeline whence a quicker turnaround and sooner impact on screening and treatment. Our next steps involve formulating a new pipeline, one that automates the ability to annotate genes simultaneously, while using the annotated information to prioritize these genes / regions of the genome. Technologies of relevance are Bioperl (a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications) and support vector machines (SVMs - a technique for separating data points into classes) 6 such as CANDID. Identify and annotate germline genomic alterations that predispose to familial pancreatic cancer (FPC) Continue analysis and data mining of copy number variation (CNV – variability in copy number [>2 or <2] of >1Kb DNA regions) in FPC Cases vs. Controls to find “interesting” regions of inactivation and/or over-expression. Do this with the aid of a more efficient pipeline; utilizing numerous programming languages and bioinformatics tools (i.e. Perl/Bioperl, MS Access, Notepad++, UCSC, Ensembl, Database of Genomic Variants (DGV), Support vector machines (SVM), etc.) BACKGROUND PURPOSE AND HYPOTHESIS MATERIALS AND METHODS RESULTS CONCLUSIONS BIBLIOGRAPHY Deletion on a chromosome4 Duplication on a chromosome5 Only FPC AMP Genes (1864) Only FPC Del Genes (84) Only CTRL AMP Genes (14272) Only CTRL DEL Genes (1257) Genes in FPC & CTRL AMPs (1678) Genes in FPC & CTRL DELs (428)