SlideShare uma empresa Scribd logo
1 de 1
Baixar para ler offline
A Multifaceted Solution to Bioinformatics Research
By Gagandeep Singh Anand – ganand@uwaterloo.ca
MOUNT SINAI HOSPITAL – SAMUEL LUNENFELD RESEARCH INSTITUTE
Pancreatic adenocarcinoma is the fourth leading cause of cancer death in Canada1.
Most often detected at an advanced stage and frequently resistant to
chemo/radiotherapy;
5-year survival rates are poor (<5%). Even after surgical resection, within five-years, the
survival rate is only 15-20%2.
This study focuses primarily on Familial Pancreatic Cancer (FPC) cases, which result from
the accumulation of acquired genetic alterations. There have been a few studies associating
genetic mutations with other genetic syndromes (i.e. BRCA1/BRCA2, p16), but this
constitutes to only a small fraction of hereditary pancreatic cancer.
It is difficult to study based on the limited pedigree data, due to the high fatality rate. In
this case, it becomes especially important to utilize a variety of bioinformatics tools and
the ability to cross-reference a multitude of available datasets.
Though we already have a set of interesting regions manually discovered by analyzing CNV data,
there is still room for data quality assurance and determining regions harder to detect by manual
means.
Using the Bioinformatics approach to gene / region localization, following are the overlapping based
results:
When overlapping the DGV +our control CNVs (25700) with the FPC Gain (631) and Loss (266)
CNVs, we find: 565 FPC gain CNVs / 235 FPC deleted CNVs coinciding with a control and 240 FPC
gain CNVs / 55 FPC deleted CNVs not overlapping with any controls.
When overlapping regions from 12 academic papers containing 1265 regions (27 hypo/methylated,
12 miRNA’s and 1233 somatic PC CNVs) we find: 63 FPC deleted and 240 FPC gain CNVs overlapping
with other regions (somatic CNVs, miRNA's or hypo/methylated regions) which were observed in
cases of pancreatic adenocarcinoma
After obtaining the set of genes coinciding with FPC CNVs and Control CNVs, then determining the
set of genes disjoint to both sets we find the following:
When evaluating the differences in multi-tooled
data manipulation vs. that which utilizes only
one tool, the example of determining
non-overlaps among CNVs is important. By
using MS Access to referentially find all overlaps then
using Perl to determine regions of non-overlaps, asymptotically
the algorithm is less than quadratic time. However, when using Perl to determine non-overlaps
between cases and controls, asymptotically the algorithm is in quadratic time. For very large n (many
case and control CNVs) and a single tool, the time to determine non-overlaps would be unfeasible.
Using three computational algorithms (dChip, CNAG, and Partek® Genomics Suite-HMM
CNV detection algorithm), a set of 124 cases and 1198 controls were analyzed. There were
266 regions of low copy number and 631 regions of elevated copy number.
Three different approaches have been used to further determine regions of interest:
Create a larger set of controls using the Database of Genomic Variants along with the
set of controls from the study to compare with our FPC cases.
Compare other types of highly conserved regions found in pancreatic adenocarcinoma
cases (i.e. Somatic CNVs, miRNAs, Hypo/Methylated regions) from various academic
papers, against our FPC CNVs.
Instead of focusing on regions of non-overlaps between FPC CNVs and control CNVs,
with the use of bioinformatics resources such as UCSC and Ensembl, determine the
genes that coincide with our FPC CNVs and controls then find the set of disjoint genes
that do not intersect each set. This gives a list of genes only in FPC CNV regions.
1 Li D, et al. Panreatic cancer. Lancet 2004;363:1049-57.
2Lin M, et al. Bioinformatics 2004;20:1233-40.
3Nannya Y, et al. Cancer Res 2005;65:6071-9.
4http://www.genome.gov//Pages/Hyperion//DIR/VIP/Glossary/Illustration/deletion.cfm
5http://www.genome.gov//Pages/Hyperion//DIR/VIP/Glossary/Illustration/duplication.cfm
6http://www.nature.com/nrc/journal/v3/n4/glossary/nrc1041_glossary.html
There is an apparent difference in the ability to organize data in the manual sense vs. the automated.
The combination of tools such as MS Access, Perl/Bioperl, Notepad++ give the researcher the ability to
dynamically search for a multitude of different metrics. In complement with Bioinformatics resources
such as UCSC, Ensembl and DGV, this opens variable research pipelines for the researcher.
By multi-tooling, automating and exploring greater means of data association, there is a more
effective and efficient study pipeline whence a quicker turnaround and sooner impact on screening
and treatment.
Our next steps involve formulating a new pipeline, one that automates the ability to annotate genes
simultaneously, while using the annotated information to prioritize these genes / regions of the
genome. Technologies of relevance are Bioperl (a collection of Perl modules that facilitate the
development of Perl scripts for bioinformatics applications) and support vector machines (SVMs - a
technique for separating data points into classes) 6 such as CANDID.
Identify and annotate germline genomic alterations that predispose to familial pancreatic
cancer (FPC)
Continue analysis and data mining of copy number variation (CNV – variability in copy
number [>2 or <2] of >1Kb DNA regions) in FPC Cases vs. Controls to find “interesting”
regions of inactivation and/or over-expression.
Do this with the aid of a more efficient pipeline; utilizing numerous programming
languages and bioinformatics tools (i.e. Perl/Bioperl, MS Access, Notepad++, UCSC,
Ensembl, Database of Genomic Variants (DGV), Support vector machines (SVM), etc.)
BACKGROUND
PURPOSE AND HYPOTHESIS
MATERIALS AND METHODS
RESULTS
CONCLUSIONS
BIBLIOGRAPHY
Deletion on a
chromosome4
Duplication on a
chromosome5
Only FPC
AMP Genes
(1864)
Only FPC Del
Genes (84)
Only CTRL
AMP Genes
(14272)
Only CTRL DEL
Genes (1257)
Genes in
FPC & CTRL
AMPs
(1678)
Genes in
FPC & CTRL
DELs (428)

Mais conteúdo relacionado

Destaque

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 

Destaque (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

Gagandeep Singh Anand - Poster - A Multifaceted Solution to Bioinformatics Research - full

  • 1. A Multifaceted Solution to Bioinformatics Research By Gagandeep Singh Anand – ganand@uwaterloo.ca MOUNT SINAI HOSPITAL – SAMUEL LUNENFELD RESEARCH INSTITUTE Pancreatic adenocarcinoma is the fourth leading cause of cancer death in Canada1. Most often detected at an advanced stage and frequently resistant to chemo/radiotherapy; 5-year survival rates are poor (<5%). Even after surgical resection, within five-years, the survival rate is only 15-20%2. This study focuses primarily on Familial Pancreatic Cancer (FPC) cases, which result from the accumulation of acquired genetic alterations. There have been a few studies associating genetic mutations with other genetic syndromes (i.e. BRCA1/BRCA2, p16), but this constitutes to only a small fraction of hereditary pancreatic cancer. It is difficult to study based on the limited pedigree data, due to the high fatality rate. In this case, it becomes especially important to utilize a variety of bioinformatics tools and the ability to cross-reference a multitude of available datasets. Though we already have a set of interesting regions manually discovered by analyzing CNV data, there is still room for data quality assurance and determining regions harder to detect by manual means. Using the Bioinformatics approach to gene / region localization, following are the overlapping based results: When overlapping the DGV +our control CNVs (25700) with the FPC Gain (631) and Loss (266) CNVs, we find: 565 FPC gain CNVs / 235 FPC deleted CNVs coinciding with a control and 240 FPC gain CNVs / 55 FPC deleted CNVs not overlapping with any controls. When overlapping regions from 12 academic papers containing 1265 regions (27 hypo/methylated, 12 miRNA’s and 1233 somatic PC CNVs) we find: 63 FPC deleted and 240 FPC gain CNVs overlapping with other regions (somatic CNVs, miRNA's or hypo/methylated regions) which were observed in cases of pancreatic adenocarcinoma After obtaining the set of genes coinciding with FPC CNVs and Control CNVs, then determining the set of genes disjoint to both sets we find the following: When evaluating the differences in multi-tooled data manipulation vs. that which utilizes only one tool, the example of determining non-overlaps among CNVs is important. By using MS Access to referentially find all overlaps then using Perl to determine regions of non-overlaps, asymptotically the algorithm is less than quadratic time. However, when using Perl to determine non-overlaps between cases and controls, asymptotically the algorithm is in quadratic time. For very large n (many case and control CNVs) and a single tool, the time to determine non-overlaps would be unfeasible. Using three computational algorithms (dChip, CNAG, and Partek® Genomics Suite-HMM CNV detection algorithm), a set of 124 cases and 1198 controls were analyzed. There were 266 regions of low copy number and 631 regions of elevated copy number. Three different approaches have been used to further determine regions of interest: Create a larger set of controls using the Database of Genomic Variants along with the set of controls from the study to compare with our FPC cases. Compare other types of highly conserved regions found in pancreatic adenocarcinoma cases (i.e. Somatic CNVs, miRNAs, Hypo/Methylated regions) from various academic papers, against our FPC CNVs. Instead of focusing on regions of non-overlaps between FPC CNVs and control CNVs, with the use of bioinformatics resources such as UCSC and Ensembl, determine the genes that coincide with our FPC CNVs and controls then find the set of disjoint genes that do not intersect each set. This gives a list of genes only in FPC CNV regions. 1 Li D, et al. Panreatic cancer. Lancet 2004;363:1049-57. 2Lin M, et al. Bioinformatics 2004;20:1233-40. 3Nannya Y, et al. Cancer Res 2005;65:6071-9. 4http://www.genome.gov//Pages/Hyperion//DIR/VIP/Glossary/Illustration/deletion.cfm 5http://www.genome.gov//Pages/Hyperion//DIR/VIP/Glossary/Illustration/duplication.cfm 6http://www.nature.com/nrc/journal/v3/n4/glossary/nrc1041_glossary.html There is an apparent difference in the ability to organize data in the manual sense vs. the automated. The combination of tools such as MS Access, Perl/Bioperl, Notepad++ give the researcher the ability to dynamically search for a multitude of different metrics. In complement with Bioinformatics resources such as UCSC, Ensembl and DGV, this opens variable research pipelines for the researcher. By multi-tooling, automating and exploring greater means of data association, there is a more effective and efficient study pipeline whence a quicker turnaround and sooner impact on screening and treatment. Our next steps involve formulating a new pipeline, one that automates the ability to annotate genes simultaneously, while using the annotated information to prioritize these genes / regions of the genome. Technologies of relevance are Bioperl (a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications) and support vector machines (SVMs - a technique for separating data points into classes) 6 such as CANDID. Identify and annotate germline genomic alterations that predispose to familial pancreatic cancer (FPC) Continue analysis and data mining of copy number variation (CNV – variability in copy number [>2 or <2] of >1Kb DNA regions) in FPC Cases vs. Controls to find “interesting” regions of inactivation and/or over-expression. Do this with the aid of a more efficient pipeline; utilizing numerous programming languages and bioinformatics tools (i.e. Perl/Bioperl, MS Access, Notepad++, UCSC, Ensembl, Database of Genomic Variants (DGV), Support vector machines (SVM), etc.) BACKGROUND PURPOSE AND HYPOTHESIS MATERIALS AND METHODS RESULTS CONCLUSIONS BIBLIOGRAPHY Deletion on a chromosome4 Duplication on a chromosome5 Only FPC AMP Genes (1864) Only FPC Del Genes (84) Only CTRL AMP Genes (14272) Only CTRL DEL Genes (1257) Genes in FPC & CTRL AMPs (1678) Genes in FPC & CTRL DELs (428)