Presentation used for my oral Master's Thesis defense for the Universtat Autònoma de Barcelona. It shows the development of a Perl script for the automated generation of a report of the somatic mutations found in a Normal/Tumor cancer experiment.
300003-World Science Day For Peace And Development.pptx
Normal/Tumor somatic mutations report tool
1. Development of a bioinformatics tool for the automated
generation of a report of the somatic mutations found
in a Normal/Tumor cancer experiment
Isaac Noguera Guixà
Universitat Autònoma de Barcelona
15th of July, 2014
Project tutor:
Dr. Raúl Tonda
Data analysis team. Centre Nacional d‘Anàlisi Genòmica (CNAG), PCB
Academic tutor: Dr. Miguel Perez-Enciso. Centre for Research in Agricultural Genomics (CRAG), UAB
Course 2013 - 2014
Master’s Thesis
2. Table of contents
Introduction
◦ Cancer genetics
◦ Cancer in Bioinformatics
Objectives
Material and methods
Results
Conclusions
2
3. Introduction
Loss of normal growth control
Cell damage (no repair)
Normal cell
Cell suicide
(apoptosis)
Uncontrolled growth
1st mutation
2nd mutation 3rd mutation
3Yulug, I. (2006). Molecular basis of cancer [PowerPoint slides]. Retrieved from
http://www.hugointernational.org/resources/Isik_Yulug_Molecular_Basis_of_cancer_bilingual.ppt
4. Introduction
Cancer in Bioinformatics
Normal
sample
Tumor
sample
Read mapping
and
variant calling
Normal/Tumor experiment
4
Lopez-Bigas, N. (2011). Identification of cancer drivers across tumor types [PowerPoint slides].
Retrieved from http://es.slideshare.net/nurialopezbigas/identification-of-cancer-drivers-across-tumor-types#
A variant is determined by the joint status in tumor-normal
sequence pairs
5. 5
Variant call format (vcf)
Introduction
Cancer in Bioinformatics
Normal/Tumor experiment
(Danecek, P. et al., 2011)
6. Objectives
Main objective
Develop an automated tool to produce a report of the somatic variants found in a Normal/Tumor experiment
→ Process the output of the CNAG’s variant calling pipeline
→ Filter the somatic variants from it and extract relevant statistics from them
→ Identify those variants that are already known and annotated in cancer somatic mutations databases
→ Transform the obtained data into some tables and graphics to include in the report
→ Fill a report template independently from the code of the main script with the processed data
→ Generate the report document in printable format such as a portable document format (pdf)
→ Execute all these steps sequentially and automatically
Additional objective
Incorporate the developed tool as an additional step in the variant calling pipeline from the CNAG’s Data Analysis team
6
7. Material and methods
Basis of the developed tool:
Main script Template document
Perl script
Template module
Input data processing
Output data generation
Template Toolkit script
LaTeX code with R and Template Toolkit
code embedded
7
8. Material and methods
Template Toolkit
document
Noweb
document
CNAG’s
vcf
Data processing
COSMICdb annotation
Somatic variants filtering
Output data storing/generation
Template
processing
Template
processing
R Sweave
LaTeX
document
pdflatex
Pdf
document
Input
data
Designed pipeline:
##INFO=<ID=FP,Number=1,Type=Float,Description="Fisher test P-value for somatic comparison.">
#CHROM POS ID REF ALT QUAL FILTER FORMAT INFO NORMAL TUMOR
Chr1 883814 . A G 18.1 mrd10
DP=36;UPSTREAM(MODIFIER||||NOC2L|processed_transcript|CODING|ENST00000496938|);FP=0.00604 GT:PL:DP 0/0:0,96,255:32
0/1:51,0,26:3
Chr20 126154 dbSNPBuildID=137;GMAF=0.1648 T A 64.7 mrp0.05
INDEL;EFF=FRAME_SHIFT(HIGH||||DEFB126|protein_coding|CODING|ENST00000382398|exon_20_126056_126392;FP=1 GT:PL:DP
1/1:255,255,0:274 0/1:253,0,45:26
8
9. Results
Script's usage description...
usage: main.pl -f file [-template file] [-p value] [-s value] [-project "string"] [-cnv "string "]
[-methods] [-cosmic file] [-h]
- h
this (help) message
- f file
variant call format file (.vcf) to be analyzed
- template file
template Toolkit file (.tt) to be used as a template. If not defined, it will use the default (“reporttemplate.tt”)
- p value
add extra p-values to the default p-values (1,0.05 and 0.001) that will be used for the somatic variants filtering
- s value
somatic variants will be only filtered for the specified p-values defined by this option
- cosmic file
COSMIC database file for SNPSift annotation (default “CosmicCodingMuts_v68")
- cnv "string"
specify the path where the script will look for the Control-FREEC output. If it is found, it will be added to the report
- project "string"
add the name of the project to the report title page
- methods
print the methods appendix in the report (if not defined it will be not printed)
9
11. Conclusions
1) We developed a functional automated tool which automatically generates a report
document for the somatic variants found in a Normal/Tumor experiment.
2) The content of the report is acceptable but it can be improved.
3) The tool has been successfully tested. It also has already been implemented within
CNAG’s variant calling pipeline to be run as its last step.
4) The template document is independent from the main script. It, in addition to the set of
configurable parameters from the main script, makes the tool really customizable.
5) Not limited by the use of computational resources. The execution time and memory usage
required by the tool seems not to be a limiting factor for its usage.
11
Tool's last aim
Make easier the transfer of information from the basic research to the clinical diagnostic.