SlideShare a Scribd company logo
1 of 37
Download to read offline
Detecting Somatic Mutations in
Impure Cancer Sample
- Ensemble Approach
김광중, 홍창범
KT GenomeCloud
2015	
  한국유전체학회	
  동계심포지엄	
  
NGS를	
  Data를	
  이용한	
  생물정보분석	
  Workshop	
  
2015.2.4~2.5	
  
Overview
Overview
Challenge
Literature Search
Mutation callers
Comparison of mutation callers
Simple consensus approach
Integrated/Ensemble Approach
Summary
Challenge
Motivation
Sequencing reads from tumor samples are diluted by normal
cells
lower signal-to-noise ratio: allele frequency 5%
SNVs cannot be called with high significance
The genomes of primary tumors are genetically heterogeneous
with frequent rearrangements
copy number alterations
subclones
Need highly sensitive and specific mutation-calling methods
Terminologies
Terminologies
Challenge
Literature Search
Mutation Callers
Comparison of mutation callers
Simple consensus approach
Integrated/Ensemble Approach
Summary
Comparison of mutation callers
Literature Search
Comparison of mutation callers-Data Sets
Literature Search
whole genome sequencing (WGS)
melanoma sample and matched blood
90% tumor content
paired-end reads, ~50x coverage
whole exam sequencing (WES)
tumor samples
18 lung tumor-normal pairs
70~80% tumor content
paired-end reads, 63x
cell lines
7 lung cancer cell lines
paired-end reads, 233x
Comparison of mutation callers-Experiments
Literature Search
sSNV detection tools
Validation
PCR and direct sequencing of genomic DNA
on only deleted, functionally important sSNVs
Simulation
10 tumor-normal pairs (WES), 100x coverage
Comparison of mutation callers- Results (1)
Literature Search
Comparison of mutation callers- Results (2)
Literature Search
Synthetic data: 10 tumor-normal pairs (WES), 100x coverage
Comparison of mutation callers- Summary of Findings
Literature Search
VarScan2 performed the best; MuTect follows
VarScan2: better at higher allele frequencies
MuTect: more sensitive with low allele frequencies
strand-bias filtering is useful
eliminate many false positives
common problem with Illumina seuquencing data
still a challenge: how to discern sSNVs and normal
alternate alleles?
to call ultras-rare sSNVs: targeted deep sequencing
recommended over WES or WGS
Simple consensus approach
Literature Search
Simple consensus approach-Data Sets
Literature Search
whole exam sequencing (WES)
27 ovarian tumors and their matched germline samples
HiSeq 2000 sequencer, using 100 bp paired-end reads
mean coverage ranged from 102~225x in tumor and 119~118x germlines
validation
Sanger sequencing
somatic SNV detection programs
JointSNVMix2, MuTect, and SomaticSniper
implement sophisticated detection algorithms
used in major tumor sequencing studies
Simple consensus approach-identification somatic SNVs
Literature Search
MuTect (v 1.0.27783)
only the default parameter set was applied
not labeled as ‘REJECT’
JointSNVMix2 (v 0.7.5)
default prior genotype probabilities used for training set
joint probability 0.9999 or greater
SomaticSniper (v 1.0.0)
using joint genotyping mode (-J option)
default prior probability of a somatic mutation (0.01)
mapping quality of 0 were filtered
predictions with a ‘somatic score’ of 40 or greater
SAMtools mpileup
mapping qualities
directionality
depth of reads
total read depth was of 8 or greater in both the T/N
mutant allele frequency of ≥20% in tumor and ≥5% germline
mutant allele supported by read mapping in both for/rev orientations
variant call in only on tumor (exception of the BRAF V600, KRAS G12/13
hotspot)
combined total of 9,226 somatic SNV predictions
median of 321 predictions per sample (range 147~695)
SomaticSniper and JointSNVMix2: most mutation per sample (median 171, 173)
MuTect was more conservative (median 115)
Simple consensus approach-Prdiction Results
Literature Search
Simple consensus approach-Properties of Predictions
Literature Search
non-reference allele frequency in germline
S,J substantial number of reads with non-ref alleles
significant number of germline variants into the call set
M is much more stringent on evidence for non-ref alleles
non-reference allele frequency in tumor
one or two programs have a lower proportion of non-ref reads
not having sufficient allelic ratios to be predicted as somatic
but enough support to rise above the thresholds of at least one program
Simple consensus approach-Validation Results
Literature Search
Simple consensus approach-Filtering Results
Literature Search
taking consensus between GATK Unified Genotyper
mate-pair rescue read filtering
minimum read depth of 10 in both the tumor and germline
2 true positive
Simple consensus approach-Summary of Findings
Literature Search
Powerful method for increasing the validation rate
while maintaining maximum sensitivity
Similar effects are likely to influence other
bioinformatics classification problems
Prove effective for a variety of genomics and
bioinformatics analyses
Integrated/Ensemble Approach
Integrated/Ensemble Approach
Challenge
Literature Search
Mutation callers
Comparison of mutation callers
Simple consensus approach
Integrated/Ensemble Approach
Summary
Integrated/Ensemble Approach
Integrated/Ensemble Approach
Ensemble
Using multiple learning algorithms to obtain better predictive performance
(Three somatic SNV callers: SomaticSniper, MuTect, and VarScan2)
Integrated
For better performance, we will use additional filtering
GATK Unified Genotyper: filtering SNVs predicted in the tumor but not the gremlin
Scoring system: help us to identify strong and relevant mutation candidates
Integrated/Ensemble Approach
Integrated/Ensemble Approach
Subject
tumor.bam
normal.bam
MuTect
somatic.vcf VarScan2
somatic.vcf
SomaticSniper
somatic.vcf
GATK
tumor.vcf
GATK
normal.vcf
Consensus (gatk)
somatic.vcf
filtered (GATK)
somatic.vcf
Cosmic, CCLE
validate somatic list
validated(GATK)
somatic.vcf
SAMtools
mpileup
Integrated/Ensemble Approach: Data Sets (CGHub)
Integrated/Ensemble Approach
https://cghub.ucsc.edu/datasets/benchmark_download.html
CGHub
Cancer Genomics Hub
a resource of the National Cancer Institute
Cancer Genome Atlas (TCGA) consortium and related projects
Integrated/Ensemble Approach: TCGA Benchmark 4
Integrated/Ensemble Approach
Three parts to mutation calling exercise:
derived from grade 3 breast ductal carcinomas (breast cancer)
HCC1143 (50x) vs. HCC1143 BL (60x)
HCC1954 (58x) vs. HCC1954 BL (71x)
Simulate normal contamination and sub clone expansion for both:
Total: 28 . bam files, ~4.3 TB
Integrated/Ensemble Approach: download using GeneTorrent
Integrated/Ensemble Approach
GeneTorrent
client software for downloading sequence data from CGHub’s repository
two main programs: gtdownload and cgquery
get public key
public key: https://cghub.ucsc.edu/software/downloads/cghub_public.key
TCGA key: approval to access the restricted data from the ICGC-DACO
download uuid (xml file)
CGHub
CGHub
CGHub
Validation Data Sets
Integrated/Ensemble Approach
COSMIC
Catalogue of somatic mutations in cancer Cell Lines Project
Wellcome Trust Sanger Institute
http://cancer.sanger.ac.uk/cancergenome/projects/cell_lines/
CCLE
Cancer Cell Line Encycolpedia
Broad Institute and Novartis Institute for Biomedical Research
http://www.broadinstitute.org/ccle/home
Validation Data Sets (18)
Integrated/Ensemble Approach
17:5445207-5445207
17:7577538-7577538
17:10411982-10411982
17:43364293-43364293
17:47892946-47892946
17:67538038-67538038
17:67012449-67012449
17:48538716-48538716
17:27936181-27936181
17:79650824-79650824
17:79638782-79638782
17:76528554-76528554
17:6683197-6683197
17:73235515-73235515
17:39505636-39505636
17:33310040-33310040
17:56083818-56083818
17:37374298-37374298
Java Application: version
Integrated/Ensemble Approach
Java version
Java6 and Java 7 used in many systems
Select Java version
use “update-alternatives —config java”
MuTect run at Java6/ GATK run at Java7 :-(
Java Application: running options
Integrated/Ensemble Approach
-Xmx7g
자바 프로그램의 초기 힙사이즈를 설정
자바프로그램을 구동하기 위해, 초기 설정된 메모리 사이즈는 64M
“java.lang.OutOfMemoryError” 힙사이즈가 부족해서 발생
-Djava.io.tmpdir=/tmp
시스템의 property 값을 설정
자바가 사용할 temporary 디렉토리를 설정
java [-java_options] -jar jarfile [jarfile_options]
java -Xmx10g -Djava.io.tmpdir=/tmp -jar muTect-1.1.4.jar --analysis_type MuTect --reference_sequence human_g1k_v37_decoy.fasta
SomaticSniper v1.0.4
Integrated/Ensemble Approach
bam-somaticsniper -J -F vcf -n HCC1143_Normal -t HCC1143_Tumor -f ${gatk_b37} $
{input_bam1} ${input_bam2} HCC1143_chr17_somaticsniper.vcf
MuTect v1.1.4
Integrated/Ensemble Approach
java -jar muTect-1.1.4.jar --analysis_type MuTect --reference_sequence ${gatk_b37} --
input_file:normal ${input_bam2} --input_file:tumor ${input_bam1} --out
HCC1143_chr17_mutect.out --vcf HCC1143_chr17_mutect.vcf --coverage_file
HCC1143_chr17_mutect.cov.wig.txt -nt 7 --normal_sample_name normal --
tumor_sample_name tumor -L 17
VarScan2 v2.3.7
Integrated/Ensemble Approach
samtools mpileup -f ${gatk_b37} -Q 20 -q 20 -B ${input_bam2} ${input_bam1} >
hcc1143_chr17.mpileup
java -jar VarScan.v2.3.7.jar somatic hcc1143_chr17.mpileup HCC1143_chr17.varscan --
mpileup 1 --output-vcf 1
GATK UnifiedGenotyper
Integrated/Ensemble Approach
java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -L 17 -o
hcc1143_chr17.gatk.normal.vcf -I ${input_bam2} --genotype_likelihoods_model BOTH -
minIndelFrac 0.2 --min_base_quality_score 17 --
standard_min_confidence_threshold_for_calling 30.0 --
standard_min_confidence_threshold_for_emitting 30.0 --baq
CALCULATE_AS_NECESSARY --baqGapOpenPenalty 30.0 --defaultBaseQualities -1 --
validation_strictness STRICT --interval_merging ALL -R ${gatk_b37} -nt 7
GATK SelectVariants
Integrated/Ensemble Approach
Select variants from a VCF source
discordance: select all calls missed in mycalls, but present in Hiscalls
concordance: select all calls made by both mycalls and Hiscalls
selectType MNP/SNP: select only multi-allelic SNPs and MNPs
select restrict the output vcf to a set of intervals
Ensemble approach - results & rank score
Integrated/Ensemble Approach
each filtered count (total variants count/filtered count)
SomaticSniper: 2,381/624
MuTect: 132,239/4,318
VarScan2: 89,986/1,457
concordance call (204 variants)
total 460 variants
exclude gatk germlines: 324 variant
include gatk cancer sample: 204 variants
validation count (total variants count/validated count)
SomaticSniper validate: 2,381/9(+4)
MuTect: 132,239/13(+8)
VarScan2: 89,986/6(+1)
filterd consensus: 204/5
rank score: 1
rank score: 2
rank score: 5
rank score: 3
rank score: 4
Summary
Summary
Challenge
Literature Search
Mutation callers
Comparison of mutation callers
Simple consensus approach
Integrated/Ensemble Approach
Summary
Summary
Summary
Identifying somatic changes from tumor and matched
normal sequence requires accurate detection of somatic
point mutations with low allele frequencies in impure and
heterogeneous cancer samples
Mutations called by multiple tools are of higher-confidence
than mutations called by single tools
Utilizing multiple callers can be a powerful way to construct
a list of final calls for one’s research
Capable of running multiple tools in parallel, providing
faster total run-time
References
References
Wang, Q., Jia, P., Li, F., Chen, H., Ji, H., Hucks, D., et al. (2013). Detecting somatic point mutations in cancer genome
sequencing data: a comparison of mutation callers. Genome Medicine, 5(10), 91. doi:10.1186/gm495
Goode, D. L., Hunter, S. M., Doyle, M. A., Ma, T., Rowley, S. M., Choong, D., et al. (2013). A simple consensus approach
improves somatic mutation prediction accuracy. Genome Medicine, 5(9), 90. doi:10.1186/gm494
Roberts, N. D., Kortschak, R. D., Parker, W. T., Schreiber, A. W., Branford, S., Scott, H. S., et al. (2013). A comparative analysis
of algorithms for somatic SNV detection in cancer. Bioinformatics, 29(18), 2223–2230. doi:10.1093/bioinformatics/btt375
Xu, H., DiCarlo, J., Satya, R. V., Peng, Q., & Wang, Y. (2014). Comparison of somatic mutation calling methods in amplicon
and whole exome sequence data. BMC Genomics, 15(1), 244. doi:10.1186/1471-2164-15-244
Kim, S. Y., Jacob, L., & Speed, T. P. (2014). Combining calls from multiple somatic mutation-callers. BMC Bioinformatics,
15(1), 154–10. doi:10.1186/1471-2105-15-154
L wer, M., Renard, B. Y., de Graaf, J., Wagner, M., Paret, C., Kneip, C., et al. (2012). Confidence-based Somatic Mutation
Evaluation and Prioritization. PLoS Computational Biology, 8(9), e1002714. doi:10.1371/journal.pcbi.1002714
Ding, J., Bashashati, A., Roth, A., Oloumi, A., Tse, K., Zeng, T., et al. (2012). Feature-based classifiers for somatic mutation
detection in tumour-normal paired sequencing data. Bioinformatics, 28(2), 167–175. doi:10.1093/bioinformatics/btr629
Fischer, A., Vázquez-García, I., Illingworth, C. J. R., & Mustonen, V. (2014). High-definition reconstruction of clonal
composition in cancer. CellReports, 7(5), 1740–1752. doi:10.1016/j.celrep.2014.04.055
Frampton, G. M., Fichtenholtz, A., Otto, G. A., Wang, K., Downing, S. R., He, J., et al. (2013). Development and validation of a
clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nature Biotechnology, 31(11), 1023–1031.
doi:10.1038/nbt.2696
Cibulskis, K., Lawrence, M. S., Carter, S. L., Sivachenko, A., Jaffe, D., Sougnez, C., et al. (2013). Sensitive detection of somatic
point mutations in impure and heterogeneous cancer samples. Nature Biotechnology, 31(3), 213–219. doi:10.1038/nbt.2514
Roth, A., Ding, J., Morin, R., Crisan, A., Ha, G., Giuliany, R., et al. (2012). JointSNVMix: a probabilistic model for accurate
detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics, 28(7), 907–913.
doi:10.1093/bioinformatics/bts053
Koboldt, D. C., Zhang, Q., Larson, D. E., Shen, D., McLellan, M. D., Lin, L., et al. (2012). VarScan 2: Somatic mutation and copy
number alteration discovery in cancer by exome sequencing. Genome Research, 22(3), 568–576. doi:10.1101/gr.129684.111
Larson, D. E., Harris, C. C., Chen, K., Koboldt, D. C., Abbott, T. E., Dooling, D. J., et al. (2012). SomaticSniper: identification of
somatic point mutations in whole genome sequencing data. Bioinformatics, 28(3), 311–317. doi:10.1093/bioinformatics/
btr665

More Related Content

What's hot

Genomics & Epigenomics
Genomics & EpigenomicsGenomics & Epigenomics
Genomics & Epigenomics
gumccomm
 
BiPday 2014 --Creanza Teresa
BiPday 2014 --Creanza TeresaBiPday 2014 --Creanza Teresa
BiPday 2014 --Creanza Teresa
eventi-ITBbari
 

What's hot (20)

2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NIST
 
The Presence and Persistence of Resistant and Stem Cell-Like Tumor Cells as a...
The Presence and Persistence of Resistant and Stem Cell-Like Tumor Cells as a...The Presence and Persistence of Resistant and Stem Cell-Like Tumor Cells as a...
The Presence and Persistence of Resistant and Stem Cell-Like Tumor Cells as a...
 
CHAVEZ_SESSION23_ACADEMICPAPER.docx
CHAVEZ_SESSION23_ACADEMICPAPER.docxCHAVEZ_SESSION23_ACADEMICPAPER.docx
CHAVEZ_SESSION23_ACADEMICPAPER.docx
 
Genomics & Epigenomics
Genomics & EpigenomicsGenomics & Epigenomics
Genomics & Epigenomics
 
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...Pre-clinical drug prioritization via prognosis-guided genetic interaction net...
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...
 
2015 04 22_time_labs_shared
2015 04 22_time_labs_shared2015 04 22_time_labs_shared
2015 04 22_time_labs_shared
 
Aug2015 deanna church analytical validation
Aug2015 deanna church analytical validationAug2015 deanna church analytical validation
Aug2015 deanna church analytical validation
 
Systems biology for Medicine' is 'Experimental methods and the big datasets
Systems biology for Medicine' is 'Experimental methods and the big datasetsSystems biology for Medicine' is 'Experimental methods and the big datasets
Systems biology for Medicine' is 'Experimental methods and the big datasets
 
Ca ncer proteomics
Ca ncer proteomicsCa ncer proteomics
Ca ncer proteomics
 
Sensors 12-16614 (1)
Sensors 12-16614 (1)Sensors 12-16614 (1)
Sensors 12-16614 (1)
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Paper presentation @DILS'07
Paper presentation @DILS'07Paper presentation @DILS'07
Paper presentation @DILS'07
 
10.1.1.80.2149
10.1.1.80.214910.1.1.80.2149
10.1.1.80.2149
 
2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ... 2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ...
 
Oncology Drug Pathway Analyzer Linkedin V1.0
Oncology Drug Pathway Analyzer Linkedin V1.0Oncology Drug Pathway Analyzer Linkedin V1.0
Oncology Drug Pathway Analyzer Linkedin V1.0
 
BiPday 2014 --Creanza Teresa
BiPday 2014 --Creanza TeresaBiPday 2014 --Creanza Teresa
BiPday 2014 --Creanza Teresa
 
Msc Thesis - Presentation
Msc Thesis - PresentationMsc Thesis - Presentation
Msc Thesis - Presentation
 
Clinical proteomics in diseases lecture, 2014
Clinical proteomics in diseases lecture, 2014Clinical proteomics in diseases lecture, 2014
Clinical proteomics in diseases lecture, 2014
 
Cncp 2010
Cncp 2010Cncp 2010
Cncp 2010
 

Viewers also liked

Perspectives of identifying Korean genetic variations
Perspectives of identifying Korean genetic variationsPerspectives of identifying Korean genetic variations
Perspectives of identifying Korean genetic variations
Hong ChangBum
 
Aug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingAug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencing
GenomeInABottle
 
Kogo 2013-ngs galaxy
Kogo 2013-ngs galaxyKogo 2013-ngs galaxy
Kogo 2013-ngs galaxy
Hyungyong Kim
 
Explanation slides Somatic Mutations cancer
Explanation slides Somatic Mutations cancerExplanation slides Somatic Mutations cancer
Explanation slides Somatic Mutations cancer
meducationdotnet
 
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics WorkshopLopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Nuria Lopez-Bigas
 
worldwide population
worldwide populationworldwide population
worldwide population
Hong ChangBum
 
DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club
avrilcoghlan
 
DEseq, voom and vst
DEseq, voom and vstDEseq, voom and vst
DEseq, voom and vst
Qiang Kou
 

Viewers also liked (20)

Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble ApproachDetecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
 
Workshop 2011
Workshop 2011Workshop 2011
Workshop 2011
 
Genomics and BigData - case study
Genomics and BigData - case studyGenomics and BigData - case study
Genomics and BigData - case study
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo Protocol
 
Perspectives of identifying Korean genetic variations
Perspectives of identifying Korean genetic variationsPerspectives of identifying Korean genetic variations
Perspectives of identifying Korean genetic variations
 
Aug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingAug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencing
 
Kogo 2013-ngs galaxy
Kogo 2013-ngs galaxyKogo 2013-ngs galaxy
Kogo 2013-ngs galaxy
 
Explanation slides Somatic Mutations cancer
Explanation slides Somatic Mutations cancerExplanation slides Somatic Mutations cancer
Explanation slides Somatic Mutations cancer
 
Normal/Tumor somatic mutations report tool
Normal/Tumor somatic mutations report toolNormal/Tumor somatic mutations report tool
Normal/Tumor somatic mutations report tool
 
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics WorkshopLopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
 
Incidental findings throughout multigene panel testing in cancer genetics
Incidental findings throughout multigene panel testing in cancer geneticsIncidental findings throughout multigene panel testing in cancer genetics
Incidental findings throughout multigene panel testing in cancer genetics
 
worldwide population
worldwide populationworldwide population
worldwide population
 
Part 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expressionPart 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expression
 
DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club
 
DEseq, voom and vst
DEseq, voom and vstDEseq, voom and vst
DEseq, voom and vst
 
Computational genomics approaches to precision medicine
Computational genomics approaches to precision medicineComputational genomics approaches to precision medicine
Computational genomics approaches to precision medicine
 
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
 
영어로 논문쓰기 - 읽기 쓰기 통합 전략을 중심으로
영어로 논문쓰기 - 읽기 쓰기 통합 전략을 중심으로영어로 논문쓰기 - 읽기 쓰기 통합 전략을 중심으로
영어로 논문쓰기 - 읽기 쓰기 통합 전략을 중심으로
 
R 기본-데이타형 소개
R 기본-데이타형 소개R 기본-데이타형 소개
R 기본-데이타형 소개
 
R 프로그래밍-향상된 데이타 조작
R 프로그래밍-향상된 데이타 조작R 프로그래밍-향상된 데이타 조작
R 프로그래밍-향상된 데이타 조작
 

Similar to Detecting Somatic Mutation - Ensemble Approach

coad_machine_learning
coad_machine_learningcoad_machine_learning
coad_machine_learning
Ford Sleeman
 
The quest for high confidence mutations in plasma: searching for a needle in ...
The quest for high confidence mutations in plasma: searching for a needle in ...The quest for high confidence mutations in plasma: searching for a needle in ...
The quest for high confidence mutations in plasma: searching for a needle in ...
Integrated DNA Technologies
 
Sk microfluidics and lab on-a-chip-ch6
Sk microfluidics and lab on-a-chip-ch6Sk microfluidics and lab on-a-chip-ch6
Sk microfluidics and lab on-a-chip-ch6
stanislas547
 
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
Kiogyf
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
GenomeInABottle
 

Similar to Detecting Somatic Mutation - Ensemble Approach (20)

Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...
 
coad_machine_learning
coad_machine_learningcoad_machine_learning
coad_machine_learning
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene finding
 
Kshivets O. Cancer, Computer Sciences and Alive Supersystems
Kshivets O. Cancer, Computer Sciences and Alive SupersystemsKshivets O. Cancer, Computer Sciences and Alive Supersystems
Kshivets O. Cancer, Computer Sciences and Alive Supersystems
 
Personalized Medicine Through Tumor Sequencing
Personalized Medicine Through Tumor SequencingPersonalized Medicine Through Tumor Sequencing
Personalized Medicine Through Tumor Sequencing
 
Personalized Medicine through Tumor Sequencing
Personalized Medicine through Tumor SequencingPersonalized Medicine through Tumor Sequencing
Personalized Medicine through Tumor Sequencing
 
Comparison of breast cancer classification models on Wisconsin dataset
Comparison of breast cancer classification models on Wisconsin  datasetComparison of breast cancer classification models on Wisconsin  dataset
Comparison of breast cancer classification models on Wisconsin dataset
 
Maldi tof-ms analysis in identification of prostate cancer
Maldi tof-ms analysis in identification of prostate cancerMaldi tof-ms analysis in identification of prostate cancer
Maldi tof-ms analysis in identification of prostate cancer
 
Translating next generation sequencing to practice
Translating next generation sequencing to practiceTranslating next generation sequencing to practice
Translating next generation sequencing to practice
 
Genomics 2015 Keynote - Utilizing cancer sequencing in the clinic
Genomics 2015 Keynote - Utilizing cancer sequencing in the clinicGenomics 2015 Keynote - Utilizing cancer sequencing in the clinic
Genomics 2015 Keynote - Utilizing cancer sequencing in the clinic
 
The quest for high confidence mutations in plasma: searching for a needle in ...
The quest for high confidence mutations in plasma: searching for a needle in ...The quest for high confidence mutations in plasma: searching for a needle in ...
The quest for high confidence mutations in plasma: searching for a needle in ...
 
Sk microfluidics and lab on-a-chip-ch6
Sk microfluidics and lab on-a-chip-ch6Sk microfluidics and lab on-a-chip-ch6
Sk microfluidics and lab on-a-chip-ch6
 
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
 
Interrogating differences in expression of targeted gene sets to predict brea...
Interrogating differences in expression of targeted gene sets to predict brea...Interrogating differences in expression of targeted gene sets to predict brea...
Interrogating differences in expression of targeted gene sets to predict brea...
 
GLIIFCA 22 final (1)
GLIIFCA 22 final (1)GLIIFCA 22 final (1)
GLIIFCA 22 final (1)
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
 
FFPE Applications Solutions brochure
FFPE Applications Solutions brochureFFPE Applications Solutions brochure
FFPE Applications Solutions brochure
 
Translation of microarray data into clinically relevant cancer diagnostic tes...
Translation of microarray data into clinically relevant cancer diagnostic tes...Translation of microarray data into clinically relevant cancer diagnostic tes...
Translation of microarray data into clinically relevant cancer diagnostic tes...
 
Next generation sequencing in cancer treatment
Next generation sequencing in cancer treatment  Next generation sequencing in cancer treatment
Next generation sequencing in cancer treatment
 

More from Hong ChangBum

BioSMACK - Linux Live CD for GWAS
BioSMACK - Linux Live CD for GWASBioSMACK - Linux Live CD for GWAS
BioSMACK - Linux Live CD for GWAS
Hong ChangBum
 
Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approach
Hong ChangBum
 
RSS & Bioinformatics
RSS & BioinformaticsRSS & Bioinformatics
RSS & Bioinformatics
Hong ChangBum
 
Next Generation bio Research Infra
Next Generation bio Research InfraNext Generation bio Research Infra
Next Generation bio Research Infra
Hong ChangBum
 

More from Hong ChangBum (20)

Demo chapter3
Demo chapter3Demo chapter3
Demo chapter3
 
통계유전학워크샵
통계유전학워크샵통계유전학워크샵
통계유전학워크샵
 
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
 
BioSMACK - Linux Live CD for GWAS
BioSMACK - Linux Live CD for GWASBioSMACK - Linux Live CD for GWAS
BioSMACK - Linux Live CD for GWAS
 
Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approach
 
How to genome
How to genomeHow to genome
How to genome
 
RSS & Bioinformatics
RSS & BioinformaticsRSS & Bioinformatics
RSS & Bioinformatics
 
Genome Browser based on Google Maps API
Genome Browser based on Google Maps APIGenome Browser based on Google Maps API
Genome Browser based on Google Maps API
 
Korean Database of Genomic Variants
Korean Database of Genomic VariantsKorean Database of Genomic Variants
Korean Database of Genomic Variants
 
Dt Ccompanieslist
Dt CcompanieslistDt Ccompanieslist
Dt Ccompanieslist
 
DTC Companies List
DTC Companies ListDTC Companies List
DTC Companies List
 
My Project
My ProjectMy Project
My Project
 
Genome Browser
Genome BrowserGenome Browser
Genome Browser
 
GenomeBrowser
GenomeBrowserGenomeBrowser
GenomeBrowser
 
Desire
DesireDesire
Desire
 
Next Generation bio Research Infra
Next Generation bio Research InfraNext Generation bio Research Infra
Next Generation bio Research Infra
 
Cluster Drm
Cluster DrmCluster Drm
Cluster Drm
 
Cluster Drm
Cluster DrmCluster Drm
Cluster Drm
 
Platform Day
Platform DayPlatform Day
Platform Day
 
Linux Cluster and Distributed Resource Manager
Linux Cluster and Distributed Resource ManagerLinux Cluster and Distributed Resource Manager
Linux Cluster and Distributed Resource Manager
 

Recently uploaded

Recently uploaded (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 

Detecting Somatic Mutation - Ensemble Approach

  • 1. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 김광중, 홍창범 KT GenomeCloud 2015  한국유전체학회  동계심포지엄   NGS를  Data를  이용한  생물정보분석  Workshop   2015.2.4~2.5  
  • 2. Overview Overview Challenge Literature Search Mutation callers Comparison of mutation callers Simple consensus approach Integrated/Ensemble Approach Summary
  • 3. Challenge Motivation Sequencing reads from tumor samples are diluted by normal cells lower signal-to-noise ratio: allele frequency 5% SNVs cannot be called with high significance The genomes of primary tumors are genetically heterogeneous with frequent rearrangements copy number alterations subclones Need highly sensitive and specific mutation-calling methods
  • 4. Terminologies Terminologies Challenge Literature Search Mutation Callers Comparison of mutation callers Simple consensus approach Integrated/Ensemble Approach Summary
  • 5. Comparison of mutation callers Literature Search
  • 6. Comparison of mutation callers-Data Sets Literature Search whole genome sequencing (WGS) melanoma sample and matched blood 90% tumor content paired-end reads, ~50x coverage whole exam sequencing (WES) tumor samples 18 lung tumor-normal pairs 70~80% tumor content paired-end reads, 63x cell lines 7 lung cancer cell lines paired-end reads, 233x
  • 7. Comparison of mutation callers-Experiments Literature Search sSNV detection tools Validation PCR and direct sequencing of genomic DNA on only deleted, functionally important sSNVs Simulation 10 tumor-normal pairs (WES), 100x coverage
  • 8. Comparison of mutation callers- Results (1) Literature Search
  • 9. Comparison of mutation callers- Results (2) Literature Search Synthetic data: 10 tumor-normal pairs (WES), 100x coverage
  • 10. Comparison of mutation callers- Summary of Findings Literature Search VarScan2 performed the best; MuTect follows VarScan2: better at higher allele frequencies MuTect: more sensitive with low allele frequencies strand-bias filtering is useful eliminate many false positives common problem with Illumina seuquencing data still a challenge: how to discern sSNVs and normal alternate alleles? to call ultras-rare sSNVs: targeted deep sequencing recommended over WES or WGS
  • 12. Simple consensus approach-Data Sets Literature Search whole exam sequencing (WES) 27 ovarian tumors and their matched germline samples HiSeq 2000 sequencer, using 100 bp paired-end reads mean coverage ranged from 102~225x in tumor and 119~118x germlines validation Sanger sequencing somatic SNV detection programs JointSNVMix2, MuTect, and SomaticSniper implement sophisticated detection algorithms used in major tumor sequencing studies
  • 13. Simple consensus approach-identification somatic SNVs Literature Search MuTect (v 1.0.27783) only the default parameter set was applied not labeled as ‘REJECT’ JointSNVMix2 (v 0.7.5) default prior genotype probabilities used for training set joint probability 0.9999 or greater SomaticSniper (v 1.0.0) using joint genotyping mode (-J option) default prior probability of a somatic mutation (0.01) mapping quality of 0 were filtered predictions with a ‘somatic score’ of 40 or greater SAMtools mpileup mapping qualities directionality depth of reads
  • 14. total read depth was of 8 or greater in both the T/N mutant allele frequency of ≥20% in tumor and ≥5% germline mutant allele supported by read mapping in both for/rev orientations variant call in only on tumor (exception of the BRAF V600, KRAS G12/13 hotspot) combined total of 9,226 somatic SNV predictions median of 321 predictions per sample (range 147~695) SomaticSniper and JointSNVMix2: most mutation per sample (median 171, 173) MuTect was more conservative (median 115) Simple consensus approach-Prdiction Results Literature Search
  • 15. Simple consensus approach-Properties of Predictions Literature Search non-reference allele frequency in germline S,J substantial number of reads with non-ref alleles significant number of germline variants into the call set M is much more stringent on evidence for non-ref alleles non-reference allele frequency in tumor one or two programs have a lower proportion of non-ref reads not having sufficient allelic ratios to be predicted as somatic but enough support to rise above the thresholds of at least one program
  • 16. Simple consensus approach-Validation Results Literature Search
  • 17. Simple consensus approach-Filtering Results Literature Search taking consensus between GATK Unified Genotyper mate-pair rescue read filtering minimum read depth of 10 in both the tumor and germline 2 true positive
  • 18. Simple consensus approach-Summary of Findings Literature Search Powerful method for increasing the validation rate while maintaining maximum sensitivity Similar effects are likely to influence other bioinformatics classification problems Prove effective for a variety of genomics and bioinformatics analyses
  • 19. Integrated/Ensemble Approach Integrated/Ensemble Approach Challenge Literature Search Mutation callers Comparison of mutation callers Simple consensus approach Integrated/Ensemble Approach Summary
  • 20. Integrated/Ensemble Approach Integrated/Ensemble Approach Ensemble Using multiple learning algorithms to obtain better predictive performance (Three somatic SNV callers: SomaticSniper, MuTect, and VarScan2) Integrated For better performance, we will use additional filtering GATK Unified Genotyper: filtering SNVs predicted in the tumor but not the gremlin Scoring system: help us to identify strong and relevant mutation candidates
  • 21. Integrated/Ensemble Approach Integrated/Ensemble Approach Subject tumor.bam normal.bam MuTect somatic.vcf VarScan2 somatic.vcf SomaticSniper somatic.vcf GATK tumor.vcf GATK normal.vcf Consensus (gatk) somatic.vcf filtered (GATK) somatic.vcf Cosmic, CCLE validate somatic list validated(GATK) somatic.vcf SAMtools mpileup
  • 22. Integrated/Ensemble Approach: Data Sets (CGHub) Integrated/Ensemble Approach https://cghub.ucsc.edu/datasets/benchmark_download.html CGHub Cancer Genomics Hub a resource of the National Cancer Institute Cancer Genome Atlas (TCGA) consortium and related projects
  • 23. Integrated/Ensemble Approach: TCGA Benchmark 4 Integrated/Ensemble Approach Three parts to mutation calling exercise: derived from grade 3 breast ductal carcinomas (breast cancer) HCC1143 (50x) vs. HCC1143 BL (60x) HCC1954 (58x) vs. HCC1954 BL (71x) Simulate normal contamination and sub clone expansion for both: Total: 28 . bam files, ~4.3 TB
  • 24. Integrated/Ensemble Approach: download using GeneTorrent Integrated/Ensemble Approach GeneTorrent client software for downloading sequence data from CGHub’s repository two main programs: gtdownload and cgquery get public key public key: https://cghub.ucsc.edu/software/downloads/cghub_public.key TCGA key: approval to access the restricted data from the ICGC-DACO download uuid (xml file) CGHub CGHub CGHub
  • 25. Validation Data Sets Integrated/Ensemble Approach COSMIC Catalogue of somatic mutations in cancer Cell Lines Project Wellcome Trust Sanger Institute http://cancer.sanger.ac.uk/cancergenome/projects/cell_lines/ CCLE Cancer Cell Line Encycolpedia Broad Institute and Novartis Institute for Biomedical Research http://www.broadinstitute.org/ccle/home
  • 26. Validation Data Sets (18) Integrated/Ensemble Approach 17:5445207-5445207 17:7577538-7577538 17:10411982-10411982 17:43364293-43364293 17:47892946-47892946 17:67538038-67538038 17:67012449-67012449 17:48538716-48538716 17:27936181-27936181 17:79650824-79650824 17:79638782-79638782 17:76528554-76528554 17:6683197-6683197 17:73235515-73235515 17:39505636-39505636 17:33310040-33310040 17:56083818-56083818 17:37374298-37374298
  • 27. Java Application: version Integrated/Ensemble Approach Java version Java6 and Java 7 used in many systems Select Java version use “update-alternatives —config java” MuTect run at Java6/ GATK run at Java7 :-(
  • 28. Java Application: running options Integrated/Ensemble Approach -Xmx7g 자바 프로그램의 초기 힙사이즈를 설정 자바프로그램을 구동하기 위해, 초기 설정된 메모리 사이즈는 64M “java.lang.OutOfMemoryError” 힙사이즈가 부족해서 발생 -Djava.io.tmpdir=/tmp 시스템의 property 값을 설정 자바가 사용할 temporary 디렉토리를 설정 java [-java_options] -jar jarfile [jarfile_options] java -Xmx10g -Djava.io.tmpdir=/tmp -jar muTect-1.1.4.jar --analysis_type MuTect --reference_sequence human_g1k_v37_decoy.fasta
  • 29. SomaticSniper v1.0.4 Integrated/Ensemble Approach bam-somaticsniper -J -F vcf -n HCC1143_Normal -t HCC1143_Tumor -f ${gatk_b37} $ {input_bam1} ${input_bam2} HCC1143_chr17_somaticsniper.vcf
  • 30. MuTect v1.1.4 Integrated/Ensemble Approach java -jar muTect-1.1.4.jar --analysis_type MuTect --reference_sequence ${gatk_b37} -- input_file:normal ${input_bam2} --input_file:tumor ${input_bam1} --out HCC1143_chr17_mutect.out --vcf HCC1143_chr17_mutect.vcf --coverage_file HCC1143_chr17_mutect.cov.wig.txt -nt 7 --normal_sample_name normal -- tumor_sample_name tumor -L 17
  • 31. VarScan2 v2.3.7 Integrated/Ensemble Approach samtools mpileup -f ${gatk_b37} -Q 20 -q 20 -B ${input_bam2} ${input_bam1} > hcc1143_chr17.mpileup java -jar VarScan.v2.3.7.jar somatic hcc1143_chr17.mpileup HCC1143_chr17.varscan -- mpileup 1 --output-vcf 1
  • 32. GATK UnifiedGenotyper Integrated/Ensemble Approach java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -L 17 -o hcc1143_chr17.gatk.normal.vcf -I ${input_bam2} --genotype_likelihoods_model BOTH - minIndelFrac 0.2 --min_base_quality_score 17 -- standard_min_confidence_threshold_for_calling 30.0 -- standard_min_confidence_threshold_for_emitting 30.0 --baq CALCULATE_AS_NECESSARY --baqGapOpenPenalty 30.0 --defaultBaseQualities -1 -- validation_strictness STRICT --interval_merging ALL -R ${gatk_b37} -nt 7
  • 33. GATK SelectVariants Integrated/Ensemble Approach Select variants from a VCF source discordance: select all calls missed in mycalls, but present in Hiscalls concordance: select all calls made by both mycalls and Hiscalls selectType MNP/SNP: select only multi-allelic SNPs and MNPs select restrict the output vcf to a set of intervals
  • 34. Ensemble approach - results & rank score Integrated/Ensemble Approach each filtered count (total variants count/filtered count) SomaticSniper: 2,381/624 MuTect: 132,239/4,318 VarScan2: 89,986/1,457 concordance call (204 variants) total 460 variants exclude gatk germlines: 324 variant include gatk cancer sample: 204 variants validation count (total variants count/validated count) SomaticSniper validate: 2,381/9(+4) MuTect: 132,239/13(+8) VarScan2: 89,986/6(+1) filterd consensus: 204/5 rank score: 1 rank score: 2 rank score: 5 rank score: 3 rank score: 4
  • 35. Summary Summary Challenge Literature Search Mutation callers Comparison of mutation callers Simple consensus approach Integrated/Ensemble Approach Summary
  • 36. Summary Summary Identifying somatic changes from tumor and matched normal sequence requires accurate detection of somatic point mutations with low allele frequencies in impure and heterogeneous cancer samples Mutations called by multiple tools are of higher-confidence than mutations called by single tools Utilizing multiple callers can be a powerful way to construct a list of final calls for one’s research Capable of running multiple tools in parallel, providing faster total run-time
  • 37. References References Wang, Q., Jia, P., Li, F., Chen, H., Ji, H., Hucks, D., et al. (2013). Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers. Genome Medicine, 5(10), 91. doi:10.1186/gm495 Goode, D. L., Hunter, S. M., Doyle, M. A., Ma, T., Rowley, S. M., Choong, D., et al. (2013). A simple consensus approach improves somatic mutation prediction accuracy. Genome Medicine, 5(9), 90. doi:10.1186/gm494 Roberts, N. D., Kortschak, R. D., Parker, W. T., Schreiber, A. W., Branford, S., Scott, H. S., et al. (2013). A comparative analysis of algorithms for somatic SNV detection in cancer. Bioinformatics, 29(18), 2223–2230. doi:10.1093/bioinformatics/btt375 Xu, H., DiCarlo, J., Satya, R. V., Peng, Q., & Wang, Y. (2014). Comparison of somatic mutation calling methods in amplicon and whole exome sequence data. BMC Genomics, 15(1), 244. doi:10.1186/1471-2164-15-244 Kim, S. Y., Jacob, L., & Speed, T. P. (2014). Combining calls from multiple somatic mutation-callers. BMC Bioinformatics, 15(1), 154–10. doi:10.1186/1471-2105-15-154 L wer, M., Renard, B. Y., de Graaf, J., Wagner, M., Paret, C., Kneip, C., et al. (2012). Confidence-based Somatic Mutation Evaluation and Prioritization. PLoS Computational Biology, 8(9), e1002714. doi:10.1371/journal.pcbi.1002714 Ding, J., Bashashati, A., Roth, A., Oloumi, A., Tse, K., Zeng, T., et al. (2012). Feature-based classifiers for somatic mutation detection in tumour-normal paired sequencing data. Bioinformatics, 28(2), 167–175. doi:10.1093/bioinformatics/btr629 Fischer, A., Vázquez-García, I., Illingworth, C. J. R., & Mustonen, V. (2014). High-definition reconstruction of clonal composition in cancer. CellReports, 7(5), 1740–1752. doi:10.1016/j.celrep.2014.04.055 Frampton, G. M., Fichtenholtz, A., Otto, G. A., Wang, K., Downing, S. R., He, J., et al. (2013). Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nature Biotechnology, 31(11), 1023–1031. doi:10.1038/nbt.2696 Cibulskis, K., Lawrence, M. S., Carter, S. L., Sivachenko, A., Jaffe, D., Sougnez, C., et al. (2013). Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature Biotechnology, 31(3), 213–219. doi:10.1038/nbt.2514 Roth, A., Ding, J., Morin, R., Crisan, A., Ha, G., Giuliany, R., et al. (2012). JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics, 28(7), 907–913. doi:10.1093/bioinformatics/bts053 Koboldt, D. C., Zhang, Q., Larson, D. E., Shen, D., McLellan, M. D., Lin, L., et al. (2012). VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Research, 22(3), 568–576. doi:10.1101/gr.129684.111 Larson, D. E., Harris, C. C., Chen, K., Koboldt, D. C., Abbott, T. E., Dooling, D. J., et al. (2012). SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics, 28(3), 311–317. doi:10.1093/bioinformatics/ btr665