20170209 ngs for_cancer_genomics_101

Next Generation Sequencing
for
Cancer Genomics 101
2017/02/10
Ino de Bruijn
Jorge Reis-Filho Lab

Content
• DNA sequencing
– Targets
• DNA, genes, exons, introns
• RNA
• How do we analyze NGS data?
– Genetic changes
– Mutational Signatures
– RNASeq

Sequencing DNA in the modern era
• DNA Sequencing is to convert real world DNA to digital DNA
• In 1980s
– Sanger sequencing
– Compare short regions of DNA
• Possible by hand
• In mid 2000s
– Parallelization of sequencing
reactions
– Generates billions of DNA reads
• DNA read: short stretch of DNA
– Compare whole genomes
• Impossible by hand
CACGTCTAAGGGCGAAGAGCTGACTGCTTTTTT

Targeting parts of the genome
• Human genome has 3 billion bases
• Be cost effective:
– Focus on part of genome related to your
subject

What is a gene?
• Human genome 3 billion bases
– 23 Chromosomes
– Certain stretches of DNA are code for proteins
which perform a wide variety of functions in
your body (~20,000 in total)

Gene to protein
Exons comprise only
~1.5% of genome

Different targets
• Whole Genome Sequencing (WGS)
– E.g. Rearrangements outside genes
• Whole Exome Sequencing (WES)
– E.g. Gene Discovery (Rare/unknown tumors)
• Custom target
– MSK-IMPACT (Integrated Mutation Profiling
of Actionable CancerTargets)
• 410 genes related to cancer
• >15K patents profiled at MSKCC
– https://cbioportal.mskcc.org/study?id=mskimpact

NGS Principles - Coverage
Sequence same part many times:
Coverage is number of times a base is covered by a read

NGS Principles - Coverage
• Not all reads retrieved are correct
– Many errors when sequencing
• DNA Library prep protocol
• Sequencing error rate
• Sequencing groups of cells
– Certain genetic changes only in small fraction
of cells
• Need to sequence the same part multiple
times to get confidence
– Amount depends on analysis & expectation

How to analyze the NGS data?
• Some might guess this is where the
bioinformatician comes in…

How to analyze the NGS data?
• Some might guess this is where the
bioinformatician comes in…
Too late - the bioinformatician should have
been helping you design the experiment 

How to analyze NGS data?
• Tons of different options
– What is the research question?
• Common analysis: identify genetic changes
in the tumor

Identify the genetic changes
Meyerson et al. Nat Rev Genet 2010

Identify the genetic changes
• Compare against reference human
genome
– Gives both germline and somatic mutations
• How to differentiate?
– Databases with common germline variants misses many
• Somatic mutations
– Take DNA from normal cells and tumor cells
– Filter mutations in normal

Identify mutations
• Automated pipelines to do this
– Example: Mutation calling tools take into
account
• Number of reads having the mutation versus all
reads (Mutation Allelic Fraction (MAF))
• Coverage at that position
• Read quality score
• If calling somatic mutations
– Mutation in the normal
• Every parameter makes assumptions about
the data – communicate the goal of the
project

Categorize Mutations
• Silent/Nonsilent
– Does the mutation alter phenotype?
• In exonic region
– Synonymous: Amino Acid Code stay the same
– Nonsynonymous: Changes Amino Acid Code
of protein

Categorize Mutations
• Oncogenesis
– Oncogenes (the gas)
• Cell growth
• Activation causes cancer
– Tumor Suppressor Genes (the breaks)
• DNA repair, slow down cell division
• Loss of function causes cancer
– Two Hit Hypothesis (Knudson 1971)

Mutational Signatures
• Find activated mutational processes
• Use the identified SNVs (single nucleotide
variants) to determine
– Use 1 base context on both 5’ and 3’ side
• .C. > .T.
• 6 base transition classes
– C>A, C>G, C>T,T>A,T>C,T>G
• 4 possible bases on both sides
• Total: 6 * 4 * 4 = 96 possible transitions

Mutational Signatures
Alexandrov, L.B. et al. Nature 2013
Biological processes generating
somatic mutations in cancer samples
Dataset:
4,938,362 mutations from 7,042
cancers
Aging signature Defective DNA MMR signature POLE signature
I T
CG>TA transitions at NpCpG
I, indels; T, transcriptional strand bias
CG>TA transitions at NpCpG
CG>AT transversions at CpCpC
C>A transversions at TpCpT; T>G at TpTpT

Copy Number Analysis
(amplification)
(gain)
(neutral)
(loss)
(deletion)
Relative
copy
number:

RNA-Seq Analysis
• Gene expression
– Find low or highly expressed genes
Breast Invasive Carcinoma (TCGA, Nature 2012)
AMP+Upreg AMP Upreg

RNA-Seq Analysis
• Gene expression as prognosis indicator
Verhaak et al (JCI, 2013)

RNA-Seq Analysis
• Fusion gene detection
– E.g.TMPRSS2:ERGa (50% prostate cancers)

20170209 ngs for_cancer_genomics_101

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a 20170209 ngs for_cancer_genomics_101

Semelhante a 20170209 ngs for_cancer_genomics_101 (20)

Último

Último (20)

20170209 ngs for_cancer_genomics_101