1. Bioc4010 Sample Questions:
1. A) What is the base call accuracy of a base in an Illumina sequenced short
read with a Q value of 20?
B) Is this better or worse than a Q value of 10?
Answer: A) Probability 1 in 100 or 99% call accuracy
B)Better. Q10 corresponds to a probability of 1 in 10 or 90% call accuracy
Formula: Q = -10 log10 P
2. What two primary advantages does exome sequencing provide over whole
genome sequencing?
Answer: Cost and data reduction. Exome capture limits the sequencing to known
protein-coding genes and some miRNAs.
3. Split and sort the string ‘CAPTAINKIRK’ into its appropriate suffix array
Answer:
Ainkirk
Aptainkirk
Captainkirk
Inkirk
Irk
K
Kirk
Nkirk
Ptainkirk
Rk
Tainkirk
2. 4. Given a base-quality score threshold of Q30, the following short read
alignment, and reference sequence, what is the genotype (two alleles, eg
G/C)at the indicated position? Base qualities for the position are listed on the
side for each of the reads.
AGCTCCCAGGGTCCAG Q29
GTCCAGTCTCGGTT Q40
CAGGGTCCAGTC Q47
TCCAGTCTCGGTTCCATC Q35
CCCAGGGCCCAG Q50
GGGTCCAGTCTC Q31
TCCCAGGGCC Q10
AGGGTCCAGT Q45
GCTCCCAGGGCCCAGTCT Q46
CTCCCAGGGCCC Q33
CCAGGGTCCAGTCQ38
GCTCCCAGGGCCCAGTCTCGG Q41
CAGGGTCCAGTCTCG Q15
AGCTCCCAGGGTCCAGTCTCGGTTCCATCTA
*
Answer: Discard the reads where the base quality score is below Q30. Sum up the
reference and alternate bases at the position. (T =6 , C = 4). Therefore the genotype
called is T/C (heterozygous).
5. Sort the following types of genetic variants into the categories: Potentially
Disease Causing, Unlikely to be Disease Causing
1. Splice Site
2. Non-Synonymous
3. Synonymous
4. FrameshiftIndel
5. Stop Loss
6. Stop Gain
7. Intronic (Non-Splice Site)
8. Intergenic
Answer:
Disease: 1, 2, 4, 5, 6
Non-Disease: 3, 7, 8
3. 6) What is the primary motivation for using “next gen” sequencing methods
and modern genomics approaches to diagnosing human genetic diseases?
Answer: Cost
7) What does the base quality of a sequencing read tell you?
Answer: The base quality is equivalent to the probability of an incorrect base call.
(Also acceptable answer is the base call accuracy)
8) What problem does binary search address?
Answer: Efficiently searching the index of a genome