2. Yaniv Erlich@erlichya2/23/16
Outline
1. Surname inference from DNA
2. The power of genome-wide STR
profiles
3. Rapid identification with handheld
deice
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
3. Yaniv Erlich@erlichya2/23/16
Outline
1. Surname inference from DNA
2. The power of genome-wide STR
profiles
3. Rapid identification with handheld
deice
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
4. Yaniv Erlich@erlichya2/23/16
Correlation between Y-chr and surnames
www.ysearch.org:Y
Y
Smith
Smith
Y
Smith
Erlich
Advanced Strategies for DNA Identification
ACGCACGC…
Surname inference HipSTR Y-STRs MinION Summary
5. Yaniv Erlich@erlichya2/23/16
Databases of interest
www.smgf.org www.ysearch.org
140,000 publicly accessible surname-Ychr records
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
6. Yaniv Erlich@erlichya2/23/16
How to find surnames?
Estimating the time to most recent common ancestor
Target i-th record
in db
surname
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
7. Yaniv Erlich@erlichya2/23/16
Empirical test to determine the probability of
recovering a US surname
Y-chr of a
real person
Querying
Ysearch and
SMGF
Inferring
surname
x900
For US Caucasian males:
12% Successful recoveries
5% Wrong recoveries
83% Unknown
Comparing the predicted
surname to the true one
Surname inference
algorithm
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
8. Yaniv Erlich@erlichya2/23/16
Distribution of inferred surnames
Most of the inferred surnames are relatively
rare
Intro. Methodology The Venter case Anonymous datasets Summary
Inferred surname =~ zipcode
Advanced Strategies for DNA Identification
9. Yaniv Erlich@erlichya2/23/16
Putting it all together: the Venter case
www.ysearch.org:
lobSTR
DYS458: 17 repeats
Try it yourself: bit.ly/find_craig
We got a surname from whole
genome sequencing data
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
11. Yaniv Erlich@erlichya2/23/16
Outline
1. Surname inference from DNA
2. The power of genome-wide STR
profiles
3. Rapid identification with handheld
deice
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
12. Yaniv Erlich@erlichya2/23/16
HipSTR: a new STR calling algorithm
• Haplotype-based imputation, phasing, and genotyping of STRs
• Haplotype
– Robustness to stutter noise
• Imputation
– Recover STR dropouts from nearby SNPs.
• Phasing
– Resolve homoplasy between alleles
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
13. Yaniv Erlich@erlichya2/23/16
STR benchmarking results
HipSTR is the most accurate STR caller AND does this ~10x faster than the next best
method
50
55
60
65
70
75
80
85
90
95
100
Accuracy
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
14. Yaniv Erlich@erlichya2/23/16
HipSTR: solving homoplasy
• Can now correctly detect STRs with identical lengths but
different sequences (homoplasy)
• Real example:
– Length based genotype: -4/-4
– HipSTR genotype: (AGAT)8(ACAT)9 / (AGAT)10(ACAT)7
• HipSTR available at
https://github.com/tfwillems/HipSTR
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
15. Yaniv Erlich@erlichya2/23/16
Chromosome wide scan of Y-STRs
• Goal: scan every possible STR on the
Y chromosome and assess mutation
rates
• >200,000 transmissions = high
accuracy.
• Leverage 1500 whole genome world-
wide samples.
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
18. Yaniv Erlich@erlichya2/23/16
Outline
1. Surname inference from DNA
2. The power of genome-wide STR
profiles
3. Rapid identification with handheld
deice
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
19. Yaniv Erlich@erlichya2/23/16
Oxford Nanopore MinION
Features:
• USB stick
• Portable
• Low throughput
• High error rate (10%)
Can we identify
samples within
minutes?
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
20. Yaniv Erlich@erlichya2/23/16
CODIS STR are challenging
1. MinION has too many indels
1. Many reads required to correct for
error
2. CODIS needs a PCR machine
Proposal:
Shotgun sequencing + Bayesian
approach
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
21. Yaniv Erlich@erlichya2/23/16
Approach
Real-time Sequencing data
(reads)
Filter SNPs
Bayesian algorithm
Based on prior knowledge
• Frequency alleles in
population
• Error rate
Alignment to human genome
15% error in base calling
Genome
db
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
22. Yaniv Erlich@erlichya2/23/16
Real experiment: me versus Venter
Probabilityofamatch
6m30s
• Assuming a
database of
107
• Retrospect
analysis
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
23. Yaniv Erlich@erlichya2/23/16
Real experiment: Venter vs. me
0%
1%
10%
100%
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
# of Reads Returned from MinION
P(Erlich)
P(Venter)
• Bad flowcell:
50min for
detection
• My genome:
23andMe
array
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
24. Yaniv Erlich@erlichya2/23/16
Summary
Advanced capabilities:
1. Surnames
2. Homoplasy detection
3. More Y-STRs
4. Imputation of missing markers
5. Rapid identification of DNA
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
25. Yaniv Erlich@erlichya2/23/16
Acknowledgements
*Thomas Willems (MIT)
*Melissa Gymrek (HST – Harvard/MIT)
*Sophie Zaajier (Columbia University
*Robert Piccone (Columbia University)
Chris Taylor-Smith (Sanger Institute)
David Poznik (Stanford University)
1000Y analysis group
* Supported by: NIJ 2014-DN-BX-K089