SlideShare a Scribd company logo
1 of 45
Calculating Guilt:
Using open-source software
in forensic DNA testing
Sarah Chenoweth
sarah@dreamwidth.org
@sarahquaint
Disclaimer
• All opinions are my own.
• Dammit, Jim, I’m a chemist, not a programmer.
• …or a statistician.
• slideshare.net/dreamwidth
Gameplan
• Forensic DNA 101
• What sort of profiles do I obtain?
• Statistics: giving weight to those profiles
• Open-or-not software for calculating these statistics
Anne Arundel
County Police
Crime Lab
DNA Technical Leader
source: Wikimedia commons
Rosalind was robbed.
• 23 pairs of chromosome
• >3 billion base pairs
• ~2% is coding DNA (genes)
• ~20-40% is regulatory
• ~50% is highly repetitive
AATGAATGAATGAATGAATGAATGAATG <— 7 repeats
AATGAATGAATGAATGAATGAATGAATGAATGAATGAAT <— 9.3 repeats
STR = Short Tandem Repeat
On chromosome 11, there is an area called TH01,
where the STR “AATG” repeats over an over again.
On the chromosome from my mother, it repeats 7 times,
and on the one from my father, it repeats 9.3 times.
Source: National Human
Genome Research Institute
STR = Short Tandem Repeat
You are not a special
snowflake.
• Most of your DNA, including your genes, is “highly
conserved”
• All humans are 99.9% identical
• Of course, 0.1% of 3 billion = 3 million base pairs of
variation
This is me.
It’s like an EAN on the back
of a book…
• A forensic DNA profile is the length of 23 STRs,
each between 100-500 base pairs in length
• <3% of 1% of 1% of your genome
• Unique “barcode”, except for identical siblings.
Receive evidence.
Sample. Extract.
Quantitate.
Amplify.
Measure.
What we receive.
What gives useful results.
Included or excluded?
• Single-source profiles are simple. But we mostly see
mixtures.
• DNA is the gold standard, carries a lot of weight.
• Must characterize all inclusions with a statistic.
• Make the qualitative statement (excluded, or
matches), characterize it with a quantitative
statistic, and let the trier of fact evaluate.
Nice, 2 person mixture
Nice, 2 person mixture
• 4 alleles at Penta E: 5,7,9,13
• Say this is an assault. We can
assume that the victim is present,
and we know the victim is 7,9.
• So: what are the odds that a random person in the
population is a 5,13?
Likelihood Ratio (LHR)
Likelihood Ratio (LHR)
• How frequently do we see the 5 allele? About 4%
• How frequently do we see the 13 allele? About 5%
• At this one locus: 360 times more likely it’s Sarah &
Robert than Sarah & someone picked at random
from the population.
• Calculate this at all 22 loci, and multiply together:
1.6 x 1023 (160,000,000,000,000,000,000,000)
The world is a dirty place.
The world is a dirty place.
A wretched hive of scum and
villainy.
A wretched hive of scum and
villainy.
“A reasonable degree of
scientific certainty.”
• DNA is a living, biological substance = messy
• Our testing procedure is super-sensitive. <10 cells
• The law wants a clear line between guilty and not
guilty; science is full of, “Well, maybe; it depends.”
• Our classic statistical tools can’t handle these
incomplete mixtures.
Nice, 2 person mixture…
Same… except.
That one little allele.
The 9 allele is just
below the threshold.
…now what?
• Only use the loci where the
suspect is present? That’s
horribly biased.
• Throw up our hands and refuse
to draw conclusions on partial
data? Also biased!
• The least awful solution is to
only use the loci that we know
have complete info: the ones
with two minor loci.
source: my sister, who is the biological mother of this pouty kid.
The loci with 2 minor alleles :
4 out of 22: 18% of the data.
LRH: 1,400,000
Understating is just as bad
as overstating.
• Well, almost. The justice system is designed to err
on the side of caution, and benefit the defendant.
• Take a conservative approach.
• But not using all the data isn’t always conservative:
what if that was exculpatory information?
Probabilistic genotyping
Semi-continuous
• Considers the probability of drop out when calculating the LHR.
• Open source. Fast.
• Still doesn’t use all the data (peak height ratios, stutter).
Scenario:
The victim is: 20,20
The suspect is: 19,22
What is the probability the suspect is
a contributor, but the 19 dropped out?
Lab Retriever
• scieg.org/lab_retriever.html
• github.com/SCIEG/LabRetriever
Lab Retriever
Lab Retriever
Lab Retriever
• if we had a complete mixture =1.6 x 1023
160,000,000,000,000,000,000,000
• partial mixture, so we only use 4 loci for LHR =
1.4 x 106 = 1,400,000
• same partial mixture, semi-continuous LHR = 7.3
x 1020 = 730,000,000,000,000,000,000
Probabilistic genotyping
Continuous
• Markov-chain Monte Carlo (MCMC) simulations.
• Uses all of the data, with fewer assumptions.
• Doesn’t just give you the best estimate: gives you a range.
Probable genotype of
the minor contributor:
AC: 40%
BC: 25%
CC: 20%
CQ: 15%
STRMix
• Developed by the ESR (Environmental Science and
Research, NZ) and FSSA (Forensic Science South
Australia)
• Increasingly becoming the standard
• 20K USD initially, 5K/yr support contract
The justice system does not
embrace open source.
• The data is reliable: but is my interpretation?
• But I don’t tell “the whole truth, and nothing but the
truth.” I can only answer the questions I’m asked.
• Prosecutor misstatement: “That means there’s a one
in a quadrillion chance it’s someone else!”
• Defense misstatement: “She didn’t test the DNA of a
quadrillion people, so there’s no way that’s true!”
Currently, in forensic DNA:
• Binary statistics: yes
• Semi-continuous: yes
• Continuous: no
• Frequency databases: yes
• Data analysis: no
• CODIS: hell to the no
source: Wikimedia commons
Statistics are hard.
source: Bill Gacey @Flickr
There is too much.
Let me sum up:
• Transparency is the key to credibility.
• I need to document all my observations, results, and
calculations so they are reproducible.
• Open software are necessary for independent
verification.
Thank you.sarah@dreamwidth.org
twitter: sarahquaint
slideshare.net/dreamwidth

More Related Content

Viewers also liked

DNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differencesDNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differencesBarbera van Schaik
 
Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2Keith Bradnam
 
DNA of building software products - Fast track method
DNA of building software products - Fast track methodDNA of building software products - Fast track method
DNA of building software products - Fast track methodProductNation/iSPIRT
 
Genome and Proteome data integration in RDF
Genome and Proteome data integration in RDFGenome and Proteome data integration in RDF
Genome and Proteome data integration in RDFNadia Anwar
 
Profile A.I.Macan Markar &amp; Co.
Profile A.I.Macan Markar &amp; Co.Profile A.I.Macan Markar &amp; Co.
Profile A.I.Macan Markar &amp; Co.Arjuna Dangalla
 
Application of Marker Assisted Selection (MAS) for the improvement of Bean Co...
Application of Marker Assisted Selection (MAS) for the improvement of Bean Co...Application of Marker Assisted Selection (MAS) for the improvement of Bean Co...
Application of Marker Assisted Selection (MAS) for the improvement of Bean Co...CIAT
 
Biology DNA Analysis
Biology DNA AnalysisBiology DNA Analysis
Biology DNA AnalysiseLearningJa
 

Viewers also liked (12)

Dna baser
Dna baserDna baser
Dna baser
 
IPA for DNA analysis
IPA for DNA analysisIPA for DNA analysis
IPA for DNA analysis
 
DNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differencesDNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differences
 
137920
137920137920
137920
 
Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2
 
DNA of building software products - Fast track method
DNA of building software products - Fast track methodDNA of building software products - Fast track method
DNA of building software products - Fast track method
 
Genome and Proteome data integration in RDF
Genome and Proteome data integration in RDFGenome and Proteome data integration in RDF
Genome and Proteome data integration in RDF
 
Kishor Presentation
Kishor PresentationKishor Presentation
Kishor Presentation
 
Profile A.I.Macan Markar &amp; Co.
Profile A.I.Macan Markar &amp; Co.Profile A.I.Macan Markar &amp; Co.
Profile A.I.Macan Markar &amp; Co.
 
Biology for Computer Engineers:Part 1(www.ubio.in)
Biology for Computer Engineers:Part 1(www.ubio.in)Biology for Computer Engineers:Part 1(www.ubio.in)
Biology for Computer Engineers:Part 1(www.ubio.in)
 
Application of Marker Assisted Selection (MAS) for the improvement of Bean Co...
Application of Marker Assisted Selection (MAS) for the improvement of Bean Co...Application of Marker Assisted Selection (MAS) for the improvement of Bean Co...
Application of Marker Assisted Selection (MAS) for the improvement of Bean Co...
 
Biology DNA Analysis
Biology DNA AnalysisBiology DNA Analysis
Biology DNA Analysis
 

Similar to Chenoweth os bridge 2015 pp

Pete thorpe wp1 april 2018
Pete thorpe wp1 april 2018Pete thorpe wp1 april 2018
Pete thorpe wp1 april 2018Forest Research
 
THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS
THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITSTHE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS
THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITSNikolaos Tselios
 
17a-lod.pdf
17a-lod.pdf17a-lod.pdf
17a-lod.pdfTaiyeb1
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredgingjemille6
 
Forensic significance of DNA Profiling (Forensic biology)
 Forensic significance of DNA Profiling (Forensic biology)  Forensic significance of DNA Profiling (Forensic biology)
Forensic significance of DNA Profiling (Forensic biology) Shabnamkhan113
 
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.QIAGEN
 
Lecture 6 candidate gene association full
Lecture 6 candidate gene association fullLecture 6 candidate gene association full
Lecture 6 candidate gene association fullLekki Frazier-Wood
 
Dnaprofiling
DnaprofilingDnaprofiling
Dnaprofilingallyjer
 
Scalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMScalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMfnothaft
 
Evolution lectures WK6
Evolution lectures WK6Evolution lectures WK6
Evolution lectures WK6Andrea Hatlen
 
How to screen out liars
How to screen out liarsHow to screen out liars
How to screen out liarsY-h Taguchi
 
The fundamental problem of forensic statistics - the evidential value of a ra...
The fundamental problem of forensic statistics - the evidential value of a ra...The fundamental problem of forensic statistics - the evidential value of a ra...
The fundamental problem of forensic statistics - the evidential value of a ra...Richard Gill
 
Introduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxIntroduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxFatma Sayed Ibrahim
 
Importance of X-STR Linkage Groups in the Establishment of Maternal Relatedne...
Importance of X-STR Linkage Groups in the Establishment of Maternal Relatedne...Importance of X-STR Linkage Groups in the Establishment of Maternal Relatedne...
Importance of X-STR Linkage Groups in the Establishment of Maternal Relatedne...Shamir Montazid
 
Applications in forensics PCR and DNA fingerprinting.pptx
Applications in forensics PCR and DNA fingerprinting.pptxApplications in forensics PCR and DNA fingerprinting.pptx
Applications in forensics PCR and DNA fingerprinting.pptxVenkateswaraPrasad7
 
Familial DNA Searching - Technology to Provide Investigative Leads
Familial DNA Searching - Technology to Provide Investigative LeadsFamilial DNA Searching - Technology to Provide Investigative Leads
Familial DNA Searching - Technology to Provide Investigative LeadsThermo Fisher Scientific
 
Data Science In Action: Prenatal Screening for Down Syndrome
Data Science In Action: Prenatal Screening for Down SyndromeData Science In Action: Prenatal Screening for Down Syndrome
Data Science In Action: Prenatal Screening for Down SyndromeEqual Experts
 

Similar to Chenoweth os bridge 2015 pp (20)

Pete thorpe wp1 april 2018
Pete thorpe wp1 april 2018Pete thorpe wp1 april 2018
Pete thorpe wp1 april 2018
 
THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS
THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITSTHE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS
THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
 
17a-lod.pdf
17a-lod.pdf17a-lod.pdf
17a-lod.pdf
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredging
 
Forensic significance of DNA Profiling (Forensic biology)
 Forensic significance of DNA Profiling (Forensic biology)  Forensic significance of DNA Profiling (Forensic biology)
Forensic significance of DNA Profiling (Forensic biology)
 
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
 
Lecture 6 candidate gene association full
Lecture 6 candidate gene association fullLecture 6 candidate gene association full
Lecture 6 candidate gene association full
 
Predictions
PredictionsPredictions
Predictions
 
Dnaprofiling
DnaprofilingDnaprofiling
Dnaprofiling
 
Scalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMScalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAM
 
Evolution lectures WK6
Evolution lectures WK6Evolution lectures WK6
Evolution lectures WK6
 
How to screen out liars
How to screen out liarsHow to screen out liars
How to screen out liars
 
The fundamental problem of forensic statistics - the evidential value of a ra...
The fundamental problem of forensic statistics - the evidential value of a ra...The fundamental problem of forensic statistics - the evidential value of a ra...
The fundamental problem of forensic statistics - the evidential value of a ra...
 
Introduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxIntroduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptx
 
Importance of X-STR Linkage Groups in the Establishment of Maternal Relatedne...
Importance of X-STR Linkage Groups in the Establishment of Maternal Relatedne...Importance of X-STR Linkage Groups in the Establishment of Maternal Relatedne...
Importance of X-STR Linkage Groups in the Establishment of Maternal Relatedne...
 
Predictions
PredictionsPredictions
Predictions
 
Applications in forensics PCR and DNA fingerprinting.pptx
Applications in forensics PCR and DNA fingerprinting.pptxApplications in forensics PCR and DNA fingerprinting.pptx
Applications in forensics PCR and DNA fingerprinting.pptx
 
Familial DNA Searching - Technology to Provide Investigative Leads
Familial DNA Searching - Technology to Provide Investigative LeadsFamilial DNA Searching - Technology to Provide Investigative Leads
Familial DNA Searching - Technology to Provide Investigative Leads
 
Data Science In Action: Prenatal Screening for Down Syndrome
Data Science In Action: Prenatal Screening for Down SyndromeData Science In Action: Prenatal Screening for Down Syndrome
Data Science In Action: Prenatal Screening for Down Syndrome
 

More from dreamwidth

From the Inside Out: How Self-Talk Affects Your Community
From the Inside Out: How Self-Talk Affects Your CommunityFrom the Inside Out: How Self-Talk Affects Your Community
From the Inside Out: How Self-Talk Affects Your Communitydreamwidth
 
How We Learned To Stop Worrying And Love (or at least live with) GitHub
How We Learned To Stop Worrying And Love (or at least live with) GitHubHow We Learned To Stop Worrying And Love (or at least live with) GitHub
How We Learned To Stop Worrying And Love (or at least live with) GitHubdreamwidth
 
When your code is nearly old enough to vote
When your code is nearly old enough to voteWhen your code is nearly old enough to vote
When your code is nearly old enough to votedreamwidth
 
Hacking In-Group Bias for Fun and Profit
Hacking In-Group Bias for Fun and ProfitHacking In-Group Bias for Fun and Profit
Hacking In-Group Bias for Fun and Profitdreamwidth
 
Slytherin 101: How to Win Friends and Influence People
Slytherin 101: How to Win Friends and Influence PeopleSlytherin 101: How to Win Friends and Influence People
Slytherin 101: How to Win Friends and Influence Peopledreamwidth
 
Keeping your culture afloat through a tidal wave
Keeping your culture afloat through a tidal waveKeeping your culture afloat through a tidal wave
Keeping your culture afloat through a tidal wavedreamwidth
 
LCA2014 - Introduction to Go
LCA2014 - Introduction to GoLCA2014 - Introduction to Go
LCA2014 - Introduction to Godreamwidth
 
User Created Content: Maintain accessibility in content you don't control
User Created Content: Maintain accessibility in content you don't controlUser Created Content: Maintain accessibility in content you don't control
User Created Content: Maintain accessibility in content you don't controldreamwidth
 
Kicking impostor syndrome in the head
Kicking impostor syndrome in the headKicking impostor syndrome in the head
Kicking impostor syndrome in the headdreamwidth
 
Care and Feeding of Volunteers
Care and Feeding of VolunteersCare and Feeding of Volunteers
Care and Feeding of Volunteersdreamwidth
 
Sowing the Seeds of Diversity
Sowing the Seeds of DiversitySowing the Seeds of Diversity
Sowing the Seeds of Diversitydreamwidth
 
Be Kind To Your Wrists (you’ll miss them when they’re gone)
Be Kind To Your Wrists (you’ll miss them when they’re gone)Be Kind To Your Wrists (you’ll miss them when they’re gone)
Be Kind To Your Wrists (you’ll miss them when they’re gone)dreamwidth
 
Web Accessibility for the 21st Century
Web Accessibility for the 21st CenturyWeb Accessibility for the 21st Century
Web Accessibility for the 21st Centurydreamwidth
 
Servers and Processes: Behavior and Analysis
Servers and Processes: Behavior and AnalysisServers and Processes: Behavior and Analysis
Servers and Processes: Behavior and Analysisdreamwidth
 
Overcoming Impostor Syndrome
Overcoming Impostor SyndromeOvercoming Impostor Syndrome
Overcoming Impostor Syndromedreamwidth
 
Build Your Own Contributors, One Part At A Time
Build Your Own Contributors, One Part At A TimeBuild Your Own Contributors, One Part At A Time
Build Your Own Contributors, One Part At A Timedreamwidth
 

More from dreamwidth (16)

From the Inside Out: How Self-Talk Affects Your Community
From the Inside Out: How Self-Talk Affects Your CommunityFrom the Inside Out: How Self-Talk Affects Your Community
From the Inside Out: How Self-Talk Affects Your Community
 
How We Learned To Stop Worrying And Love (or at least live with) GitHub
How We Learned To Stop Worrying And Love (or at least live with) GitHubHow We Learned To Stop Worrying And Love (or at least live with) GitHub
How We Learned To Stop Worrying And Love (or at least live with) GitHub
 
When your code is nearly old enough to vote
When your code is nearly old enough to voteWhen your code is nearly old enough to vote
When your code is nearly old enough to vote
 
Hacking In-Group Bias for Fun and Profit
Hacking In-Group Bias for Fun and ProfitHacking In-Group Bias for Fun and Profit
Hacking In-Group Bias for Fun and Profit
 
Slytherin 101: How to Win Friends and Influence People
Slytherin 101: How to Win Friends and Influence PeopleSlytherin 101: How to Win Friends and Influence People
Slytherin 101: How to Win Friends and Influence People
 
Keeping your culture afloat through a tidal wave
Keeping your culture afloat through a tidal waveKeeping your culture afloat through a tidal wave
Keeping your culture afloat through a tidal wave
 
LCA2014 - Introduction to Go
LCA2014 - Introduction to GoLCA2014 - Introduction to Go
LCA2014 - Introduction to Go
 
User Created Content: Maintain accessibility in content you don't control
User Created Content: Maintain accessibility in content you don't controlUser Created Content: Maintain accessibility in content you don't control
User Created Content: Maintain accessibility in content you don't control
 
Kicking impostor syndrome in the head
Kicking impostor syndrome in the headKicking impostor syndrome in the head
Kicking impostor syndrome in the head
 
Care and Feeding of Volunteers
Care and Feeding of VolunteersCare and Feeding of Volunteers
Care and Feeding of Volunteers
 
Sowing the Seeds of Diversity
Sowing the Seeds of DiversitySowing the Seeds of Diversity
Sowing the Seeds of Diversity
 
Be Kind To Your Wrists (you’ll miss them when they’re gone)
Be Kind To Your Wrists (you’ll miss them when they’re gone)Be Kind To Your Wrists (you’ll miss them when they’re gone)
Be Kind To Your Wrists (you’ll miss them when they’re gone)
 
Web Accessibility for the 21st Century
Web Accessibility for the 21st CenturyWeb Accessibility for the 21st Century
Web Accessibility for the 21st Century
 
Servers and Processes: Behavior and Analysis
Servers and Processes: Behavior and AnalysisServers and Processes: Behavior and Analysis
Servers and Processes: Behavior and Analysis
 
Overcoming Impostor Syndrome
Overcoming Impostor SyndromeOvercoming Impostor Syndrome
Overcoming Impostor Syndrome
 
Build Your Own Contributors, One Part At A Time
Build Your Own Contributors, One Part At A TimeBuild Your Own Contributors, One Part At A Time
Build Your Own Contributors, One Part At A Time
 

Recently uploaded

User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 

Recently uploaded (20)

User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 

Chenoweth os bridge 2015 pp

  • 1. Calculating Guilt: Using open-source software in forensic DNA testing Sarah Chenoweth sarah@dreamwidth.org @sarahquaint
  • 2. Disclaimer • All opinions are my own. • Dammit, Jim, I’m a chemist, not a programmer. • …or a statistician. • slideshare.net/dreamwidth
  • 3. Gameplan • Forensic DNA 101 • What sort of profiles do I obtain? • Statistics: giving weight to those profiles • Open-or-not software for calculating these statistics
  • 4. Anne Arundel County Police Crime Lab DNA Technical Leader source: Wikimedia commons
  • 5. Rosalind was robbed. • 23 pairs of chromosome • >3 billion base pairs • ~2% is coding DNA (genes) • ~20-40% is regulatory • ~50% is highly repetitive
  • 6. AATGAATGAATGAATGAATGAATGAATG <— 7 repeats AATGAATGAATGAATGAATGAATGAATGAATGAATGAAT <— 9.3 repeats STR = Short Tandem Repeat On chromosome 11, there is an area called TH01, where the STR “AATG” repeats over an over again. On the chromosome from my mother, it repeats 7 times, and on the one from my father, it repeats 9.3 times. Source: National Human Genome Research Institute
  • 7. STR = Short Tandem Repeat
  • 8. You are not a special snowflake. • Most of your DNA, including your genes, is “highly conserved” • All humans are 99.9% identical • Of course, 0.1% of 3 billion = 3 million base pairs of variation
  • 10. It’s like an EAN on the back of a book… • A forensic DNA profile is the length of 23 STRs, each between 100-500 base pairs in length • <3% of 1% of 1% of your genome • Unique “barcode”, except for identical siblings.
  • 17. What gives useful results.
  • 18. Included or excluded? • Single-source profiles are simple. But we mostly see mixtures. • DNA is the gold standard, carries a lot of weight. • Must characterize all inclusions with a statistic. • Make the qualitative statement (excluded, or matches), characterize it with a quantitative statistic, and let the trier of fact evaluate.
  • 19. Nice, 2 person mixture
  • 20. Nice, 2 person mixture
  • 21. • 4 alleles at Penta E: 5,7,9,13 • Say this is an assault. We can assume that the victim is present, and we know the victim is 7,9. • So: what are the odds that a random person in the population is a 5,13? Likelihood Ratio (LHR)
  • 22. Likelihood Ratio (LHR) • How frequently do we see the 5 allele? About 4% • How frequently do we see the 13 allele? About 5% • At this one locus: 360 times more likely it’s Sarah & Robert than Sarah & someone picked at random from the population. • Calculate this at all 22 loci, and multiply together: 1.6 x 1023 (160,000,000,000,000,000,000,000)
  • 23. The world is a dirty place.
  • 24. The world is a dirty place.
  • 25. A wretched hive of scum and villainy.
  • 26. A wretched hive of scum and villainy.
  • 27. “A reasonable degree of scientific certainty.” • DNA is a living, biological substance = messy • Our testing procedure is super-sensitive. <10 cells • The law wants a clear line between guilty and not guilty; science is full of, “Well, maybe; it depends.” • Our classic statistical tools can’t handle these incomplete mixtures.
  • 28. Nice, 2 person mixture…
  • 30. That one little allele. The 9 allele is just below the threshold.
  • 31. …now what? • Only use the loci where the suspect is present? That’s horribly biased. • Throw up our hands and refuse to draw conclusions on partial data? Also biased! • The least awful solution is to only use the loci that we know have complete info: the ones with two minor loci. source: my sister, who is the biological mother of this pouty kid.
  • 32. The loci with 2 minor alleles : 4 out of 22: 18% of the data. LRH: 1,400,000
  • 33. Understating is just as bad as overstating. • Well, almost. The justice system is designed to err on the side of caution, and benefit the defendant. • Take a conservative approach. • But not using all the data isn’t always conservative: what if that was exculpatory information?
  • 34. Probabilistic genotyping Semi-continuous • Considers the probability of drop out when calculating the LHR. • Open source. Fast. • Still doesn’t use all the data (peak height ratios, stutter). Scenario: The victim is: 20,20 The suspect is: 19,22 What is the probability the suspect is a contributor, but the 19 dropped out?
  • 35. Lab Retriever • scieg.org/lab_retriever.html • github.com/SCIEG/LabRetriever
  • 38. Lab Retriever • if we had a complete mixture =1.6 x 1023 160,000,000,000,000,000,000,000 • partial mixture, so we only use 4 loci for LHR = 1.4 x 106 = 1,400,000 • same partial mixture, semi-continuous LHR = 7.3 x 1020 = 730,000,000,000,000,000,000
  • 39. Probabilistic genotyping Continuous • Markov-chain Monte Carlo (MCMC) simulations. • Uses all of the data, with fewer assumptions. • Doesn’t just give you the best estimate: gives you a range. Probable genotype of the minor contributor: AC: 40% BC: 25% CC: 20% CQ: 15%
  • 40. STRMix • Developed by the ESR (Environmental Science and Research, NZ) and FSSA (Forensic Science South Australia) • Increasingly becoming the standard • 20K USD initially, 5K/yr support contract
  • 41. The justice system does not embrace open source. • The data is reliable: but is my interpretation? • But I don’t tell “the whole truth, and nothing but the truth.” I can only answer the questions I’m asked. • Prosecutor misstatement: “That means there’s a one in a quadrillion chance it’s someone else!” • Defense misstatement: “She didn’t test the DNA of a quadrillion people, so there’s no way that’s true!”
  • 42. Currently, in forensic DNA: • Binary statistics: yes • Semi-continuous: yes • Continuous: no • Frequency databases: yes • Data analysis: no • CODIS: hell to the no source: Wikimedia commons
  • 43. Statistics are hard. source: Bill Gacey @Flickr
  • 44. There is too much. Let me sum up: • Transparency is the key to credibility. • I need to document all my observations, results, and calculations so they are reproducible. • Open software are necessary for independent verification.

Editor's Notes

  1. Good afternoon! Okay, I should say off the bat: DNA cannot be used to calculate guilt. What you can calculate are statistics to give weight whenever you include in individual in a forensic DNA mixture. This talk is not a tutorial, or a how-to. What I’d like to do is familiarize you with a pretty complex topic — forensic DNA testing — and show you how it relates to open source software in ways you might not be familiar with.
  2. All opinions are my own: the police department I work for doesn’t know I’m here. Also, all photos are my own unless otherwise cited, and everything is licensed for noncommercial use. This is a lot of data crammed into 40 minutes, and some these slides are rather dense. They are available on slideshare. Please feel free to raise you hand during the talk and let me know if I’ve lost you, but for longer questions, wait, and grab me afterwards. I will not be speaking about any specific cases or details, but I will make general mention of some violent crimes, including sexual assaults. Also, when I use the terms ‘male’ and ‘female,’ I’m referring to the chromosomal make up, not gender identity.
  3. So: I’m going to explain, as briefly as I can, what a forensic DNA profile is. I’ll show you examples of forensic DNA profiles, including the good, the bad, and the ugly. And finally, I’ll talk about what current and emerging options I have for calculating statistics, and why open source software is important to forensics.
  4. This is the county I work for, with a population of about 500,000. I’m a native and current Baltimorian, which just to the north, and D.C. is the diamond-shaped void is just to the west. I’m not speaking in an official capacity. But I do have fifteen years experience, which means I’ve had time to develop some strong opinions.
  5. You have, in nearly every cell of your body, a complete copy of your genome, all 23 chromosomes of it. That’s a lot of information: over 3 billion base pairs, made up of only 4 nucleotides. Surprisingly, only about 2% of the genome are genes, which are actual recipes for proteins. About half are repetitive sections with no known function. These repetitive sections are a few nucleotides repeated over and over: AATG, AATG, AATG, for long stretches.
  6. In fact, we have an acronym for those repetitive bits: STRs. Please note that I’m a scientist, and I work for the police, so I’m doubly obliged to have acronyms for everything. So: you can have complete repeats, or partial repeats, which are noted with a decimal point. Everyone has two copies, usually different lengths, sometimes the same length.
  7. This is an electropherogram. This the actual output I see in the lab. I don’t see the A’s, T’s, C’s, and G’s, because the actual nucleotides don’t really matter. But I can measure how many times they repeat. This is the length of the STR, and that’s the top number in the box.
  8. Most of your DNA is “highly conserved”. Highly conserved means mostly the same from generation to generation. This indicates it has an important function, so there’s less variation. And that’s not useful for forensic identification. We want to measure highly variable areas, so we can distinguish between people. Because the STRs have no known function, mutations have no detrimental effect, so they tend to be highly variable. Now, I mentioned one STR, named TH01…
  9. I use a commercially-available DNA kit that looks at 24 specific areas of the genome. Includes 22 STRs (like I just described) a sex-marker gene, and one STR that’s only found on the Y chromosome. These peaks are called alleles. Where you see one tall peak, I have two copies that are the same length. Homozygous, vs heterozygous.
  10. Now, these STRs represent a very small proportion of your genome, and because they aren’t genes, they don’t provide information about your physical appearance. Just like a UPC or EAN barcode, it’s not useful in and of itself. You need to compare it to known DNA profiles. To do this, we also test oral swabs from individuals related to the case.
  11. So: how do I actually get from an item of evidence — like, say, an empty water bottle left in a stolen vehicle — to a forensic DNA profile? Well, we receive over 400 cases a year, with an incredible variety of items. This represents nearly all the items tested over a two year period (does not include bloodstains or semen).
  12. The first step is to open each item, one at a time, on a sterile surface, and take a cutting or a swabbing. Then we purify the DNA by breaking open the cells with a detergent, and washing away all the membranes, proteins, dirt, etc.
  13. Then, because the concentration of cells on different items is highly variable, we use up a bit of the purified DNA to measure how concentrated it is.
  14. This is a thermal cycler, which is essentially a chemical photocopier. It unzips the double strand of DNA, then uses each strand to make a copy. Repeat this over and over, and you exponentially increase the amount of DNA in about two hours. We also tag each copy with a fluorescent dye. Like a regular, paper photocopier, GIGO.
  15. Our analysis instrument uses capillary electrophoresis: the capillaries are those thin copper wires, and at one end is a platinum cathode, and at the other end, a platinum anode, and a high voltage runs through it. It separates the piece by size, with shorter pieces moving faster than longer ones. Behind the black door is a laser, which excites the fluorescent tags on the STR copies, which are measured by a CCD camera.
  16. So: to recap, this is what I receive…
  17. …and this is what actually produces interpretable profiles. Okay! Congratulations, we’ve reached the end of the Science portion! You are all welcome to visit my lab if you’re ever in town. Now: onto the Math-y portion of the talk. I’m not going to hit you with formulas, or anything, just give you a high level overview of the concepts.
  18. Okay, great. When I obtain a clean, single source profile (like my own profile, which I showed you a few slides back), it’s a very simple matter to determine if it matches or doesn’t match the reference standards. But most evidence yields a mixture. Mixtures are always more complicated, because it’s usually impossible to tell with certainty which alleles belong to which person. With mixtures, it’s particularly important to provide a statistic, to give people an idea of what percentage of the population could fit into that mixture. In order to calculate any statistics, you need to know the approximate frequency of each allele in the population. The frequency database that I use was tabulated by a team at NIST (the national institute of standards and technology), based on testing of several thousand unrelated individuals.
  19. You saw a single source profile when I showed you my DNA profile. It looks the same, whether it’s from an oral swab, or from a water bottle, or from a bloodstain. This is an example of a mixture of 1 part male DNA and 2 parts female DNA. I know this because I made this mixture for a validation study. This is actually me and my coworker Robert.
  20. Here’s the second half. Now, these peaks are all of nice, even height, all quite a bit above the interpretation threshold. This is lovely, very easy to interpret. When I get this kind of profile from actual evidence, I calculate a statistic called a likelihood ratio.
  21. I’m not going to go into the math, here, but the likelihood ratio compares two probabilities of the same event under different hypotheses. In forensics, this means I’m weighing the prosecutor’s hypothesis to the defense attorney’s hypothesis. The prosecutor is theorizing that this is a mixture of the victim and the defendant. The defense postulates this is the victim, and some other person selected at random who happens to have a very similar profile to the defendant. The LHR expresses those odds based on how common each allele is in the population.
  22. Where do these frequencies come from? Studies of large population groups. Specifically, I use a database compiled by NIST — that’s the National Institute of Standards and Technology. (160 sextillion) The important point here is that the statistics are very strong when there’s a strong, complete mixture. Unfortunately, I don’t usually get pretty, full mixtures like this.
  23. This is a partial mixture, probably from two people. But there’s a lot of drop out. Take a look at D10 (middle of the blue row). You can see four peaks, but only one is labeled: only one is above the detection threshold for this instrument. The other three “dropped out”. Look right next to it, at D13. All the alleles dropped out. What’s more, though you can kind of see at least two peaks, it’s likely others that dropped out so completely that they’re not registering at all.
  24. This is the other half of that profile. However, this really isn’t even all that bad! …
  25. …this is a mixture of at least five people. [Look in the middle of the green.] This is a seatbelt from a car, from a homicide case I’m working on. Again, look at D13: only one allele. Does that mean it’s five homozygous people? No, just a lot of dropout.
  26. “A reasonable degree of scientific certainty.” This is the phrase that’s always echoing through my mind when evaluating data. A lot of the time, I don’t have complete profiles. Where’s the line between reasonably certain and standing on shaky ground? Especially with new, ultra-sensitive techniques, we can’t use the same old statistical models.
  27. This is that same 1:2 mixture of Robert and myself that I showed you…
  28. This is the same mixture, same two people, but a different ratio, of 1 part Robert and 9 parts me. So his peaks are going to to be very short compared to mine. In fact, might be so short that some could drop out. Specifically look at TPOX.
  29. I know I’m a 8,11 at TPOX, and Robert’s a 9,11. But his 9 allele isn’t present. Well, it kind of is, but not above the threshold, so I can’t say it’s there. Well, that’s just great, when I know this is Robert. But what if it’s an assault case, and the minor component matches the suspect except for one little allele?
  30. Well, I could choose which loci to use for the statistic based on which ones match the suspect. Just leave TPOX out of the calculation. But now I’m interpreting the evidence based on the suspect’s profile. That’s awful. I could call it inconclusive, but again, how do I actually know if anything’s missing? I have to evaluate the evidence independently of the suspect. The least awful solution…
  31. Now, I still have to make one assumption: that this is a mixture of two people. And I can also assume that I’m one of the two people. But I don’t need to make any assumptions about the suspect. I don’t even need the suspect’s profile in order to decide which loci to use for stats. Unfortunately, that’s only 4 loci, out of the 22 I tested. I’m using less than a fifth of the data.
  32. That sucks, because…
  33. The solution: better, more sophisticated statistical models. We call these probabilistic genotype models. These are just emerging, just being adopted by forensic labs. There are two classes…
  34. This is the best of the semi-continuous models currently available. Developed by professors from UC Berkeley, UCLA, and California State University. Lovely, simple UI, and I figured it how to use it in about four hours last Friday, when I suddenly realized I’d better write this talk. The hardest part was crunching my validation data to determine the probability of drop out.
  35. Here’s that 1:9 mixture of Robert and I…
  36. Calculated the LHR at each loci, for each of the three population groups that it has frequency data for, then the product of all this loci is the final LHR in bold, at the bottom.
  37. 160 sextillion, 1.4 million, & 730 quintillion Even better: I did not decide which loci to use. I gave it the full profile, the drop out frequency determined previously, from an earlier validation study, and it went from there. This is so much better and less biased. Also, open source. Anyone can reproduce what I did, including the defense, another DNA expert, you (if I gave you the profile).
  38. There is another class of probabilistic genotyping models: the continuous models. Not going to try and explain this math, because I only vaguely grasp it: it’s a sampling algorithm. Monte Carlo: a method of predicting the probabilities that various events are likely to occur in the future Markov-chain: the most common way to build the future states from some present state (we know the frequency of alleles, so we can predict genotypes)
  39. This is the best of the continuous models available for forensic DNA labs.
  40. Why do I care so much about bias, impartiality, and transparency? Because in this adversarial system, I’ve seen both sides misstate, overstate, and flat-out lie about the significance of DNA. So I want to present and explain the information as clearly as possible to all the stakeholders. (I don’t know why I’m never picked for jury duty: I don’t believe either side, because they both try to twist my words.)
  41. So, what’s the current state of open source in forensic DNA testing? [just before next slide] And in conclusion…
  42. If people understood statistics, Vegas would be a sleepy spot in the desert.
  43. In my opinion, which is an expert opinion, which means I’m allowed to opine in courtrooms: