Catalog of formulae for forensic genetics ppt

COMPREHENSIVE CATALOG OF
STATISTICAL FORMULAE,
ALGORITHMS AND SOFTWARE – A
STEP TOWARDS GOOD STATISTICS
PRACTICE IN FORENSIC GENETICS
Nikita N. Khromov-Borisov,Nikita N. Khromov-Borisov,
Andrew G. Smolyanitsky
Forensic Medicine Bureau of Leningrad District
Saint Petersburg, Russia
Nikita.KhromovBorisov@gmail.com
Andrew.Smolyanitsky@yandex.ru

Quotations of the day
If your experiment requires statistics,
then you ought to have done a better experiment
Ernest Rutherford
Statistical thinking will one day be as necessary
for efficient citizenship as the ability to read and writefor efficient citizenship as the ability to read and write
Herbert George Wells
Those who ignore Statistics are condemned to reinvent it
Bradley Efron
If Experimentation is the Queen of the Sciences,
then Statistical Methods must be regarded
as Guardians of the Royal Virtue
Myron Tribus

GSP – Good Statistics
Practice is what we need
Obviously, in their turn, statistical methods must be blameless and
perfect
So there is an urgent need for the comprehensive catalog of
carefully checked and approved formulae
as well as corresponding algorithms and software.as well as corresponding algorithms and software.
Unfortunately, some of them are published initially with errors which are
reproduced in subsequent sources.
Example of corrections:
Clayton T. M., Foreman L. A., Carracedo A.
FORENSIC SCIENCE INTERNATIONAL
Vol. 125: No. 2-3 p. 284-284, 2002
Motherless case in paternity testing by Lee et al.

Elements of the Statistical Design of
Experiment
Some formulae are in rare use or forgotten.
Example: Chakraborty’s formula for the sample size required
to reach the representativeness (reliability, “saturation”) of
the reference population samples. Human Biology 64 (1992)
141-159:141-159:
ln[1 - (1 - α)1/r]
Nmin= --------------
4 ln(1 – Pmin)
Nmin - minimum number of independent individuals to be analyzed,
a - probability of error,
r - number of alleles revealed by the system,
Pmin- minimum allele frequency.

Minimum sample sizes required
for the reference population data
Minimum
allele
frequency
No. of
alleles
Error Sample size,
No. of
individuals
P r α NPmin r α Nmin
0.01 2 - 25 0.001-0.0001 190 - 310
0.005 2 – 25 0.001-0.0001 380 - 620
0.001 2 - 25 0.001-0.0001 1900 - 3100

Template for paternity testing
Mother Child Tested man
JK JK JJ JK JW
PI 1/(pJ +pK) 0.5/(pJ +pK)
JJ JJ JJ JWJJ
JK
KK
KW
JJ
JJ
JK
JK
JJ
JJ
JJ
JJ
JK
JK
JK
JW
JW
JW
JW JZ
PI 1/pJ 0.5/pJ
Obligative paternal alleles are in red color
False genotyping is excluded

C. C. Li and A. Chakravarti
alternatives
Paternity probability based on Nonexclusion
P0
W=-------------------------------
P0 + (1-P0)(1- E1)…(1-Et)P0 + (1-P0)(1- E1)…(1-Et)
Ei – probability of exclusion for i-th test
P0 can be estimated from long-term records
Am. J. Hum. Genet. 43 (1) 197-205 (1988)

Coincidental DNA matches
Match probability as a property of a locus:
M0 = 2(sum pi
2)2 – sum(pi
4)
First principles: no prior knowledge is required
Li C.C. Hum. Biol. 68 (1996) 167-184
Won the Gabriel W. Lasker Award as the best paper
of the year in Human Biology

Rare allele frequency
estimation
Commonly, from a reference sample: pi = ni/N
When, however, a stain and suspect are independent homozygotes AiA then
pi = (ni + 4)/(N + 4)
If they are independent heterozygotes AiAj then
pi = (ni + 2)/(N + 2) and pj = (nj + 2)/(N + 2)
Let the size of a reference sample N = 1000 and the frequency of a rare
allele Ai is ni = 1, so pi = ni/N = 0.001 and pii = 0.000001
If a stain and suspect are homozygotes AiAi, then pi becomes
pi = (1 + 4)/(1000 + 4 ) ≈ 0.005 and pii = 0.000025

Forensic genetics software
Allelix http://www.allelix.net/
BDgen dbgen@yahoo.com.ar
DNAdacto, mDNAbase gavriley@krinc.ru
DNAmix, EasyDNA: EasyPA, EasyPAnt, EasyIN, EsayMISS/EasyKIN
http://www.hku.hk/statistics/staff/wingfung/countdown/dnamix.ht
ml
DNAmix2DNAmix2
ftp://statgen.ncsu.edu/pub/storey/DNAMIXv2/dos/dnamix2.exe
DNA-view http://dna-view.com/
EasyPat, Patern
http://www.uni-
kiel.de/medinfo/mitarbeiter/krawczak/download/index.html
Familias
http://www.math.chalmers.se/~mostad/familias/familias.zip
FCalc bolon@caltech.edu www.its.caltech.edu/~bolon
GRAPE serge@star.net.

Identity, NewPat5 DadShare
http://www.zoo.cam.ac.uk/zoostaff/amos/
PARENTE http://www2.ujf-
grenoble.fr/leca/download/PARENTE/PARENTE.zip
PATER2 spena@dcc.ufmg.br
PATRI
http://www.bscb.cornell.edu/Homepages/Rasmus_Nielsen/mdivhttp://www.bscb.cornell.edu/Homepages/Rasmus_Nielsen/mdiv
/mdiv.exe
PedCheck
http://watson.hgen.pitt.edu/register/docs/pedcheck.html
PowerMarker http://152.14.14.57/
PowerStats http://www.promega.com/geneticidtools/
ProbMax http://www.uoguelph.ca/~rdanzman/software/
Relative hhg2@columbia.edu
SPUR nina.fukshansky@gmx.de
STRLab http://strlab.co.za/

Population genetics software
Arlequin http://anthro/unige.ch/arlequin
CERVUS http://helios.bto.ed.ac.uk/evolgen
Con~Struct andy.overall@ed.ac.uk
FSTAT http://www.unil.ch/izea/softwares/fstat.html
FSTMET, HWMET http://www.reading.ac.uk/~snsbalng/
GDA http://lewis.eeb.uconn.edu/lewishome/software.html
GEN lazzeroni@stanford.edu
GENEPOP ftp://ftp.cefe.cnrs-mop.fr/genepop
GENEPOP on WebGENEPOP on Web
http://wbiomed.curtin.edu.au/genepop/index.html
GENETIX http://www.univ-montp2.fr/~genetix/genetix.htm
GeneKonv http://www.rrz.uni-
hamburg.de/OekoGenetik/software.htm
HWE
http://www.biology.ualberta.ca/old_site/jbrzusto/hwenj.html
PopGen32 Http://www.ualberta.ca/~fyeh
Population http://www.cnrs-
gif.fr/pge/bioinfo/populations/index.php?lang=fr
PowerMarker http://www.powermarker.net
PowerStats http://www.promega.com/geneticidtools/
TFPGA http://bioweb.usu.edu/mpmbio/tfpga.htm

Software online
Allelix http://www.allelix.net/
GENEPOP on Web
http://wbiomed.curtin.edu.au/genepop/index.html
ProfilerPlus Random Match Probability Calculator
http://www.csfs.ca/pplus/profiler.htm

Different tests (even exact) can lead to
different conclusions
Locus: vWA, Russian Caucasians, Hardy-Weinberg equilibrium test
Test P-value or CL Software
χ2 0.106 ChiHW, GDA, PowerMarker,
etc.etc.
Corrected χ2 0.092 GEN
Fisher’s probability,
Guo-Thompson alg.
0.026 Arlequin, GDA, GENEPOP,
HWE, TFPGA, etc.
G2 asympt. 0.163 POPGENE
Fis 0.141 FSTAT, GENETIX
Fis, 95% cred. lim. -0.044, -0.015 HWMET

Algorithms
Modern software implement exact nonparametric
approaches and modern Bayesian ideology and
methodology.
Their realization requires sophisticated
computational algorithms and facilities.
In this respect some new problems are raised, e.g.
the problem of convergence for the procedures
based on Markov chain Monte Carlo (MCMC)
algorithms.

familiarize yourself with the method,
including convergence diagnostics
K. L. Ayres, D. J. Balding
P-values produced by MCMC procedure
depend on the number of
randomization steps:
10 steps — P = 0.7815 ± 0.0008104 steps — P = 0.7815 ± 0.0008
105 steps — P = 0.2681 ± 0.0005
106 steps — P = 0.373 ± 0.012
107 steps — P = 0.424 ± 0.006
108 steps — P = 0.460 ± 0.003

Conclusion
First principle of GSP
It should be good statistics practiceIt should be good statistics practice
to analyze the data with different
statistical methods and investigate
their consistency.

Acknowledgements
We thank Drs., Karen L. Ayres and David J. Balding,
Laura C. Lazzeroni and Kenneth Lange, and John
Brzustowski
for kind supply with the executables of theirfor kind supply with the executables of their
programs (HWMET, GEN and HWE, respectively).
Many thanks to them and Drs. Angel Carracedo,
Laurent Excoffier, Jerome Goudet, Kejun Liu,
Tristan Marshall, Mark P. Miller, Eleanor Morgan,
Michel Raymond, Francois Rousset, Hans-Georg
Scheil, Bruce S. Weir and Dmitri Zaykin,
the authors of other programs and papers used in
this study, for helpful and fruitful discussion.

Sincere thanks
Drs.
Carsten HohoffCarsten Hohoff
Edwin Ehrlich
Kurt Trübner
for the invitation, help and financial
support

Catalog of formulae for forensic genetics ppt

Recommended

Recommended

More Related Content

Similar to Catalog of formulae for forensic genetics ppt

Similar to Catalog of formulae for forensic genetics ppt (20)

More from Nikita Khromov-Borisov

More from Nikita Khromov-Borisov (18)

Recently uploaded

Recently uploaded (20)

Catalog of formulae for forensic genetics ppt