Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Catalog of formulae for forensic genetics ppt
1. COMPREHENSIVE CATALOG OF
STATISTICAL FORMULAE,
ALGORITHMS AND SOFTWARE – A
STEP TOWARDS GOOD STATISTICS
PRACTICE IN FORENSIC GENETICS
Nikita N. Khromov-Borisov,Nikita N. Khromov-Borisov,
Andrew G. Smolyanitsky
Forensic Medicine Bureau of Leningrad District
Saint Petersburg, Russia
Nikita.KhromovBorisov@gmail.com
Andrew.Smolyanitsky@yandex.ru
2. Quotations of the day
If your experiment requires statistics,
then you ought to have done a better experiment
Ernest Rutherford
Statistical thinking will one day be as necessary
for efficient citizenship as the ability to read and writefor efficient citizenship as the ability to read and write
Herbert George Wells
Those who ignore Statistics are condemned to reinvent it
Bradley Efron
If Experimentation is the Queen of the Sciences,
then Statistical Methods must be regarded
as Guardians of the Royal Virtue
Myron Tribus
3. GSP – Good Statistics
Practice is what we need
Obviously, in their turn, statistical methods must be blameless and
perfect
So there is an urgent need for the comprehensive catalog of
carefully checked and approved formulae
as well as corresponding algorithms and software.as well as corresponding algorithms and software.
Unfortunately, some of them are published initially with errors which are
reproduced in subsequent sources.
Example of corrections:
Clayton T. M., Foreman L. A., Carracedo A.
FORENSIC SCIENCE INTERNATIONAL
Vol. 125: No. 2-3 p. 284-284, 2002
Motherless case in paternity testing by Lee et al.
4. Elements of the Statistical Design of
Experiment
Some formulae are in rare use or forgotten.
Example: Chakraborty’s formula for the sample size required
to reach the representativeness (reliability, “saturation”) of
the reference population samples. Human Biology 64 (1992)
141-159:141-159:
ln[1 - (1 - α)1/r]
Nmin= --------------
4 ln(1 – Pmin)
Nmin - minimum number of independent individuals to be analyzed,
a - probability of error,
r - number of alleles revealed by the system,
Pmin- minimum allele frequency.
5. Minimum sample sizes required
for the reference population data
Minimum
allele
frequency
No. of
alleles
Error Sample size,
No. of
individuals
P r α NPmin r α Nmin
0.01 2 - 25 0.001-0.0001 190 - 310
0.005 2 – 25 0.001-0.0001 380 - 620
0.001 2 - 25 0.001-0.0001 1900 - 3100
6. Template for paternity testing
Mother Child Tested man
JK JK JJ JK JW
PI 1/(pJ +pK) 0.5/(pJ +pK)
JJ JJ JJ JWJJ
JK
KK
KW
JJ
JJ
JK
JK
JJ
JJ
JJ
JJ
JK
JK
JK
JW
JW
JW
JW JZ
PI 1/pJ 0.5/pJ
Obligative paternal alleles are in red color
False genotyping is excluded
7. C. C. Li and A. Chakravarti
alternatives
Paternity probability based on Nonexclusion
P0
W=-------------------------------
P0 + (1-P0)(1- E1)…(1-Et)P0 + (1-P0)(1- E1)…(1-Et)
Ei – probability of exclusion for i-th test
P0 can be estimated from long-term records
Am. J. Hum. Genet. 43 (1) 197-205 (1988)
8. Coincidental DNA matches
Match probability as a property of a locus:
M0 = 2(sum pi
2)2 – sum(pi
4)
First principles: no prior knowledge is required
Li C.C. Hum. Biol. 68 (1996) 167-184
Won the Gabriel W. Lasker Award as the best paper
of the year in Human Biology
9. Rare allele frequency
estimation
Commonly, from a reference sample: pi = ni/N
When, however, a stain and suspect are independent homozygotes AiA then
pi = (ni + 4)/(N + 4)
If they are independent heterozygotes AiAj then
pi = (ni + 2)/(N + 2) and pj = (nj + 2)/(N + 2)
Let the size of a reference sample N = 1000 and the frequency of a rare
allele Ai is ni = 1, so pi = ni/N = 0.001 and pii = 0.000001
If a stain and suspect are homozygotes AiAi, then pi becomes
pi = (1 + 4)/(1000 + 4 ) ≈ 0.005 and pii = 0.000025
14. Different tests (even exact) can lead to
different conclusions
Locus: vWA, Russian Caucasians, Hardy-Weinberg equilibrium test
Test P-value or CL Software
χ2 0.106 ChiHW, GDA, PowerMarker,
etc.etc.
Corrected χ2 0.092 GEN
Fisher’s probability,
Guo-Thompson alg.
0.026 Arlequin, GDA, GENEPOP,
HWE, TFPGA, etc.
G2 asympt. 0.163 POPGENE
Fis 0.141 FSTAT, GENETIX
Fis, 95% cred. lim. -0.044, -0.015 HWMET
15. Algorithms
Modern software implement exact nonparametric
approaches and modern Bayesian ideology and
methodology.
Their realization requires sophisticated
computational algorithms and facilities.
In this respect some new problems are raised, e.g.
the problem of convergence for the procedures
based on Markov chain Monte Carlo (MCMC)
algorithms.
16. familiarize yourself with the method,
including convergence diagnostics
K. L. Ayres, D. J. Balding
P-values produced by MCMC procedure
depend on the number of
randomization steps:
10 steps — P = 0.7815 ± 0.0008104 steps — P = 0.7815 ± 0.0008
105 steps — P = 0.2681 ± 0.0005
106 steps — P = 0.373 ± 0.012
107 steps — P = 0.424 ± 0.006
108 steps — P = 0.460 ± 0.003
17. Conclusion
First principle of GSP
It should be good statistics practiceIt should be good statistics practice
to analyze the data with different
statistical methods and investigate
their consistency.
18. Acknowledgements
We thank Drs., Karen L. Ayres and David J. Balding,
Laura C. Lazzeroni and Kenneth Lange, and John
Brzustowski
for kind supply with the executables of theirfor kind supply with the executables of their
programs (HWMET, GEN and HWE, respectively).
Many thanks to them and Drs. Angel Carracedo,
Laurent Excoffier, Jerome Goudet, Kejun Liu,
Tristan Marshall, Mark P. Miller, Eleanor Morgan,
Michel Raymond, Francois Rousset, Hans-Georg
Scheil, Bruce S. Weir and Dmitri Zaykin,
the authors of other programs and papers used in
this study, for helpful and fruitful discussion.