001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
Y DNA Surname Projects - Some Fresh Ideas
1. 11th
Annual International Conference on Genetic Genealogy
Houston, 13-15 November, 2015
Surname Projects –
Some Fresh Ideas
James M Irvine
Member: GOONS, ISOGG, OFHS, SGS
2. D N A
31 patients Did Not Attend their appointments at this surgery last month.
3. Overview
(1) pre BigY:
- Background
- Penetration
- “Matching”, “Grouping” & “Genetic Families”
- False Positives & False Negatives
- TMRCAs
- “NPEs”
- Geographic origins
- SNPs
(2) BigY & BAM data: use & interpretation
using the Irwin project to illustrate principles & tools
that may be relevant to other surname projects 3
4. Surname DNA Projects: their
context
4
DNA testing
Medical Paternity Genetic Criminal Archeology
applications testing genealogy investigations ("Ancient DNA")
mt-DNA y-DNA at-DNA x-DNA
tests tests tests tests
Deep Surname "chasing
Ancestry projects cousins"
- Closed projects - STR tests
- Open projects - SNP tests
y-DNA & surnames only descend through the male line
5. Surname DNA Projects:
Roles of volunteer
Administrators
1. Agree & refine terms of reference & goals
- including “closed” or “open”.
2. Maintain genetic & genealogical database.
3. Define & identify genetic families.
4. “Add value” from genealogical data:
- identify cousins & geographic origins.
5. Publicise results.
6. Liaise with individual participants.
7. Recruit new participants.
Always respecting participants’ confidentiality. 5
6. Irwin Surname project:
Background• Scottish lowlands surname
• strong genealogical traditions, but few “old” pedigrees
• active clan association in America
• the DNA project:
- only represents 0.12% of Irwins etc. in world today, BUT
- has grown steadily over 10 years
- has 392 y-DNA STR and 19 “BigY” test results
- is about the 50th
largest of 8,000 surname projects
- includes largest genetic family in any surname project
- shows surname typifies Scotch-Irish-America diaspora
- has associated but separate Autosomal DNA project
66
10. Irwin project:
Geographical “penetration”
10
Participant's All Irwins etc. Penetration
place of in world today of project
residence * **
Project size/Population 392 300,000 0.12%
USA 77% 61% 0.13%
Canada 6% 12% 0.05%
Australia, New Zealand 6% 9% 0.07%
England & Wales 5% 10% 0.05%
Scotland 5% 4% 0.12%
Ireland (NI & Eire) 1% 3% 0.03%
Germany, Netherlands - 1% 0.00%
Unknown, other - - -
*: Source: w w w .w orldnames.publicprofiler.org/
**: definition: w w w .jogg.info/62/files/Irvine.pdf
12. The Scotch-Irish
12
The term Scots-Irish, or Ulster Scots, refers to Scots who migrated
to Ireland, typically in the 17th
century from SW Scotland to Ulster.
•Many Scots took part in the Plantation of Ulster c.1610,
either as a landowning Undertaker, or as a tenant.
Each Undertaker undertook to keep 40 loyal tenants.
•Other settlers included Border Reivers who had been banished.
•Most Scots-Irish were Presbyterians.
•Very few Scots-Irish have pedigrees back to Scotland
(unless their ancestors were Undertakers).
The American term Scotch-Irish refers to descendants of these
Ulster settlers who in turn migrated to America, typically in the
18th
century to the Appalachian piedmont (PA-GA).
•Few Scotch-Irish have pedigrees back to Ireland.
16. Matching & Grouping:
Definitions
Large projects need rigorous definition of
terms & procedures to determine:
(1) if two testees are a near match,
(2) how matching testees are grouped,
&
(3) how groups should be named
16
17. Genetic Distance: Example
Comparison of two 12-marker STR haplotypes
17
3 3 3 3 3 3 4 3 4 3 3 3
Haplotype 9 9 9 9 8 8 2 8 3 8 9 8
DYS 3 0 4 1 5 5 6 8 9 9 2 9
a b -i -ii
Testee A 13 24 14 11 11 15 12 12 12 13 13 29
Testee B 13 24 15 11 11 15 11 12 10 13 13 29
difference 0 0 1 0 0 0 1 0 2 0 0 0
matching markers: 9/12
mismatching markers: 3/12
Genetic Distance: 4/12
Genetic Distances are useful for educational & illustrative purposes, BUT:
1. Special rules apply for multi-copy markers:
DYS 385, 389, 395, 413, 459, 464, CDY & YCA11.
2. Four different models for calculating GDs:
Stepwise; Infinite alleles; FTDNA hybrid, old & new.
3. GDs take no account of differing average mutation rates for each marker:
e.g. av. rate of CDY is 400 times that of DYS494.
18. TiP (Time Predictor)
18
TiPs - allow for different average mutation rates for each marker
- are FTDNA’s most sophisticated tool for matching;
BUT - appear complicated and slow;
- derivation is “opaque”, and liable to be updated;
- 2 decimal places (e.g. 96.73%) is misleading;
- limited to FTDNA testees.
19. “TiP Score”
TiP Score: - simple, arbitrary tool for project management;
- 24-generation, no-paper-trail TiP at highest available resolution;
- best available indicator of the probability of two
testees
sharing a common ancestor within the
surname era;
- avoids problems of Genetic Distances & matrices;
- nearest whole % (e.g. 97%) sufficient;
20. Matching
A “near match” is a rule-of-thumb, arbitrarily chosen,
to determine if two participants share a common ancestor
within the surname era, i.e. in the last millennium.
FTDNA list near matches on their personal yDNA “Matches” pages.
They use criteria of GD = 1/12, 2/25, 4/37 or 7/67,
sometimes known as “1, 2, 4, 7 rule”, or “10% rule”
Some Surname project administrators use other criteria, e.g.
• GD: “1, 2, 4, 6 rule”, or
• GD: “0, 2, 3, 5 rule”
Irwin project:
• TiP Score: “60% rule” (for Irwins);
“95% rule” (for non-Irwins)
20
21. False Positives & False
Negatives
• FTDNA’s “Matches” pages are useful for newbies,
but are in fact an arbitrary compromise:
• for comparing similar surnames the “10% rule” is too
stringent :
- 7% of Irwins show as “False Negatives” (e.g. 5/37 or 6/37);
- 60% TiP Score gives better matching.
• for comparing dissimilar surnames the “10% rule” is too
lax :
- most “Matches” are “False Positives” i.e. co-incidental;
- 95% TiP Score gives better screening to identify NPEs,
especially when confirmed by terminal SNP test, e.g. L555.
21
22. Grouping
Assigning testees to clusters / groups / genetic
families:
Subjective choice of project administrator:
• by haplogroup (default used in FTDNA public pages) or SNP
• by genealogical feature
e.g. surname spelling, or place of residence
• by near matches
e.g. GD matrix
GD from mode
TiP Score from modal participant
• other features e.g. rare / idiosyncratic markers,
TMRCAs, cladograms, triangulation
22
23. Genetic Distance Matrix:
Example
23
Genetic Distance Matrix of eight 37-marker STR haplotypes
A -
B 0 -
C 1 4 -
D 0 1 3 -
E 13 9 8 16 -
F 7 11 4 9 1 -
G 3 8 10 8 0 2 -
H 6 2 9 7 6 10 9 -
Participant A B C D E F G H
Interpretation: Two genetic families: A, B, C, D and E, F, G
One Singleton: HH
Problems:
1-3. Problems inherent in Genetic Distance.
4. Separate matrices necessary for comparing 12, 25, 37, 67 & 111 markers.
5. Matrices are very cumbersome for large projects.
24. Irwin project –
justification for use of 60% TiP
Score
24
0
10
20
30
40
50
60
70
Frequency
of
TiP Scores
Magnitude of TiP Scores from project modal haplotype
25. Irwin project : Definitions
• Genetic family: 2 or more participants with TiP Scores >
60%
(> 95% for dissimilar surnames).
• Singleton: unassigned Irwin with TiP Score < 60%.
• TiP Score: 24-generation, no-paper-trail TiP, at highest
available resolution, from modal
participant:
probability of sharing common ancestor with
modal participant within the surname era,
i.e. probability of being member of genetic family.
• Modal participant: participant whose genetic signature is the
most
typical of the members of a genetic family.25
27. TMRCA
(Time to Most Recent Common
Ancestor)
27
Popular tables/graphs can predict no. of generations/years
back to the common ancestor of two participants.
BUT
• All TMRCAs are probabilities
• TMRCAs based on genetic distance:
- assume some single average mutation rate;
- even the chosen average mutation rate may be incorrect;
- ignore back mutations;
- can be very misleading.
28. TMRCAs: typical margins of
error when predicted by
Genetic Distance
28
Genetic Most probable TMRCA 90% of TMRCAs
Distance within
0/37 1 generation = 30 years 0 - 290 years
1/37 3 generations = 90 years 0 - 450 years
2/37 6 generations = 180 years 65 - 580 years
3/37 9 generations = 270 years 110 - 710 years
4/37 12 generations = 360 years 165 - 825 years
5/37 15 generations = 450 years 220 - 930 years
Assumptions: average mutation rate =0.0042 per generation
1 generation =30 years
Source: www.dna-project.clan-donald-usa.org/tmrca.htm
30. NPEs: possible causes
Narrow definition (used in genetics):
• Surrogacy: not yet likely in context of genealogy
• Illegitimacy outside marriage: boy taking maiden name of mother
• Infidelity within marriage: boy taking surname of mother’s husband
Wider definition (when surname & DNA don’t match) also includes:
• Re-marriage: boy taking surname of step-father
• Adoption, incl. orphan, waif: boy taking surname of guardian
• Formal name-change: man taking maiden name of wife or mother
• Informal name-change, or alias: man taking name of farm, trade or mother
• Anglicisation of gaelic or foreign surname
• Error in genealogy
Similar symptoms , but not a NPE if father didn’t use a hereditary surname:
• By-name: man taking name of farm, trade or origin
• Tenant or vassal: man taking surname of landlord or chief
• Apprentice or slave: man taking surname of master 3030
31. Manifestations of NPEs
• Egressions from a genetic family (“e-NPEs”):
same DNA, but different surname
e.g. Irwin DNA, but Elliot surname
(possibly an Elliot step-father)
• Introgressions into a genetic family (“i-NPEs”):
same surname, but different DNA
e.g. Elliot DNA, but Irwin surname
(possibly an Irwin step-father)
“One project’s e-NPE is another project’s i-NPE”.
31
34. Recognising & handling
NPEs
e-NPEs:
testee finds near matches with another surname,
& asks admin. to join this second surname project.
NB Need stringent matching criteria or evidence of NPE.
i-NPEs:
administrator finds near matches with another surname,
& creates a new genetic family within in his project.
NB i-NPEs are a sensitive subject which may disappoint
testees, even if they accept the ‘event’ was not
necessarily an illegitimacy or infidelity.
For all NPEs, if cause & date of the ‘event’ are not known,
seek evidence that the two surnames were once neighbours.
34
36. 36
Irwin project: Results
examples (2)
ID Earliest confirmed paternal ancestor Haplo- No. of Genetic Distance TiP Remarks
Surname Forename born died Residence(s) group markers from Mode Score
tested /12 /25 /37 /67 /111 from modal
SCOTTISH BORDERS ("B")
65875 U Irwin Henry E c1813 Lancaster Co, PA R1b1 67 - - - - - - Modal participant
112094 E Urwin William 1783 1851 Co. Durham R1b1 67 0/ 0/ 0/ 0/ - 100%
194922 U Ervin John 1715 N.Ireland SC R1b1 111 0/ 0/ 0/ 0/ 0/ 100%
102835 U Armstrong 1844 1902 Co.Tyrone OH R1b1 67 0/ 0/ 0/ 0/ - 100%
108028 U Irvine Andrew 1763 1797 Ireland PA R1b1 37 0/ 0/ 0/ - - 100%
85111 U Irwin Samuel 1736 1783 Lancaster Co, PA R1b1 67 0/ 0/ 1/ 1/ - 100% 5th cousin of 72683
72683 U Irwin Samuel 1736 1783 Lancaster Co, PA R1b1 111 0/ 0/ 2/ 2/ 5/ 99% 5th cousin of 85111
54774 U Irving William fl.1484x1506 Bonshaw, Dumfriesshire R1b1 67 0/ 0/ 2/ 3/ - 99%
87191 S Irving Francis c1568 1633 Dumfries, Dumfriesshire R1b1 67 0/ 0/ 1/ 2/ - 99% brother of 19864
19864 S Irving Francis c1568 1633 Dumfries, Dumfriesshire R1b1 67 1/ 2/ 3/ 4/ - 99% brother of 87191
169170 E Irvine John 1662 1732 Eskdale, Dumfriesshire R1b1 37 0/ 1/ 3/ - - 99% Mt. Everest line
84825 U Erwin Matthew c1695 Co.Antrim? NC R1b1 67 1/ 3/ 5/ 5/ 7/ 98% False negative
39927 C Elliot Simon 1897 1955 Co.Fermanagh R1b1 37 1/ 2/ 4/ - - 98% e-NPEs
106520 U Irvin Joe 1744 MD R1b1 12 0/ - - - - 91%
NPE Elliot (1) ("NE1")
161010 U Irwin Hiram 1815 Ireland? IL I1 67 13/ 28/ 39/ 55/ - 0% ) 100% with Elliots
72309 U Irwin Andrew 1765 1824 Scotland TN I1 37 13/ 28/ 40/ - - 0% ) i-NPEs
ORKNEY (1) ("O1")
51216 U Irving Christe fl. 1468 Shapinsay, Orkney Isles NY R1b1 37 2/ 6/ 11/ - - 16% Washington Irving
29479 E Irvine George c1705 1742 Sandwick, Orkney Isles R1b1 37 3/ 6/ 11/ - - 18% author of this paper
IRISH - Munster ("IM")
75606 U O'Ciarmhacain/Irwin Eoin 1785 1845 Limerick, Ireland NJ R1b1 67 2/ 8/ 16/ 19/ - 1% gaelic; catholic
22971 I Irwin William 1840 Limerick, Ireland R1b1 67 2/ 9/ 17/ 20/ - 1%
Singleton
84049 U Irwin William c1770 c1810 Leinster, Roscommon R1b1 37 5/ 9/ 16/ - - 2%
37. Irwin project: Genetic Families
And we thought Irwin was a single-origin surname!
*: with 262 members this is apparently the largest genetic family in any surname project.
37
Origin Genetic % of 392 of which
Families participants e-NPEs
Scotland Borders* 1 67% 17%
i-NPEs 15 10% 0
Aberdeenshire 1 1% 0
Forfarshire 1 0% 0
Perthshire 1 1% 0
Orkney 2 2% ?1%
Shetland 1 1% 0
Unknown 6 3% ?0-3%
Ireland 4 4% 1%
Germany/ Netherlands 1 2% 0
Africa 1 0% 0
Singletons - 9% ?
Total 34 100% 13-16%
38. 38
EXAMPLE OF TRIANGULATION Crystie Irwing Irvings were first Magnus (Irving)
fl. 1468, -a1504 recorded in Orkney fl. 1470
IRVINGS OF ORKNEY first of Sabay in 1369 Clovigarth
showing the two lines of descent John m ? ………….
identified by DNA tests fl.1483,-1519x22 heiress (Clovigarth)
Sabay of Yesnaby
James John m2 Katherine Kirkness m1 ........ Irving
fl.1534, -1567 fl.1534 , -1597/8 fl.1561 (Clovigarth)
Sabay; Law man of Orkney Overgarson heiress of Overgarson?
?
Magnus William William James Alexander
fl.1536, -1614 fl.1601 -1614 -1612 fl.1601
Shapinsay Sabay Clovigarth Overgarson Yesnaby
Thomas Patrick Magnus Alexander Alexander
c1570-p1646 fl. 1582, -a1614 fl.1583, -1649 -1629 c1600-1642
Quholm Overgarson Lie Yesnaby
? William Magnus Patrick George
c1610- c1601-1626 -1657 fl. 1635x78 c1628-c1700
last of Sebay Overgarson Lie Yesnaby
George David James
fl.1650, -1702x11 fl. 1673x1701 c1660-c1705
Overgarson Lie Yesnaby
Magnus Patrick
1650- fl.1711x29
John Magnus Hary (2) Duncan (1) Edward Edward
1682-a1746 1685-p1731 c1705-p1768 c1700-1749 1704-1756x64 1707-1796
Quholm Skaebreck Overgarson Lie Quoyloo
James William John Edward George
c1734-1797 1731-1807 ? c1736-p1792 c1735-c1791 c1750-1800
Quholm; NY Skaebreck Overgarson Quoyloo
James Ebenezer John m Jannet Edward Peter George
1759-1835 1776-1868 -1808x21 Irvine 1774-1833x41 1741-p1772 c1750-1800
New York Washington Huan 1754-1832x41 Overgarson Lie Quoyloo
1783-1859
author
FTDNA Kit. No. 174038 51216 29479 169056 174074 199671
Test sequence 4th= 2nd 1st 3rd 4th= 6th
Genetic family "Orkney 1" "Orkney 2"
39. Irwin project:
Geographic origins
39
Participant's Residence of Historic origin
place of earliest confirmed of
residence paternal ancestor genetic family
Project size 392 392 392
USA 77% 21% -
Canada 6% 1% -
Australia, New Zealand 6% - -
England & Wales 5% 3% -
Ireland (NI & Eire) 1% 40% 5%
Scotland 5% 23% 84%
Germany, Netherlands - 1% 2%
Unknown, other - 10% 9%
40. 40
Irvine, Ayrshire
Irwin project:
1200 Scottish ancestral lines
as shown by DNA tests
1300
Borders X Drum, Aberdeenshire X
1400 Orkney1 Orkney2
1500 Eskdale Bonshaw Dumfries 11 other
& Castle lines
Irvine
1600 X Perth X Shetland
1700
1800
BE BB BD BA, Bel, Ber,
B9, B10, B14, B15,
B16, B17, B23, B29
Eskdale
43. The two types of y-DNA
test
43
STR tests
metaphor: "individual leaves on a tree"
used for: comparing genetic signatures
Sequencing Sanger Next Generation
quantification analogue probabilistic
expressed as counts of markers quality of base pairs
FTDNA y- tests 12/25/37/67/111 markers Single SNP SNP Pack BigY
use in Surname
projects
main tool haplogroup BigY advanced tool
projects: confirmation support
secondary data haplogroup prediction STR and mt data
SNP ('snip') tests
"branches and twigs"
building phylogenetic tree
Sanger
binary
e.g. L21+ or L21-
44. Irwin project: Phylogenetic treeThe genetic "Adam" 200,000-300,000bp
M42
M168 70,000bp
M89
M9
M45
M96 M170 M304 M207 30,000bp
E I s1I J R (years before present)
P147 L68 M253 NE1 NKr M267 M172 M173 25,000bp
E1 I2 I1 NC ND J1 NG J2 R1
P177 L46 M410 M513 M343 16,000bp
P2 L135 CLAN IRWIN PHYLOGENETIC TREE L26 M439 UD P25
M2 AF M223 IL as at 1 Nov. 2015 M67 UJ P297 12,000bp
showing tested members of Irwin genetic families in green, M269 NBt NJ NKd NL
UN U3 U4 U5
and FTDNA's predictions of Irwin genetic families in red. L23 Mesolithic
See Borders Irwin phylogenetic tree for L555 BigY results L51
PF7589 G L151, P311 Atlantic Modal Haplotype
U106O2 P312 SF 5,300bp-Neolithic
S263 DF27 Z195 M269+, L21- DA L21 NR 4,000bp
S264 L176.2 Z274 DF63 DF13
DF96 Z262 Z209 NN CTS6919 DF49 - b DF21 - h CTS4466 Z251
R1b12a1a2c1a - c R1b12a1a2c1g - i R1b12a1a2c1l R1b12a1a2c1j
- d - k
- e - m
L1 NBl M167 O1 A92 DF23 - f Y11277 - n Z21065 - S1156 Z16943
- FGC13899
Z16506 Z2961 Z16294 A541 CTS4157 Z16944 Pre-surname era
BA BB BD BE Bel Ber
BY674 NM M222 PFNF Z16281 NE2 A195 IM1 FGC7549 L555 B9 B10 B14 B15 B16
PF IM2 B17 B23 B29
48. Irwin project: BigY
goalsInitial goals
• manage and understand BigY results
• set up cloud account to share project data
Interim goals
• minimise dependence on 3rd
party analysis tools
• focus on our large L555 (“Borders”) genetic family
• facilitate 1 BigY test for each of 10 main sub-groups
• confirm/refine project phylogentic tree and TMRCAs
Current goals
• facilitate FTDNA offering a low-cost L555 “SNP Pack” test
• use SNP Pack data to refine individual TMRCAs
NB I am giving low priority to “naming” novel variants and having them placed
on the phylogenetic trees of FTDNA and ISOGG, at least until a robust
understanding of the structure of L555 sub-branches has emerged.
48
49. Example of limitations of
algorithm-based analyses of BigY test
results:
the Private SNPs of FTDNA L555 Kit no. 65048
49
FGC YFull Williamson "DIY"
Name Position vcf csv** Analysis *** incl. In No. of No. of Consistency SNP
Big Tree?* reads Indels of SNP reads status
FGC19532 8557914 G A Pass, I variant Known SNP, High conf. Private >95% B100 yes 75 0 100% Probable
FGC19534 16642304 G C Pass, I variant Known SNP, High conf. Private >95% B100G yes 48 0 100% Probable
FGC19535 16956346 T G Pass, I variant Known SNP, High conf. Private >95% B100 yes 81 0 100% Probable
FGC19537 18668146 C A Pass, I variant Known SNP, High conf. Private >95% C 98 yes 47 0 98% Probable
FGC19538 18775426 C T Pass, I variant Known SNP, High conf. Private >95% B100 yes 64 0 100% Probable
FGC19539 19436082 G A Pass, I variant Known SNP, High conf. Private >95% C 96 yes 40 0 98% Probable
- 18982587 G A - Novel variant, High conf. - - - 34 0 94% unstable
- 18982595 G A - Novel variant, High conf. - - - 32 0 97% unstable
- 13226006 C A - - Private >40% - - 2 0 100% possible
- 13571571 C T - - Private >40% - - 2 0 100% possible
- 10064260 C T - - Private >40% - - 2 0 100% possible
- 16275572 C A - - - M100 - 2 0 100% possible
A608 7534406 G T - Known SNP, High conf. * - - 94 55 67% no
- 16344316 TC T Pass, I variant - -/a - - 73 0* 100% no
CTS10214 19328796 G T Rej'd*, 1 variant - - 1 read - 1 0 100% no
PF3499 14624254 C T - - - >1 read - 29 0* 100% no
*: no BED coverage **: FTDNA list 73 other ***: FGC and YFull's *: AW lists *: Indel in
high conf. Novel variants, analyses have many 20 other low others tests
of which 13 appear to be more low confidence conf. Private
private to 65048 private markers markers
BAM dataFTDNA
Bases
Variant
50. Analysis options for BigY test
results
50
FTDNA BAM file
Computerised algorithms ("science") Manual refinement ("art")
FGC YFull FTDNA vcf file
Analysis Analysis
FTDNA csv file Haplogroup projects
e.g. "Big Tree"
FTDNA Matches Surname project admins "DIY"
Detecting & Filtering Quality
- High level SNPs - Old SNPs - Regions
- Terminal SNPs - Intermediate SNPs - SNPs/Indels
- Novel SNPs - Private SNPs - No.of Reads
- Unique SNPs - Consistency of Reads
- Compatibility within sub-clade
- Stability across haplogroup
- Phylogenetic trees
-TMRCAs
51. Process for “DIY” BigY
analysis1. Create project cloud account ; upload VCF, BAM & BAM.BAI files.
2. Identify relevant variants from CSV & Matches data, Walsh & Williamson
(& FGC/YFull Analyses, if used).
3. Use BAM IGV viewer to:
(1) filter relevant variants: A: pre-L21 (shared by all L555
testees)
B: L21-L555 ( ” )
C: L555 block (shared only by L555 testees)
I : Intermediate
(shared by some L555 testees)
Pn: Private (unique to each testee).
(2) determine SNP quality for each variant:
“Probable” if >10 reads AND consistency >85%
“possible” if 2-9 reads OR consistency 70-85%
“No” if 1 read, OR consistency <70%,
OR Indel, OR unreliable region.
4. Consider stability of SNP quality vs. that for closely-related BigY testees.51
53. BAM analysis Example:
2: Construct matrix of relevant
variables and closely-related BigY
testees
53
Named Position 1 - 22874 2 - 311268 6 - N126337
Variant on Genome Irvine - BX C'ningam-BX Erwin - B10 Irving - B17 Irvin - B26 Irvin - BA
Reference
Alternative
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
CTS11273 23045843 T A
DF13 2836431 A C
FGC19532 8557914 G A
FGC19534 16642304 G C Synonyms and positions of
FGC19535 16956346 T G named variants
FGC19537 18668146 C A (shown in red)
FGC19538 18775426 C T are derived from
FGC4341 8757882 A G ybrowse
L21 15654428 C G (www.ybrowse.isogg.org)
L555 7647335 G T
PF496 13297909 T G
PF6729 10022033 A G
PR1489 14543997 C C
Z16940 22470652 T T
Z16946 8014468 G A
Z16949 7933047 T TAA
CAZ251 8736334 G A
8531427 C T
13226006 C A
13294119 T T
13801126 A G
15093112 G A
15218377 T A
16561158 A G
16630774 G A
17319595 G A
18982595 G A
21368012 G A A G G A A A 32 0 94
21515424 T A
21782548 T G
21950915 G T
22487613 G T
23898645 T C
24479734 T C
Base 5 - 230264- 2264263 - 65048
54. BAM analysis Example:
3: Enter BAM data, sort &
filterBlock Named Position 1 - 22874 2 - 311268 6 - N126337 SNP Comments
Variant on Genome Irvine - BX Cunningam-BX Erwin - B10 Irving - B17 Irvin - B26 Irvin - BA Category
Reference
Alternative
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
Block B L21 15654428 C G G 59 0 100 G 71 0 98 G 60 0 100 G 69 0 96 G 33 0 97 g 18 0 78
L21 to DF13 2836431 A C c 3 0 100 c? 1 0 100 C 11 0 91 c 6 0 100 c 6 0 100 c 2 0 100 Poor qualities -surprising
L555 Z251 8736334 G A a 4 - 100 - a 6 0 100 a 14 0 100 a 2 0 100 ?a 7 0 57 Poor qualities -surprising
Block C L555 7647335 G T T 51 0 100 T 54 0 98 T 76 0 100 T 91 2 100 T 36 0 100 t 9 0 78 Probable
L555 Z16946 8014468 G A A 50 0 94 A 125 0 100 A 49 0 100 A 73 0 100 A 22 0 100 A 25 0 88 Probable
Z16940 22470652 T T C 53 0 96 C 52 0 88 C 72 0 89 C 44 0 89 C 53 0 100 C 59 0 86 No Unreliable region
Z16949 7933047 T TA T 46 39 100 T 76 75 95 T 38 39 100 T 47 47 100 T 54 47 100 T 94 68 100 No Indel
Intermediate FCG34569 21368012 G A A 85 0 100 G 147 0 90 G 82 0 100 A 80 0 99 A 48 0 98 A 32 0 94 Probable
Block PF496 13297909 T G g 71 0 73 t? 21 0 67 T 15 0 100 T 15 0 93 T 21 0 100 g 85 0 65 No conflicts with FCG34569
Private 17319595 G A A 23 0 87 G 24 0 100 G 27 0 100 G 58 0 100 G 24 0 100 G 78 0 100 Probable
block for 21782548 T G G 79 0 100 T 174 0 100 T 93 0 100 T 97 0 100 T 35 0 97 T 27 0 100 Probable
1 -22874 PF6729 10022033 A G g 7 0 86 a? 8 0 85 a 4 0 100 a 11 0 64 ?a 6 0 83 ?a 5 0 60 possible
Private 8531427 C T C 63 0 100 T 47 0 98 C 44 0 100 C 47 0 100 C 69 0 100 C 72 0 100 Probable
block for 16561158 A G A 17 0 100 G 34 0 100 A 23 0 100 A 41 0 100 A 14 0 100 A 16 0 100 Probable
2 -311268 21515424 T A T 45 0 100 A 59 0 98 T 49 0 100 T 77 0 99 T 42 0 100 T 45 0 100 Probable
21950915 G T G 47 0 100 T 63 0 94 G 61 0 100 G 54 0 100 G 29 0 100 G 42 0 100 Probable
13801126 A G c 1748 10 81 G 2281 0 89 c 1144 1 76 c 1658 7 71 ?c 1083 28 57 ?c 1676 53 63 No Indel
Private FGC19532 8557914 G A G 59 0 100 G 99 0 98 A 75 0 100 G 93 0 100 G 31 0 100 G 101 0 100 Probable
block for FGC19534 16642304 G C G 58 0 100 G 77 0 100 C 48 0 100 G 67 0 100 G 45 0 100 G 21 0 100 Probable
3 -65048 FGC19535 16956346 T G T 90 0 100 T 139 0 95 G 81 0 100 T 53 0 100 T 87 0 100 T 102 0 100 Probable
FGC19537 18668146 C A C 29 0 100 C 53 0 100 A 47 0 98 C 64 0 100 C 21 0 100 C 44 0 100 Probable
FGC19538 18775426 C T C 59 0 100 C 128 0 100 T 64 0 100 C 58 0 100 C 48 0 100 C 18 0 100 No appears elsewhere in L21
13226006 C A c 4 0 100 c 4 0 100 a 2 0 100 c 6 0 100 c? 1 0 100 C 31 0 100 possible
Private 16630774 G A G 65 0 100 G 44 0 100 G 42 0 98 A 32 0 100 G 59 0 100 g 6 0 100 Probable
block for 22487613 G T G 119 0 98 G 127 0 93 G 101 0 99 T 67 0 88 G 205 0 99 G 184 0 100 Probable
4 -22642 PR1489 14543997 C C c 4 0 100 - c? 1 0 100 a 2 0 100 c 8 0 100 c 5 0 80 possible
Private 15218377 T A T 22 0 100 T 41 0 100 T 31 0 100 T 51 0 100 A 10 0 100 T 40 0 100 Probable
block for 24479734 T C T 91 0 100 T 143 0 100 T 80 0 100 T 51 0 100 C 58 0 98 T 72 0 100 Probable
5 -23026 FGC4341 8757882 A G A 24 0 100 A 45 0 98 A 35 0 100 A 51 0 100 g 9 0 100 a 4 0 100 possible note marginal no. of counts
Private 23898645 T C t 56 0 84 t 109 0 78 t 71 0 80 t 90 0 71 t 45 0 80 C 27 0 85 Probable
block for 15093112 G A G 98 0 100 G 74 0 99 G 76 0 100 G 34 0 100 G 104 0 100 a 137 0 84 possible note marginal consistency
6 -N126337 13294119 T T C 32 0 100 C 35 0 100 C 25 0 92 c 74 0 62 C 18 0 100 t 10 0 70 possible
5 - 230264- 2264263 - 65048Base
55. L555 BAM analysis Results
55
BigY - L555 data as of 21 Oct 2015, by James Irvine, based on initial work by Dennis Wright
JamesIrvine: DennisWright: FTDNA: VCF(1): A if Quality >500 Alex Williamson: Mike Walsh (1): FGC: YFull: All:
Stage/Block: ) Lower case .BAM data: A Capitals: tested A .bam: not seen in,vcf -"Good" CSV: a if Quality <500 y included as per DW 9 Tree, official S shared, 99, 95% - no entry
A: Adam - L21, shown at foot of table ) IF <50% are A IF >85% AND no. ofreads >10 g Rejected, "1"qual. >500 a .bam: not seen in,vcf -"Weak" n Novel VCF(2): P pass p privste, not terminal 8 Tree, draft 3 Multi family/surname s shared, 40% m >1 read
B: L21 - L555 ) "good" BAM a IF 70-84%OR no. ofreads 2-9 - Rejected, "1"qual. <500 ? inconclusive: 1 or 2 samples, multiple bases k Known- R rejected ? private, "?" 7 Public, consistent 2 Singe family/surname P private, 99, 95% s 1 read
C: L555 ) data a? IF no.of reads 1 ? Inconclusive, "0/1" a/- no .bam test result H High conf. 0 ancestral ; 2 entries for 1 SNP! 6 Public, semi-cnstnt 1 Single individual p private, 40%
intermediate Between C and P ) Italicsin cols. G & H A Private to individual Shared SNPs which DW ignores M Med. Conf. 1 derived 4 Public, unsure -1 Unstable confirmed * private, 10%
P1, P2, P3 .... Private: unique to 1 test) additional to DW T Inconclusive SNP Unstable region - 22216800-22512940 (T Krahn) u Unknown conf. 0/1 1 & R
1 - 22874 2 - 311268 6 - N126337 7-54774 8 - 364399 9 - 280156 10 - 87191 11- 160045 12 - 280599
Irvine - BX Cunningham - BX Erwin - B10 Irving - B17 Irvin - B26 Irvin - BA Irving - BB Ervin - BE Ervin - B23 Irving - BD Irwin - B9 Irvin - B14
SNP
(Variant/
Indel)
Remarks
Stage/Block
Position b37
Reference
Alternative
Alternative
reads
Indels
Derived/reads%
vcf(1)FTDNA
vcf(2)FTDNA
csvFTDNA
AWilliamson
MWalshStage
FGC
YFull
Alternative
reads
Indels
Derived/calls%
vcf(1)FTDNA
vcf(2)FTDNA
csvFTDNA
AWilliamson
MWalshStage
FGC
Alternative
reads
Indels
Derived/calls%
vcf(1)FTDNA
vcf(2)FTDNA
csv
AWilliamson
MWalshStage
FGC
YFull
Alternative
reads
Indels
Derived/calls%
vcf(1)FTDNA
vcf(2)FTDNA
csvFTDNA
csv:N:NovelV.;H:HighConf.
MWalshStage
Alternative
reads
Indels
Derived/calls%
vcf(1)FTDNA
vcf(2)FTDNA
csv
MWalshStage
Alternative
reads
Indels
Derived/calls%
MWalshStage
Alternative
reads
Indels
Derived/calls%
MWalshStage
Alternative
reads
Indels
Derived/calls%
MWalshStage
Alternative
reads
Indels
Derived/calls%
MWalshStage
Alternative
reads
Indels
Derived/calls%
MWalshStage
Alternative
reads
Indels
Derived/calls%
MWalshStage
Alternative
reads
Indels
Derived/calls%
MWalshStage
MWalsh-Total
Block B: L21 to L555
L21/S145/M529 B 15654428 C G G 59 0 100 y 9 G 71 0 98 y 9 G 60 0 100 9 G 69 0 96 G 1P G kH 9 G 33 0 97 9 g 18 0 78 9 G 46 0 100 G 60 0 100 G 50 0 100 G 61 0 100 G 26 0 96 G 67 0 94
DF13/S521/CTS241 b 28364318 A C c 3 0 100 c 1P - y 9 c? 1 0 100 c 1R y - c 11 0 91 y 9 c 6 0 100 C kH 9 c 6 0 100 9 c 2 0 100 9 c 7 0 86 c 8 0 100 c 7 0 88 C 18 0 100 c 4 0 100 c 5 0 100
Z251/S470 b 8736334 G A a 4 - 100 a 1R ku y S m - - y ? a 6 0 100 y s m a 14 0 100 - 1R ? k?u a 2 0 100 ?a 7 0 57 a 7 0 100 a 6 0 83 a 8 0 100 a 9 0 100 a 1 0 100 A 14 0 100
Z18600 FGC only, not covered by BigY 25633952 G A
Z16943 B 6351101 T A A 46 0 100 A 1P nH y 7 - - A 62 0 97 A 1P nH y 7 - A 51 0 100 nH y 7 - - A 74 0 100 A nH 7 A 53 0 96 A 1P nH 7 A 71 0 90 7 A 69 0 100 A 66 0 87 A 77 0 100 A 107 0 97 A 75 0 100 A 80 0 100
Z16944 DW had as P1 B 7527372 G A a 37 0 84 - -! y;p? - - - A 24 0 100 A 1P nH y 7 P A 26 0 100 nH y 7 P - A 29 0 100 A 1P A kH 7 A 45 0 100 A 1P nH 7 A 80 0 90 7 A 48 0 100 A 40 0 98 A 66 0 95 A 61 0 98 A 67 0 A 34 0 100
CTS4157/S3741brother of Z16944 (AW); public block? B 15439136 G A G 15 0 100 G 0P kH - - - - G 18 0 100 g 0P kH - - - G 25 0 100 kH y - - - G 38 0 100 G 0P G kH - ?g 4 0 100 g 0P - - - g 6 0 100 G 14 0 100 G 10 0 100 G 10 0 100 g 5 0 100 G 17 0 100
FGC13746 public block withFGC7549? (Donatella) B 9375616 G T T 38 0 97 T 1P nH - 4 - - T 112 0 99 T 1P nH - 4 - T 45 0 100 nH - 4 - - T 64 0 100 T 1P T nH 4 T 38 0 100 T 1P nH 4 T 17 0 82 - T 40 0 98 T 48 0 98 T 36 0 92 T 53 0 100 T 45 0 100 T 59 0 100
FGC8673 public block withFGC7549? (Donatella) B 9852985 A G G 19 0 100 nH y 4 - - g 114 0 75 nH y 4 - G 52 0 100 nH y 4 - - G 38 0 97 G nH 4 G 14 0 100 nH 4 ?g 5 0 40 - g 12 0 83 G 10 0 100 g 7 0 100 G 20 0 100 G 59 0 100 G 10 0 100
-AW found 2015H1 B 22424486 A A A 88 0 86 A 98 0 100 A 61 0 95 A 67 0 97 A 123 0 86 a 92 0 85 a 62 0 84 A 78 0 90 A 88 0 85 A 83 0 85 A 218 0 94 A 61 0 90
Block C: L555
L555/S393 C 7647335 G T T 51 0 100 kH y 7 - m T 54 0 98 T 1P kH y 7 - T 76 0 100 T 1P kH y 7 - m T 91 2 100 T 1P T kH 7 T 36 0 100 T 1P 7 t 9 0 78 - T 35 0 100 T 52 0 100 T 61 0 100 T 43 0 100 T 25 0 94 T 52 0 100
L557/S394 DB omission? C 22513691 C G G 54 0 100 G 1P kH y 7 P m G 106 0 95 G 1P kH y 7 P G 68 0 100 G 1P kH y 7 P m G 80 0 100 G 1P G kH 7 G 41 0 98 G 1P 7 ?c 12 0 58 - G 76 0 99 G 73 0 100 G 75 0 99 G 88 0 100 G 55 0 93 G 61 0 100
Z16945 C 7536923 A G G 29 0 100 G 1P nH y 7 - - G 38 0 84 nH y 7 - G 28 0 100 nH y 7 - - G 34 0 100 G 1P nH 7 G 37 0 97 nH 7 g 10 0 76 - G 26 0 96 G 31 0 97 G 43 0 95 G 39 0 100 G 76 0 99 G 18 0 100
Z16946 C 8014468 G A A 50 0 94 nH y 7 - - A 125 0 100 nH y 7 - A 49 0 100 nH y 7 - - A 73 0 100 A nH 7 A 22 0 100 nH 7 A 25 0 88 7 A 33 0 100 A 54 0 95 A 51 0 96 A 75 0 100 A 62 0 98 A 49 0 100
Z16929 c 13493784 A G G 29 0 97 nH y 7 - - G 69 0 97 nH y 7 - G 35 0 100 nH y 7 - - G 45 0 100 G nH 7 g 4 0 100 - - - G 16 0 94 G 21 0 100 G 23 0 100 G 30 0 100 G 10 0 100 G 27 0 100
Z16930 C 15625978 A G G 51 0 100 G 1P nH y 7 - - G 52 0 92 G 1P nH y 7 - G 45 0 100 nH y 7 - - G 102 0 97 G 1P G nH 7 G 35 0 100 nH 7 g 4 0 100 - G 71 0 100 G 78 0 99 G 101 0 96 G 106 0 100 G 49 0 98 G 80 0 98
Z16931 C 16433477 T C C 52 0 100 nH y 7 - - C 80 0 99 nH y 7 - C 60 0 100 nH y 7 - - C 39 0 97 C nH 7 C 53 0 98 nH 7 C 76 0 86 7 C 39 0 92 C 43 0 91 C 78 0 99 C 89 0 100 C 49 0 98 C 51 0 100
Z16932 C 17236526 C T T 34 0 100 nH y 7 - - T 65 0 100 nH y 7 - T 60 0 95 nH y 7 - - T 39 0 100 T nH 7 T 24 0 100 nH 7 t 25 0 84 - T 32 0 97 T 42 0 98 T 46 0 98 T 50 0 100 T 23 0 94 T 27 0 100
Z16933 C 17438536 G C C 25 0 100 nH y 7 - - C 24 0 100 nH y 7 - C 26 0 100 nH y 7 - - C 26 0 100 C nH 7 C 15 0 100 nH 7 t? 1 0 - C 19 0 100 C 25 0 96 C 25 0 100 C 21 0 100 C 23 0 100 C 30 0 100
Z16934 C 17448751 G C C 16 0 100 nH y 7 - - C 19 0 100 nH y 7 - C 21 0 100 nH y 7 P - C 28 0 100 C nH 7 c 5 0 100 - C 15 0 87 - C 17 0 100 C 22 0 95 C 22 0 100 C 35 0 100 c 2 0 100 C 13 0 100
Z16935 C 17612482 C T T 46 0 95 nH y 7 - - T 145 0 97 nH y 7 - T 91 0 100 nH y 7 P - T 91 0 99 T nH 7 T 44 0 100 nH 7 T 46 0 89 7 T 31 0 97 T 61 0 97 T 77 0 100 T 64 0 98 T 103 0 99 T 60 0 100
S20749 C 18171989 C T T 40 0 95 nH y 7 - - T 30 0 97 nH y 7 - T 36 0 100 nH y 7 - - T 69 0 100 T nH 7 T 41 0 100 nH 7 t 35 0 74 - T 48 0 100 T 57 0 100 T 63 0 98 T 75 0 100 T 28 0 100 T 49 0 96
Z16936 C 19094859 T C C 26 0 100 nH y 7 - - C 61 0 97 nH y 7 - C 57 0 100 nH y 7 - - C 60 0 98 C nH 7 C 19 0 89 nH 7 C 22 0 91 7 C 37 0 97 C 51 0 100 C 35 0 100 C 53 0 92 C 15 0 100 C 38 0 100
Z16937 C 19200522 G T T 71 0 99 nH - 7 - - T 109 0 97 nH - 7 - T 97 0 98 nH y 7 P - T 83 0 100 T nH 7 T 50 0 100 nH 7 t 101 0 85 7 T 64 0 98 T 112 0 100 T 87 0 100 T 96 0 100 T 62 0 89 T 63 0 100
Z16938 C 19548026 G A A 38 0 97 nH - 7 - - A 103 0 97 nH - 7 - A 52 0 100 nH y 7 P - A 77 0 100 A nH 7 A 33 0 100 nH 7 a 50 0 84 7 A 50 0 100 A 71 0 100 A 58 0 95 A 69 0 97 A 36 0 100 A 59 0 98
Z16939 C 21810487 A G G 69 0 99 nH y 7 - - G 84 0 98 nH y 7 - G 75 0 100 nH y 7 - - G 63 0 98 G nH 7 G 70 0 100 nH 7 G 110 0 85 7 G 90 0 90 G 102 0 98 G 107 0 93 G 115 0 99 G 66 0 98 G 84 0 100
Z16942 C 23130578 T A A 50 0 96 nH y 7 - - A 38 0 100 nH y 7 - A 55 0 98 nH y 7 - - A 45 0 100 A nH 7 A 22 0 100 nH 7 a 53 0 75 - A 27 0 100 A 56 0 98 A 58 0 98 A 57 0 95 A 14 0 100 A 52 0 98
Z17660 C 8877028 G C C 13 0 100 nH y 3 p - C 12 0 100 c 1P nH y 3 p c 4 0 100 c 1P -! y? - p - C 16 0 100 c 1P C nH 3 c 6 0 100 - - - ?c 13 0 69 - c 12 0 83 C 14 0 100 c 23 0 83 C 20 0 100 c 7 0 100 c 8 0 100
FGC19531 csv had both Novel & Known!; AW had P3c 6643803 C T t 8 0 100 - kH - - P - t 8 0 100 kH - - P t 9 0 100 nH Y 2 P - T 16 0 100 t 1P T nH 2 T 15 0 100 nH 2 t 5 0 80 - T 14 0 100 T 13 0 100 T 13 0 100 T 21 0 100 t 5 6 0 T 11 0 100
FGC19536 c 17576040 G C c 7 0 86 - - - c 2 0 100 - - - c 7 0 100 - - C 12 0 100 c 1P C nH 1 c 6 0 100 - c 7 0 57 - C 11 0 100 c 9 0 100 c 9 0 100 c 9 0 100 c 2 0 100 C 12 0 100
Z16940 n 22470652 T T C 53 0 96 C 1P nH y 7 - - C 52 0 88 nH y 7 - C 72 0 89 nH y 7 - - C 44 0 89 C nH 7 C 53 0 100 c 1P nH 7 C 59 0 86 7 C 36 0 97 C 35 0 86 C 39 0 87 C 55 0 93 C 119 0 96 C 26 0 88
Z16941 n 22470900 C G G 44 0 98 nH y 7 - - g 45 0 84 nH y 7 - G 18 0 100 nH y 7 - - G 31 0 97 G nH 7 G 62 0 98 G 1P nH 7 G 61 0 89 7 G 35 0 91 G 49 0 100 G 47 0 94 G 41 0 100 G 96 0 99 G 38 0 100
L561 AW has P3 FGC16164 is 2888667-672n 2888667-70 C C c 6 2 100 - c 2 4 100 - - 0 13 0 - - m c 2 14 100 - c 6 3 100 c 0P C 15 8 100 c 6 18 100 C 11 8 100 C 14 10 100 C 14 10 100 c 9 3 100 c 5 15 100
Z16947 Indel? n 18680368 T TA T 50 0 100 - - ? 3 - T 90 83 96 TA 1P 3 T 49 47 100 - T 85 0 100 - - T 31 0 100 - t 7 0 100 - T 54 0 100 T 84 0 98 T 60 0 100 T 72 0 100 T 31 0 100 T 59 0 100
Z16948 Indel? n 21613125 TA T T 49 0 100 - - ? 3 - - T 90 0 100 T 1P 3 T 90 47 100 - - - T 79 0 100 - - - T 37 0 97 - T 41 0 100 - T 65 0 100 T 79 0 100 T 86 0 100 T 88 0 100 T 39 0 100 T 56 0 100
Z16949 MW: long indel n 7933047 T TAA
CA
T 46 39 100 ta 1P - y 7 - - T 76 75 95 TA 1P - y 7 - T 38 39 100 TA 1P - y 7 - - T 47 47 100 TA 1P - - 7 T 54 47 100 ta 1P - 7 T 94 68 100 7 T 76 68 100 T 113 100 100 T 124 ### 100 T 125 108 100 T 48 45 100 T 94 0 88
MW: short indel n 16344311 TT T T 39 0 100 t 1P - y 3 - - T 110 0 95 T 1P - y 3 - T 10 0 100 t 1P - y - - - T 77 0 100 t 1P - - 3 T 30 1 100 - - - T 23 0 100 - T 34 0 100 T 35 2 100 T 31 0 100 T 39 0 100 T 47 0 100 T 26 0 100
AW has P3 MW: short indel n 16344316 TCT T T 39 0 100 t 1P - y 3 -/a - T 106 0 93 T 1P - y 3 -/a T 73 0 100 t 1P - y;y? - -/a - T 77 0 100 t 1P - - 3 t 5 25 100 - - - t 7 15 100 - t 3 31 100 t 6 30 100 t 5 26 100 T 8 0 29 t 8 39 100 T 5 0 21
?covered by 18680368? n 18680369 A AA A 52 45 100 2 A 89 86 98 - A 48 47 100 - A 86 78 100 2 A 33 29 100 2 a 7 5 100 - A 55 38 100 A 65 53 100 A 64 56 100 A 79 67 100 A 31 28 100 A 61 55 100
Indel; AW had P1MW: homopolymer n 21613126 AA A A 49 1 100 a 1P - Y 2 - A 10 69 100 - - - a 4 86 100 - - A 79 0 100 - 2 A 37 0 100 2 A 41 0 100 - A 65 0 100 A 79 1 100 A 86 0 100 A 89 1 100 A 39 2 100 A 56 0 100
AW had P2 MW: long indel n 14750280 ACCA
GTGT
A A 13 0 100 - A 16 0 100 a 1P Y - 2 - a 4 0 100 - - A 10 0 100 - - A 22 0 100 2 a 4 0 100 - A 12 0 100 A 17 0 100 A 15 0 100 A 15 0 100 a 9 0 100 A 13 0 100
FGC16164 Indel; AW had P3MW: long indel n 2888666 CCTG
G
C c 8 0 100 - - - I -del c 7 0 100 - - -I -del C 13 0 100 Y 1 I -del C 16 0 100 - - c 9 0 100 - C 23 0 96 1 C 24 0 100 C 20 0 100 C 24 0 100 C 18 0 100 C 12 0 100 C 20 0 100
Indel? MW: homopolymer n 6347814 G GAG
AA
g? 16 0 63 - - G 115 89 93 GA 0/1R - G 78 75 95 1 - G 117 4 88 - - - - g 9 1 67 - g 2 0 100 - g 5 0 60 g 7 0 100 g 7 1 88 G 13 1 85 g 12 0 75 G 14 0 86
MW: long indel n 13550973 TTAG T T 72 0 100 - T 240 0 99 - T 150 0 100 - T 79 0 99 - T 24 0 100 - T 17 0 100 1 T 23 0 100 T 45 0 100 T 70 0 100 T 57 0 100 T 82 0 100 T 43 0 100
MW: homopolymer n 14101345 CCTT A c 6 0 83 - C 43 0 98 1 C 31 0 100 - C 36 0 97 - c 3 0 100 - c 2 0 100 - c 6 0 100 c 5 0 100 c 4 0 100 c 3 0 100 c 6 0 100 c 7 0 100
AW has P2 MW: homopolymer n 14379561 T TGA
TA
T 21 0 100 - T 34 31 100 tg 1P n - 1 - T 26 0 100 - T 19 0 100 - - T 40 0 100 - t 8 0 100 - T 33 0 94 T 23 0 100 T 27 19 100 T 31 0 100 T 59 0 98 T 27 0 100
MW: homopolymer n 15305844 A AAT A 16 8 100 - A 35 29 89 - a 6 2 100 - A 16 9 100 - a 6 6 100 1 a 2 2 100 - A 13 11 100 A 16 15 100 A 27 19 100 A 28 17 100 A 32 24 100 a 5 5 100
Indel? c 16344315 TTCT T T 39 0 100 - T 106 0 91 - T 71 0 100 - T 77 0 100 - T 30 0 100 1 T 22 0 100 1 T 34 0 100 T 35 2 100 T 31 0 100 T 38 0 100 T 47 0 100 T 26 0 100
MW: homopolymer n 18585796 C CAA C 33 0 100 - C 147 138 100 1 C 78 0 100 - C 64 0 100 - C 11 0 100 - c 2 0 100 - C 38 0 100 C 37 0 89 C 45 0 100 C 50 0 100 C 15 0 100 C 38 0 100
MW: homopolymer n 2746565 AA A A 55 0 100 a 1P - - 2 - A 17 0 100 - a? 1 53 100 - A 67 0 100 - 2 A 25 0 100 - A 31 2 100 - A 32 0 100 A 37 0 100 A 70 0 100 A 49 0 100 A 30 0 100 A 45 0 100
Intermediate SNPs
FCG34569 2,3,8,10 1,4,5,6,7,9,11,12 I 21368012 G A A 85 0 100 A 1P nH Y 2 - G 147 0 90 - - - - G 82 0 100 A 1P - - - - A 80 0 99 A 1P A nH 2 A 48 0 98 A 1P nH 2 A 32 0 94 2 A 51 0 100 G 57 0 100 A 67 0 100 G 92 0 100 A 87 0 99 A 59 0 100
PF506 3,4,5,7,8,9 1,2,10 n 13323493 A C c 24 0 79 c 0/1R U - - m c 5 0 80 c 1R U - - a 4 0 100 a 0R kH - - - a 8 0 100 - 0R ? k?u a 4 0 75 a 0R - ?a 10 0 60 ?a 16 0 56 a 7 0 71 c 40 0 80 ?a 6 0 67 ?c 12 0 50
3,4,5,7,8,9,11,12 csv:P1 1 n 13302072 C T T 42 0 91 t 1P nH - - - t? 33 0 61 - - - - C 13 0 100 - - - 1 - C 21 0 100 - - - C 16 0 100 - - - ?t 36 0 56 - C 20 0 100 c 29 0 72 C 40 0 100 ?c 46 0 57 c 10 0 100 c 17 0 76
PF6812 1,2 3,4,5,10,11,1
2
n 10013029 T G T 14 0 57 t 0R - - - t 9 56 100 t 0R U - g 7 0 71 g 0/1R kU - - m g 35 0 51 ? k?u G 22 0 77 g 0/1R ?g 58 0 64 ?g 37 0 62 ?t 27 0 56 gt 63 0 51 g 48 0 63 g 7 0 71 g 36 0 69
4,11 csv:P1 1,9 n 13317375 A T T 26 0 92 t 1P H - 1 - t? 33 0 61 t 0/1R - t? 16 0 69 - * a? 26 0 54 - - - a? 4 0 100 - t? 3 0 100 - ?t 15 0 58 ?t 28 0 64 t 18 0 78 ?t 31 0 58 a 2 0 100 ?t 20 0 55
CTS11841 2,6,10,11 8,9,12 n 23311208 C T t? 3 0 67 c 5 0 96 c? 31 0 58 c? 36 0 53 t? 2 0 100 c 2 0 100 ct 4 0 50 t 1 0 - t 1 0 - c 6 0 83 c 1 0 100 t 5 0 80
PF682 1,6,7,9 2,3,4,5,10,11 n 14624294 C T c 6 0 83 - t 6 0 67 - t 2 0 100 - - s t 3 0 100 t P1 T k+m t 6 0 83 c 4 0 75 c 1 70 - - c 6 0 83 t 9 0 89 t 1 0 100 ?ct 2 0 50
PF496 3,4,5,7,9,11 1,6 n! 13297909 T G g 71 0 73 kU - - m t? 21 0 67 kU T 15 0 100 kU - T 15 0 93 ? k?u T 21 0 100 g 85 0 65 T 29 0 97 ?t 52 0 54 T 44 0 91 ?g 72 0 58 T 13 0 100 ?t 48 0 52
? Indel 6 n 13700173 C ? t 68 12 81 - - - - a 1118 34 81 - - - - A 364 9 89 T 1R nM - 1 - - t 127 31 83 T 1R - - T 63 3 88 T 1R - - C 44 8 91 - ?c 18 11 67 t 7 3 86 ?t 18 12 67 c 12 0 75 ?c 30 5 60 ?c 17 3 53
Block P1: Private SNPsfor 22874
AW has P1 P1 17319595 G A A 23 0 87 a 1P nH Y 1 - G 24 0 100 - - - - G 27 0 100 - - - - G 58 0 100 - - - G 24 0 100 - - G 78 0 100 - G 43 0 100 G 63 0 100 G 63 0 100 G 109 0 100 G 23 0 100 G 53 0 100
AW has P1 P1 19263733 T A A 39 0 97 A 1P nH Y 1 - C96 t? 60 0 100 - - - - T 39 0 100 - - - - T 59 0 100 - - - T 40 0 100 - - T 28 0 100 - T 66 0 100 T 63 0 100 T 63 0 100 T 90 0 100 T 29 0 100 T 62 0 100
AW has P1 P1 21782548 T G G 79 0 100 G nH Y 1 - C91 T 174 0 100 - - - - T 93 0 100 - - - - T 97 0 100 - - - T 35 0 97 - - T 27 0 100 - T 38 0 100 T 68 0 100 T 60 0 100 T 68 0 100 C 74 0 100 T 61 0 100
PF6729 p1 10022033 A g g 7 0 86 kU - - m a? 8 0 85 kU a 4 0 100 kU - a 11 0 64 0 ? k?u ?a 6 0 83 ?a 5 0 60 A 12 0 100 a 6 0 100 A 10 0 80 ?a 8 0 50 a 8 0 50 a 7 0 86
PF6730 p1 10022039 A g g 7 0 86 kU - - m a? 6 0 67 kU a 4 0 100 kU - a 10 0 60 ? k?u ?a 6 0 83 ?a 5 0 60 A 12 0 100 a 5 0 80 a 9 0 89 ?a 8 0 50 a 8 0 50 a 7 0 86
p1 14769164 T g g 4 0 100 - - - - C100 t 6 0 100 t 3 0 100 - t 5 0 100 - - t? 1 0 - t 5 0 100 t 9 0 100 T 11 0 100 t 8 0 100 - T 11 0 100
CTS6916 AW has P1 p1 17193400 C a a 2 0 100 a - Y 1 - M100 c 3 0 100 (c) 0P - c 2 0 100 - - 0 - - C 15 0 100 - C 15 0 100 - C 59 0 100 C 48 0 100 C 78 0 100 C 88 0 100 c 4 0 100 C 35 0 100
S25968 p1 23900831 T c c 4 0 75 - - - m t 5 0 100 - t? 4 0 75 - t 8 0 89 - t? 1 0 - t 5 0 80 t 8 0 63 t 12 0 80 T 10 0 90 t 7 0 71 t 8 0 75
PF3498 Matches! p1 8094631 G a a 2 0 100 - - - g 3 0 100 - G 40 0 100 G 20 0 100 G 68 0 99 G 63 0 100 g 2 0 100 G 16 0 100
csv implies P11 p1 22257324 G t g 4 0 100 t 5 0 100 t 3 0 100 t 2 0 100 T 14 0 100 t 4 0 100 t 4 0 100 t 6 0 100 t 101 0 100 T 10 0 100 T 11 0 100 t 4 0 100
Block P2: Private SNPsfor 311268
AW has P2 P2 8531427 C T C 63 0 100 - - - - T 47 0 98 T 1P nH Y 1 - C 44 0 100 - - - - C 47 0 100 - - - C 69 0 100 - - C 72 0 100 - C 70 0 100 C 65 0 100 C 90 0 100 C 111 0 98 C 70 0 100 C 55 0 100
AW has P2 P2 16561158 A G A 17 0 100 - - - G 34 0 100 G 1P nH Y 1 - A 23 0 100 - - - A 41 0 100 - - - A 14 0 100 - - A 16 0 100 - A 22 0 100 A 34 0 100 A 37 0 100 A 33 0 100 A 13 0 100 A 32 0 100
AW has P2 P2 21515424 T A T 45 0 100 - A 59 0 98 1 T 49 0 100 - T 77 0 99 - T 42 0 100 - T 45 0 100 - T 53 0 100 T 51 0 100 T 54 0 100 T 59 0 100 T 40 0 100 T 74 0 100
AW has P2 P2 21950915 G T G 47 0 100 - - - T 63 0 94 T 1P nH Y 1 - G 61 0 100 - - - G 54 0 100 - - - G 29 0 100 - - G 42 0 100 - G 79 0 100 G 54 0 100 G 54 0 100 G 69 0 100 G 42 0 100 G 54 0 100
DW had above L21 n 13833214 T A A 41 0 85 - - t? 80 0 45 - - - A 15 0 93 - 1 - A 100 0 91 A R1 - - A 49 0 88 - a 63 0 79 - a 37 0 73 a 44 0 84 ?a 46 0 67 A 121 0 92 A 28 0 96 A 74 0 92
n 17729336 C C? c 6 0 100 - a/c 2 0 50 a 0/1P n - 1 - c 6 0 100 - c 3 0 100 - 0P - - c 4 0 100 - C 20 0 95 - C 14 0 100 C 17 0 100 C 31 0 100 C 12 0 100 c 2 0 100 C 12 0 100
CTS12439 n 28587358 T G c 123 0 72 U - - m g? 151 0 57 c 0/1R - c? 100 0 65 kU - - c? 77 0 65 ? k?u ?c 67 0 55 c 43 0 77 C 112 0 100 C 165 0 100 C 157 0 64 ?c 162 0 68 c 102 0 75 c 145 0 74
not on ybrowse n 13801126 A G c 1748 10 81 C 0/1R U - - m G 2281 0 89 (A) 0R U - - c 1144 1 76 - - m c 1658 7 71 C 0/1R ? knu ?c 1083 28 57 ?c 1676 53 63 ?c 853 17 61 ?c 1118 36 63 ?c 1037 32 56 ?c 2406 56 64 ?c 517 19 60 ?c 1554 25 61
5 - 230264- 2264263 - 65048
57. Deriving TMRCAs from BigY tests
TMRCAs derived from SNPs are easy to calculate:
TMRCA in years = no. of SNPs x av. no. of years per SNP
BUT:
• all TMRCAs are probabilities
• TMRCAs from a single test have wide confidence limits;
confidence improved if several TMRCAs can be averaged
• difficulties specific to SNP-based TMRCAs:
- “av. years per SNP” depends on type of NGS test
(FTDNA use “av. 120 years per SNP”);
- no uniformity on what constitutes a relevant SNP, so I use:
TMRCA in years = ∑(probable SNPs + 0.5 possible SNPs)/n x 120
57
60. (3):
Age of L555 block by other SNP
criteria
No. Duration Age
of @120 years
SNPs per SNP (approx.)
R-L21
) 5 600 years BC1700
DF13
)
L21 starburst DF21 DF41 DF49 FCG FCG Z251 L1335 S1026 Z1026 ZZ10
5494 11134 )
Z16943
)
Z16944
)
L555 block/bottleneck 20 2400 years
L555
+19 other probable SNPs
=20 SNPs
Pre-surname era
Surname era
Border Irwinsstarburst av. 5.5 650 years AD1300
1 probable
10 probables +3 probables +2 probables +4 probables +4 probables +4 probables +2 probables +1 probable 6 probables 5 probables 4 probables 3 probables
+7 possibles +5 possibles +1 possible +1 possible +4 possibles +1 possible
=say 11 SNPs =say 7.5 SNPs =say 5.5 SNPs =say 5.5 SNPs =say 5 SNPs =say 5 SNPs =say 3.5 SNPs =say 2 SNPs =say 8 SNPs =say 5.5 SNPs =say 4 SNPs =say 3 SNPs
280599 22874 N126337 23026 54774 280156 226426 160045 65048 364399 311268 87191
Irvin Irvine Irvin Irvin Irving Ervin Irving Irvin Erwin Ervin Cunningham Irving
12 -B14 1 -BX 6 -BA 5 -B29 7 -BB 9 -B23 4 -B17 11 -B9 3 -B10 8 -BE 2 -BX 10 -BD
TMRCA = (∑(Probable SNPs + 0.5 possible SNPs)/12 ) * 120
11 7.5 5.5 5.5 5 5 3.5 2 8 5.5 4 3 av. 5.5 650 AD
TMRCAs with SNPs as per Williamson's Big Tree
11 5 4.5 6 5 5 3 2 7 5 4 4 av. 5.1 615 AD
TMRCA = (∑(Probable SNPs)/12 ) * 120
11 4 3 6 5 5 3 2 6 5 4 3 av. 4.8 570 AD
TMRCAs with SNPs as per ISOGG Y Tree criteria
11 4 4 5 1 5 2 2 6 5 4 3 av. 4.3 520 AD
years
years
years 1430
years 1380
1300
1335
61. Criteria for BigY SNPs
61
Criterion FTDNA FGC Y Full Williamson D.Wright J.Irvine ISOGG
csv Analysis Analysis Big Tree "DIY" Y Tree
Min. no. of reads/calls 10 2 1-2* 10 10 4
Max. no. of reads none none 320?
Min. % consistent reads 99/95/40/10 85 85/70 100/95
Stability within Haplogroup ) "shared excluded no excl. if known
Stability within sub-clade ) SNPs" no important
22216800-22512940 unstable region excluded excluded included
Other "Unreliable" regions included excluded excluded
Indels? included excluded excluded excluded
Homopolymers, recLOHs, excluded N/A
Min. "Quality" (FTDNA) yes 500 N/A N/A
"Confidence" (FTDNA) yes N/A N/A
Max. locations on ISOGG tree N/A 3
Min. Mapping quality average (ISOGG) N/A 10%
Min. extent of base-pairs (ISOGG) N/A 20
Max. segment, repeated alleles (ISOGG) N/A 5 alleles
Av. years per SNP 120 118 - 120 120 -
*: depending on region
NB The criteria listed are as known to me 11 Nov. 2015; all are evolving and subject to change.
Clearly there are both substantive differences and confusion over terminology & definitions.
At least in theory it is clearly inappropriate:
(1) to seek TMRCAs without clear understanding of how relevant "SNP"s are defined, and
(2) to use the same "av. years per SNP" ratio for differing definitions of "SNP".
62. The Irwin Surname tree
62
The Irwin Surname P311
showing the genetic and conventional genealogies P312 U106
BC2000 of some of the project's 33 genetic families L21 ? DF27 ? S263
and of the Borders genetic family sub-groups Z251 CTS4466 DF21 DF49 ? L176.2 ? S264
(many details omitted) Z16943 Z21065 Y11277 DF23 ? Z262 ? DF96
Bold indicates BigY test; indicates "Brick wall" Z16944 A541 Z16294 Z2961 ? SRY2627 ? ?
L555, plus 20 other SNPs A195 Z16281 M222 ? ? ? ?
AD1200 FCG34569 A88 A2427 A3955 ? ? ? ? ?
5 SNPs 4 SNPs 8 SNPs 4 SNPs 4 SNPs 4 SNPs 1-10 SNPs A89 A2432 M7964 ? ? ? ? ?
364399 87191 65048 22874 N126337 54774 B9 B14 B17
BE BD B10 BX BA BB B23 B29 IM1 IM2 NE2 PF DA O1 O2 NB1
1300s /
1400s
1500s
1600s
1700s /
1800s
?
Today
Irvings of
?
Irvings of
?
169056 + 4
others
122282 +
7 others
William
1754-1830
226426 +
48 others
James
1730-1799
116495 +
2 others
51216 +
3 others
Isaac
1781-1851
193093 +
9 others
?
Washington
1783-1859
James
fl.1534-67
Magnus 1655-
170?
Criste
fl.1460
Magnus
fl.1470
?
Alexander
1754-1844
129415 +
3 others
122282
163590 +
3 others
Charles
1738-
?
Alexander
fl.1601
Edward
1707-1798
129415
? ?
?
Eoin
1785-1841
15606 A3093 3722 116495 1690651216
?
?
?
?
?
?
?
?
75606 + 2
others
65048 +
32 others
?
? ?
Edward
1668-1708
?
?
Matthew
1697-
22874 + 65
others
?
? Edward
1669-
?
William
fl.1506
?
Irvings of
?
??
?
James
1776-1833
James
1750-1810
Irvings of
Dumfries
Francis
fl.1596
?
Thomas
1650-1722
?
? ?
? ?
John
1734-
John 1733-
N126337 +
33 others
?
?
87191 +
2 others 13 others
William
1710-1763
?
? William
1698-
David
fl.1721
54774 +
4 others
?
11 others
169170 364399 +
16 others
? ?
John
fl.1662
GeneticgenealogyPapertrails
Irvines of
Eskdale
William
fl.1323
Alexander
1456-1527
Alexander
1527-1602
Irvings of
Bonshaw
Irvings of
?
Edward
1590-
?
Irving - NPE
Bell (1)
Irvines of
Perthshire
Irwins of
Munster (1)
Irwins of
Munster (2)
Irving - NPE
Elliot (2)
Irvines of
Drum
Irvines of
Orkney (1)
Irvines of
Orkney (2)
63. Main findings relevant to Irwin project
• Steady growth over 10 years, now 392 STR test results (94% 37+ markers)
• Most participants reside in USA, & typify the Scotch-Irish-American
diaspora
• 40% claim Irish ancestry, but lack paper trails “across the pond”
• Tradition of single-origin Scottish surname refuted
• > 90% of all participants matched to a genetic family
• 34 genetic families identified, each unrelated to one another in surname era:
- 22 Scottish, 4 native Irish, 1 German, 1 African, 6 unknown (Scots ?)
• 13-26% of participants from NPEs
• Border Irwins genetic family is apparently the largest in any surname project:
- all 262 descended from a Dumfriesshire ancestor who fl. C14
- SNP L555 recognised by ISOGG, still unique to Border Irwins
- tentatively split into 15 sub-groups
- BigY is yielding further insights, but reliable TMRCAs elusive 6363
64. Findings relevant to other surname projects
• Small surname projects can learn much from large projects
• Penetration ratios identify geographic bias
• Spelling of surname is often misleading
• FTDNA’s “Matches” pages give False Positives & False Negatives
• TMRCA tables using GDs are misleading
• TiP Scores avoid the many limitations of GDs
• NPEs should be included
• BigY: - a massive step forward
- handling of results is unnecessarily cumbersome
- comprehension of results is difficult & poorly explained
- BAM data essential for analysing SNP quality
- “starburst”/“bottleneck” phenomena need investigating
- need for improved understanding of SNP criteria
- individual TMRCAs unreliable: need SNP Pack back-up
64
65. Further reading
• www.dnastudy.clanirwin.org
• www.jogg.info/62/files/Irvine.pdf
• https://dl.dropboxusercontent.com/u/14028750/Testing%20and%20Analysing%20Big-Y.pdf
(use of BAM IGV Viewer)
• www.borderreivers.co.uk
• Irving, JB 1907 The Book of the Irvings
• Maxwell-Irving, AMT 1968 The Irvings of Bonshaw
• Mackintosh, D 1999 The Irvines of Drum and their Cadet Lines 1300-1750
• Tough, DLW 1928 The Last Years of a Frontier
• MacDonald Fraser, G 1971 The Steel Bonnets
• Perceval-Maxwell, M 1973 The Scottish Migration to Ulster
in the Reign of James I
• Dickson, RJ 1976 Ulster Emigration to Colonial America, 1718-75
• Fischer, DH 1989 Albion’s Seed
• Fitzgerald, P 2008 Migration in Irish History, 1607-2007
65
66. Acknowledgements
• All our 392 participants;
• The many participants, most preferring anonymity,
who have donated to our General Fund, helped
with our website, and guided & encouraged me;
• Fellow admins. John Cleary, Maurice Gleeson,
Kent Irvin, Peter Irvine, Debbie Kennett, Ralph
Taylor, Dennis Wright ;
• Catherine Borges, for ISOGG;
• Bennett Greenspan and his team at FTDNA;
• My patient wife. 66
Notas do Editor
Background: Genealogist for over 50 years.
No knowledge of genetics, but 10 years of experience of administering Irwin DNA project, aka Clan Irwin Surname DNA Study.
Irwin project also known as Irwin Clan Surname DNA Study.
Irwin project is not necessarily typical of Scots clans, but many lessons apply to all surname projects.
at-, mt- and x-DNA also used for Deep Ancestry and “chasing cousins”.
Testing companies very dependent on “admins” for customer interface – viz. manning of FTDNA stand at WDYTYA.
Important to recognise Administrators are volunteers whose interests, skills and time availability are, by definition, not limitless!
This lecture focuses on items 3 and 4.
A personal thought: as a surname project administrator, to date I have found understanding genetics to be less critical than having time, patience, a good support network, and skills in genealogy, data handling & communicating. I have also been lucky to inherit an interesting surname and trained as an engineer. However to understand NGS SNP criteria I will need more knowledge of genetics.
Very lucky this DNA project brings out so many features.
0.1% ratio is typical of many DNA surname projects
“All Scottish Irwins, regardless of spelling, are descended from a common ancestor.”
Solid lines show confirmed paper trails.
436 “joins”, but this includes some results pending” and mt-DNA and at-DNA orders ,and excludes non-FTDNA data;
corrected figure to end-October 2015 is 392 y-DNA test results.
Penetration is ratio of participants tested to world population.
Note the heavy US bias in project, but Scotland not under-represented.
Study suggests that penetration of about 0.06% necessary before project gets a fair perspective of diaspora.
Distribution of all project participants who know the county in Britain or Ireland of their earliest confirmed paternal ancestor.
Good correlation with census/Griffiths Valuations.
Placenames in green appear in traditional genealogies.
For background see Reading List slide at end of lecture.
Spelling relevant in Scotland but not elsewhere.
NB All forgoing data is before considering significance of DNA test results!
- 111 marker panel more useful than 67 panel, but expensive
- 12 marker panel can be useful, especially with individual “private” SNP test
- “horses for courses”
Full Excel table of results (470 lines, 180 columns) at www.dnastudy.clanirwin.org.
This slide shows sample of 21 results (of 392), of first 25 markers (of 111), and of
4 (of the 34) genetic families identified by Administrator.
Colour key at bottom denotes “genetic distance” from modal value for each marker.
Some participants with only 12 markers can be categorised, some cannot.
Challenge for lecture: How are these genetic families best defined, identified and named.
Matching and grouping cause much confusion, and little reliable guidance available.
Moral : use FTDNA pages to determine GD
Average mutation rates of different markers vary by a factor of c.400.
TiPs didn’t “arrive” until 2005 and by then the trail-blazing admins had developed their own tools and rules. They are still not popular with admins.
TiP Score term conceived by myself and Ralph Taylor. The more I use it the more I realise its potential.
FTDNA’s terminology and “Matches” pages cause much confusion.
Time prevents discussion of latter; they are more useful for cousin chasing than for surname projects; not screened to remove dissimilar surnames.
Most near matches have TiP score &gt; 95%. I used to use cut-off of 80%, now use 60%, but not critical (for similar surnames).
Grouping is biggest challenge for admins.
Much inertia: most admins “set in their ways”.
Fine in theory.
Iterative process.
DNA signature of Modal participant may not be that of common ancestor because (a) small sample size, (b) sample bias (e.g. two branches of the family have procreated at the same rate, but one stayed in UK where DNA sampling is rare, another migrated to USA where DNA sampling common), (c) “Founder effect”, where two branches procreated at different rates, typically one with a relatively lower rate in UK, but another with a higher rate migrated to USA, and (d) “Genetic drift”, the consequence of random mutations irrespective of procreation rates or migration where some lines flourish over time and others dwindle or die out.
Some gentic families have only one participant if he has a very clear origin.
My “Total participants” is a little less than FTDNA’s “Project joins”, as the latter include tests still at laboratory and mtDNAs
Singletons, initially 50%, now steady at just 10%
phases: - establishment & initial growth – difficulty in identifying genetic families;
- recognition of most genetic families
- “maturity” - few new genetic families being added ,although project continues to grow
The 0.04% and 0.07% on the right are the project’s penetration levels when the second and thrid phses happened: interesting to compare with other projects.
In theory the TMRCA of a genetic family may be estimated by averaging the TMRCAs of members of the family using Magee’s matrix, but I am unsure how to interpret the mathematical result.
Note: These wide probability ranges do not include further uncertainties attributable to individual marker mutation rates, back mutations and no. of years per generation.
Moral: Don’t use TMRCAs based on genetic distance!
Not all strict synonyms.
The term NPE is borrowed from genetics, where it has a narrow interpretation.
Some genetic genealogists feel this interpretation should be retained, and they and others feel very sensitive about its use in genetic genealogy.
For genetic genealogy I think a wide interpretation is necessary. I would prefer the term “SDEs”, but this novelty is not widely known.
Illegitimacy quite common (today technically 50%!!), but certainly not only cause of NPEs
Historically, adoption and formal name-change were rare
Step father probably most common
These terms conceived by Dr John Plant; they are not widely used, but they need to be.
This is an example of FTDNA’s “Matches” page.
Note this is an Elliot with several Irwin near matches.
Note this is an Irwin with several Elliot (& Fairburn!) near matches.
Touchy subject with many admins and participants. But with understanding, clear explanation and sensitivity I have handled over 50 NPEs without any complaints.
Reminder of challenge.
This is my spreadsheet analysis of the same data as in the previous slide. Many points arise. Note:
- half of these examples claim Irish ancestry
range of markers tested, from 12 to 111;
2 brothers with BG of 2/25 (anecdote), and one of 5/37, outside FTDNA “Matches” criterion;
clarity of TiP Scores
few pairs of cousins found
e-NPEs and i-NPEs
ability to name all four genetic families (Munster ancecdote)
Most important slide.
30 genetic families now identified – for what was thought to be a single source surname!
The Borders genetic family dominant, with 262 members; probably now the largest such cluster in any surname DNA project.
Most e-NPEs and all i-NPEs have or used to have other Borders surnames, implying these “events” probably occurred after Irwin settlement in Borders (1300s?) but before migrations to Ulster (1600s).
Only 5% of Irwin project STR tests via General Fund, but these provide several of the critical genealogies from which their geographical origins can be identified.
Example of use of triangulation. Note the sequence in which the tests were taken.
Note most participants reside in the New World, many can trace ancestry back to Ireland, but correlation of DNA and available genealogical evidence shows most have Scottish origins.
Most apparently migrated from the Borders to Ulster in the 17th century, and from Ulster to America in the 18th century.
Question arises: is project US biased?
Project has cast a completely new light on traditional understanding of this Scottish surname.
“X” indicates where traditional tree was wrong.
Discoveries had to be handled sensitively.
Pretty, but not convinced! Did help to identify sub-groups within Borders genetic family
The modal sub-group BA (12 members match 67/67, 30 match 37/37) is probably an example of convergence, with regression towards the mode.
Recent breakthroughs in “Next Generation Sequence” SNP tests (e.g. FGC Elite, Chromo2, Big Y) are very powerful, but expensive and difficult to analyse.
Deep Ancestry speculates on the geographical distribution of these SNPs.
L555 recognized by ISOGG mid 2012; still private to Irwin Borders genetic family.
NGS tests necessary to bring tree into surname era.
BigY is FTDNA’s Next Generation Sequencing (NGS) test.
BAM data is the raw test results, typically 30Gb, i.e. too much to send by e-mail unless compressed.
L21 members are very lucky to have Mike Walsh and Alex Williamson – www.ytree.net This is an on-line, free access phylogenetic tree of c.1800 P3I2/L21 NGS test results that have been copied to Williamson. He lists Private SNPs separately.
This example shows the 12 L555 testees: the 5th largest such surname group in Williamson’s tree.
BigTree data (only), as of 9 Nov. 2015, processed for Irwin project. I disagree with some minor details.
Shows L555 still unique to Irwins.
Note how “flat” this sub-clade is compared with, for example, the extensive biforcation shown in the sub-clades of the phylogenetic trees of Maurice Gleeson.
Decision to minimise dependence on 3rd parties was prompted by Williamson’s threat to discontinue his Bigtree. This threat has now lapsed, but my resultant ability to read BAM data has improved my understanding of NGS data and enhanced reliability of TMRCA estimates, as well as avoiding dependence on FGC or FullY analyses.
All but last prioritiy achieved in 2 years; L555 Pack test will be FTDNA’s first surname SNP Pack test.
This example for kit 65048 may not be typical, and may be out of date, but the extent of the read and pink cells illustrates the principle that no computerised BigY analysis is necessarily as comprehensive as might be expected. I have even “found” probable SNPs listed in FTDNA’s Matches that were not in the relevant csv file, and “discovered”, by chance, probable SNPs that were not listed by FTDNA , Walsh or Williamson.
This is a flow diagram illustrating my appreciation of the various tools available to analyse BigY test results., and some of the parameters used in these analyses.
Thanks to Dennis Wright for pointing me in this direction. His webpage at https://dl.dropboxusercontent.com/u/14028750/Testing%20and%20Analysing%20Big-Y.pdf explains how to load and use the BAM IGV Viewer.
Step 1 is the most difficult!
Step 2 is tedious.
Step 3 is easy and most illuminating. Steps 3(1) and 3(2) are iterative. See following slides. L555 is described by some as our project’s “Terminal” SNP for our Borders sub-group.
Step 4 is most important. As the number of available genetically closely-related BigY test results increases, so does the likelihood of quality ratings that are incompatible. Judgement is thus called for, as no computer program could resolve these occasional conflicts (any more than a computer could describe an oil-painting).
Once set up, surprisingly easy to use.
This slide shows the 12 L555 results for variant 21368012-G-A on one screen!
Sources: FTDNA CSV Novel Variants and Known SNPs; FTDNA Matches; FCG/YFull Analyses; haplogroup web sites, e.g. Mike Walsh, Alex Williamson
NB 1. Capital A, C, G or T indicate “probable”, lower case a, c, g, t indicate “possible”.
2. Black boxes identify probable Intermediate and Private SNP blocks.
3. When identifying probable Intermediate and Private SNPs, compatibility of “possible” quality derived from a single BigY may need subjective revision.
4. Such revision cannot be undertaken by a computer program.
5. The more comparable BigY test results the better the insight into Intermediate and Private SNPs.
This example shows page 2 of pages 1-3 of my manual analysis of BAM data for the 12 BigY Border Irwin tests to date.
Raw BAM data is shown in red print (Read count s of SNPs & of Indels, % consistency of SNP Reads).
Alternate variants in capitals if Read count &gt;10 AND Read consistency &gt; 85%.
This top page shows pre-L555 and L555 variants: boxed data is probable, unboxed data is possible. Note the Alternate variants for each base pair are the same for all Testees.
FGC and YFull contributions shown in bright green.
This analysis differs slightly from that of Alex Williamson.
Worryingly, neither his version nor the above correlate with the STR data of these 12 BigY project members.
The more private SNPs, the older the biforcation. Note the BA testee from our modal sub-group is apparently not the oldest – example of “founder effect”?
Average mutation rates (“years per SNP”) are derived from radio carbon dating/ancient DNA/genealogies: YFull use 118 years per SNP (see Adamov D et al ‘Defining a New Rate Constant for Y-Chromosome SNPs based on Full Sequencing Data’ in Russian Journal of Genetic Genealogy 2015 7/1 p76 (ex http://dna.cfsna.net/HAP/index.html).
Dennis Wright and FTDNA use 120 years per SNP.
For FGC ‘s NGS tests over a larger sample of the genome, a smaller “years per SNP” ratio is applicable.
TMRCAs based on av. mutation rate of 120 years per SNP.
Mean of AD1200 for L555 block seems credible.
Starburst/bottleneck/starburst phenomena – striking, no obvious explanation
Some individual TMRCAs seem credible, e.g. B9.
But others clearly not, e.g. B10: need for L555 SNP Pack to avoid reliance on single tests
A difference of 1½ SNPs and 170 years seems a lot, and our genealogical evidence suggests that the ISOGG criteria for defining SNPs (as of 3 Nov. 2015) is too restrictive.
I have included my “DIY” criteria above simply to put them in context, not to suggest they have more merit than the other criteria.
Blanks indicate I haven’t got the relevant evidence.
Format courtesy of Maurice Gleeson.
We are making considerable progress at bridging the gap between paper trails and DNA test data.
The bad news is that the Borders, Drum, Orkney and Perthshire Irvine/gs are apparently unrelated to each other through male line
The good news is that :
- so many American Irwins can now be positiveily entified as descendants of the Border Irvings;
- surname is a plural origin name – not surprising, but upsets traditionalists;
further developments and revelations likely.
With 262 members (or 202 even if NPEs and &lt;37 markers excluded), our Border Irwin genetic family is apparently the largest such cluster in all of the 8,000+ surname projects.
And its 12 BigY test results are the 4th largest surname cluster in Alex Williamson’s Big Tree. These two features make it an excellent case study for statistical analyses by other project admins.
Most of this would not have been possible without FTDNA’s vision, stoicism and patience.