SlideShare uma empresa Scribd logo
1 de 66
11th
Annual International Conference on Genetic Genealogy
Houston, 13-15 November, 2015
Surname Projects –
Some Fresh Ideas
James M Irvine
Member: GOONS, ISOGG, OFHS, SGS
D N A
31 patients Did Not Attend their appointments at this surgery last month.
Overview
(1) pre BigY:
- Background
- Penetration
- “Matching”, “Grouping” & “Genetic Families”
- False Positives & False Negatives
- TMRCAs
- “NPEs”
- Geographic origins
- SNPs
(2) BigY & BAM data: use & interpretation
using the Irwin project to illustrate principles & tools
that may be relevant to other surname projects 3
Surname DNA Projects: their
context
4
DNA testing
Medical Paternity Genetic Criminal Archeology
applications testing genealogy investigations ("Ancient DNA")
mt-DNA y-DNA at-DNA x-DNA
tests tests tests tests
Deep Surname "chasing
Ancestry projects cousins"
- Closed projects - STR tests
- Open projects - SNP tests
y-DNA & surnames only descend through the male line
Surname DNA Projects:
Roles of volunteer
Administrators
1. Agree & refine terms of reference & goals
- including “closed” or “open”.
2. Maintain genetic & genealogical database.
3. Define & identify genetic families.
4. “Add value” from genealogical data:
- identify cousins & geographic origins.
5. Publicise results.
6. Liaise with individual participants.
7. Recruit new participants.
Always respecting participants’ confidentiality. 5
Irwin Surname project:
Background• Scottish lowlands surname
• strong genealogical traditions, but few “old” pedigrees
• active clan association in America
• the DNA project:
- only represents 0.12% of Irwins etc. in world today, BUT
- has grown steadily over 10 years
- has 392 y-DNA STR and 19 “BigY” test results
- is about the 50th
largest of 8,000 surname projects
- includes largest genetic family in any surname project
- shows surname typifies Scotch-Irish-America diaspora
- has associated but separate Autosomal DNA project
66
The traditional genealogy of the Irwins
7
8
Irvine, Ayrshire
Irwin project:
1200 Eskdale, Dumfriesshire traditionally a single-origin
Scottish surname
1300 Bonshaw, Dumfriesshire
Drum, Aberdeenshire
1400 Orkney
1500
1600 Dumfries Castle Irvine Perth Shetland
Co.Fermanagh
1700
1800
Irwin project: growth
9
Irwin project:
Geographical “penetration”
10
Participant's All Irwins etc. Penetration
place of in world today of project
residence * **
Project size/Population 392 300,000 0.12%
USA 77% 61% 0.13%
Canada 6% 12% 0.05%
Australia, New Zealand 6% 9% 0.07%
England & Wales 5% 10% 0.05%
Scotland 5% 4% 0.12%
Ireland (NI & Eire) 1% 3% 0.03%
Germany, Netherlands - 1% 0.00%
Unknown, other - - -
*: Source: w w w .w orldnames.publicprofiler.org/
**: definition: w w w .jogg.info/62/files/Irvine.pdf
Irwin project: Origins
in UK counties, if known, of
participants’ earliest
confirmed paternal ancestors
11
L
Cumberland
4
Dumfriesshire
14
Antrim
18
Derry
10
Tyrone
15
Down
2
Armagh
3
Fermanagh
14
Monaghan
1Cavan
1Connaught 3
Donegal
2
Leinster 5
Ayrshire
1
. Irvine
..Dumfries
Bonshaw.
Esk-
dale
.Castle Irvine
Munster 5
Shetland 4
Orkney 9
Aberdeenshire 7
Perthshire 4
Northum-
berland,
Durham 7
The Scotch-Irish
12
The term Scots-Irish, or Ulster Scots, refers to Scots who migrated
to Ireland, typically in the 17th
century from SW Scotland to Ulster.
•Many Scots took part in the Plantation of Ulster c.1610,
either as a landowning Undertaker, or as a tenant.
Each Undertaker undertook to keep 40 loyal tenants.
•Other settlers included Border Reivers who had been banished.
•Most Scots-Irish were Presbyterians.
•Very few Scots-Irish have pedigrees back to Scotland
(unless their ancestors were Undertakers).
The American term Scotch-Irish refers to descendants of these
Ulster settlers who in turn migrated to America, typically in the
18th
century to the Appalachian piedmont (PA-GA).
•Few Scotch-Irish have pedigrees back to Ireland.
Irwin project:
Earliest confirmed paternal
ancestors
13
Irwin 32% 1900s 4%
Irvine 16% 1800s 29%
Erwin 13% 1700s 48%
Ervin 8% 1600s 3%
Irving 8% 1500s 1%
Irvin 8% 1400s 1%
Arnwine 1% 1300s 0%
Urwin 1% 1200s 1%
Other 13% Unknown 13%
Spelling Birth date
Irwin project :
Marker resolution
No. of
markers 2010 2015
12 13% 5%
25 6% 1%
37 48% 54%
67 33% 26%
111 - 14%
37 or more 81% 94%
% participants
Irwin project: Results
examples (1)
15
ID Haplo 12 25
group 393 390 394 391 385 385 426 388 439 389 392 389 458 459 459 455 454 447 437 448 449 464 464 464 464
a b -1 -2 a b a b c d
Cluster (1)
65875 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17
112094 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17
194922 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17
102835 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17
108028 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17
85111 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17
72683 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17
54774 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17
87191 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17
19864 R1b1 13 24 14 11 11 15 12 12 12 12 13 28 18 9 10 11 11 25 15 20 30 15 16 17 17
169170 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 31 15 16 17 17
84825 R1b1 13 24 14 10 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 16 16 16 17
39927 R1b1 13 24 14 11 11 14 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 15 16 17
106520 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 - - - - - - - - - - - - -
Cluster (2)
161010 I1 13 22 14 10 14 16 11 14 11 12 11 29 15 8 9 8 11 22 16 20 28 12 14 14 15
72309 I1 13 22 14 10 14 16 11 14 11 12 11 29 15 8 9 8 11 22 16 20 28 12 14 14 15
Cluster (3)
51216 R1b1 13 24 14 11 11 14 12 12 13 13 13 29 17 9 10 11 11 25 15 19 29 14 15 17 18
29479 R1b1 13 24 14 10 11 14 12 12 12 13 13 28 17 9 10 11 11 25 15 19 29 14 15 16 17
Cluster (4)
75606 R1b1 13 24 14 10 11 15 12 12 11 13 13 29 17 10 11 11 11 24 15 19 29 15 17 17 17
22971 R1b1 13 24 14 10 11 15 12 12 11 13 13 29 17 10 11 11 11 24 15 19 29 15 16 15 17
Singleton
84049 R1b1 13 25 14 10 11 14 12 12 12 12 14 28 17 9 10 11 11 25 15 18 30 16 16 16 17
Key: compared with modal value:
>2>; 2> ; 1> ; = ; <1 ; <2 ; >2< bold: fast moving markers small: GD rule differs
Matching & Grouping:
Definitions
Large projects need rigorous definition of
terms & procedures to determine:
(1) if two testees are a near match,
(2) how matching testees are grouped,
&
(3) how groups should be named
16
Genetic Distance: Example
Comparison of two 12-marker STR haplotypes
17
3 3 3 3 3 3 4 3 4 3 3 3
Haplotype 9 9 9 9 8 8 2 8 3 8 9 8
DYS 3 0 4 1 5 5 6 8 9 9 2 9
a b -i -ii
Testee A 13 24 14 11 11 15 12 12 12 13 13 29
Testee B 13 24 15 11 11 15 11 12 10 13 13 29
difference 0 0 1 0 0 0 1 0 2 0 0 0
matching markers: 9/12
mismatching markers: 3/12
Genetic Distance: 4/12
Genetic Distances are useful for educational & illustrative purposes, BUT:
1. Special rules apply for multi-copy markers:
DYS 385, 389, 395, 413, 459, 464, CDY & YCA11.
2. Four different models for calculating GDs:
Stepwise; Infinite alleles; FTDNA hybrid, old & new.
3. GDs take no account of differing average mutation rates for each marker:
e.g. av. rate of CDY is 400 times that of DYS494.
TiP (Time Predictor)
18
TiPs - allow for different average mutation rates for each marker
- are FTDNA’s most sophisticated tool for matching;
BUT - appear complicated and slow;
- derivation is “opaque”, and liable to be updated;
- 2 decimal places (e.g. 96.73%) is misleading;
- limited to FTDNA testees.
“TiP Score”
TiP Score: - simple, arbitrary tool for project management;
- 24-generation, no-paper-trail TiP at highest available resolution;
- best available indicator of the probability of two
testees
sharing a common ancestor within the
surname era;
- avoids problems of Genetic Distances & matrices;
- nearest whole % (e.g. 97%) sufficient;
Matching
A “near match” is a rule-of-thumb, arbitrarily chosen,
to determine if two participants share a common ancestor
within the surname era, i.e. in the last millennium.
FTDNA list near matches on their personal yDNA “Matches” pages.
They use criteria of GD = 1/12, 2/25, 4/37 or 7/67,
sometimes known as “1, 2, 4, 7 rule”, or “10% rule”
Some Surname project administrators use other criteria, e.g.
• GD: “1, 2, 4, 6 rule”, or
• GD: “0, 2, 3, 5 rule”
Irwin project:
• TiP Score: “60% rule” (for Irwins);
“95% rule” (for non-Irwins)
20
False Positives & False
Negatives
• FTDNA’s “Matches” pages are useful for newbies,
but are in fact an arbitrary compromise:
• for comparing similar surnames the “10% rule” is too
stringent :
- 7% of Irwins show as “False Negatives” (e.g. 5/37 or 6/37);
- 60% TiP Score gives better matching.
• for comparing dissimilar surnames the “10% rule” is too
lax :
- most “Matches” are “False Positives” i.e. co-incidental;
- 95% TiP Score gives better screening to identify NPEs,
especially when confirmed by terminal SNP test, e.g. L555.
21
Grouping
Assigning testees to clusters / groups / genetic
families:
Subjective choice of project administrator:
• by haplogroup (default used in FTDNA public pages) or SNP
• by genealogical feature
e.g. surname spelling, or place of residence
• by near matches
e.g. GD matrix
GD from mode
TiP Score from modal participant
• other features e.g. rare / idiosyncratic markers,
TMRCAs, cladograms, triangulation
22
Genetic Distance Matrix:
Example
23
Genetic Distance Matrix of eight 37-marker STR haplotypes
A -
B 0 -
C 1 4 -
D 0 1 3 -
E 13 9 8 16 -
F 7 11 4 9 1 -
G 3 8 10 8 0 2 -
H 6 2 9 7 6 10 9 -
Participant A B C D E F G H
Interpretation: Two genetic families: A, B, C, D and E, F, G
One Singleton: HH
Problems:
1-3. Problems inherent in Genetic Distance.
4. Separate matrices necessary for comparing 12, 25, 37, 67 & 111 markers.
5. Matrices are very cumbersome for large projects.
Irwin project –
justification for use of 60% TiP
Score
24
0
10
20
30
40
50
60
70
Frequency
of
TiP Scores
Magnitude of TiP Scores from project modal haplotype
Irwin project : Definitions
• Genetic family: 2 or more participants with TiP Scores >
60%
(> 95% for dissimilar surnames).
• Singleton: unassigned Irwin with TiP Score < 60%.
• TiP Score: 24-generation, no-paper-trail TiP, at highest
available resolution, from modal
participant:
probability of sharing common ancestor with
modal participant within the surname era,
i.e. probability of being member of genetic family.
• Modal participant: participant whose genetic signature is the
most
typical of the members of a genetic family.25
Irwin project: Growth
0.12%
0.07%
0.04%
26
0
50
100
150
200
250
300
350
400
450
Nov
May
Nov
May
Nov
May
Nov
May
Nov
May
Nov
May
Nov
May
Nov
May
Nov
May
Nov
May
Nov
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Total participants
Genetic families
Singletons
TMRCA
(Time to Most Recent Common
Ancestor)
27
Popular tables/graphs can predict no. of generations/years
back to the common ancestor of two participants.
BUT
• All TMRCAs are probabilities
• TMRCAs based on genetic distance:
- assume some single average mutation rate;
- even the chosen average mutation rate may be incorrect;
- ignore back mutations;
- can be very misleading.
TMRCAs: typical margins of
error when predicted by
Genetic Distance
28
Genetic Most probable TMRCA 90% of TMRCAs
Distance within
0/37 1 generation = 30 years 0 - 290 years
1/37 3 generations = 90 years 0 - 450 years
2/37 6 generations = 180 years 65 - 580 years
3/37 9 generations = 270 years 110 - 710 years
4/37 12 generations = 360 years 165 - 825 years
5/37 15 generations = 450 years 220 - 930 years
Assumptions: average mutation rate =0.0042 per generation
1 generation =30 years
Source: www.dna-project.clan-donald-usa.org/tmrca.htm
NPEs: synonyms
• Non-paternal event (from genetics)
• Non-paternity event
• Extra paternity event
• False paternity event
• False paternity
• Misattributed paternity
• Non-patrilineal transmission
• Male introgression
• Ancestral introgression
• Undocumented Adoption
• Not the Parent Expected
• Surname discontinuity
• Surname Discontinuity Event (my preferred term)
29
NPEs: possible causes
Narrow definition (used in genetics):
• Surrogacy: not yet likely in context of genealogy
• Illegitimacy outside marriage: boy taking maiden name of mother
• Infidelity within marriage: boy taking surname of mother’s husband
Wider definition (when surname & DNA don’t match) also includes:
• Re-marriage: boy taking surname of step-father
• Adoption, incl. orphan, waif: boy taking surname of guardian
• Formal name-change: man taking maiden name of wife or mother
• Informal name-change, or alias: man taking name of farm, trade or mother
• Anglicisation of gaelic or foreign surname
• Error in genealogy
Similar symptoms , but not a NPE if father didn’t use a hereditary surname:
• By-name: man taking name of farm, trade or origin
• Tenant or vassal: man taking surname of landlord or chief
• Apprentice or slave: man taking surname of master 3030
Manifestations of NPEs
• Egressions from a genetic family (“e-NPEs”):
same DNA, but different surname
e.g. Irwin DNA, but Elliot surname
(possibly an Elliot step-father)
• Introgressions into a genetic family (“i-NPEs”):
same surname, but different DNA
e.g. Elliot DNA, but Irwin surname
(possibly an Irwin step-father)
“One project’s e-NPE is another project’s i-NPE”.
31
Examples of Irwin / Elliot e-
NPEs
32
...........Elliott
...........Elliott
...........Elliott
...........Elliott
...........Irving
...........Erwin
...........Elliott
...........Erwin
...........Nipper
...........Irvine
...........McDonald
...........Armstrong
............Irwin
............Snowdon
Examples of Elliot / Irwin i-
NPEs
33
.......... Elliott
............Fairbairn
............Fairbairn
............Elliott
............Elliott
............Elliott
............Elliott
............Farms
............Fairbairn
............Fairbairn
............Fairbairn
............Fairbairn
............Fairbairn
............Fairbairn
Recognising & handling
NPEs
e-NPEs:
testee finds near matches with another surname,
& asks admin. to join this second surname project.
NB Need stringent matching criteria or evidence of NPE.
i-NPEs:
administrator finds near matches with another surname,
& creates a new genetic family within in his project.
NB i-NPEs are a sensitive subject which may disappoint
testees, even if they accept the ‘event’ was not
necessarily an illegitimacy or infidelity.
For all NPEs, if cause & date of the ‘event’ are not known,
seek evidence that the two surnames were once neighbours.
34
35
ID Haplo 12 25
group 393 390 394 391 385 385 426 388 439 389 392 389 458 459 459 455 454 447 437 448 449 464 464 464 464
a b -1 -2 a b a b c d
Cluster (1)
65875 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17
112094 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17
194922 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17
102835 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17
108028 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17
85111 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17
72683 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17
54774 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17
87191 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17
19864 R1b1 13 24 14 11 11 15 12 12 12 12 13 28 18 9 10 11 11 25 15 20 30 15 16 17 17
169170 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 31 15 16 17 17
84825 R1b1 13 24 14 10 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 16 16 16 17
39927 R1b1 13 24 14 11 11 14 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 15 16 17
106520 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 - - - - - - - - - - - - -
Cluster (2)
161010 I1 13 22 14 10 14 16 11 14 11 12 11 29 15 8 9 8 11 22 16 20 28 12 14 14 15
72309 I1 13 22 14 10 14 16 11 14 11 12 11 29 15 8 9 8 11 22 16 20 28 12 14 14 15
Cluster (3)
51216 R1b1 13 24 14 11 11 14 12 12 13 13 13 29 17 9 10 11 11 25 15 19 29 14 15 17 18
29479 R1b1 13 24 14 10 11 14 12 12 12 13 13 28 17 9 10 11 11 25 15 19 29 14 15 16 17
Cluster (4)
75606 R1b1 13 24 14 10 11 15 12 12 11 13 13 29 17 10 11 11 11 24 15 19 29 15 17 17 17
22971 R1b1 13 24 14 10 11 15 12 12 11 13 13 29 17 10 11 11 11 24 15 19 29 15 16 15 17
Singleton
84049 R1b1 13 25 14 10 11 14 12 12 12 12 14 28 17 9 10 11 11 25 15 18 30 16 16 16 17
Key: compared with modal value:
>2>; 2> ; 1> ; = ; <1 ; <2 ; >2< bold: fast moving markers small: GD rule differs
Irwin project: Results
examples (1)
36
Irwin project: Results
examples (2)
ID Earliest confirmed paternal ancestor Haplo- No. of Genetic Distance TiP Remarks
Surname Forename born died Residence(s) group markers from Mode Score
tested /12 /25 /37 /67 /111 from modal
SCOTTISH BORDERS ("B")
65875 U Irwin Henry E c1813 Lancaster Co, PA R1b1 67 - - - - - - Modal participant
112094 E Urwin William 1783 1851 Co. Durham R1b1 67 0/ 0/ 0/ 0/ - 100%
194922 U Ervin John 1715 N.Ireland SC R1b1 111 0/ 0/ 0/ 0/ 0/ 100%
102835 U Armstrong 1844 1902 Co.Tyrone OH R1b1 67 0/ 0/ 0/ 0/ - 100%
108028 U Irvine Andrew 1763 1797 Ireland PA R1b1 37 0/ 0/ 0/ - - 100%
85111 U Irwin Samuel 1736 1783 Lancaster Co, PA R1b1 67 0/ 0/ 1/ 1/ - 100% 5th cousin of 72683
72683 U Irwin Samuel 1736 1783 Lancaster Co, PA R1b1 111 0/ 0/ 2/ 2/ 5/ 99% 5th cousin of 85111
54774 U Irving William fl.1484x1506 Bonshaw, Dumfriesshire R1b1 67 0/ 0/ 2/ 3/ - 99%
87191 S Irving Francis c1568 1633 Dumfries, Dumfriesshire R1b1 67 0/ 0/ 1/ 2/ - 99% brother of 19864
19864 S Irving Francis c1568 1633 Dumfries, Dumfriesshire R1b1 67 1/ 2/ 3/ 4/ - 99% brother of 87191
169170 E Irvine John 1662 1732 Eskdale, Dumfriesshire R1b1 37 0/ 1/ 3/ - - 99% Mt. Everest line
84825 U Erwin Matthew c1695 Co.Antrim? NC R1b1 67 1/ 3/ 5/ 5/ 7/ 98% False negative
39927 C Elliot Simon 1897 1955 Co.Fermanagh R1b1 37 1/ 2/ 4/ - - 98% e-NPEs
106520 U Irvin Joe 1744 MD R1b1 12 0/ - - - - 91%
NPE Elliot (1) ("NE1")
161010 U Irwin Hiram 1815 Ireland? IL I1 67 13/ 28/ 39/ 55/ - 0% ) 100% with Elliots
72309 U Irwin Andrew 1765 1824 Scotland TN I1 37 13/ 28/ 40/ - - 0% ) i-NPEs
ORKNEY (1) ("O1")
51216 U Irving Christe fl. 1468 Shapinsay, Orkney Isles NY R1b1 37 2/ 6/ 11/ - - 16% Washington Irving
29479 E Irvine George c1705 1742 Sandwick, Orkney Isles R1b1 37 3/ 6/ 11/ - - 18% author of this paper
IRISH - Munster ("IM")
75606 U O'Ciarmhacain/Irwin Eoin 1785 1845 Limerick, Ireland NJ R1b1 67 2/ 8/ 16/ 19/ - 1% gaelic; catholic
22971 I Irwin William 1840 Limerick, Ireland R1b1 67 2/ 9/ 17/ 20/ - 1%
Singleton
84049 U Irwin William c1770 c1810 Leinster, Roscommon R1b1 37 5/ 9/ 16/ - - 2%
Irwin project: Genetic Families
And we thought Irwin was a single-origin surname!
*: with 262 members this is apparently the largest genetic family in any surname project.
37
Origin Genetic % of 392 of which
Families participants e-NPEs
Scotland Borders* 1 67% 17%
i-NPEs 15 10% 0
Aberdeenshire 1 1% 0
Forfarshire 1 0% 0
Perthshire 1 1% 0
Orkney 2 2% ?1%
Shetland 1 1% 0
Unknown 6 3% ?0-3%
Ireland 4 4% 1%
Germany/ Netherlands 1 2% 0
Africa 1 0% 0
Singletons - 9% ?
Total 34 100% 13-16%
38
EXAMPLE OF TRIANGULATION Crystie Irwing Irvings were first Magnus (Irving)
fl. 1468, -a1504 recorded in Orkney fl. 1470
IRVINGS OF ORKNEY first of Sabay in 1369 Clovigarth
showing the two lines of descent John m ? ………….
identified by DNA tests fl.1483,-1519x22 heiress (Clovigarth)
Sabay of Yesnaby
James John m2 Katherine Kirkness m1 ........ Irving
fl.1534, -1567 fl.1534 , -1597/8 fl.1561 (Clovigarth)
Sabay; Law man of Orkney Overgarson heiress of Overgarson?
?
Magnus William William James Alexander
fl.1536, -1614 fl.1601 -1614 -1612 fl.1601
Shapinsay Sabay Clovigarth Overgarson Yesnaby
Thomas Patrick Magnus Alexander Alexander
c1570-p1646 fl. 1582, -a1614 fl.1583, -1649 -1629 c1600-1642
Quholm Overgarson Lie Yesnaby
? William Magnus Patrick George
c1610- c1601-1626 -1657 fl. 1635x78 c1628-c1700
last of Sebay Overgarson Lie Yesnaby
George David James
fl.1650, -1702x11 fl. 1673x1701 c1660-c1705
Overgarson Lie Yesnaby
Magnus Patrick
1650- fl.1711x29
John Magnus Hary (2) Duncan (1) Edward Edward
1682-a1746 1685-p1731 c1705-p1768 c1700-1749 1704-1756x64 1707-1796
Quholm Skaebreck Overgarson Lie Quoyloo
James William John Edward George
c1734-1797 1731-1807 ? c1736-p1792 c1735-c1791 c1750-1800
Quholm; NY Skaebreck Overgarson Quoyloo
James Ebenezer John m Jannet Edward Peter George
1759-1835 1776-1868 -1808x21 Irvine 1774-1833x41 1741-p1772 c1750-1800
New York Washington Huan 1754-1832x41 Overgarson Lie Quoyloo
1783-1859
author
FTDNA Kit. No. 174038 51216 29479 169056 174074 199671
Test sequence 4th= 2nd 1st 3rd 4th= 6th
Genetic family "Orkney 1" "Orkney 2"
Irwin project:
Geographic origins
39
Participant's Residence of Historic origin
place of earliest confirmed of
residence paternal ancestor genetic family
Project size 392 392 392
USA 77% 21% -
Canada 6% 1% -
Australia, New Zealand 6% - -
England & Wales 5% 3% -
Ireland (NI & Eire) 1% 40% 5%
Scotland 5% 23% 84%
Germany, Netherlands - 1% 2%
Unknown, other - 10% 9%
40
Irvine, Ayrshire
Irwin project:
1200 Scottish ancestral lines
as shown by DNA tests
1300
Borders X Drum, Aberdeenshire X
1400 Orkney1 Orkney2
1500 Eskdale Bonshaw Dumfries 11 other
& Castle lines
Irvine
1600 X Perth X Shetland
1700
1800
BE BB BD BA, Bel, Ber,
B9, B10, B14, B15,
B16, B17, B23, B29
Eskdale
Irwin project : Borders Family
Cladogram
41
Irwin project:
The 15 sub-groups of the Borders
family
(pre-BigY)
- SNP L555 recognised by ISOGG in mid-2012
- 50 tests to date, nil “L555-” results by Irwins or NPEs
42
L21 Totals
Z251
L555
mode DYS DYS DYS DYS DYS DYS DYS DYS DYS DYS DYS YCA DYS un-
617 576 449 442 447 459b 391 570 534 438 570 11b 449 assigned
=11 =17 = 31 = 13 =27 = 9 = 10 =14 = 15 = 16 = 17 = 23 = 29
No. of members 34 16 15 19 6 3 11 32 7 4 5 18 7 16 67 262
excl. NPEs & <37 markers 202
US descendants? Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Irish ancestors? Yes Yes Yes Yes Yes ? Yes Yes Yes ? Yes Yes Yes Yes Yes
Scotish origin ? Bonshaw Dumfries Eskdale ? ? ? ? ? ? ? ? ? ? -
NPE surname - - - - Elliot Errand - - - - - - - - -
Code BA BB BD BE Bel Ber B9 B10 B14 B15 B16 B17 B23 B29 BX
TMRCA ( by STRs) 1800 1750 1050 850 1700 1300 750 1200 BC200 1700
Earliest genealogy 1700 1500 1565 1600 1800 1850 1700 1700 1750 1700 1800 1650 1650 1650 var.
L555 Yes Yes Yes Yes Yes ? Yes Yes Yes Yes ? Yes Yes Yes (Yes)
The two types of y-DNA
test
43
STR tests
metaphor: "individual leaves on a tree"
used for: comparing genetic signatures
Sequencing Sanger Next Generation
quantification analogue probabilistic
expressed as counts of markers quality of base pairs
FTDNA y- tests 12/25/37/67/111 markers Single SNP SNP Pack BigY
use in Surname
projects
main tool haplogroup BigY advanced tool
projects: confirmation support
secondary data haplogroup prediction STR and mt data
SNP ('snip') tests
"branches and twigs"
building phylogenetic tree
Sanger
binary
e.g. L21+ or L21-
Irwin project: Phylogenetic treeThe genetic "Adam" 200,000-300,000bp
M42
M168 70,000bp
M89
M9
M45
M96 M170 M304 M207 30,000bp
E I s1I J R (years before present)
P147 L68 M253 NE1 NKr M267 M172 M173 25,000bp
E1 I2 I1 NC ND J1 NG J2 R1
P177 L46 M410 M513 M343 16,000bp
P2 L135 CLAN IRWIN PHYLOGENETIC TREE L26 M439 UD P25
M2 AF M223 IL as at 1 Nov. 2015 M67 UJ P297 12,000bp
showing tested members of Irwin genetic families in green, M269 NBt NJ NKd NL
UN U3 U4 U5
and FTDNA's predictions of Irwin genetic families in red. L23 Mesolithic
See Borders Irwin phylogenetic tree for L555 BigY results L51
PF7589 G L151, P311 Atlantic Modal Haplotype
U106O2 P312 SF 5,300bp-Neolithic
S263 DF27 Z195 M269+, L21- DA L21 NR 4,000bp
S264 L176.2 Z274 DF63 DF13
DF96 Z262 Z209 NN CTS6919 DF49 - b DF21 - h CTS4466 Z251
R1b12a1a2c1a - c R1b12a1a2c1g - i R1b12a1a2c1l R1b12a1a2c1j
- d - k
- e - m
L1 NBl M167 O1 A92 DF23 - f Y11277 - n Z21065 - S1156 Z16943
- FGC13899
Z16506 Z2961 Z16294 A541 CTS4157 Z16944 Pre-surname era
BA BB BD BE Bel Ber
BY674 NM M222 PFNF Z16281 NE2 A195 IM1 FGC7549 L555 B9 B10 B14 B15 B16
PF IM2 B17 B23 B29
Part 2:
BigY and BAM data –
use and interpretation
45
Example of Williamson’s
“BigTree”
www.ytree.net
46
Irwin project, ex
“BigTree”
47
R-P312
ZZ37 L21
Z29644 DF63 DF13
Z29645 A91 DF21 FGC11134 Z251
Z29646 A92 S5488 Z16250 Z16943 S11556 FGC13899
Z29647 Z16506 Z16294 CTS4466 Z16944 CTS4157 A6077
BY674 Z16281 Z21065 S1115 L555 Z16929 Z16932 Z16935 Z16937 Z16940 Z16945 Z16949 FGC19531 14750280AA A2201 V38 A4257
L557 Z16930 Z16933 S20749 Z16938 Z16941 Z16946 Z17660 FGC19533 16344314TT
Z16282 A195 Z21065 L561 Z16931 Z16934 Z16936 Z16939 Z16942 Z16947 Y5816 FGC19536
FCG34569 -21368012GA
6966393AG 17193400CA CTS11273 FGC4341 9166578AC 7583420GA 16630774GA 7581395GT FGC19532 14209909CT 8531427CT 8356286CT
7244870AG 17319595GA 15093112GA 15218377TA 10007460CT 14268577CT 22487613GT FGC19534 16967721CA 16561158AG 17417800AC
7940600GA 19263733TA 1554 21519299GA 19166468GA 19048311TC FGC19535 17371426CT 21515424TA 20809987AC
8311955CA 21782548TG 22479673GC 23804663GA 19201889CG FGC19537 21030091GA 21950915GT 23427085GA
16737596AT 24479734TC FGC19538 22164909TC
17357906TA FGC19539
17851993CG 16344316TT
19262306GA 18982587GA
21306828GA 18982595GA
22461683GT
218190 75606 Withers
Irvine Irwin Breen Burgin
NM IM Broadley Reams
328617 3722 A3093 280599 22874 N126337 23026 54774 280156 226426 160045 65048 364399 311268 87191 Bradley Hardage
Irvine Flanagan Whitaker Irvin Irvine Irvin Irvin Irving Ervin Irving Irwin Erwin Ervin Cunningham Irving Clarke Fortner
singleton NE2 (NPE) IU (NPE) B14 BX BA B29 BB B23 B17 B9 B10 BE BX (NPE) BD Donatella Desmond
Irwin project: BigY
goalsInitial goals
• manage and understand BigY results
• set up cloud account to share project data
Interim goals
• minimise dependence on 3rd
party analysis tools
• focus on our large L555 (“Borders”) genetic family
• facilitate 1 BigY test for each of 10 main sub-groups
• confirm/refine project phylogentic tree and TMRCAs
Current goals
• facilitate FTDNA offering a low-cost L555 “SNP Pack” test
• use SNP Pack data to refine individual TMRCAs
NB I am giving low priority to “naming” novel variants and having them placed
on the phylogenetic trees of FTDNA and ISOGG, at least until a robust
understanding of the structure of L555 sub-branches has emerged.
48
Example of limitations of
algorithm-based analyses of BigY test
results:
the Private SNPs of FTDNA L555 Kit no. 65048
49
FGC YFull Williamson "DIY"
Name Position vcf csv** Analysis *** incl. In No. of No. of Consistency SNP
Big Tree?* reads Indels of SNP reads status
FGC19532 8557914 G A Pass, I variant Known SNP, High conf. Private >95% B100 yes 75 0 100% Probable
FGC19534 16642304 G C Pass, I variant Known SNP, High conf. Private >95% B100G yes 48 0 100% Probable
FGC19535 16956346 T G Pass, I variant Known SNP, High conf. Private >95% B100 yes 81 0 100% Probable
FGC19537 18668146 C A Pass, I variant Known SNP, High conf. Private >95% C 98 yes 47 0 98% Probable
FGC19538 18775426 C T Pass, I variant Known SNP, High conf. Private >95% B100 yes 64 0 100% Probable
FGC19539 19436082 G A Pass, I variant Known SNP, High conf. Private >95% C 96 yes 40 0 98% Probable
- 18982587 G A - Novel variant, High conf. - - - 34 0 94% unstable
- 18982595 G A - Novel variant, High conf. - - - 32 0 97% unstable
- 13226006 C A - - Private >40% - - 2 0 100% possible
- 13571571 C T - - Private >40% - - 2 0 100% possible
- 10064260 C T - - Private >40% - - 2 0 100% possible
- 16275572 C A - - - M100 - 2 0 100% possible
A608 7534406 G T - Known SNP, High conf. * - - 94 55 67% no
- 16344316 TC T Pass, I variant - -/a - - 73 0* 100% no
CTS10214 19328796 G T Rej'd*, 1 variant - - 1 read - 1 0 100% no
PF3499 14624254 C T - - - >1 read - 29 0* 100% no
*: no BED coverage **: FTDNA list 73 other ***: FGC and YFull's *: AW lists *: Indel in
high conf. Novel variants, analyses have many 20 other low others tests
of which 13 appear to be more low confidence conf. Private
private to 65048 private markers markers
BAM dataFTDNA
Bases
Variant
Analysis options for BigY test
results
50
FTDNA BAM file
Computerised algorithms ("science") Manual refinement ("art")
FGC YFull FTDNA vcf file
Analysis Analysis
FTDNA csv file Haplogroup projects
e.g. "Big Tree"
FTDNA Matches Surname project admins "DIY"
Detecting & Filtering Quality
- High level SNPs - Old SNPs - Regions
- Terminal SNPs - Intermediate SNPs - SNPs/Indels
- Novel SNPs - Private SNPs - No.of Reads
- Unique SNPs - Consistency of Reads
- Compatibility within sub-clade
- Stability across haplogroup
- Phylogenetic trees
-TMRCAs
Process for “DIY” BigY
analysis1. Create project cloud account ; upload VCF, BAM & BAM.BAI files.
2. Identify relevant variants from CSV & Matches data, Walsh & Williamson
(& FGC/YFull Analyses, if used).
3. Use BAM IGV viewer to:
(1) filter relevant variants: A: pre-L21 (shared by all L555
testees)
B: L21-L555 ( ” )
C: L555 block (shared only by L555 testees)
I : Intermediate
(shared by some L555 testees)
Pn: Private (unique to each testee).
(2) determine SNP quality for each variant:
“Probable” if >10 reads AND consistency >85%
“possible” if 2-9 reads OR consistency 70-85%
“No” if 1 read, OR consistency <70%,
OR Indel, OR unreliable region.
4. Consider stability of SNP quality vs. that for closely-related BigY testees.51
BAM analysis Example:
1: Use of BAM IGV
Viewer
www.broadinstitute.org
52
BAM analysis Example:
2: Construct matrix of relevant
variables and closely-related BigY
testees
53
Named Position 1 - 22874 2 - 311268 6 - N126337
Variant on Genome Irvine - BX C'ningam-BX Erwin - B10 Irving - B17 Irvin - B26 Irvin - BA
Reference
Alternative
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
CTS11273 23045843 T A
DF13 2836431 A C
FGC19532 8557914 G A
FGC19534 16642304 G C Synonyms and positions of
FGC19535 16956346 T G named variants
FGC19537 18668146 C A (shown in red)
FGC19538 18775426 C T are derived from
FGC4341 8757882 A G ybrowse
L21 15654428 C G (www.ybrowse.isogg.org)
L555 7647335 G T
PF496 13297909 T G
PF6729 10022033 A G
PR1489 14543997 C C
Z16940 22470652 T T
Z16946 8014468 G A
Z16949 7933047 T TAA
CAZ251 8736334 G A
8531427 C T
13226006 C A
13294119 T T
13801126 A G
15093112 G A
15218377 T A
16561158 A G
16630774 G A
17319595 G A
18982595 G A
21368012 G A A G G A A A 32 0 94
21515424 T A
21782548 T G
21950915 G T
22487613 G T
23898645 T C
24479734 T C
Base 5 - 230264- 2264263 - 65048
BAM analysis Example:
3: Enter BAM data, sort &
filterBlock Named Position 1 - 22874 2 - 311268 6 - N126337 SNP Comments
Variant on Genome Irvine - BX Cunningam-BX Erwin - B10 Irving - B17 Irvin - B26 Irvin - BA Category
Reference
Alternative
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
Alternative
No.ofreads
No.ofindels
Alt./reads%
Block B L21 15654428 C G G 59 0 100 G 71 0 98 G 60 0 100 G 69 0 96 G 33 0 97 g 18 0 78
L21 to DF13 2836431 A C c 3 0 100 c? 1 0 100 C 11 0 91 c 6 0 100 c 6 0 100 c 2 0 100 Poor qualities -surprising
L555 Z251 8736334 G A a 4 - 100 - a 6 0 100 a 14 0 100 a 2 0 100 ?a 7 0 57 Poor qualities -surprising
Block C L555 7647335 G T T 51 0 100 T 54 0 98 T 76 0 100 T 91 2 100 T 36 0 100 t 9 0 78 Probable
L555 Z16946 8014468 G A A 50 0 94 A 125 0 100 A 49 0 100 A 73 0 100 A 22 0 100 A 25 0 88 Probable
Z16940 22470652 T T C 53 0 96 C 52 0 88 C 72 0 89 C 44 0 89 C 53 0 100 C 59 0 86 No Unreliable region
Z16949 7933047 T TA T 46 39 100 T 76 75 95 T 38 39 100 T 47 47 100 T 54 47 100 T 94 68 100 No Indel
Intermediate FCG34569 21368012 G A A 85 0 100 G 147 0 90 G 82 0 100 A 80 0 99 A 48 0 98 A 32 0 94 Probable
Block PF496 13297909 T G g 71 0 73 t? 21 0 67 T 15 0 100 T 15 0 93 T 21 0 100 g 85 0 65 No conflicts with FCG34569
Private 17319595 G A A 23 0 87 G 24 0 100 G 27 0 100 G 58 0 100 G 24 0 100 G 78 0 100 Probable
block for 21782548 T G G 79 0 100 T 174 0 100 T 93 0 100 T 97 0 100 T 35 0 97 T 27 0 100 Probable
1 -22874 PF6729 10022033 A G g 7 0 86 a? 8 0 85 a 4 0 100 a 11 0 64 ?a 6 0 83 ?a 5 0 60 possible
Private 8531427 C T C 63 0 100 T 47 0 98 C 44 0 100 C 47 0 100 C 69 0 100 C 72 0 100 Probable
block for 16561158 A G A 17 0 100 G 34 0 100 A 23 0 100 A 41 0 100 A 14 0 100 A 16 0 100 Probable
2 -311268 21515424 T A T 45 0 100 A 59 0 98 T 49 0 100 T 77 0 99 T 42 0 100 T 45 0 100 Probable
21950915 G T G 47 0 100 T 63 0 94 G 61 0 100 G 54 0 100 G 29 0 100 G 42 0 100 Probable
13801126 A G c 1748 10 81 G 2281 0 89 c 1144 1 76 c 1658 7 71 ?c 1083 28 57 ?c 1676 53 63 No Indel
Private FGC19532 8557914 G A G 59 0 100 G 99 0 98 A 75 0 100 G 93 0 100 G 31 0 100 G 101 0 100 Probable
block for FGC19534 16642304 G C G 58 0 100 G 77 0 100 C 48 0 100 G 67 0 100 G 45 0 100 G 21 0 100 Probable
3 -65048 FGC19535 16956346 T G T 90 0 100 T 139 0 95 G 81 0 100 T 53 0 100 T 87 0 100 T 102 0 100 Probable
FGC19537 18668146 C A C 29 0 100 C 53 0 100 A 47 0 98 C 64 0 100 C 21 0 100 C 44 0 100 Probable
FGC19538 18775426 C T C 59 0 100 C 128 0 100 T 64 0 100 C 58 0 100 C 48 0 100 C 18 0 100 No appears elsewhere in L21
13226006 C A c 4 0 100 c 4 0 100 a 2 0 100 c 6 0 100 c? 1 0 100 C 31 0 100 possible
Private 16630774 G A G 65 0 100 G 44 0 100 G 42 0 98 A 32 0 100 G 59 0 100 g 6 0 100 Probable
block for 22487613 G T G 119 0 98 G 127 0 93 G 101 0 99 T 67 0 88 G 205 0 99 G 184 0 100 Probable
4 -22642 PR1489 14543997 C C c 4 0 100 - c? 1 0 100 a 2 0 100 c 8 0 100 c 5 0 80 possible
Private 15218377 T A T 22 0 100 T 41 0 100 T 31 0 100 T 51 0 100 A 10 0 100 T 40 0 100 Probable
block for 24479734 T C T 91 0 100 T 143 0 100 T 80 0 100 T 51 0 100 C 58 0 98 T 72 0 100 Probable
5 -23026 FGC4341 8757882 A G A 24 0 100 A 45 0 98 A 35 0 100 A 51 0 100 g 9 0 100 a 4 0 100 possible note marginal no. of counts
Private 23898645 T C t 56 0 84 t 109 0 78 t 71 0 80 t 90 0 71 t 45 0 80 C 27 0 85 Probable
block for 15093112 G A G 98 0 100 G 74 0 99 G 76 0 100 G 34 0 100 G 104 0 100 a 137 0 84 possible note marginal consistency
6 -N126337 13294119 T T C 32 0 100 C 35 0 100 C 25 0 92 c 74 0 62 C 18 0 100 t 10 0 70 possible
5 - 230264- 2264263 - 65048Base
L555 BAM analysis Results
55
BigY - L555 data as of 21 Oct 2015, by James Irvine, based on initial work by Dennis Wright
JamesIrvine: DennisWright: FTDNA: VCF(1): A if Quality >500 Alex Williamson: Mike Walsh (1): FGC: YFull: All:
Stage/Block: ) Lower case .BAM data: A Capitals: tested A .bam: not seen in,vcf -"Good" CSV: a if Quality <500 y included as per DW 9 Tree, official S shared, 99, 95% - no entry
A: Adam - L21, shown at foot of table ) IF <50% are A IF >85% AND no. ofreads >10 g Rejected, "1"qual. >500 a .bam: not seen in,vcf -"Weak" n Novel VCF(2): P pass p privste, not terminal 8 Tree, draft 3 Multi family/surname s shared, 40% m >1 read
B: L21 - L555 ) "good" BAM a IF 70-84%OR no. ofreads 2-9 - Rejected, "1"qual. <500 ? inconclusive: 1 or 2 samples, multiple bases k Known- R rejected ? private, "?" 7 Public, consistent 2 Singe family/surname P private, 99, 95% s 1 read
C: L555 ) data a? IF no.of reads 1 ? Inconclusive, "0/1" a/- no .bam test result H High conf. 0 ancestral ; 2 entries for 1 SNP! 6 Public, semi-cnstnt 1 Single individual p private, 40%
intermediate Between C and P ) Italicsin cols. G & H A Private to individual Shared SNPs which DW ignores M Med. Conf. 1 derived 4 Public, unsure -1 Unstable confirmed * private, 10%
P1, P2, P3 .... Private: unique to 1 test) additional to DW T Inconclusive SNP Unstable region - 22216800-22512940 (T Krahn) u Unknown conf. 0/1 1 & R
1 - 22874 2 - 311268 6 - N126337 7-54774 8 - 364399 9 - 280156 10 - 87191 11- 160045 12 - 280599
Irvine - BX Cunningham - BX Erwin - B10 Irving - B17 Irvin - B26 Irvin - BA Irving - BB Ervin - BE Ervin - B23 Irving - BD Irwin - B9 Irvin - B14
SNP
(Variant/
Indel)
Remarks
Stage/Block
Position b37
Reference
Alternative
Alternative
reads
Indels
Derived/reads%
vcf(1)FTDNA
vcf(2)FTDNA
csvFTDNA
AWilliamson
MWalshStage
FGC
YFull
Alternative
reads
Indels
Derived/calls%
vcf(1)FTDNA
vcf(2)FTDNA
csvFTDNA
AWilliamson
MWalshStage
FGC
Alternative
reads
Indels
Derived/calls%
vcf(1)FTDNA
vcf(2)FTDNA
csv
AWilliamson
MWalshStage
FGC
YFull
Alternative
reads
Indels
Derived/calls%
vcf(1)FTDNA
vcf(2)FTDNA
csvFTDNA
csv:N:NovelV.;H:HighConf.
MWalshStage
Alternative
reads
Indels
Derived/calls%
vcf(1)FTDNA
vcf(2)FTDNA
csv
MWalshStage
Alternative
reads
Indels
Derived/calls%
MWalshStage
Alternative
reads
Indels
Derived/calls%
MWalshStage
Alternative
reads
Indels
Derived/calls%
MWalshStage
Alternative
reads
Indels
Derived/calls%
MWalshStage
Alternative
reads
Indels
Derived/calls%
MWalshStage
Alternative
reads
Indels
Derived/calls%
MWalshStage
Alternative
reads
Indels
Derived/calls%
MWalshStage
MWalsh-Total
Block B: L21 to L555
L21/S145/M529 B 15654428 C G G 59 0 100 y 9 G 71 0 98 y 9 G 60 0 100 9 G 69 0 96 G 1P G kH 9 G 33 0 97 9 g 18 0 78 9 G 46 0 100 G 60 0 100 G 50 0 100 G 61 0 100 G 26 0 96 G 67 0 94
DF13/S521/CTS241 b 28364318 A C c 3 0 100 c 1P - y 9 c? 1 0 100 c 1R y - c 11 0 91 y 9 c 6 0 100 C kH 9 c 6 0 100 9 c 2 0 100 9 c 7 0 86 c 8 0 100 c 7 0 88 C 18 0 100 c 4 0 100 c 5 0 100
Z251/S470 b 8736334 G A a 4 - 100 a 1R ku y S m - - y ? a 6 0 100 y s m a 14 0 100 - 1R ? k?u a 2 0 100 ?a 7 0 57 a 7 0 100 a 6 0 83 a 8 0 100 a 9 0 100 a 1 0 100 A 14 0 100
Z18600 FGC only, not covered by BigY 25633952 G A
Z16943 B 6351101 T A A 46 0 100 A 1P nH y 7 - - A 62 0 97 A 1P nH y 7 - A 51 0 100 nH y 7 - - A 74 0 100 A nH 7 A 53 0 96 A 1P nH 7 A 71 0 90 7 A 69 0 100 A 66 0 87 A 77 0 100 A 107 0 97 A 75 0 100 A 80 0 100
Z16944 DW had as P1 B 7527372 G A a 37 0 84 - -! y;p? - - - A 24 0 100 A 1P nH y 7 P A 26 0 100 nH y 7 P - A 29 0 100 A 1P A kH 7 A 45 0 100 A 1P nH 7 A 80 0 90 7 A 48 0 100 A 40 0 98 A 66 0 95 A 61 0 98 A 67 0 A 34 0 100
CTS4157/S3741brother of Z16944 (AW); public block? B 15439136 G A G 15 0 100 G 0P kH - - - - G 18 0 100 g 0P kH - - - G 25 0 100 kH y - - - G 38 0 100 G 0P G kH - ?g 4 0 100 g 0P - - - g 6 0 100 G 14 0 100 G 10 0 100 G 10 0 100 g 5 0 100 G 17 0 100
FGC13746 public block withFGC7549? (Donatella) B 9375616 G T T 38 0 97 T 1P nH - 4 - - T 112 0 99 T 1P nH - 4 - T 45 0 100 nH - 4 - - T 64 0 100 T 1P T nH 4 T 38 0 100 T 1P nH 4 T 17 0 82 - T 40 0 98 T 48 0 98 T 36 0 92 T 53 0 100 T 45 0 100 T 59 0 100
FGC8673 public block withFGC7549? (Donatella) B 9852985 A G G 19 0 100 nH y 4 - - g 114 0 75 nH y 4 - G 52 0 100 nH y 4 - - G 38 0 97 G nH 4 G 14 0 100 nH 4 ?g 5 0 40 - g 12 0 83 G 10 0 100 g 7 0 100 G 20 0 100 G 59 0 100 G 10 0 100
-AW found 2015H1 B 22424486 A A A 88 0 86 A 98 0 100 A 61 0 95 A 67 0 97 A 123 0 86 a 92 0 85 a 62 0 84 A 78 0 90 A 88 0 85 A 83 0 85 A 218 0 94 A 61 0 90
Block C: L555
L555/S393 C 7647335 G T T 51 0 100 kH y 7 - m T 54 0 98 T 1P kH y 7 - T 76 0 100 T 1P kH y 7 - m T 91 2 100 T 1P T kH 7 T 36 0 100 T 1P 7 t 9 0 78 - T 35 0 100 T 52 0 100 T 61 0 100 T 43 0 100 T 25 0 94 T 52 0 100
L557/S394 DB omission? C 22513691 C G G 54 0 100 G 1P kH y 7 P m G 106 0 95 G 1P kH y 7 P G 68 0 100 G 1P kH y 7 P m G 80 0 100 G 1P G kH 7 G 41 0 98 G 1P 7 ?c 12 0 58 - G 76 0 99 G 73 0 100 G 75 0 99 G 88 0 100 G 55 0 93 G 61 0 100
Z16945 C 7536923 A G G 29 0 100 G 1P nH y 7 - - G 38 0 84 nH y 7 - G 28 0 100 nH y 7 - - G 34 0 100 G 1P nH 7 G 37 0 97 nH 7 g 10 0 76 - G 26 0 96 G 31 0 97 G 43 0 95 G 39 0 100 G 76 0 99 G 18 0 100
Z16946 C 8014468 G A A 50 0 94 nH y 7 - - A 125 0 100 nH y 7 - A 49 0 100 nH y 7 - - A 73 0 100 A nH 7 A 22 0 100 nH 7 A 25 0 88 7 A 33 0 100 A 54 0 95 A 51 0 96 A 75 0 100 A 62 0 98 A 49 0 100
Z16929 c 13493784 A G G 29 0 97 nH y 7 - - G 69 0 97 nH y 7 - G 35 0 100 nH y 7 - - G 45 0 100 G nH 7 g 4 0 100 - - - G 16 0 94 G 21 0 100 G 23 0 100 G 30 0 100 G 10 0 100 G 27 0 100
Z16930 C 15625978 A G G 51 0 100 G 1P nH y 7 - - G 52 0 92 G 1P nH y 7 - G 45 0 100 nH y 7 - - G 102 0 97 G 1P G nH 7 G 35 0 100 nH 7 g 4 0 100 - G 71 0 100 G 78 0 99 G 101 0 96 G 106 0 100 G 49 0 98 G 80 0 98
Z16931 C 16433477 T C C 52 0 100 nH y 7 - - C 80 0 99 nH y 7 - C 60 0 100 nH y 7 - - C 39 0 97 C nH 7 C 53 0 98 nH 7 C 76 0 86 7 C 39 0 92 C 43 0 91 C 78 0 99 C 89 0 100 C 49 0 98 C 51 0 100
Z16932 C 17236526 C T T 34 0 100 nH y 7 - - T 65 0 100 nH y 7 - T 60 0 95 nH y 7 - - T 39 0 100 T nH 7 T 24 0 100 nH 7 t 25 0 84 - T 32 0 97 T 42 0 98 T 46 0 98 T 50 0 100 T 23 0 94 T 27 0 100
Z16933 C 17438536 G C C 25 0 100 nH y 7 - - C 24 0 100 nH y 7 - C 26 0 100 nH y 7 - - C 26 0 100 C nH 7 C 15 0 100 nH 7 t? 1 0 - C 19 0 100 C 25 0 96 C 25 0 100 C 21 0 100 C 23 0 100 C 30 0 100
Z16934 C 17448751 G C C 16 0 100 nH y 7 - - C 19 0 100 nH y 7 - C 21 0 100 nH y 7 P - C 28 0 100 C nH 7 c 5 0 100 - C 15 0 87 - C 17 0 100 C 22 0 95 C 22 0 100 C 35 0 100 c 2 0 100 C 13 0 100
Z16935 C 17612482 C T T 46 0 95 nH y 7 - - T 145 0 97 nH y 7 - T 91 0 100 nH y 7 P - T 91 0 99 T nH 7 T 44 0 100 nH 7 T 46 0 89 7 T 31 0 97 T 61 0 97 T 77 0 100 T 64 0 98 T 103 0 99 T 60 0 100
S20749 C 18171989 C T T 40 0 95 nH y 7 - - T 30 0 97 nH y 7 - T 36 0 100 nH y 7 - - T 69 0 100 T nH 7 T 41 0 100 nH 7 t 35 0 74 - T 48 0 100 T 57 0 100 T 63 0 98 T 75 0 100 T 28 0 100 T 49 0 96
Z16936 C 19094859 T C C 26 0 100 nH y 7 - - C 61 0 97 nH y 7 - C 57 0 100 nH y 7 - - C 60 0 98 C nH 7 C 19 0 89 nH 7 C 22 0 91 7 C 37 0 97 C 51 0 100 C 35 0 100 C 53 0 92 C 15 0 100 C 38 0 100
Z16937 C 19200522 G T T 71 0 99 nH - 7 - - T 109 0 97 nH - 7 - T 97 0 98 nH y 7 P - T 83 0 100 T nH 7 T 50 0 100 nH 7 t 101 0 85 7 T 64 0 98 T 112 0 100 T 87 0 100 T 96 0 100 T 62 0 89 T 63 0 100
Z16938 C 19548026 G A A 38 0 97 nH - 7 - - A 103 0 97 nH - 7 - A 52 0 100 nH y 7 P - A 77 0 100 A nH 7 A 33 0 100 nH 7 a 50 0 84 7 A 50 0 100 A 71 0 100 A 58 0 95 A 69 0 97 A 36 0 100 A 59 0 98
Z16939 C 21810487 A G G 69 0 99 nH y 7 - - G 84 0 98 nH y 7 - G 75 0 100 nH y 7 - - G 63 0 98 G nH 7 G 70 0 100 nH 7 G 110 0 85 7 G 90 0 90 G 102 0 98 G 107 0 93 G 115 0 99 G 66 0 98 G 84 0 100
Z16942 C 23130578 T A A 50 0 96 nH y 7 - - A 38 0 100 nH y 7 - A 55 0 98 nH y 7 - - A 45 0 100 A nH 7 A 22 0 100 nH 7 a 53 0 75 - A 27 0 100 A 56 0 98 A 58 0 98 A 57 0 95 A 14 0 100 A 52 0 98
Z17660 C 8877028 G C C 13 0 100 nH y 3 p - C 12 0 100 c 1P nH y 3 p c 4 0 100 c 1P -! y? - p - C 16 0 100 c 1P C nH 3 c 6 0 100 - - - ?c 13 0 69 - c 12 0 83 C 14 0 100 c 23 0 83 C 20 0 100 c 7 0 100 c 8 0 100
FGC19531 csv had both Novel & Known!; AW had P3c 6643803 C T t 8 0 100 - kH - - P - t 8 0 100 kH - - P t 9 0 100 nH Y 2 P - T 16 0 100 t 1P T nH 2 T 15 0 100 nH 2 t 5 0 80 - T 14 0 100 T 13 0 100 T 13 0 100 T 21 0 100 t 5 6 0 T 11 0 100
FGC19536 c 17576040 G C c 7 0 86 - - - c 2 0 100 - - - c 7 0 100 - - C 12 0 100 c 1P C nH 1 c 6 0 100 - c 7 0 57 - C 11 0 100 c 9 0 100 c 9 0 100 c 9 0 100 c 2 0 100 C 12 0 100
Z16940 n 22470652 T T C 53 0 96 C 1P nH y 7 - - C 52 0 88 nH y 7 - C 72 0 89 nH y 7 - - C 44 0 89 C nH 7 C 53 0 100 c 1P nH 7 C 59 0 86 7 C 36 0 97 C 35 0 86 C 39 0 87 C 55 0 93 C 119 0 96 C 26 0 88
Z16941 n 22470900 C G G 44 0 98 nH y 7 - - g 45 0 84 nH y 7 - G 18 0 100 nH y 7 - - G 31 0 97 G nH 7 G 62 0 98 G 1P nH 7 G 61 0 89 7 G 35 0 91 G 49 0 100 G 47 0 94 G 41 0 100 G 96 0 99 G 38 0 100
L561 AW has P3 FGC16164 is 2888667-672n 2888667-70 C C c 6 2 100 - c 2 4 100 - - 0 13 0 - - m c 2 14 100 - c 6 3 100 c 0P C 15 8 100 c 6 18 100 C 11 8 100 C 14 10 100 C 14 10 100 c 9 3 100 c 5 15 100
Z16947 Indel? n 18680368 T TA T 50 0 100 - - ? 3 - T 90 83 96 TA 1P 3 T 49 47 100 - T 85 0 100 - - T 31 0 100 - t 7 0 100 - T 54 0 100 T 84 0 98 T 60 0 100 T 72 0 100 T 31 0 100 T 59 0 100
Z16948 Indel? n 21613125 TA T T 49 0 100 - - ? 3 - - T 90 0 100 T 1P 3 T 90 47 100 - - - T 79 0 100 - - - T 37 0 97 - T 41 0 100 - T 65 0 100 T 79 0 100 T 86 0 100 T 88 0 100 T 39 0 100 T 56 0 100
Z16949 MW: long indel n 7933047 T TAA
CA
T 46 39 100 ta 1P - y 7 - - T 76 75 95 TA 1P - y 7 - T 38 39 100 TA 1P - y 7 - - T 47 47 100 TA 1P - - 7 T 54 47 100 ta 1P - 7 T 94 68 100 7 T 76 68 100 T 113 100 100 T 124 ### 100 T 125 108 100 T 48 45 100 T 94 0 88
MW: short indel n 16344311 TT T T 39 0 100 t 1P - y 3 - - T 110 0 95 T 1P - y 3 - T 10 0 100 t 1P - y - - - T 77 0 100 t 1P - - 3 T 30 1 100 - - - T 23 0 100 - T 34 0 100 T 35 2 100 T 31 0 100 T 39 0 100 T 47 0 100 T 26 0 100
AW has P3 MW: short indel n 16344316 TCT T T 39 0 100 t 1P - y 3 -/a - T 106 0 93 T 1P - y 3 -/a T 73 0 100 t 1P - y;y? - -/a - T 77 0 100 t 1P - - 3 t 5 25 100 - - - t 7 15 100 - t 3 31 100 t 6 30 100 t 5 26 100 T 8 0 29 t 8 39 100 T 5 0 21
?covered by 18680368? n 18680369 A AA A 52 45 100 2 A 89 86 98 - A 48 47 100 - A 86 78 100 2 A 33 29 100 2 a 7 5 100 - A 55 38 100 A 65 53 100 A 64 56 100 A 79 67 100 A 31 28 100 A 61 55 100
Indel; AW had P1MW: homopolymer n 21613126 AA A A 49 1 100 a 1P - Y 2 - A 10 69 100 - - - a 4 86 100 - - A 79 0 100 - 2 A 37 0 100 2 A 41 0 100 - A 65 0 100 A 79 1 100 A 86 0 100 A 89 1 100 A 39 2 100 A 56 0 100
AW had P2 MW: long indel n 14750280 ACCA
GTGT
A A 13 0 100 - A 16 0 100 a 1P Y - 2 - a 4 0 100 - - A 10 0 100 - - A 22 0 100 2 a 4 0 100 - A 12 0 100 A 17 0 100 A 15 0 100 A 15 0 100 a 9 0 100 A 13 0 100
FGC16164 Indel; AW had P3MW: long indel n 2888666 CCTG
G
C c 8 0 100 - - - I -del c 7 0 100 - - -I -del C 13 0 100 Y 1 I -del C 16 0 100 - - c 9 0 100 - C 23 0 96 1 C 24 0 100 C 20 0 100 C 24 0 100 C 18 0 100 C 12 0 100 C 20 0 100
Indel? MW: homopolymer n 6347814 G GAG
AA
g? 16 0 63 - - G 115 89 93 GA 0/1R - G 78 75 95 1 - G 117 4 88 - - - - g 9 1 67 - g 2 0 100 - g 5 0 60 g 7 0 100 g 7 1 88 G 13 1 85 g 12 0 75 G 14 0 86
MW: long indel n 13550973 TTAG T T 72 0 100 - T 240 0 99 - T 150 0 100 - T 79 0 99 - T 24 0 100 - T 17 0 100 1 T 23 0 100 T 45 0 100 T 70 0 100 T 57 0 100 T 82 0 100 T 43 0 100
MW: homopolymer n 14101345 CCTT A c 6 0 83 - C 43 0 98 1 C 31 0 100 - C 36 0 97 - c 3 0 100 - c 2 0 100 - c 6 0 100 c 5 0 100 c 4 0 100 c 3 0 100 c 6 0 100 c 7 0 100
AW has P2 MW: homopolymer n 14379561 T TGA
TA
T 21 0 100 - T 34 31 100 tg 1P n - 1 - T 26 0 100 - T 19 0 100 - - T 40 0 100 - t 8 0 100 - T 33 0 94 T 23 0 100 T 27 19 100 T 31 0 100 T 59 0 98 T 27 0 100
MW: homopolymer n 15305844 A AAT A 16 8 100 - A 35 29 89 - a 6 2 100 - A 16 9 100 - a 6 6 100 1 a 2 2 100 - A 13 11 100 A 16 15 100 A 27 19 100 A 28 17 100 A 32 24 100 a 5 5 100
Indel? c 16344315 TTCT T T 39 0 100 - T 106 0 91 - T 71 0 100 - T 77 0 100 - T 30 0 100 1 T 22 0 100 1 T 34 0 100 T 35 2 100 T 31 0 100 T 38 0 100 T 47 0 100 T 26 0 100
MW: homopolymer n 18585796 C CAA C 33 0 100 - C 147 138 100 1 C 78 0 100 - C 64 0 100 - C 11 0 100 - c 2 0 100 - C 38 0 100 C 37 0 89 C 45 0 100 C 50 0 100 C 15 0 100 C 38 0 100
MW: homopolymer n 2746565 AA A A 55 0 100 a 1P - - 2 - A 17 0 100 - a? 1 53 100 - A 67 0 100 - 2 A 25 0 100 - A 31 2 100 - A 32 0 100 A 37 0 100 A 70 0 100 A 49 0 100 A 30 0 100 A 45 0 100
Intermediate SNPs
FCG34569 2,3,8,10 1,4,5,6,7,9,11,12 I 21368012 G A A 85 0 100 A 1P nH Y 2 - G 147 0 90 - - - - G 82 0 100 A 1P - - - - A 80 0 99 A 1P A nH 2 A 48 0 98 A 1P nH 2 A 32 0 94 2 A 51 0 100 G 57 0 100 A 67 0 100 G 92 0 100 A 87 0 99 A 59 0 100
PF506 3,4,5,7,8,9 1,2,10 n 13323493 A C c 24 0 79 c 0/1R U - - m c 5 0 80 c 1R U - - a 4 0 100 a 0R kH - - - a 8 0 100 - 0R ? k?u a 4 0 75 a 0R - ?a 10 0 60 ?a 16 0 56 a 7 0 71 c 40 0 80 ?a 6 0 67 ?c 12 0 50
3,4,5,7,8,9,11,12 csv:P1 1 n 13302072 C T T 42 0 91 t 1P nH - - - t? 33 0 61 - - - - C 13 0 100 - - - 1 - C 21 0 100 - - - C 16 0 100 - - - ?t 36 0 56 - C 20 0 100 c 29 0 72 C 40 0 100 ?c 46 0 57 c 10 0 100 c 17 0 76
PF6812 1,2 3,4,5,10,11,1
2
n 10013029 T G T 14 0 57 t 0R - - - t 9 56 100 t 0R U - g 7 0 71 g 0/1R kU - - m g 35 0 51 ? k?u G 22 0 77 g 0/1R ?g 58 0 64 ?g 37 0 62 ?t 27 0 56 gt 63 0 51 g 48 0 63 g 7 0 71 g 36 0 69
4,11 csv:P1 1,9 n 13317375 A T T 26 0 92 t 1P H - 1 - t? 33 0 61 t 0/1R - t? 16 0 69 - * a? 26 0 54 - - - a? 4 0 100 - t? 3 0 100 - ?t 15 0 58 ?t 28 0 64 t 18 0 78 ?t 31 0 58 a 2 0 100 ?t 20 0 55
CTS11841 2,6,10,11 8,9,12 n 23311208 C T t? 3 0 67 c 5 0 96 c? 31 0 58 c? 36 0 53 t? 2 0 100 c 2 0 100 ct 4 0 50 t 1 0 - t 1 0 - c 6 0 83 c 1 0 100 t 5 0 80
PF682 1,6,7,9 2,3,4,5,10,11 n 14624294 C T c 6 0 83 - t 6 0 67 - t 2 0 100 - - s t 3 0 100 t P1 T k+m t 6 0 83 c 4 0 75 c 1 70 - - c 6 0 83 t 9 0 89 t 1 0 100 ?ct 2 0 50
PF496 3,4,5,7,9,11 1,6 n! 13297909 T G g 71 0 73 kU - - m t? 21 0 67 kU T 15 0 100 kU - T 15 0 93 ? k?u T 21 0 100 g 85 0 65 T 29 0 97 ?t 52 0 54 T 44 0 91 ?g 72 0 58 T 13 0 100 ?t 48 0 52
? Indel 6 n 13700173 C ? t 68 12 81 - - - - a 1118 34 81 - - - - A 364 9 89 T 1R nM - 1 - - t 127 31 83 T 1R - - T 63 3 88 T 1R - - C 44 8 91 - ?c 18 11 67 t 7 3 86 ?t 18 12 67 c 12 0 75 ?c 30 5 60 ?c 17 3 53
Block P1: Private SNPsfor 22874
AW has P1 P1 17319595 G A A 23 0 87 a 1P nH Y 1 - G 24 0 100 - - - - G 27 0 100 - - - - G 58 0 100 - - - G 24 0 100 - - G 78 0 100 - G 43 0 100 G 63 0 100 G 63 0 100 G 109 0 100 G 23 0 100 G 53 0 100
AW has P1 P1 19263733 T A A 39 0 97 A 1P nH Y 1 - C96 t? 60 0 100 - - - - T 39 0 100 - - - - T 59 0 100 - - - T 40 0 100 - - T 28 0 100 - T 66 0 100 T 63 0 100 T 63 0 100 T 90 0 100 T 29 0 100 T 62 0 100
AW has P1 P1 21782548 T G G 79 0 100 G nH Y 1 - C91 T 174 0 100 - - - - T 93 0 100 - - - - T 97 0 100 - - - T 35 0 97 - - T 27 0 100 - T 38 0 100 T 68 0 100 T 60 0 100 T 68 0 100 C 74 0 100 T 61 0 100
PF6729 p1 10022033 A g g 7 0 86 kU - - m a? 8 0 85 kU a 4 0 100 kU - a 11 0 64 0 ? k?u ?a 6 0 83 ?a 5 0 60 A 12 0 100 a 6 0 100 A 10 0 80 ?a 8 0 50 a 8 0 50 a 7 0 86
PF6730 p1 10022039 A g g 7 0 86 kU - - m a? 6 0 67 kU a 4 0 100 kU - a 10 0 60 ? k?u ?a 6 0 83 ?a 5 0 60 A 12 0 100 a 5 0 80 a 9 0 89 ?a 8 0 50 a 8 0 50 a 7 0 86
p1 14769164 T g g 4 0 100 - - - - C100 t 6 0 100 t 3 0 100 - t 5 0 100 - - t? 1 0 - t 5 0 100 t 9 0 100 T 11 0 100 t 8 0 100 - T 11 0 100
CTS6916 AW has P1 p1 17193400 C a a 2 0 100 a - Y 1 - M100 c 3 0 100 (c) 0P - c 2 0 100 - - 0 - - C 15 0 100 - C 15 0 100 - C 59 0 100 C 48 0 100 C 78 0 100 C 88 0 100 c 4 0 100 C 35 0 100
S25968 p1 23900831 T c c 4 0 75 - - - m t 5 0 100 - t? 4 0 75 - t 8 0 89 - t? 1 0 - t 5 0 80 t 8 0 63 t 12 0 80 T 10 0 90 t 7 0 71 t 8 0 75
PF3498 Matches! p1 8094631 G a a 2 0 100 - - - g 3 0 100 - G 40 0 100 G 20 0 100 G 68 0 99 G 63 0 100 g 2 0 100 G 16 0 100
csv implies P11 p1 22257324 G t g 4 0 100 t 5 0 100 t 3 0 100 t 2 0 100 T 14 0 100 t 4 0 100 t 4 0 100 t 6 0 100 t 101 0 100 T 10 0 100 T 11 0 100 t 4 0 100
Block P2: Private SNPsfor 311268
AW has P2 P2 8531427 C T C 63 0 100 - - - - T 47 0 98 T 1P nH Y 1 - C 44 0 100 - - - - C 47 0 100 - - - C 69 0 100 - - C 72 0 100 - C 70 0 100 C 65 0 100 C 90 0 100 C 111 0 98 C 70 0 100 C 55 0 100
AW has P2 P2 16561158 A G A 17 0 100 - - - G 34 0 100 G 1P nH Y 1 - A 23 0 100 - - - A 41 0 100 - - - A 14 0 100 - - A 16 0 100 - A 22 0 100 A 34 0 100 A 37 0 100 A 33 0 100 A 13 0 100 A 32 0 100
AW has P2 P2 21515424 T A T 45 0 100 - A 59 0 98 1 T 49 0 100 - T 77 0 99 - T 42 0 100 - T 45 0 100 - T 53 0 100 T 51 0 100 T 54 0 100 T 59 0 100 T 40 0 100 T 74 0 100
AW has P2 P2 21950915 G T G 47 0 100 - - - T 63 0 94 T 1P nH Y 1 - G 61 0 100 - - - G 54 0 100 - - - G 29 0 100 - - G 42 0 100 - G 79 0 100 G 54 0 100 G 54 0 100 G 69 0 100 G 42 0 100 G 54 0 100
DW had above L21 n 13833214 T A A 41 0 85 - - t? 80 0 45 - - - A 15 0 93 - 1 - A 100 0 91 A R1 - - A 49 0 88 - a 63 0 79 - a 37 0 73 a 44 0 84 ?a 46 0 67 A 121 0 92 A 28 0 96 A 74 0 92
n 17729336 C C? c 6 0 100 - a/c 2 0 50 a 0/1P n - 1 - c 6 0 100 - c 3 0 100 - 0P - - c 4 0 100 - C 20 0 95 - C 14 0 100 C 17 0 100 C 31 0 100 C 12 0 100 c 2 0 100 C 12 0 100
CTS12439 n 28587358 T G c 123 0 72 U - - m g? 151 0 57 c 0/1R - c? 100 0 65 kU - - c? 77 0 65 ? k?u ?c 67 0 55 c 43 0 77 C 112 0 100 C 165 0 100 C 157 0 64 ?c 162 0 68 c 102 0 75 c 145 0 74
not on ybrowse n 13801126 A G c 1748 10 81 C 0/1R U - - m G 2281 0 89 (A) 0R U - - c 1144 1 76 - - m c 1658 7 71 C 0/1R ? knu ?c 1083 28 57 ?c 1676 53 63 ?c 853 17 61 ?c 1118 36 63 ?c 1037 32 56 ?c 2406 56 64 ?c 517 19 60 ?c 1554 25 61
5 - 230264- 2264263 - 65048
L555 Phylogenetic Tree
based on “DIY” BAM
anlaysis
56
SNPs Indels & homopolymers
L555 S393 Z16931 Z16935 Z16938 Z16946 FGC16164 Z16949 14101345 16344311 18680369
L557 S394 Z16932 S20749 Z16939 Z17660 L561 2746565 14379561 16344315 21613126
Z16929 Z16933 Z16936 Z16942 FGC19531 Z16947 6347814 14750280 16344316
Z16930 Z16934 Z16937 Z16945 FGC19536 Z16948 13550973 15305844 18585796
FCG34569 - 21368012GA
6966393AG 17319595GA CTS11273 FGC4341 9166468GA 7583420GA 16630774GA 7581395GT FGC19532 14209909CT 8531427CT 17417800AC
7244870AG 19263733TA 15093112GA 15218377TA 10007460CT 14768577CT 22487613GT FGC19534 16967721CA 16561158AG 20809987AC
7940600GA 21782548TG 21519299GA 19166468GA 19048311TC ? PR1489 FGC19535 17371426CT 21515424TA 23427058GA
8311955CA ?CTS6916 22479673GC 23804663GA 19201889CG FGC19537 21030091GA 21950915GT
16737596AT ?PF3498 ?13294119GA 24479734TC FGC19538 22164909TC
17357906TA ?PF6729 ?3715806TG FGC19539 ?16505988CT
17851999CG ?PF6730 ?13550958TG ?10064260CT
19262306GC ?S25968 ?13571571CT
21306828GA ?14769164TG ?13726006CA
22461683GT ?22257324GG ?16275572CA
280599 22874 N126337 23026 54774 280156 226426 160045 65048 364399 311268 87191
Irvin Irvine Irvin Irvin Irving Ervin Irving Irwin Erwin Ervin Cunningham Irving
12 - B14 1 - BX(I) 6 - BA 5 - B29 7 - BB 9 - B23 4 - B17 11 - B9 3 - B10 8 - BE 2 - BX(C) 10 - BD
?23898645TC
15542414CT
Deriving TMRCAs from BigY tests
TMRCAs derived from SNPs are easy to calculate:
TMRCA in years = no. of SNPs x av. no. of years per SNP
BUT:
• all TMRCAs are probabilities
• TMRCAs from a single test have wide confidence limits;
confidence improved if several TMRCAs can be averaged
• difficulties specific to SNP-based TMRCAs:
- “av. years per SNP” depends on type of NGS test
(FTDNA use “av. 120 years per SNP”);
- no uniformity on what constitutes a relevant SNP, so I use:
TMRCA in years = ∑(probable SNPs + 0.5 possible SNPs)/n x 120
57
Irwin project: L555 TMRCAs
(1):
Age of L555 block
58
No. Duration Age
of @120 years
SNPs per SNP (approx.)
R-L21
) 5 600 years BC1700
DF13
)
L21 starburst DF21 DF41 DF49 FCG FCG Z251 L1335 S1026 Z1026 ZZ10
5494 11134 )
Z16943
)
Z16944
)
L555 block/bottleneck 20 2400 years
L555
+19 other probable SNPs
= 20 SNPs
Pre-surname era
Surname era
Border Irwins starburst av. 5.5 650 years AD1300
1 probable
10 probables + 3 probables +2 probables +4 probables + 4 probables +4 probables +2 probables +1 probable 6 probables 5 probables 4 probables 3 probables
+7 possibles +5 possibles +1 possible + 1 possible +4 possibles +1 possible
=say 11 SNPs =say 7.5 SNPs =say 5.5 SNPs =say 5.5 SNPs =say 5 SNPs =say 5 SNPs =say 3.5 SNPs =say 2 SNPs =say 8 SNPs =say 5.5 SNPs =say 4 SNPs =say 3 SNPs
280599 22874 N126337 23026 54774 280156 226426 160045 65048 364399 311268 87191
Irvin Irvine Irvin Irvin Irving Ervin Irving Irvin Erwin Ervin Cunningham Irving
12 -B14 1 - BX 6 -BA 5 -B29 7 -BB 9 -B23 4 - B17 11 -B9 3 -B10 8 -BE 2 -BX 10 -BD
Irwin project: L555 TMRCAs
(2):
Ages of individual members
59
No. Duration Age
of @120 years
SNPs per SNP (approx.)
R-L21
) 5 600 years BC1700
DF13
)
L21 starburst DF21 DF41 DF49 FCG FCG Z251 L1335 S1026 Z1026 ZZ10
5494 11134 )
Z16943
)
Z16944
)
L555 block/bottleneck 20 2400 years
L555
+19 other probable SNPs
=20 SNPs
Pre-surname era
Surname era
Border Irwinsstarburst av. 5.5 650 years AD1300
1 probable
10 probables +3 probables +2 probables +4 probables +4 probables +4 probables +2 probables +1 probable 6 probables 5 probables 4 probables 3 probables
+7 possibles +5 possibles +1 possible +1 possible +4 possibles +1 possible
=say 11 SNPs =say 7.5 SNPs =say 5.5 SNPs =say 5.5 SNPs =say 5 SNPs =say 5 SNPs =say 3.5 SNPs =say 2 SNPs =say 8 SNPs =say 5.5 SNPs =say 4 SNPs =say 3 SNPs
280599 22874 N126337 23026 54774 280156 226426 160045 65048 364399 311268 87191
Irvin Irvine Irvin Irvin Irving Ervin Irving Irvin Erwin Ervin Cunningham Irving
12 -B14 1 -BX 6 -BA 5 -B29 7 -BB 9 - B23 4 - B17 11 -B9 3 -B10 8 - BE 2 -BX 10 -BD
c.630 c.1050 c.1230 c.1230 c.1350 c.1350 c.1530 c.1700 c.750 c.1300 c.1350 c.1600 "DIY" BigY TMRCAs
c.750 c.1800 c.1700 BC200 c.1200 c.1700 c.1000 c.1050 c.1450 c.1750 STR TMRCAs, 2011
c.1750 c.1780 c.1700 c.1650 c.1500 c.1650 c.1750 c.1700 c.1700 c.1600 c.1850 c.1565 Earliest genealogy
(3):
Age of L555 block by other SNP
criteria
No. Duration Age
of @120 years
SNPs per SNP (approx.)
R-L21
) 5 600 years BC1700
DF13
)
L21 starburst DF21 DF41 DF49 FCG FCG Z251 L1335 S1026 Z1026 ZZ10
5494 11134 )
Z16943
)
Z16944
)
L555 block/bottleneck 20 2400 years
L555
+19 other probable SNPs
=20 SNPs
Pre-surname era
Surname era
Border Irwinsstarburst av. 5.5 650 years AD1300
1 probable
10 probables +3 probables +2 probables +4 probables +4 probables +4 probables +2 probables +1 probable 6 probables 5 probables 4 probables 3 probables
+7 possibles +5 possibles +1 possible +1 possible +4 possibles +1 possible
=say 11 SNPs =say 7.5 SNPs =say 5.5 SNPs =say 5.5 SNPs =say 5 SNPs =say 5 SNPs =say 3.5 SNPs =say 2 SNPs =say 8 SNPs =say 5.5 SNPs =say 4 SNPs =say 3 SNPs
280599 22874 N126337 23026 54774 280156 226426 160045 65048 364399 311268 87191
Irvin Irvine Irvin Irvin Irving Ervin Irving Irvin Erwin Ervin Cunningham Irving
12 -B14 1 -BX 6 -BA 5 -B29 7 -BB 9 -B23 4 -B17 11 -B9 3 -B10 8 -BE 2 -BX 10 -BD
TMRCA = (∑(Probable SNPs + 0.5 possible SNPs)/12 ) * 120
11 7.5 5.5 5.5 5 5 3.5 2 8 5.5 4 3 av. 5.5 650 AD
TMRCAs with SNPs as per Williamson's Big Tree
11 5 4.5 6 5 5 3 2 7 5 4 4 av. 5.1 615 AD
TMRCA = (∑(Probable SNPs)/12 ) * 120
11 4 3 6 5 5 3 2 6 5 4 3 av. 4.8 570 AD
TMRCAs with SNPs as per ISOGG Y Tree criteria
11 4 4 5 1 5 2 2 6 5 4 3 av. 4.3 520 AD
years
years
years 1430
years 1380
1300
1335
Criteria for BigY SNPs
61
Criterion FTDNA FGC Y Full Williamson D.Wright J.Irvine ISOGG
csv Analysis Analysis Big Tree "DIY" Y Tree
Min. no. of reads/calls 10 2 1-2* 10 10 4
Max. no. of reads none none 320?
Min. % consistent reads 99/95/40/10 85 85/70 100/95
Stability within Haplogroup ) "shared excluded no excl. if known
Stability within sub-clade ) SNPs" no important
22216800-22512940 unstable region excluded excluded included
Other "Unreliable" regions included excluded excluded
Indels? included excluded excluded excluded
Homopolymers, recLOHs, excluded N/A
Min. "Quality" (FTDNA) yes 500 N/A N/A
"Confidence" (FTDNA) yes N/A N/A
Max. locations on ISOGG tree N/A 3
Min. Mapping quality average (ISOGG) N/A 10%
Min. extent of base-pairs (ISOGG) N/A 20
Max. segment, repeated alleles (ISOGG) N/A 5 alleles
Av. years per SNP 120 118 - 120 120 -
*: depending on region
NB The criteria listed are as known to me 11 Nov. 2015; all are evolving and subject to change.
Clearly there are both substantive differences and confusion over terminology & definitions.
At least in theory it is clearly inappropriate:
(1) to seek TMRCAs without clear understanding of how relevant "SNP"s are defined, and
(2) to use the same "av. years per SNP" ratio for differing definitions of "SNP".
The Irwin Surname tree
62
The Irwin Surname P311
showing the genetic and conventional genealogies P312 U106
BC2000 of some of the project's 33 genetic families L21 ? DF27 ? S263
and of the Borders genetic family sub-groups Z251 CTS4466 DF21 DF49 ? L176.2 ? S264
(many details omitted) Z16943 Z21065 Y11277 DF23 ? Z262 ? DF96
Bold indicates BigY test; indicates "Brick wall" Z16944 A541 Z16294 Z2961 ? SRY2627 ? ?
L555, plus 20 other SNPs A195 Z16281 M222 ? ? ? ?
AD1200 FCG34569 A88 A2427 A3955 ? ? ? ? ?
5 SNPs 4 SNPs 8 SNPs 4 SNPs 4 SNPs 4 SNPs 1-10 SNPs A89 A2432 M7964 ? ? ? ? ?
364399 87191 65048 22874 N126337 54774 B9 B14 B17
BE BD B10 BX BA BB B23 B29 IM1 IM2 NE2 PF DA O1 O2 NB1
1300s /
1400s
1500s
1600s
1700s /
1800s
?
Today
Irvings of
?
Irvings of
?
169056 + 4
others
122282 +
7 others
William
1754-1830
226426 +
48 others
James
1730-1799
116495 +
2 others
51216 +
3 others
Isaac
1781-1851
193093 +
9 others
?
Washington
1783-1859
James
fl.1534-67
Magnus 1655-
170?
Criste
fl.1460
Magnus
fl.1470
?
Alexander
1754-1844
129415 +
3 others
122282
163590 +
3 others
Charles
1738-
?
Alexander
fl.1601
Edward
1707-1798
129415
? ?
?
Eoin
1785-1841
15606 A3093 3722 116495 1690651216
?
?
?
?
?
?
?
?
75606 + 2
others
65048 +
32 others
?
? ?
Edward
1668-1708
?
?
Matthew
1697-
22874 + 65
others
?
? Edward
1669-
?
William
fl.1506
?
Irvings of
?
??
?
James
1776-1833
James
1750-1810
Irvings of
Dumfries
Francis
fl.1596
?
Thomas
1650-1722
?
? ?
? ?
John
1734-
John 1733-
N126337 +
33 others
?
?
87191 +
2 others 13 others
William
1710-1763
?
? William
1698-
David
fl.1721
54774 +
4 others
?
11 others
169170 364399 +
16 others
? ?
John
fl.1662
GeneticgenealogyPapertrails
Irvines of
Eskdale
William
fl.1323
Alexander
1456-1527
Alexander
1527-1602
Irvings of
Bonshaw
Irvings of
?
Edward
1590-
?
Irving - NPE
Bell (1)
Irvines of
Perthshire
Irwins of
Munster (1)
Irwins of
Munster (2)
Irving - NPE
Elliot (2)
Irvines of
Drum
Irvines of
Orkney (1)
Irvines of
Orkney (2)
Main findings relevant to Irwin project
• Steady growth over 10 years, now 392 STR test results (94% 37+ markers)
• Most participants reside in USA, & typify the Scotch-Irish-American
diaspora
• 40% claim Irish ancestry, but lack paper trails “across the pond”
• Tradition of single-origin Scottish surname refuted
• > 90% of all participants matched to a genetic family
• 34 genetic families identified, each unrelated to one another in surname era:
- 22 Scottish, 4 native Irish, 1 German, 1 African, 6 unknown (Scots ?)
• 13-26% of participants from NPEs
• Border Irwins genetic family is apparently the largest in any surname project:
- all 262 descended from a Dumfriesshire ancestor who fl. C14
- SNP L555 recognised by ISOGG, still unique to Border Irwins
- tentatively split into 15 sub-groups
- BigY is yielding further insights, but reliable TMRCAs elusive 6363
Findings relevant to other surname projects
• Small surname projects can learn much from large projects
• Penetration ratios identify geographic bias
• Spelling of surname is often misleading
• FTDNA’s “Matches” pages give False Positives & False Negatives
• TMRCA tables using GDs are misleading
• TiP Scores avoid the many limitations of GDs
• NPEs should be included
• BigY: - a massive step forward
- handling of results is unnecessarily cumbersome
- comprehension of results is difficult & poorly explained
- BAM data essential for analysing SNP quality
- “starburst”/“bottleneck” phenomena need investigating
- need for improved understanding of SNP criteria
- individual TMRCAs unreliable: need SNP Pack back-up
64
Further reading
• www.dnastudy.clanirwin.org
• www.jogg.info/62/files/Irvine.pdf
• https://dl.dropboxusercontent.com/u/14028750/Testing%20and%20Analysing%20Big-Y.pdf
(use of BAM IGV Viewer)
• www.borderreivers.co.uk
• Irving, JB 1907 The Book of the Irvings
• Maxwell-Irving, AMT 1968 The Irvings of Bonshaw
• Mackintosh, D 1999 The Irvines of Drum and their Cadet Lines 1300-1750
• Tough, DLW 1928 The Last Years of a Frontier
• MacDonald Fraser, G 1971 The Steel Bonnets
• Perceval-Maxwell, M 1973 The Scottish Migration to Ulster
in the Reign of James I
• Dickson, RJ 1976 Ulster Emigration to Colonial America, 1718-75
• Fischer, DH 1989 Albion’s Seed
• Fitzgerald, P 2008 Migration in Irish History, 1607-2007
65
Acknowledgements
• All our 392 participants;
• The many participants, most preferring anonymity,
who have donated to our General Fund, helped
with our website, and guided & encouraged me;
• Fellow admins. John Cleary, Maurice Gleeson,
Kent Irvin, Peter Irvine, Debbie Kennett, Ralph
Taylor, Dennis Wright ;
• Catherine Borges, for ISOGG;
• Bennett Greenspan and his team at FTDNA;
• My patient wife. 66

Mais conteúdo relacionado

Mais procurados

One Way ANOVA and Two Way ANOVA using R
One Way ANOVA and Two Way ANOVA using ROne Way ANOVA and Two Way ANOVA using R
One Way ANOVA and Two Way ANOVA using RSean Stovall
 
Morphological, Cytological and Biochemical Markers
Morphological, Cytological and Biochemical MarkersMorphological, Cytological and Biochemical Markers
Morphological, Cytological and Biochemical MarkersJay Khaniya
 
ANOVA & EXPERIMENTAL DESIGNS
ANOVA & EXPERIMENTAL DESIGNSANOVA & EXPERIMENTAL DESIGNS
ANOVA & EXPERIMENTAL DESIGNSvishwanth555
 
e-EXTENSION and SOCIAL MEDIA
e-EXTENSION and SOCIAL MEDIAe-EXTENSION and SOCIAL MEDIA
e-EXTENSION and SOCIAL MEDIADr Chandan Patil
 
Role of it in agriculture
Role of it in agricultureRole of it in agriculture
Role of it in agricultureamarish12
 
National agricultural research system (NARS) & ICAR, government of india - o...
National agricultural research system (NARS) & ICAR, government of  india - o...National agricultural research system (NARS) & ICAR, government of  india - o...
National agricultural research system (NARS) & ICAR, government of india - o...Krishnakumar T
 
Artifial intellegence in Plant diseases detection and diagnosis
Artifial intellegence in Plant diseases detection and diagnosis Artifial intellegence in Plant diseases detection and diagnosis
Artifial intellegence in Plant diseases detection and diagnosis N.H. Shankar Reddy
 
Agricultural Transformation Agenda in GTP II
 Agricultural Transformation Agenda in GTP II Agricultural Transformation Agenda in GTP II
Agricultural Transformation Agenda in GTP IIILRI
 
Problems and challenges of animal husbandry extension
Problems and challenges of animal husbandry extensionProblems and challenges of animal husbandry extension
Problems and challenges of animal husbandry extensionPreethi Sundar
 
ICT Enabled Agriculture Transforming - Initiatives for Agriculture and Rural ...
ICT Enabled Agriculture Transforming - Initiatives for Agriculture and Rural ...ICT Enabled Agriculture Transforming - Initiatives for Agriculture and Rural ...
ICT Enabled Agriculture Transforming - Initiatives for Agriculture and Rural ...Private Agriculture College at Tamil Nadu
 
agricultural statistics.ppt
agricultural statistics.pptagricultural statistics.ppt
agricultural statistics.pptPirZain
 
Molecular marker analysis of A few Capsicum annum varieties
Molecular marker analysis of A few Capsicum annum varietiesMolecular marker analysis of A few Capsicum annum varieties
Molecular marker analysis of A few Capsicum annum varietiesAnkitha Hirematha
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear RegressionIndus University
 

Mais procurados (20)

One Way ANOVA and Two Way ANOVA using R
One Way ANOVA and Two Way ANOVA using ROne Way ANOVA and Two Way ANOVA using R
One Way ANOVA and Two Way ANOVA using R
 
Morphological, Cytological and Biochemical Markers
Morphological, Cytological and Biochemical MarkersMorphological, Cytological and Biochemical Markers
Morphological, Cytological and Biochemical Markers
 
Median and mode
Median and modeMedian and mode
Median and mode
 
THE CGIAR AND AGRICULTURE SECTOR
THE CGIAR AND AGRICULTURE SECTORTHE CGIAR AND AGRICULTURE SECTOR
THE CGIAR AND AGRICULTURE SECTOR
 
ANOVA & EXPERIMENTAL DESIGNS
ANOVA & EXPERIMENTAL DESIGNSANOVA & EXPERIMENTAL DESIGNS
ANOVA & EXPERIMENTAL DESIGNS
 
e-EXTENSION and SOCIAL MEDIA
e-EXTENSION and SOCIAL MEDIAe-EXTENSION and SOCIAL MEDIA
e-EXTENSION and SOCIAL MEDIA
 
Role of it in agriculture
Role of it in agricultureRole of it in agriculture
Role of it in agriculture
 
Molecular Markers
Molecular MarkersMolecular Markers
Molecular Markers
 
National agricultural research system (NARS) & ICAR, government of india - o...
National agricultural research system (NARS) & ICAR, government of  india - o...National agricultural research system (NARS) & ICAR, government of  india - o...
National agricultural research system (NARS) & ICAR, government of india - o...
 
Artifial intellegence in Plant diseases detection and diagnosis
Artifial intellegence in Plant diseases detection and diagnosis Artifial intellegence in Plant diseases detection and diagnosis
Artifial intellegence in Plant diseases detection and diagnosis
 
Analysis of Variance (ANOVA)
Analysis of Variance (ANOVA)Analysis of Variance (ANOVA)
Analysis of Variance (ANOVA)
 
Agricultural Transformation Agenda in GTP II
 Agricultural Transformation Agenda in GTP II Agricultural Transformation Agenda in GTP II
Agricultural Transformation Agenda in GTP II
 
Problems and challenges of animal husbandry extension
Problems and challenges of animal husbandry extensionProblems and challenges of animal husbandry extension
Problems and challenges of animal husbandry extension
 
AgGDP+ & AgEMP+: Measuring Agricultural Transformation
AgGDP+ & AgEMP+: Measuring Agricultural TransformationAgGDP+ & AgEMP+: Measuring Agricultural Transformation
AgGDP+ & AgEMP+: Measuring Agricultural Transformation
 
E NAM
E NAM E NAM
E NAM
 
ICT Enabled Agriculture Transforming - Initiatives for Agriculture and Rural ...
ICT Enabled Agriculture Transforming - Initiatives for Agriculture and Rural ...ICT Enabled Agriculture Transforming - Initiatives for Agriculture and Rural ...
ICT Enabled Agriculture Transforming - Initiatives for Agriculture and Rural ...
 
agricultural statistics.ppt
agricultural statistics.pptagricultural statistics.ppt
agricultural statistics.ppt
 
Detection of plant Pathogens
Detection of plant PathogensDetection of plant Pathogens
Detection of plant Pathogens
 
Molecular marker analysis of A few Capsicum annum varieties
Molecular marker analysis of A few Capsicum annum varietiesMolecular marker analysis of A few Capsicum annum varieties
Molecular marker analysis of A few Capsicum annum varieties
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear Regression
 

Semelhante a Y DNA Surname Projects - Some Fresh Ideas

UMBC Research Day Presentation
UMBC Research Day PresentationUMBC Research Day Presentation
UMBC Research Day PresentationSDavis7
 
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...Torsten Seemann
 
Workflows supporting drug discovery against malaria
Workflows supporting drug discovery against malariaWorkflows supporting drug discovery against malaria
Workflows supporting drug discovery against malariaBarry Hardy
 
nonsyndromic orofacial cleft and palate
nonsyndromic orofacial cleft and palatenonsyndromic orofacial cleft and palate
nonsyndromic orofacial cleft and palatehad89
 
Improving Genetic Algorithm (GA) based NoC mapping algorithm using a formal ...
Improving Genetic Algorithm (GA)  based NoC mapping algorithm using a formal ...Improving Genetic Algorithm (GA)  based NoC mapping algorithm using a formal ...
Improving Genetic Algorithm (GA) based NoC mapping algorithm using a formal ...Vinita Palaniveloo
 
Statewide SRD presentation (Official)
Statewide SRD presentation (Official)Statewide SRD presentation (Official)
Statewide SRD presentation (Official)Juan Cardenas
 
Making the cut with CRISPR
Making the cut with CRISPRMaking the cut with CRISPR
Making the cut with CRISPREdward Perello
 
Forensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics GroupForensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics Groupnist-spin
 
F Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis PipelineF Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis PipelineFrancesca Giordano
 
Detailed report of processing gdv images in the gdv scientific laboratory pro...
Detailed report of processing gdv images in the gdv scientific laboratory pro...Detailed report of processing gdv images in the gdv scientific laboratory pro...
Detailed report of processing gdv images in the gdv scientific laboratory pro...Ultimate Water inc.
 
Building a Mutation History Tree
Building a Mutation History Tree Building a Mutation History Tree
Building a Mutation History Tree Family Tree DNA
 
Splice site recognition among different organisms
Splice site recognition among different organismsSplice site recognition among different organisms
Splice site recognition among different organismsDespoina Kalfakakou
 
Assessment of Y chromosome degradation level using the Investigator® Quantipl...
Assessment of Y chromosome degradation level using the Investigator® Quantipl...Assessment of Y chromosome degradation level using the Investigator® Quantipl...
Assessment of Y chromosome degradation level using the Investigator® Quantipl...QIAGEN
 
2012 august 16 systems biology rna seq v2
2012 august 16 systems biology rna seq v22012 august 16 systems biology rna seq v2
2012 august 16 systems biology rna seq v2Anne Deslattes Mays
 
Scott Kahn Genomic Big Data.gia.052913
Scott Kahn Genomic Big Data.gia.052913Scott Kahn Genomic Big Data.gia.052913
Scott Kahn Genomic Big Data.gia.052913Social at Illumina
 
2015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and22015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and2Dan Gaston
 

Semelhante a Y DNA Surname Projects - Some Fresh Ideas (20)

UMBC Research Day Presentation
UMBC Research Day PresentationUMBC Research Day Presentation
UMBC Research Day Presentation
 
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
 
Workflows supporting drug discovery against malaria
Workflows supporting drug discovery against malariaWorkflows supporting drug discovery against malaria
Workflows supporting drug discovery against malaria
 
nonsyndromic orofacial cleft and palate
nonsyndromic orofacial cleft and palatenonsyndromic orofacial cleft and palate
nonsyndromic orofacial cleft and palate
 
Improving Genetic Algorithm (GA) based NoC mapping algorithm using a formal ...
Improving Genetic Algorithm (GA)  based NoC mapping algorithm using a formal ...Improving Genetic Algorithm (GA)  based NoC mapping algorithm using a formal ...
Improving Genetic Algorithm (GA) based NoC mapping algorithm using a formal ...
 
Statewide SRD presentation (Official)
Statewide SRD presentation (Official)Statewide SRD presentation (Official)
Statewide SRD presentation (Official)
 
Blast fasta 4
Blast fasta 4Blast fasta 4
Blast fasta 4
 
QTL mapping.pdf
QTL mapping.pdfQTL mapping.pdf
QTL mapping.pdf
 
Making the cut with CRISPR
Making the cut with CRISPRMaking the cut with CRISPR
Making the cut with CRISPR
 
Forensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics GroupForensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics Group
 
F Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis PipelineF Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis Pipeline
 
Detailed report of processing gdv images in the gdv scientific laboratory pro...
Detailed report of processing gdv images in the gdv scientific laboratory pro...Detailed report of processing gdv images in the gdv scientific laboratory pro...
Detailed report of processing gdv images in the gdv scientific laboratory pro...
 
Building a Mutation History Tree
Building a Mutation History Tree Building a Mutation History Tree
Building a Mutation History Tree
 
Splice site recognition among different organisms
Splice site recognition among different organismsSplice site recognition among different organisms
Splice site recognition among different organisms
 
Assessment of Y chromosome degradation level using the Investigator® Quantipl...
Assessment of Y chromosome degradation level using the Investigator® Quantipl...Assessment of Y chromosome degradation level using the Investigator® Quantipl...
Assessment of Y chromosome degradation level using the Investigator® Quantipl...
 
2012 august 16 systems biology rna seq v2
2012 august 16 systems biology rna seq v22012 august 16 systems biology rna seq v2
2012 august 16 systems biology rna seq v2
 
2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
Scott Kahn Genomic Big Data.gia.052913
Scott Kahn Genomic Big Data.gia.052913Scott Kahn Genomic Big Data.gia.052913
Scott Kahn Genomic Big Data.gia.052913
 
2015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and22015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and2
 
132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques
 

Mais de Family Tree DNA

The Paternal Tree of Humanity
The Paternal Tree of HumanityThe Paternal Tree of Humanity
The Paternal Tree of HumanityFamily Tree DNA
 
FTDNA Houston Project Management
FTDNA Houston Project ManagementFTDNA Houston Project Management
FTDNA Houston Project ManagementFamily Tree DNA
 
Native American Mitochondrial Haplogroup Discoveries
Native American Mitochondrial Haplogroup DiscoveriesNative American Mitochondrial Haplogroup Discoveries
Native American Mitochondrial Haplogroup DiscoveriesFamily Tree DNA
 
Personal Privacy In Public Projects
Personal Privacy In Public ProjectsPersonal Privacy In Public Projects
Personal Privacy In Public ProjectsFamily Tree DNA
 
Family Tree DNA Conference -- Administrators' Library
Family Tree DNA Conference -- Administrators' LibraryFamily Tree DNA Conference -- Administrators' Library
Family Tree DNA Conference -- Administrators' LibraryFamily Tree DNA
 
Surname DNA Journal Update 2016
Surname DNA Journal Update 2016Surname DNA Journal Update 2016
Surname DNA Journal Update 2016Family Tree DNA
 
Autosomes & Agamemnon's Face
Autosomes & Agamemnon's FaceAutosomes & Agamemnon's Face
Autosomes & Agamemnon's FaceFamily Tree DNA
 
YDNA maps Scandinavian Family Trees from Medieval Times and the Viking Age
YDNA maps Scandinavian Family Trees from Medieval Times and the Viking AgeYDNA maps Scandinavian Family Trees from Medieval Times and the Viking Age
YDNA maps Scandinavian Family Trees from Medieval Times and the Viking AgeFamily Tree DNA
 
The Genographic Project 2015
The Genographic Project 2015The Genographic Project 2015
The Genographic Project 2015Family Tree DNA
 
Surveying Ancestry Using Autosomal DNA Results
Surveying Ancestry Using Autosomal DNA ResultsSurveying Ancestry Using Autosomal DNA Results
Surveying Ancestry Using Autosomal DNA ResultsFamily Tree DNA
 
R1b and the People of Europe: An Ancient DNA Update
R1b and the People of Europe: An Ancient DNA UpdateR1b and the People of Europe: An Ancient DNA Update
R1b and the People of Europe: An Ancient DNA UpdateFamily Tree DNA
 
Roberta estes crumley y dna
Roberta estes   crumley y dnaRoberta estes   crumley y dna
Roberta estes crumley y dnaFamily Tree DNA
 
Supercharge Your Project Members
Supercharge Your Project MembersSupercharge Your Project Members
Supercharge Your Project MembersFamily Tree DNA
 
The Origin of Ashkenazi Levites
The Origin of Ashkenazi Levites The Origin of Ashkenazi Levites
The Origin of Ashkenazi Levites Family Tree DNA
 

Mais de Family Tree DNA (17)

YSNPs, Packs and Trees
YSNPs, Packs and TreesYSNPs, Packs and Trees
YSNPs, Packs and Trees
 
The Paternal Tree of Humanity
The Paternal Tree of HumanityThe Paternal Tree of Humanity
The Paternal Tree of Humanity
 
FTDNA Houston Project Management
FTDNA Houston Project ManagementFTDNA Houston Project Management
FTDNA Houston Project Management
 
Gap 101 – The Basics
Gap 101 – The BasicsGap 101 – The Basics
Gap 101 – The Basics
 
Native American Mitochondrial Haplogroup Discoveries
Native American Mitochondrial Haplogroup DiscoveriesNative American Mitochondrial Haplogroup Discoveries
Native American Mitochondrial Haplogroup Discoveries
 
Hammer FTDNA 2016
Hammer FTDNA 2016Hammer FTDNA 2016
Hammer FTDNA 2016
 
Personal Privacy In Public Projects
Personal Privacy In Public ProjectsPersonal Privacy In Public Projects
Personal Privacy In Public Projects
 
Family Tree DNA Conference -- Administrators' Library
Family Tree DNA Conference -- Administrators' LibraryFamily Tree DNA Conference -- Administrators' Library
Family Tree DNA Conference -- Administrators' Library
 
Surname DNA Journal Update 2016
Surname DNA Journal Update 2016Surname DNA Journal Update 2016
Surname DNA Journal Update 2016
 
Autosomes & Agamemnon's Face
Autosomes & Agamemnon's FaceAutosomes & Agamemnon's Face
Autosomes & Agamemnon's Face
 
YDNA maps Scandinavian Family Trees from Medieval Times and the Viking Age
YDNA maps Scandinavian Family Trees from Medieval Times and the Viking AgeYDNA maps Scandinavian Family Trees from Medieval Times and the Viking Age
YDNA maps Scandinavian Family Trees from Medieval Times and the Viking Age
 
The Genographic Project 2015
The Genographic Project 2015The Genographic Project 2015
The Genographic Project 2015
 
Surveying Ancestry Using Autosomal DNA Results
Surveying Ancestry Using Autosomal DNA ResultsSurveying Ancestry Using Autosomal DNA Results
Surveying Ancestry Using Autosomal DNA Results
 
R1b and the People of Europe: An Ancient DNA Update
R1b and the People of Europe: An Ancient DNA UpdateR1b and the People of Europe: An Ancient DNA Update
R1b and the People of Europe: An Ancient DNA Update
 
Roberta estes crumley y dna
Roberta estes   crumley y dnaRoberta estes   crumley y dna
Roberta estes crumley y dna
 
Supercharge Your Project Members
Supercharge Your Project MembersSupercharge Your Project Members
Supercharge Your Project Members
 
The Origin of Ashkenazi Levites
The Origin of Ashkenazi Levites The Origin of Ashkenazi Levites
The Origin of Ashkenazi Levites
 

Último

Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Sérgio Sacani
 
World Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlabWorld Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlabkiyorndlab
 
Pests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPRPests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPRPirithiRaju
 
KeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data scienceKeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data scienceLayne Sadler
 
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WayShiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WaySérgio Sacani
 
MARKER ASSISTED SELECTION IN CROP IMPROVEMENT
MARKER ASSISTED SELECTION IN CROP IMPROVEMENTMARKER ASSISTED SELECTION IN CROP IMPROVEMENT
MARKER ASSISTED SELECTION IN CROP IMPROVEMENTjipexe1248
 
Exploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & ResearchExploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & ResearchPrachya Adhyayan
 
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptxTHE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptxAkinrotimiOluwadunsi
 
PSP3 employability assessment form .docx
PSP3 employability assessment form .docxPSP3 employability assessment form .docx
PSP3 employability assessment form .docxmarwaahmad357
 
Human brain.. It's parts and function.
Human brain.. It's parts and function. Human brain.. It's parts and function.
Human brain.. It's parts and function. MUKTA MANJARI SAHOO
 
geometric quantization on coadjoint orbits
geometric quantization on coadjoint orbitsgeometric quantization on coadjoint orbits
geometric quantization on coadjoint orbitsHassan Jolany
 
CW marking grid Analytical BS - M Ahmad.docx
CW  marking grid Analytical BS - M Ahmad.docxCW  marking grid Analytical BS - M Ahmad.docx
CW marking grid Analytical BS - M Ahmad.docxmarwaahmad357
 
Pests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPirithiRaju
 
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptxSCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptxROVELYNEDELUNA3
 
soft skills question paper set for bba ca
soft skills question paper set for bba casoft skills question paper set for bba ca
soft skills question paper set for bba caohsadfeeling
 
Main Exam Applied biochemistry final year
Main Exam Applied biochemistry final yearMain Exam Applied biochemistry final year
Main Exam Applied biochemistry final yearmarwaahmad357
 
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...marwaahmad357
 

Último (20)

Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
 
World Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlabWorld Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlab
 
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
 
Pests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPRPests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPR
 
KeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data scienceKeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data science
 
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WayShiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
 
MARKER ASSISTED SELECTION IN CROP IMPROVEMENT
MARKER ASSISTED SELECTION IN CROP IMPROVEMENTMARKER ASSISTED SELECTION IN CROP IMPROVEMENT
MARKER ASSISTED SELECTION IN CROP IMPROVEMENT
 
Exploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & ResearchExploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & Research
 
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptxTHE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
 
PSP3 employability assessment form .docx
PSP3 employability assessment form .docxPSP3 employability assessment form .docx
PSP3 employability assessment form .docx
 
Human brain.. It's parts and function.
Human brain.. It's parts and function. Human brain.. It's parts and function.
Human brain.. It's parts and function.
 
geometric quantization on coadjoint orbits
geometric quantization on coadjoint orbitsgeometric quantization on coadjoint orbits
geometric quantization on coadjoint orbits
 
Cheminformatics tools supporting dissemination of data associated with US EPA...
Cheminformatics tools supporting dissemination of data associated with US EPA...Cheminformatics tools supporting dissemination of data associated with US EPA...
Cheminformatics tools supporting dissemination of data associated with US EPA...
 
Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...
Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...
Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...
 
CW marking grid Analytical BS - M Ahmad.docx
CW  marking grid Analytical BS - M Ahmad.docxCW  marking grid Analytical BS - M Ahmad.docx
CW marking grid Analytical BS - M Ahmad.docx
 
Pests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPR
 
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptxSCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
 
soft skills question paper set for bba ca
soft skills question paper set for bba casoft skills question paper set for bba ca
soft skills question paper set for bba ca
 
Main Exam Applied biochemistry final year
Main Exam Applied biochemistry final yearMain Exam Applied biochemistry final year
Main Exam Applied biochemistry final year
 
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
 

Y DNA Surname Projects - Some Fresh Ideas

  • 1. 11th Annual International Conference on Genetic Genealogy Houston, 13-15 November, 2015 Surname Projects – Some Fresh Ideas James M Irvine Member: GOONS, ISOGG, OFHS, SGS
  • 2. D N A 31 patients Did Not Attend their appointments at this surgery last month.
  • 3. Overview (1) pre BigY: - Background - Penetration - “Matching”, “Grouping” & “Genetic Families” - False Positives & False Negatives - TMRCAs - “NPEs” - Geographic origins - SNPs (2) BigY & BAM data: use & interpretation using the Irwin project to illustrate principles & tools that may be relevant to other surname projects 3
  • 4. Surname DNA Projects: their context 4 DNA testing Medical Paternity Genetic Criminal Archeology applications testing genealogy investigations ("Ancient DNA") mt-DNA y-DNA at-DNA x-DNA tests tests tests tests Deep Surname "chasing Ancestry projects cousins" - Closed projects - STR tests - Open projects - SNP tests y-DNA & surnames only descend through the male line
  • 5. Surname DNA Projects: Roles of volunteer Administrators 1. Agree & refine terms of reference & goals - including “closed” or “open”. 2. Maintain genetic & genealogical database. 3. Define & identify genetic families. 4. “Add value” from genealogical data: - identify cousins & geographic origins. 5. Publicise results. 6. Liaise with individual participants. 7. Recruit new participants. Always respecting participants’ confidentiality. 5
  • 6. Irwin Surname project: Background• Scottish lowlands surname • strong genealogical traditions, but few “old” pedigrees • active clan association in America • the DNA project: - only represents 0.12% of Irwins etc. in world today, BUT - has grown steadily over 10 years - has 392 y-DNA STR and 19 “BigY” test results - is about the 50th largest of 8,000 surname projects - includes largest genetic family in any surname project - shows surname typifies Scotch-Irish-America diaspora - has associated but separate Autosomal DNA project 66
  • 7. The traditional genealogy of the Irwins 7
  • 8. 8 Irvine, Ayrshire Irwin project: 1200 Eskdale, Dumfriesshire traditionally a single-origin Scottish surname 1300 Bonshaw, Dumfriesshire Drum, Aberdeenshire 1400 Orkney 1500 1600 Dumfries Castle Irvine Perth Shetland Co.Fermanagh 1700 1800
  • 10. Irwin project: Geographical “penetration” 10 Participant's All Irwins etc. Penetration place of in world today of project residence * ** Project size/Population 392 300,000 0.12% USA 77% 61% 0.13% Canada 6% 12% 0.05% Australia, New Zealand 6% 9% 0.07% England & Wales 5% 10% 0.05% Scotland 5% 4% 0.12% Ireland (NI & Eire) 1% 3% 0.03% Germany, Netherlands - 1% 0.00% Unknown, other - - - *: Source: w w w .w orldnames.publicprofiler.org/ **: definition: w w w .jogg.info/62/files/Irvine.pdf
  • 11. Irwin project: Origins in UK counties, if known, of participants’ earliest confirmed paternal ancestors 11 L Cumberland 4 Dumfriesshire 14 Antrim 18 Derry 10 Tyrone 15 Down 2 Armagh 3 Fermanagh 14 Monaghan 1Cavan 1Connaught 3 Donegal 2 Leinster 5 Ayrshire 1 . Irvine ..Dumfries Bonshaw. Esk- dale .Castle Irvine Munster 5 Shetland 4 Orkney 9 Aberdeenshire 7 Perthshire 4 Northum- berland, Durham 7
  • 12. The Scotch-Irish 12 The term Scots-Irish, or Ulster Scots, refers to Scots who migrated to Ireland, typically in the 17th century from SW Scotland to Ulster. •Many Scots took part in the Plantation of Ulster c.1610, either as a landowning Undertaker, or as a tenant. Each Undertaker undertook to keep 40 loyal tenants. •Other settlers included Border Reivers who had been banished. •Most Scots-Irish were Presbyterians. •Very few Scots-Irish have pedigrees back to Scotland (unless their ancestors were Undertakers). The American term Scotch-Irish refers to descendants of these Ulster settlers who in turn migrated to America, typically in the 18th century to the Appalachian piedmont (PA-GA). •Few Scotch-Irish have pedigrees back to Ireland.
  • 13. Irwin project: Earliest confirmed paternal ancestors 13 Irwin 32% 1900s 4% Irvine 16% 1800s 29% Erwin 13% 1700s 48% Ervin 8% 1600s 3% Irving 8% 1500s 1% Irvin 8% 1400s 1% Arnwine 1% 1300s 0% Urwin 1% 1200s 1% Other 13% Unknown 13% Spelling Birth date
  • 14. Irwin project : Marker resolution No. of markers 2010 2015 12 13% 5% 25 6% 1% 37 48% 54% 67 33% 26% 111 - 14% 37 or more 81% 94% % participants
  • 15. Irwin project: Results examples (1) 15 ID Haplo 12 25 group 393 390 394 391 385 385 426 388 439 389 392 389 458 459 459 455 454 447 437 448 449 464 464 464 464 a b -1 -2 a b a b c d Cluster (1) 65875 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 112094 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 194922 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 102835 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 108028 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 85111 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 72683 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 54774 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 87191 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 19864 R1b1 13 24 14 11 11 15 12 12 12 12 13 28 18 9 10 11 11 25 15 20 30 15 16 17 17 169170 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 31 15 16 17 17 84825 R1b1 13 24 14 10 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 16 16 16 17 39927 R1b1 13 24 14 11 11 14 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 15 16 17 106520 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 - - - - - - - - - - - - - Cluster (2) 161010 I1 13 22 14 10 14 16 11 14 11 12 11 29 15 8 9 8 11 22 16 20 28 12 14 14 15 72309 I1 13 22 14 10 14 16 11 14 11 12 11 29 15 8 9 8 11 22 16 20 28 12 14 14 15 Cluster (3) 51216 R1b1 13 24 14 11 11 14 12 12 13 13 13 29 17 9 10 11 11 25 15 19 29 14 15 17 18 29479 R1b1 13 24 14 10 11 14 12 12 12 13 13 28 17 9 10 11 11 25 15 19 29 14 15 16 17 Cluster (4) 75606 R1b1 13 24 14 10 11 15 12 12 11 13 13 29 17 10 11 11 11 24 15 19 29 15 17 17 17 22971 R1b1 13 24 14 10 11 15 12 12 11 13 13 29 17 10 11 11 11 24 15 19 29 15 16 15 17 Singleton 84049 R1b1 13 25 14 10 11 14 12 12 12 12 14 28 17 9 10 11 11 25 15 18 30 16 16 16 17 Key: compared with modal value: >2>; 2> ; 1> ; = ; <1 ; <2 ; >2< bold: fast moving markers small: GD rule differs
  • 16. Matching & Grouping: Definitions Large projects need rigorous definition of terms & procedures to determine: (1) if two testees are a near match, (2) how matching testees are grouped, & (3) how groups should be named 16
  • 17. Genetic Distance: Example Comparison of two 12-marker STR haplotypes 17 3 3 3 3 3 3 4 3 4 3 3 3 Haplotype 9 9 9 9 8 8 2 8 3 8 9 8 DYS 3 0 4 1 5 5 6 8 9 9 2 9 a b -i -ii Testee A 13 24 14 11 11 15 12 12 12 13 13 29 Testee B 13 24 15 11 11 15 11 12 10 13 13 29 difference 0 0 1 0 0 0 1 0 2 0 0 0 matching markers: 9/12 mismatching markers: 3/12 Genetic Distance: 4/12 Genetic Distances are useful for educational & illustrative purposes, BUT: 1. Special rules apply for multi-copy markers: DYS 385, 389, 395, 413, 459, 464, CDY & YCA11. 2. Four different models for calculating GDs: Stepwise; Infinite alleles; FTDNA hybrid, old & new. 3. GDs take no account of differing average mutation rates for each marker: e.g. av. rate of CDY is 400 times that of DYS494.
  • 18. TiP (Time Predictor) 18 TiPs - allow for different average mutation rates for each marker - are FTDNA’s most sophisticated tool for matching; BUT - appear complicated and slow; - derivation is “opaque”, and liable to be updated; - 2 decimal places (e.g. 96.73%) is misleading; - limited to FTDNA testees.
  • 19. “TiP Score” TiP Score: - simple, arbitrary tool for project management; - 24-generation, no-paper-trail TiP at highest available resolution; - best available indicator of the probability of two testees sharing a common ancestor within the surname era; - avoids problems of Genetic Distances & matrices; - nearest whole % (e.g. 97%) sufficient;
  • 20. Matching A “near match” is a rule-of-thumb, arbitrarily chosen, to determine if two participants share a common ancestor within the surname era, i.e. in the last millennium. FTDNA list near matches on their personal yDNA “Matches” pages. They use criteria of GD = 1/12, 2/25, 4/37 or 7/67, sometimes known as “1, 2, 4, 7 rule”, or “10% rule” Some Surname project administrators use other criteria, e.g. • GD: “1, 2, 4, 6 rule”, or • GD: “0, 2, 3, 5 rule” Irwin project: • TiP Score: “60% rule” (for Irwins); “95% rule” (for non-Irwins) 20
  • 21. False Positives & False Negatives • FTDNA’s “Matches” pages are useful for newbies, but are in fact an arbitrary compromise: • for comparing similar surnames the “10% rule” is too stringent : - 7% of Irwins show as “False Negatives” (e.g. 5/37 or 6/37); - 60% TiP Score gives better matching. • for comparing dissimilar surnames the “10% rule” is too lax : - most “Matches” are “False Positives” i.e. co-incidental; - 95% TiP Score gives better screening to identify NPEs, especially when confirmed by terminal SNP test, e.g. L555. 21
  • 22. Grouping Assigning testees to clusters / groups / genetic families: Subjective choice of project administrator: • by haplogroup (default used in FTDNA public pages) or SNP • by genealogical feature e.g. surname spelling, or place of residence • by near matches e.g. GD matrix GD from mode TiP Score from modal participant • other features e.g. rare / idiosyncratic markers, TMRCAs, cladograms, triangulation 22
  • 23. Genetic Distance Matrix: Example 23 Genetic Distance Matrix of eight 37-marker STR haplotypes A - B 0 - C 1 4 - D 0 1 3 - E 13 9 8 16 - F 7 11 4 9 1 - G 3 8 10 8 0 2 - H 6 2 9 7 6 10 9 - Participant A B C D E F G H Interpretation: Two genetic families: A, B, C, D and E, F, G One Singleton: HH Problems: 1-3. Problems inherent in Genetic Distance. 4. Separate matrices necessary for comparing 12, 25, 37, 67 & 111 markers. 5. Matrices are very cumbersome for large projects.
  • 24. Irwin project – justification for use of 60% TiP Score 24 0 10 20 30 40 50 60 70 Frequency of TiP Scores Magnitude of TiP Scores from project modal haplotype
  • 25. Irwin project : Definitions • Genetic family: 2 or more participants with TiP Scores > 60% (> 95% for dissimilar surnames). • Singleton: unassigned Irwin with TiP Score < 60%. • TiP Score: 24-generation, no-paper-trail TiP, at highest available resolution, from modal participant: probability of sharing common ancestor with modal participant within the surname era, i.e. probability of being member of genetic family. • Modal participant: participant whose genetic signature is the most typical of the members of a genetic family.25
  • 27. TMRCA (Time to Most Recent Common Ancestor) 27 Popular tables/graphs can predict no. of generations/years back to the common ancestor of two participants. BUT • All TMRCAs are probabilities • TMRCAs based on genetic distance: - assume some single average mutation rate; - even the chosen average mutation rate may be incorrect; - ignore back mutations; - can be very misleading.
  • 28. TMRCAs: typical margins of error when predicted by Genetic Distance 28 Genetic Most probable TMRCA 90% of TMRCAs Distance within 0/37 1 generation = 30 years 0 - 290 years 1/37 3 generations = 90 years 0 - 450 years 2/37 6 generations = 180 years 65 - 580 years 3/37 9 generations = 270 years 110 - 710 years 4/37 12 generations = 360 years 165 - 825 years 5/37 15 generations = 450 years 220 - 930 years Assumptions: average mutation rate =0.0042 per generation 1 generation =30 years Source: www.dna-project.clan-donald-usa.org/tmrca.htm
  • 29. NPEs: synonyms • Non-paternal event (from genetics) • Non-paternity event • Extra paternity event • False paternity event • False paternity • Misattributed paternity • Non-patrilineal transmission • Male introgression • Ancestral introgression • Undocumented Adoption • Not the Parent Expected • Surname discontinuity • Surname Discontinuity Event (my preferred term) 29
  • 30. NPEs: possible causes Narrow definition (used in genetics): • Surrogacy: not yet likely in context of genealogy • Illegitimacy outside marriage: boy taking maiden name of mother • Infidelity within marriage: boy taking surname of mother’s husband Wider definition (when surname & DNA don’t match) also includes: • Re-marriage: boy taking surname of step-father • Adoption, incl. orphan, waif: boy taking surname of guardian • Formal name-change: man taking maiden name of wife or mother • Informal name-change, or alias: man taking name of farm, trade or mother • Anglicisation of gaelic or foreign surname • Error in genealogy Similar symptoms , but not a NPE if father didn’t use a hereditary surname: • By-name: man taking name of farm, trade or origin • Tenant or vassal: man taking surname of landlord or chief • Apprentice or slave: man taking surname of master 3030
  • 31. Manifestations of NPEs • Egressions from a genetic family (“e-NPEs”): same DNA, but different surname e.g. Irwin DNA, but Elliot surname (possibly an Elliot step-father) • Introgressions into a genetic family (“i-NPEs”): same surname, but different DNA e.g. Elliot DNA, but Irwin surname (possibly an Irwin step-father) “One project’s e-NPE is another project’s i-NPE”. 31
  • 32. Examples of Irwin / Elliot e- NPEs 32 ...........Elliott ...........Elliott ...........Elliott ...........Elliott ...........Irving ...........Erwin ...........Elliott ...........Erwin ...........Nipper ...........Irvine ...........McDonald ...........Armstrong ............Irwin ............Snowdon
  • 33. Examples of Elliot / Irwin i- NPEs 33 .......... Elliott ............Fairbairn ............Fairbairn ............Elliott ............Elliott ............Elliott ............Elliott ............Farms ............Fairbairn ............Fairbairn ............Fairbairn ............Fairbairn ............Fairbairn ............Fairbairn
  • 34. Recognising & handling NPEs e-NPEs: testee finds near matches with another surname, & asks admin. to join this second surname project. NB Need stringent matching criteria or evidence of NPE. i-NPEs: administrator finds near matches with another surname, & creates a new genetic family within in his project. NB i-NPEs are a sensitive subject which may disappoint testees, even if they accept the ‘event’ was not necessarily an illegitimacy or infidelity. For all NPEs, if cause & date of the ‘event’ are not known, seek evidence that the two surnames were once neighbours. 34
  • 35. 35 ID Haplo 12 25 group 393 390 394 391 385 385 426 388 439 389 392 389 458 459 459 455 454 447 437 448 449 464 464 464 464 a b -1 -2 a b a b c d Cluster (1) 65875 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 112094 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 194922 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 102835 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 108028 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 85111 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 72683 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 54774 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 87191 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 16 17 17 19864 R1b1 13 24 14 11 11 15 12 12 12 12 13 28 18 9 10 11 11 25 15 20 30 15 16 17 17 169170 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 31 15 16 17 17 84825 R1b1 13 24 14 10 11 15 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 16 16 16 17 39927 R1b1 13 24 14 11 11 14 12 12 12 13 13 29 17 9 10 11 11 25 15 20 30 15 15 16 17 106520 R1b1 13 24 14 11 11 15 12 12 12 13 13 29 - - - - - - - - - - - - - Cluster (2) 161010 I1 13 22 14 10 14 16 11 14 11 12 11 29 15 8 9 8 11 22 16 20 28 12 14 14 15 72309 I1 13 22 14 10 14 16 11 14 11 12 11 29 15 8 9 8 11 22 16 20 28 12 14 14 15 Cluster (3) 51216 R1b1 13 24 14 11 11 14 12 12 13 13 13 29 17 9 10 11 11 25 15 19 29 14 15 17 18 29479 R1b1 13 24 14 10 11 14 12 12 12 13 13 28 17 9 10 11 11 25 15 19 29 14 15 16 17 Cluster (4) 75606 R1b1 13 24 14 10 11 15 12 12 11 13 13 29 17 10 11 11 11 24 15 19 29 15 17 17 17 22971 R1b1 13 24 14 10 11 15 12 12 11 13 13 29 17 10 11 11 11 24 15 19 29 15 16 15 17 Singleton 84049 R1b1 13 25 14 10 11 14 12 12 12 12 14 28 17 9 10 11 11 25 15 18 30 16 16 16 17 Key: compared with modal value: >2>; 2> ; 1> ; = ; <1 ; <2 ; >2< bold: fast moving markers small: GD rule differs Irwin project: Results examples (1)
  • 36. 36 Irwin project: Results examples (2) ID Earliest confirmed paternal ancestor Haplo- No. of Genetic Distance TiP Remarks Surname Forename born died Residence(s) group markers from Mode Score tested /12 /25 /37 /67 /111 from modal SCOTTISH BORDERS ("B") 65875 U Irwin Henry E c1813 Lancaster Co, PA R1b1 67 - - - - - - Modal participant 112094 E Urwin William 1783 1851 Co. Durham R1b1 67 0/ 0/ 0/ 0/ - 100% 194922 U Ervin John 1715 N.Ireland SC R1b1 111 0/ 0/ 0/ 0/ 0/ 100% 102835 U Armstrong 1844 1902 Co.Tyrone OH R1b1 67 0/ 0/ 0/ 0/ - 100% 108028 U Irvine Andrew 1763 1797 Ireland PA R1b1 37 0/ 0/ 0/ - - 100% 85111 U Irwin Samuel 1736 1783 Lancaster Co, PA R1b1 67 0/ 0/ 1/ 1/ - 100% 5th cousin of 72683 72683 U Irwin Samuel 1736 1783 Lancaster Co, PA R1b1 111 0/ 0/ 2/ 2/ 5/ 99% 5th cousin of 85111 54774 U Irving William fl.1484x1506 Bonshaw, Dumfriesshire R1b1 67 0/ 0/ 2/ 3/ - 99% 87191 S Irving Francis c1568 1633 Dumfries, Dumfriesshire R1b1 67 0/ 0/ 1/ 2/ - 99% brother of 19864 19864 S Irving Francis c1568 1633 Dumfries, Dumfriesshire R1b1 67 1/ 2/ 3/ 4/ - 99% brother of 87191 169170 E Irvine John 1662 1732 Eskdale, Dumfriesshire R1b1 37 0/ 1/ 3/ - - 99% Mt. Everest line 84825 U Erwin Matthew c1695 Co.Antrim? NC R1b1 67 1/ 3/ 5/ 5/ 7/ 98% False negative 39927 C Elliot Simon 1897 1955 Co.Fermanagh R1b1 37 1/ 2/ 4/ - - 98% e-NPEs 106520 U Irvin Joe 1744 MD R1b1 12 0/ - - - - 91% NPE Elliot (1) ("NE1") 161010 U Irwin Hiram 1815 Ireland? IL I1 67 13/ 28/ 39/ 55/ - 0% ) 100% with Elliots 72309 U Irwin Andrew 1765 1824 Scotland TN I1 37 13/ 28/ 40/ - - 0% ) i-NPEs ORKNEY (1) ("O1") 51216 U Irving Christe fl. 1468 Shapinsay, Orkney Isles NY R1b1 37 2/ 6/ 11/ - - 16% Washington Irving 29479 E Irvine George c1705 1742 Sandwick, Orkney Isles R1b1 37 3/ 6/ 11/ - - 18% author of this paper IRISH - Munster ("IM") 75606 U O'Ciarmhacain/Irwin Eoin 1785 1845 Limerick, Ireland NJ R1b1 67 2/ 8/ 16/ 19/ - 1% gaelic; catholic 22971 I Irwin William 1840 Limerick, Ireland R1b1 67 2/ 9/ 17/ 20/ - 1% Singleton 84049 U Irwin William c1770 c1810 Leinster, Roscommon R1b1 37 5/ 9/ 16/ - - 2%
  • 37. Irwin project: Genetic Families And we thought Irwin was a single-origin surname! *: with 262 members this is apparently the largest genetic family in any surname project. 37 Origin Genetic % of 392 of which Families participants e-NPEs Scotland Borders* 1 67% 17% i-NPEs 15 10% 0 Aberdeenshire 1 1% 0 Forfarshire 1 0% 0 Perthshire 1 1% 0 Orkney 2 2% ?1% Shetland 1 1% 0 Unknown 6 3% ?0-3% Ireland 4 4% 1% Germany/ Netherlands 1 2% 0 Africa 1 0% 0 Singletons - 9% ? Total 34 100% 13-16%
  • 38. 38 EXAMPLE OF TRIANGULATION Crystie Irwing Irvings were first Magnus (Irving) fl. 1468, -a1504 recorded in Orkney fl. 1470 IRVINGS OF ORKNEY first of Sabay in 1369 Clovigarth showing the two lines of descent John m ? …………. identified by DNA tests fl.1483,-1519x22 heiress (Clovigarth) Sabay of Yesnaby James John m2 Katherine Kirkness m1 ........ Irving fl.1534, -1567 fl.1534 , -1597/8 fl.1561 (Clovigarth) Sabay; Law man of Orkney Overgarson heiress of Overgarson? ? Magnus William William James Alexander fl.1536, -1614 fl.1601 -1614 -1612 fl.1601 Shapinsay Sabay Clovigarth Overgarson Yesnaby Thomas Patrick Magnus Alexander Alexander c1570-p1646 fl. 1582, -a1614 fl.1583, -1649 -1629 c1600-1642 Quholm Overgarson Lie Yesnaby ? William Magnus Patrick George c1610- c1601-1626 -1657 fl. 1635x78 c1628-c1700 last of Sebay Overgarson Lie Yesnaby George David James fl.1650, -1702x11 fl. 1673x1701 c1660-c1705 Overgarson Lie Yesnaby Magnus Patrick 1650- fl.1711x29 John Magnus Hary (2) Duncan (1) Edward Edward 1682-a1746 1685-p1731 c1705-p1768 c1700-1749 1704-1756x64 1707-1796 Quholm Skaebreck Overgarson Lie Quoyloo James William John Edward George c1734-1797 1731-1807 ? c1736-p1792 c1735-c1791 c1750-1800 Quholm; NY Skaebreck Overgarson Quoyloo James Ebenezer John m Jannet Edward Peter George 1759-1835 1776-1868 -1808x21 Irvine 1774-1833x41 1741-p1772 c1750-1800 New York Washington Huan 1754-1832x41 Overgarson Lie Quoyloo 1783-1859 author FTDNA Kit. No. 174038 51216 29479 169056 174074 199671 Test sequence 4th= 2nd 1st 3rd 4th= 6th Genetic family "Orkney 1" "Orkney 2"
  • 39. Irwin project: Geographic origins 39 Participant's Residence of Historic origin place of earliest confirmed of residence paternal ancestor genetic family Project size 392 392 392 USA 77% 21% - Canada 6% 1% - Australia, New Zealand 6% - - England & Wales 5% 3% - Ireland (NI & Eire) 1% 40% 5% Scotland 5% 23% 84% Germany, Netherlands - 1% 2% Unknown, other - 10% 9%
  • 40. 40 Irvine, Ayrshire Irwin project: 1200 Scottish ancestral lines as shown by DNA tests 1300 Borders X Drum, Aberdeenshire X 1400 Orkney1 Orkney2 1500 Eskdale Bonshaw Dumfries 11 other & Castle lines Irvine 1600 X Perth X Shetland 1700 1800 BE BB BD BA, Bel, Ber, B9, B10, B14, B15, B16, B17, B23, B29 Eskdale
  • 41. Irwin project : Borders Family Cladogram 41
  • 42. Irwin project: The 15 sub-groups of the Borders family (pre-BigY) - SNP L555 recognised by ISOGG in mid-2012 - 50 tests to date, nil “L555-” results by Irwins or NPEs 42 L21 Totals Z251 L555 mode DYS DYS DYS DYS DYS DYS DYS DYS DYS DYS DYS YCA DYS un- 617 576 449 442 447 459b 391 570 534 438 570 11b 449 assigned =11 =17 = 31 = 13 =27 = 9 = 10 =14 = 15 = 16 = 17 = 23 = 29 No. of members 34 16 15 19 6 3 11 32 7 4 5 18 7 16 67 262 excl. NPEs & <37 markers 202 US descendants? Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Irish ancestors? Yes Yes Yes Yes Yes ? Yes Yes Yes ? Yes Yes Yes Yes Yes Scotish origin ? Bonshaw Dumfries Eskdale ? ? ? ? ? ? ? ? ? ? - NPE surname - - - - Elliot Errand - - - - - - - - - Code BA BB BD BE Bel Ber B9 B10 B14 B15 B16 B17 B23 B29 BX TMRCA ( by STRs) 1800 1750 1050 850 1700 1300 750 1200 BC200 1700 Earliest genealogy 1700 1500 1565 1600 1800 1850 1700 1700 1750 1700 1800 1650 1650 1650 var. L555 Yes Yes Yes Yes Yes ? Yes Yes Yes Yes ? Yes Yes Yes (Yes)
  • 43. The two types of y-DNA test 43 STR tests metaphor: "individual leaves on a tree" used for: comparing genetic signatures Sequencing Sanger Next Generation quantification analogue probabilistic expressed as counts of markers quality of base pairs FTDNA y- tests 12/25/37/67/111 markers Single SNP SNP Pack BigY use in Surname projects main tool haplogroup BigY advanced tool projects: confirmation support secondary data haplogroup prediction STR and mt data SNP ('snip') tests "branches and twigs" building phylogenetic tree Sanger binary e.g. L21+ or L21-
  • 44. Irwin project: Phylogenetic treeThe genetic "Adam" 200,000-300,000bp M42 M168 70,000bp M89 M9 M45 M96 M170 M304 M207 30,000bp E I s1I J R (years before present) P147 L68 M253 NE1 NKr M267 M172 M173 25,000bp E1 I2 I1 NC ND J1 NG J2 R1 P177 L46 M410 M513 M343 16,000bp P2 L135 CLAN IRWIN PHYLOGENETIC TREE L26 M439 UD P25 M2 AF M223 IL as at 1 Nov. 2015 M67 UJ P297 12,000bp showing tested members of Irwin genetic families in green, M269 NBt NJ NKd NL UN U3 U4 U5 and FTDNA's predictions of Irwin genetic families in red. L23 Mesolithic See Borders Irwin phylogenetic tree for L555 BigY results L51 PF7589 G L151, P311 Atlantic Modal Haplotype U106O2 P312 SF 5,300bp-Neolithic S263 DF27 Z195 M269+, L21- DA L21 NR 4,000bp S264 L176.2 Z274 DF63 DF13 DF96 Z262 Z209 NN CTS6919 DF49 - b DF21 - h CTS4466 Z251 R1b12a1a2c1a - c R1b12a1a2c1g - i R1b12a1a2c1l R1b12a1a2c1j - d - k - e - m L1 NBl M167 O1 A92 DF23 - f Y11277 - n Z21065 - S1156 Z16943 - FGC13899 Z16506 Z2961 Z16294 A541 CTS4157 Z16944 Pre-surname era BA BB BD BE Bel Ber BY674 NM M222 PFNF Z16281 NE2 A195 IM1 FGC7549 L555 B9 B10 B14 B15 B16 PF IM2 B17 B23 B29
  • 45. Part 2: BigY and BAM data – use and interpretation 45
  • 47. Irwin project, ex “BigTree” 47 R-P312 ZZ37 L21 Z29644 DF63 DF13 Z29645 A91 DF21 FGC11134 Z251 Z29646 A92 S5488 Z16250 Z16943 S11556 FGC13899 Z29647 Z16506 Z16294 CTS4466 Z16944 CTS4157 A6077 BY674 Z16281 Z21065 S1115 L555 Z16929 Z16932 Z16935 Z16937 Z16940 Z16945 Z16949 FGC19531 14750280AA A2201 V38 A4257 L557 Z16930 Z16933 S20749 Z16938 Z16941 Z16946 Z17660 FGC19533 16344314TT Z16282 A195 Z21065 L561 Z16931 Z16934 Z16936 Z16939 Z16942 Z16947 Y5816 FGC19536 FCG34569 -21368012GA 6966393AG 17193400CA CTS11273 FGC4341 9166578AC 7583420GA 16630774GA 7581395GT FGC19532 14209909CT 8531427CT 8356286CT 7244870AG 17319595GA 15093112GA 15218377TA 10007460CT 14268577CT 22487613GT FGC19534 16967721CA 16561158AG 17417800AC 7940600GA 19263733TA 1554 21519299GA 19166468GA 19048311TC FGC19535 17371426CT 21515424TA 20809987AC 8311955CA 21782548TG 22479673GC 23804663GA 19201889CG FGC19537 21030091GA 21950915GT 23427085GA 16737596AT 24479734TC FGC19538 22164909TC 17357906TA FGC19539 17851993CG 16344316TT 19262306GA 18982587GA 21306828GA 18982595GA 22461683GT 218190 75606 Withers Irvine Irwin Breen Burgin NM IM Broadley Reams 328617 3722 A3093 280599 22874 N126337 23026 54774 280156 226426 160045 65048 364399 311268 87191 Bradley Hardage Irvine Flanagan Whitaker Irvin Irvine Irvin Irvin Irving Ervin Irving Irwin Erwin Ervin Cunningham Irving Clarke Fortner singleton NE2 (NPE) IU (NPE) B14 BX BA B29 BB B23 B17 B9 B10 BE BX (NPE) BD Donatella Desmond
  • 48. Irwin project: BigY goalsInitial goals • manage and understand BigY results • set up cloud account to share project data Interim goals • minimise dependence on 3rd party analysis tools • focus on our large L555 (“Borders”) genetic family • facilitate 1 BigY test for each of 10 main sub-groups • confirm/refine project phylogentic tree and TMRCAs Current goals • facilitate FTDNA offering a low-cost L555 “SNP Pack” test • use SNP Pack data to refine individual TMRCAs NB I am giving low priority to “naming” novel variants and having them placed on the phylogenetic trees of FTDNA and ISOGG, at least until a robust understanding of the structure of L555 sub-branches has emerged. 48
  • 49. Example of limitations of algorithm-based analyses of BigY test results: the Private SNPs of FTDNA L555 Kit no. 65048 49 FGC YFull Williamson "DIY" Name Position vcf csv** Analysis *** incl. In No. of No. of Consistency SNP Big Tree?* reads Indels of SNP reads status FGC19532 8557914 G A Pass, I variant Known SNP, High conf. Private >95% B100 yes 75 0 100% Probable FGC19534 16642304 G C Pass, I variant Known SNP, High conf. Private >95% B100G yes 48 0 100% Probable FGC19535 16956346 T G Pass, I variant Known SNP, High conf. Private >95% B100 yes 81 0 100% Probable FGC19537 18668146 C A Pass, I variant Known SNP, High conf. Private >95% C 98 yes 47 0 98% Probable FGC19538 18775426 C T Pass, I variant Known SNP, High conf. Private >95% B100 yes 64 0 100% Probable FGC19539 19436082 G A Pass, I variant Known SNP, High conf. Private >95% C 96 yes 40 0 98% Probable - 18982587 G A - Novel variant, High conf. - - - 34 0 94% unstable - 18982595 G A - Novel variant, High conf. - - - 32 0 97% unstable - 13226006 C A - - Private >40% - - 2 0 100% possible - 13571571 C T - - Private >40% - - 2 0 100% possible - 10064260 C T - - Private >40% - - 2 0 100% possible - 16275572 C A - - - M100 - 2 0 100% possible A608 7534406 G T - Known SNP, High conf. * - - 94 55 67% no - 16344316 TC T Pass, I variant - -/a - - 73 0* 100% no CTS10214 19328796 G T Rej'd*, 1 variant - - 1 read - 1 0 100% no PF3499 14624254 C T - - - >1 read - 29 0* 100% no *: no BED coverage **: FTDNA list 73 other ***: FGC and YFull's *: AW lists *: Indel in high conf. Novel variants, analyses have many 20 other low others tests of which 13 appear to be more low confidence conf. Private private to 65048 private markers markers BAM dataFTDNA Bases Variant
  • 50. Analysis options for BigY test results 50 FTDNA BAM file Computerised algorithms ("science") Manual refinement ("art") FGC YFull FTDNA vcf file Analysis Analysis FTDNA csv file Haplogroup projects e.g. "Big Tree" FTDNA Matches Surname project admins "DIY" Detecting & Filtering Quality - High level SNPs - Old SNPs - Regions - Terminal SNPs - Intermediate SNPs - SNPs/Indels - Novel SNPs - Private SNPs - No.of Reads - Unique SNPs - Consistency of Reads - Compatibility within sub-clade - Stability across haplogroup - Phylogenetic trees -TMRCAs
  • 51. Process for “DIY” BigY analysis1. Create project cloud account ; upload VCF, BAM & BAM.BAI files. 2. Identify relevant variants from CSV & Matches data, Walsh & Williamson (& FGC/YFull Analyses, if used). 3. Use BAM IGV viewer to: (1) filter relevant variants: A: pre-L21 (shared by all L555 testees) B: L21-L555 ( ” ) C: L555 block (shared only by L555 testees) I : Intermediate (shared by some L555 testees) Pn: Private (unique to each testee). (2) determine SNP quality for each variant: “Probable” if >10 reads AND consistency >85% “possible” if 2-9 reads OR consistency 70-85% “No” if 1 read, OR consistency <70%, OR Indel, OR unreliable region. 4. Consider stability of SNP quality vs. that for closely-related BigY testees.51
  • 52. BAM analysis Example: 1: Use of BAM IGV Viewer www.broadinstitute.org 52
  • 53. BAM analysis Example: 2: Construct matrix of relevant variables and closely-related BigY testees 53 Named Position 1 - 22874 2 - 311268 6 - N126337 Variant on Genome Irvine - BX C'ningam-BX Erwin - B10 Irving - B17 Irvin - B26 Irvin - BA Reference Alternative Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% CTS11273 23045843 T A DF13 2836431 A C FGC19532 8557914 G A FGC19534 16642304 G C Synonyms and positions of FGC19535 16956346 T G named variants FGC19537 18668146 C A (shown in red) FGC19538 18775426 C T are derived from FGC4341 8757882 A G ybrowse L21 15654428 C G (www.ybrowse.isogg.org) L555 7647335 G T PF496 13297909 T G PF6729 10022033 A G PR1489 14543997 C C Z16940 22470652 T T Z16946 8014468 G A Z16949 7933047 T TAA CAZ251 8736334 G A 8531427 C T 13226006 C A 13294119 T T 13801126 A G 15093112 G A 15218377 T A 16561158 A G 16630774 G A 17319595 G A 18982595 G A 21368012 G A A G G A A A 32 0 94 21515424 T A 21782548 T G 21950915 G T 22487613 G T 23898645 T C 24479734 T C Base 5 - 230264- 2264263 - 65048
  • 54. BAM analysis Example: 3: Enter BAM data, sort & filterBlock Named Position 1 - 22874 2 - 311268 6 - N126337 SNP Comments Variant on Genome Irvine - BX Cunningam-BX Erwin - B10 Irving - B17 Irvin - B26 Irvin - BA Category Reference Alternative Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% Alternative No.ofreads No.ofindels Alt./reads% Block B L21 15654428 C G G 59 0 100 G 71 0 98 G 60 0 100 G 69 0 96 G 33 0 97 g 18 0 78 L21 to DF13 2836431 A C c 3 0 100 c? 1 0 100 C 11 0 91 c 6 0 100 c 6 0 100 c 2 0 100 Poor qualities -surprising L555 Z251 8736334 G A a 4 - 100 - a 6 0 100 a 14 0 100 a 2 0 100 ?a 7 0 57 Poor qualities -surprising Block C L555 7647335 G T T 51 0 100 T 54 0 98 T 76 0 100 T 91 2 100 T 36 0 100 t 9 0 78 Probable L555 Z16946 8014468 G A A 50 0 94 A 125 0 100 A 49 0 100 A 73 0 100 A 22 0 100 A 25 0 88 Probable Z16940 22470652 T T C 53 0 96 C 52 0 88 C 72 0 89 C 44 0 89 C 53 0 100 C 59 0 86 No Unreliable region Z16949 7933047 T TA T 46 39 100 T 76 75 95 T 38 39 100 T 47 47 100 T 54 47 100 T 94 68 100 No Indel Intermediate FCG34569 21368012 G A A 85 0 100 G 147 0 90 G 82 0 100 A 80 0 99 A 48 0 98 A 32 0 94 Probable Block PF496 13297909 T G g 71 0 73 t? 21 0 67 T 15 0 100 T 15 0 93 T 21 0 100 g 85 0 65 No conflicts with FCG34569 Private 17319595 G A A 23 0 87 G 24 0 100 G 27 0 100 G 58 0 100 G 24 0 100 G 78 0 100 Probable block for 21782548 T G G 79 0 100 T 174 0 100 T 93 0 100 T 97 0 100 T 35 0 97 T 27 0 100 Probable 1 -22874 PF6729 10022033 A G g 7 0 86 a? 8 0 85 a 4 0 100 a 11 0 64 ?a 6 0 83 ?a 5 0 60 possible Private 8531427 C T C 63 0 100 T 47 0 98 C 44 0 100 C 47 0 100 C 69 0 100 C 72 0 100 Probable block for 16561158 A G A 17 0 100 G 34 0 100 A 23 0 100 A 41 0 100 A 14 0 100 A 16 0 100 Probable 2 -311268 21515424 T A T 45 0 100 A 59 0 98 T 49 0 100 T 77 0 99 T 42 0 100 T 45 0 100 Probable 21950915 G T G 47 0 100 T 63 0 94 G 61 0 100 G 54 0 100 G 29 0 100 G 42 0 100 Probable 13801126 A G c 1748 10 81 G 2281 0 89 c 1144 1 76 c 1658 7 71 ?c 1083 28 57 ?c 1676 53 63 No Indel Private FGC19532 8557914 G A G 59 0 100 G 99 0 98 A 75 0 100 G 93 0 100 G 31 0 100 G 101 0 100 Probable block for FGC19534 16642304 G C G 58 0 100 G 77 0 100 C 48 0 100 G 67 0 100 G 45 0 100 G 21 0 100 Probable 3 -65048 FGC19535 16956346 T G T 90 0 100 T 139 0 95 G 81 0 100 T 53 0 100 T 87 0 100 T 102 0 100 Probable FGC19537 18668146 C A C 29 0 100 C 53 0 100 A 47 0 98 C 64 0 100 C 21 0 100 C 44 0 100 Probable FGC19538 18775426 C T C 59 0 100 C 128 0 100 T 64 0 100 C 58 0 100 C 48 0 100 C 18 0 100 No appears elsewhere in L21 13226006 C A c 4 0 100 c 4 0 100 a 2 0 100 c 6 0 100 c? 1 0 100 C 31 0 100 possible Private 16630774 G A G 65 0 100 G 44 0 100 G 42 0 98 A 32 0 100 G 59 0 100 g 6 0 100 Probable block for 22487613 G T G 119 0 98 G 127 0 93 G 101 0 99 T 67 0 88 G 205 0 99 G 184 0 100 Probable 4 -22642 PR1489 14543997 C C c 4 0 100 - c? 1 0 100 a 2 0 100 c 8 0 100 c 5 0 80 possible Private 15218377 T A T 22 0 100 T 41 0 100 T 31 0 100 T 51 0 100 A 10 0 100 T 40 0 100 Probable block for 24479734 T C T 91 0 100 T 143 0 100 T 80 0 100 T 51 0 100 C 58 0 98 T 72 0 100 Probable 5 -23026 FGC4341 8757882 A G A 24 0 100 A 45 0 98 A 35 0 100 A 51 0 100 g 9 0 100 a 4 0 100 possible note marginal no. of counts Private 23898645 T C t 56 0 84 t 109 0 78 t 71 0 80 t 90 0 71 t 45 0 80 C 27 0 85 Probable block for 15093112 G A G 98 0 100 G 74 0 99 G 76 0 100 G 34 0 100 G 104 0 100 a 137 0 84 possible note marginal consistency 6 -N126337 13294119 T T C 32 0 100 C 35 0 100 C 25 0 92 c 74 0 62 C 18 0 100 t 10 0 70 possible 5 - 230264- 2264263 - 65048Base
  • 55. L555 BAM analysis Results 55 BigY - L555 data as of 21 Oct 2015, by James Irvine, based on initial work by Dennis Wright JamesIrvine: DennisWright: FTDNA: VCF(1): A if Quality >500 Alex Williamson: Mike Walsh (1): FGC: YFull: All: Stage/Block: ) Lower case .BAM data: A Capitals: tested A .bam: not seen in,vcf -"Good" CSV: a if Quality <500 y included as per DW 9 Tree, official S shared, 99, 95% - no entry A: Adam - L21, shown at foot of table ) IF <50% are A IF >85% AND no. ofreads >10 g Rejected, "1"qual. >500 a .bam: not seen in,vcf -"Weak" n Novel VCF(2): P pass p privste, not terminal 8 Tree, draft 3 Multi family/surname s shared, 40% m >1 read B: L21 - L555 ) "good" BAM a IF 70-84%OR no. ofreads 2-9 - Rejected, "1"qual. <500 ? inconclusive: 1 or 2 samples, multiple bases k Known- R rejected ? private, "?" 7 Public, consistent 2 Singe family/surname P private, 99, 95% s 1 read C: L555 ) data a? IF no.of reads 1 ? Inconclusive, "0/1" a/- no .bam test result H High conf. 0 ancestral ; 2 entries for 1 SNP! 6 Public, semi-cnstnt 1 Single individual p private, 40% intermediate Between C and P ) Italicsin cols. G & H A Private to individual Shared SNPs which DW ignores M Med. Conf. 1 derived 4 Public, unsure -1 Unstable confirmed * private, 10% P1, P2, P3 .... Private: unique to 1 test) additional to DW T Inconclusive SNP Unstable region - 22216800-22512940 (T Krahn) u Unknown conf. 0/1 1 & R 1 - 22874 2 - 311268 6 - N126337 7-54774 8 - 364399 9 - 280156 10 - 87191 11- 160045 12 - 280599 Irvine - BX Cunningham - BX Erwin - B10 Irving - B17 Irvin - B26 Irvin - BA Irving - BB Ervin - BE Ervin - B23 Irving - BD Irwin - B9 Irvin - B14 SNP (Variant/ Indel) Remarks Stage/Block Position b37 Reference Alternative Alternative reads Indels Derived/reads% vcf(1)FTDNA vcf(2)FTDNA csvFTDNA AWilliamson MWalshStage FGC YFull Alternative reads Indels Derived/calls% vcf(1)FTDNA vcf(2)FTDNA csvFTDNA AWilliamson MWalshStage FGC Alternative reads Indels Derived/calls% vcf(1)FTDNA vcf(2)FTDNA csv AWilliamson MWalshStage FGC YFull Alternative reads Indels Derived/calls% vcf(1)FTDNA vcf(2)FTDNA csvFTDNA csv:N:NovelV.;H:HighConf. MWalshStage Alternative reads Indels Derived/calls% vcf(1)FTDNA vcf(2)FTDNA csv MWalshStage Alternative reads Indels Derived/calls% MWalshStage Alternative reads Indels Derived/calls% MWalshStage Alternative reads Indels Derived/calls% MWalshStage Alternative reads Indels Derived/calls% MWalshStage Alternative reads Indels Derived/calls% MWalshStage Alternative reads Indels Derived/calls% MWalshStage Alternative reads Indels Derived/calls% MWalshStage MWalsh-Total Block B: L21 to L555 L21/S145/M529 B 15654428 C G G 59 0 100 y 9 G 71 0 98 y 9 G 60 0 100 9 G 69 0 96 G 1P G kH 9 G 33 0 97 9 g 18 0 78 9 G 46 0 100 G 60 0 100 G 50 0 100 G 61 0 100 G 26 0 96 G 67 0 94 DF13/S521/CTS241 b 28364318 A C c 3 0 100 c 1P - y 9 c? 1 0 100 c 1R y - c 11 0 91 y 9 c 6 0 100 C kH 9 c 6 0 100 9 c 2 0 100 9 c 7 0 86 c 8 0 100 c 7 0 88 C 18 0 100 c 4 0 100 c 5 0 100 Z251/S470 b 8736334 G A a 4 - 100 a 1R ku y S m - - y ? a 6 0 100 y s m a 14 0 100 - 1R ? k?u a 2 0 100 ?a 7 0 57 a 7 0 100 a 6 0 83 a 8 0 100 a 9 0 100 a 1 0 100 A 14 0 100 Z18600 FGC only, not covered by BigY 25633952 G A Z16943 B 6351101 T A A 46 0 100 A 1P nH y 7 - - A 62 0 97 A 1P nH y 7 - A 51 0 100 nH y 7 - - A 74 0 100 A nH 7 A 53 0 96 A 1P nH 7 A 71 0 90 7 A 69 0 100 A 66 0 87 A 77 0 100 A 107 0 97 A 75 0 100 A 80 0 100 Z16944 DW had as P1 B 7527372 G A a 37 0 84 - -! y;p? - - - A 24 0 100 A 1P nH y 7 P A 26 0 100 nH y 7 P - A 29 0 100 A 1P A kH 7 A 45 0 100 A 1P nH 7 A 80 0 90 7 A 48 0 100 A 40 0 98 A 66 0 95 A 61 0 98 A 67 0 A 34 0 100 CTS4157/S3741brother of Z16944 (AW); public block? B 15439136 G A G 15 0 100 G 0P kH - - - - G 18 0 100 g 0P kH - - - G 25 0 100 kH y - - - G 38 0 100 G 0P G kH - ?g 4 0 100 g 0P - - - g 6 0 100 G 14 0 100 G 10 0 100 G 10 0 100 g 5 0 100 G 17 0 100 FGC13746 public block withFGC7549? (Donatella) B 9375616 G T T 38 0 97 T 1P nH - 4 - - T 112 0 99 T 1P nH - 4 - T 45 0 100 nH - 4 - - T 64 0 100 T 1P T nH 4 T 38 0 100 T 1P nH 4 T 17 0 82 - T 40 0 98 T 48 0 98 T 36 0 92 T 53 0 100 T 45 0 100 T 59 0 100 FGC8673 public block withFGC7549? (Donatella) B 9852985 A G G 19 0 100 nH y 4 - - g 114 0 75 nH y 4 - G 52 0 100 nH y 4 - - G 38 0 97 G nH 4 G 14 0 100 nH 4 ?g 5 0 40 - g 12 0 83 G 10 0 100 g 7 0 100 G 20 0 100 G 59 0 100 G 10 0 100 -AW found 2015H1 B 22424486 A A A 88 0 86 A 98 0 100 A 61 0 95 A 67 0 97 A 123 0 86 a 92 0 85 a 62 0 84 A 78 0 90 A 88 0 85 A 83 0 85 A 218 0 94 A 61 0 90 Block C: L555 L555/S393 C 7647335 G T T 51 0 100 kH y 7 - m T 54 0 98 T 1P kH y 7 - T 76 0 100 T 1P kH y 7 - m T 91 2 100 T 1P T kH 7 T 36 0 100 T 1P 7 t 9 0 78 - T 35 0 100 T 52 0 100 T 61 0 100 T 43 0 100 T 25 0 94 T 52 0 100 L557/S394 DB omission? C 22513691 C G G 54 0 100 G 1P kH y 7 P m G 106 0 95 G 1P kH y 7 P G 68 0 100 G 1P kH y 7 P m G 80 0 100 G 1P G kH 7 G 41 0 98 G 1P 7 ?c 12 0 58 - G 76 0 99 G 73 0 100 G 75 0 99 G 88 0 100 G 55 0 93 G 61 0 100 Z16945 C 7536923 A G G 29 0 100 G 1P nH y 7 - - G 38 0 84 nH y 7 - G 28 0 100 nH y 7 - - G 34 0 100 G 1P nH 7 G 37 0 97 nH 7 g 10 0 76 - G 26 0 96 G 31 0 97 G 43 0 95 G 39 0 100 G 76 0 99 G 18 0 100 Z16946 C 8014468 G A A 50 0 94 nH y 7 - - A 125 0 100 nH y 7 - A 49 0 100 nH y 7 - - A 73 0 100 A nH 7 A 22 0 100 nH 7 A 25 0 88 7 A 33 0 100 A 54 0 95 A 51 0 96 A 75 0 100 A 62 0 98 A 49 0 100 Z16929 c 13493784 A G G 29 0 97 nH y 7 - - G 69 0 97 nH y 7 - G 35 0 100 nH y 7 - - G 45 0 100 G nH 7 g 4 0 100 - - - G 16 0 94 G 21 0 100 G 23 0 100 G 30 0 100 G 10 0 100 G 27 0 100 Z16930 C 15625978 A G G 51 0 100 G 1P nH y 7 - - G 52 0 92 G 1P nH y 7 - G 45 0 100 nH y 7 - - G 102 0 97 G 1P G nH 7 G 35 0 100 nH 7 g 4 0 100 - G 71 0 100 G 78 0 99 G 101 0 96 G 106 0 100 G 49 0 98 G 80 0 98 Z16931 C 16433477 T C C 52 0 100 nH y 7 - - C 80 0 99 nH y 7 - C 60 0 100 nH y 7 - - C 39 0 97 C nH 7 C 53 0 98 nH 7 C 76 0 86 7 C 39 0 92 C 43 0 91 C 78 0 99 C 89 0 100 C 49 0 98 C 51 0 100 Z16932 C 17236526 C T T 34 0 100 nH y 7 - - T 65 0 100 nH y 7 - T 60 0 95 nH y 7 - - T 39 0 100 T nH 7 T 24 0 100 nH 7 t 25 0 84 - T 32 0 97 T 42 0 98 T 46 0 98 T 50 0 100 T 23 0 94 T 27 0 100 Z16933 C 17438536 G C C 25 0 100 nH y 7 - - C 24 0 100 nH y 7 - C 26 0 100 nH y 7 - - C 26 0 100 C nH 7 C 15 0 100 nH 7 t? 1 0 - C 19 0 100 C 25 0 96 C 25 0 100 C 21 0 100 C 23 0 100 C 30 0 100 Z16934 C 17448751 G C C 16 0 100 nH y 7 - - C 19 0 100 nH y 7 - C 21 0 100 nH y 7 P - C 28 0 100 C nH 7 c 5 0 100 - C 15 0 87 - C 17 0 100 C 22 0 95 C 22 0 100 C 35 0 100 c 2 0 100 C 13 0 100 Z16935 C 17612482 C T T 46 0 95 nH y 7 - - T 145 0 97 nH y 7 - T 91 0 100 nH y 7 P - T 91 0 99 T nH 7 T 44 0 100 nH 7 T 46 0 89 7 T 31 0 97 T 61 0 97 T 77 0 100 T 64 0 98 T 103 0 99 T 60 0 100 S20749 C 18171989 C T T 40 0 95 nH y 7 - - T 30 0 97 nH y 7 - T 36 0 100 nH y 7 - - T 69 0 100 T nH 7 T 41 0 100 nH 7 t 35 0 74 - T 48 0 100 T 57 0 100 T 63 0 98 T 75 0 100 T 28 0 100 T 49 0 96 Z16936 C 19094859 T C C 26 0 100 nH y 7 - - C 61 0 97 nH y 7 - C 57 0 100 nH y 7 - - C 60 0 98 C nH 7 C 19 0 89 nH 7 C 22 0 91 7 C 37 0 97 C 51 0 100 C 35 0 100 C 53 0 92 C 15 0 100 C 38 0 100 Z16937 C 19200522 G T T 71 0 99 nH - 7 - - T 109 0 97 nH - 7 - T 97 0 98 nH y 7 P - T 83 0 100 T nH 7 T 50 0 100 nH 7 t 101 0 85 7 T 64 0 98 T 112 0 100 T 87 0 100 T 96 0 100 T 62 0 89 T 63 0 100 Z16938 C 19548026 G A A 38 0 97 nH - 7 - - A 103 0 97 nH - 7 - A 52 0 100 nH y 7 P - A 77 0 100 A nH 7 A 33 0 100 nH 7 a 50 0 84 7 A 50 0 100 A 71 0 100 A 58 0 95 A 69 0 97 A 36 0 100 A 59 0 98 Z16939 C 21810487 A G G 69 0 99 nH y 7 - - G 84 0 98 nH y 7 - G 75 0 100 nH y 7 - - G 63 0 98 G nH 7 G 70 0 100 nH 7 G 110 0 85 7 G 90 0 90 G 102 0 98 G 107 0 93 G 115 0 99 G 66 0 98 G 84 0 100 Z16942 C 23130578 T A A 50 0 96 nH y 7 - - A 38 0 100 nH y 7 - A 55 0 98 nH y 7 - - A 45 0 100 A nH 7 A 22 0 100 nH 7 a 53 0 75 - A 27 0 100 A 56 0 98 A 58 0 98 A 57 0 95 A 14 0 100 A 52 0 98 Z17660 C 8877028 G C C 13 0 100 nH y 3 p - C 12 0 100 c 1P nH y 3 p c 4 0 100 c 1P -! y? - p - C 16 0 100 c 1P C nH 3 c 6 0 100 - - - ?c 13 0 69 - c 12 0 83 C 14 0 100 c 23 0 83 C 20 0 100 c 7 0 100 c 8 0 100 FGC19531 csv had both Novel & Known!; AW had P3c 6643803 C T t 8 0 100 - kH - - P - t 8 0 100 kH - - P t 9 0 100 nH Y 2 P - T 16 0 100 t 1P T nH 2 T 15 0 100 nH 2 t 5 0 80 - T 14 0 100 T 13 0 100 T 13 0 100 T 21 0 100 t 5 6 0 T 11 0 100 FGC19536 c 17576040 G C c 7 0 86 - - - c 2 0 100 - - - c 7 0 100 - - C 12 0 100 c 1P C nH 1 c 6 0 100 - c 7 0 57 - C 11 0 100 c 9 0 100 c 9 0 100 c 9 0 100 c 2 0 100 C 12 0 100 Z16940 n 22470652 T T C 53 0 96 C 1P nH y 7 - - C 52 0 88 nH y 7 - C 72 0 89 nH y 7 - - C 44 0 89 C nH 7 C 53 0 100 c 1P nH 7 C 59 0 86 7 C 36 0 97 C 35 0 86 C 39 0 87 C 55 0 93 C 119 0 96 C 26 0 88 Z16941 n 22470900 C G G 44 0 98 nH y 7 - - g 45 0 84 nH y 7 - G 18 0 100 nH y 7 - - G 31 0 97 G nH 7 G 62 0 98 G 1P nH 7 G 61 0 89 7 G 35 0 91 G 49 0 100 G 47 0 94 G 41 0 100 G 96 0 99 G 38 0 100 L561 AW has P3 FGC16164 is 2888667-672n 2888667-70 C C c 6 2 100 - c 2 4 100 - - 0 13 0 - - m c 2 14 100 - c 6 3 100 c 0P C 15 8 100 c 6 18 100 C 11 8 100 C 14 10 100 C 14 10 100 c 9 3 100 c 5 15 100 Z16947 Indel? n 18680368 T TA T 50 0 100 - - ? 3 - T 90 83 96 TA 1P 3 T 49 47 100 - T 85 0 100 - - T 31 0 100 - t 7 0 100 - T 54 0 100 T 84 0 98 T 60 0 100 T 72 0 100 T 31 0 100 T 59 0 100 Z16948 Indel? n 21613125 TA T T 49 0 100 - - ? 3 - - T 90 0 100 T 1P 3 T 90 47 100 - - - T 79 0 100 - - - T 37 0 97 - T 41 0 100 - T 65 0 100 T 79 0 100 T 86 0 100 T 88 0 100 T 39 0 100 T 56 0 100 Z16949 MW: long indel n 7933047 T TAA CA T 46 39 100 ta 1P - y 7 - - T 76 75 95 TA 1P - y 7 - T 38 39 100 TA 1P - y 7 - - T 47 47 100 TA 1P - - 7 T 54 47 100 ta 1P - 7 T 94 68 100 7 T 76 68 100 T 113 100 100 T 124 ### 100 T 125 108 100 T 48 45 100 T 94 0 88 MW: short indel n 16344311 TT T T 39 0 100 t 1P - y 3 - - T 110 0 95 T 1P - y 3 - T 10 0 100 t 1P - y - - - T 77 0 100 t 1P - - 3 T 30 1 100 - - - T 23 0 100 - T 34 0 100 T 35 2 100 T 31 0 100 T 39 0 100 T 47 0 100 T 26 0 100 AW has P3 MW: short indel n 16344316 TCT T T 39 0 100 t 1P - y 3 -/a - T 106 0 93 T 1P - y 3 -/a T 73 0 100 t 1P - y;y? - -/a - T 77 0 100 t 1P - - 3 t 5 25 100 - - - t 7 15 100 - t 3 31 100 t 6 30 100 t 5 26 100 T 8 0 29 t 8 39 100 T 5 0 21 ?covered by 18680368? n 18680369 A AA A 52 45 100 2 A 89 86 98 - A 48 47 100 - A 86 78 100 2 A 33 29 100 2 a 7 5 100 - A 55 38 100 A 65 53 100 A 64 56 100 A 79 67 100 A 31 28 100 A 61 55 100 Indel; AW had P1MW: homopolymer n 21613126 AA A A 49 1 100 a 1P - Y 2 - A 10 69 100 - - - a 4 86 100 - - A 79 0 100 - 2 A 37 0 100 2 A 41 0 100 - A 65 0 100 A 79 1 100 A 86 0 100 A 89 1 100 A 39 2 100 A 56 0 100 AW had P2 MW: long indel n 14750280 ACCA GTGT A A 13 0 100 - A 16 0 100 a 1P Y - 2 - a 4 0 100 - - A 10 0 100 - - A 22 0 100 2 a 4 0 100 - A 12 0 100 A 17 0 100 A 15 0 100 A 15 0 100 a 9 0 100 A 13 0 100 FGC16164 Indel; AW had P3MW: long indel n 2888666 CCTG G C c 8 0 100 - - - I -del c 7 0 100 - - -I -del C 13 0 100 Y 1 I -del C 16 0 100 - - c 9 0 100 - C 23 0 96 1 C 24 0 100 C 20 0 100 C 24 0 100 C 18 0 100 C 12 0 100 C 20 0 100 Indel? MW: homopolymer n 6347814 G GAG AA g? 16 0 63 - - G 115 89 93 GA 0/1R - G 78 75 95 1 - G 117 4 88 - - - - g 9 1 67 - g 2 0 100 - g 5 0 60 g 7 0 100 g 7 1 88 G 13 1 85 g 12 0 75 G 14 0 86 MW: long indel n 13550973 TTAG T T 72 0 100 - T 240 0 99 - T 150 0 100 - T 79 0 99 - T 24 0 100 - T 17 0 100 1 T 23 0 100 T 45 0 100 T 70 0 100 T 57 0 100 T 82 0 100 T 43 0 100 MW: homopolymer n 14101345 CCTT A c 6 0 83 - C 43 0 98 1 C 31 0 100 - C 36 0 97 - c 3 0 100 - c 2 0 100 - c 6 0 100 c 5 0 100 c 4 0 100 c 3 0 100 c 6 0 100 c 7 0 100 AW has P2 MW: homopolymer n 14379561 T TGA TA T 21 0 100 - T 34 31 100 tg 1P n - 1 - T 26 0 100 - T 19 0 100 - - T 40 0 100 - t 8 0 100 - T 33 0 94 T 23 0 100 T 27 19 100 T 31 0 100 T 59 0 98 T 27 0 100 MW: homopolymer n 15305844 A AAT A 16 8 100 - A 35 29 89 - a 6 2 100 - A 16 9 100 - a 6 6 100 1 a 2 2 100 - A 13 11 100 A 16 15 100 A 27 19 100 A 28 17 100 A 32 24 100 a 5 5 100 Indel? c 16344315 TTCT T T 39 0 100 - T 106 0 91 - T 71 0 100 - T 77 0 100 - T 30 0 100 1 T 22 0 100 1 T 34 0 100 T 35 2 100 T 31 0 100 T 38 0 100 T 47 0 100 T 26 0 100 MW: homopolymer n 18585796 C CAA C 33 0 100 - C 147 138 100 1 C 78 0 100 - C 64 0 100 - C 11 0 100 - c 2 0 100 - C 38 0 100 C 37 0 89 C 45 0 100 C 50 0 100 C 15 0 100 C 38 0 100 MW: homopolymer n 2746565 AA A A 55 0 100 a 1P - - 2 - A 17 0 100 - a? 1 53 100 - A 67 0 100 - 2 A 25 0 100 - A 31 2 100 - A 32 0 100 A 37 0 100 A 70 0 100 A 49 0 100 A 30 0 100 A 45 0 100 Intermediate SNPs FCG34569 2,3,8,10 1,4,5,6,7,9,11,12 I 21368012 G A A 85 0 100 A 1P nH Y 2 - G 147 0 90 - - - - G 82 0 100 A 1P - - - - A 80 0 99 A 1P A nH 2 A 48 0 98 A 1P nH 2 A 32 0 94 2 A 51 0 100 G 57 0 100 A 67 0 100 G 92 0 100 A 87 0 99 A 59 0 100 PF506 3,4,5,7,8,9 1,2,10 n 13323493 A C c 24 0 79 c 0/1R U - - m c 5 0 80 c 1R U - - a 4 0 100 a 0R kH - - - a 8 0 100 - 0R ? k?u a 4 0 75 a 0R - ?a 10 0 60 ?a 16 0 56 a 7 0 71 c 40 0 80 ?a 6 0 67 ?c 12 0 50 3,4,5,7,8,9,11,12 csv:P1 1 n 13302072 C T T 42 0 91 t 1P nH - - - t? 33 0 61 - - - - C 13 0 100 - - - 1 - C 21 0 100 - - - C 16 0 100 - - - ?t 36 0 56 - C 20 0 100 c 29 0 72 C 40 0 100 ?c 46 0 57 c 10 0 100 c 17 0 76 PF6812 1,2 3,4,5,10,11,1 2 n 10013029 T G T 14 0 57 t 0R - - - t 9 56 100 t 0R U - g 7 0 71 g 0/1R kU - - m g 35 0 51 ? k?u G 22 0 77 g 0/1R ?g 58 0 64 ?g 37 0 62 ?t 27 0 56 gt 63 0 51 g 48 0 63 g 7 0 71 g 36 0 69 4,11 csv:P1 1,9 n 13317375 A T T 26 0 92 t 1P H - 1 - t? 33 0 61 t 0/1R - t? 16 0 69 - * a? 26 0 54 - - - a? 4 0 100 - t? 3 0 100 - ?t 15 0 58 ?t 28 0 64 t 18 0 78 ?t 31 0 58 a 2 0 100 ?t 20 0 55 CTS11841 2,6,10,11 8,9,12 n 23311208 C T t? 3 0 67 c 5 0 96 c? 31 0 58 c? 36 0 53 t? 2 0 100 c 2 0 100 ct 4 0 50 t 1 0 - t 1 0 - c 6 0 83 c 1 0 100 t 5 0 80 PF682 1,6,7,9 2,3,4,5,10,11 n 14624294 C T c 6 0 83 - t 6 0 67 - t 2 0 100 - - s t 3 0 100 t P1 T k+m t 6 0 83 c 4 0 75 c 1 70 - - c 6 0 83 t 9 0 89 t 1 0 100 ?ct 2 0 50 PF496 3,4,5,7,9,11 1,6 n! 13297909 T G g 71 0 73 kU - - m t? 21 0 67 kU T 15 0 100 kU - T 15 0 93 ? k?u T 21 0 100 g 85 0 65 T 29 0 97 ?t 52 0 54 T 44 0 91 ?g 72 0 58 T 13 0 100 ?t 48 0 52 ? Indel 6 n 13700173 C ? t 68 12 81 - - - - a 1118 34 81 - - - - A 364 9 89 T 1R nM - 1 - - t 127 31 83 T 1R - - T 63 3 88 T 1R - - C 44 8 91 - ?c 18 11 67 t 7 3 86 ?t 18 12 67 c 12 0 75 ?c 30 5 60 ?c 17 3 53 Block P1: Private SNPsfor 22874 AW has P1 P1 17319595 G A A 23 0 87 a 1P nH Y 1 - G 24 0 100 - - - - G 27 0 100 - - - - G 58 0 100 - - - G 24 0 100 - - G 78 0 100 - G 43 0 100 G 63 0 100 G 63 0 100 G 109 0 100 G 23 0 100 G 53 0 100 AW has P1 P1 19263733 T A A 39 0 97 A 1P nH Y 1 - C96 t? 60 0 100 - - - - T 39 0 100 - - - - T 59 0 100 - - - T 40 0 100 - - T 28 0 100 - T 66 0 100 T 63 0 100 T 63 0 100 T 90 0 100 T 29 0 100 T 62 0 100 AW has P1 P1 21782548 T G G 79 0 100 G nH Y 1 - C91 T 174 0 100 - - - - T 93 0 100 - - - - T 97 0 100 - - - T 35 0 97 - - T 27 0 100 - T 38 0 100 T 68 0 100 T 60 0 100 T 68 0 100 C 74 0 100 T 61 0 100 PF6729 p1 10022033 A g g 7 0 86 kU - - m a? 8 0 85 kU a 4 0 100 kU - a 11 0 64 0 ? k?u ?a 6 0 83 ?a 5 0 60 A 12 0 100 a 6 0 100 A 10 0 80 ?a 8 0 50 a 8 0 50 a 7 0 86 PF6730 p1 10022039 A g g 7 0 86 kU - - m a? 6 0 67 kU a 4 0 100 kU - a 10 0 60 ? k?u ?a 6 0 83 ?a 5 0 60 A 12 0 100 a 5 0 80 a 9 0 89 ?a 8 0 50 a 8 0 50 a 7 0 86 p1 14769164 T g g 4 0 100 - - - - C100 t 6 0 100 t 3 0 100 - t 5 0 100 - - t? 1 0 - t 5 0 100 t 9 0 100 T 11 0 100 t 8 0 100 - T 11 0 100 CTS6916 AW has P1 p1 17193400 C a a 2 0 100 a - Y 1 - M100 c 3 0 100 (c) 0P - c 2 0 100 - - 0 - - C 15 0 100 - C 15 0 100 - C 59 0 100 C 48 0 100 C 78 0 100 C 88 0 100 c 4 0 100 C 35 0 100 S25968 p1 23900831 T c c 4 0 75 - - - m t 5 0 100 - t? 4 0 75 - t 8 0 89 - t? 1 0 - t 5 0 80 t 8 0 63 t 12 0 80 T 10 0 90 t 7 0 71 t 8 0 75 PF3498 Matches! p1 8094631 G a a 2 0 100 - - - g 3 0 100 - G 40 0 100 G 20 0 100 G 68 0 99 G 63 0 100 g 2 0 100 G 16 0 100 csv implies P11 p1 22257324 G t g 4 0 100 t 5 0 100 t 3 0 100 t 2 0 100 T 14 0 100 t 4 0 100 t 4 0 100 t 6 0 100 t 101 0 100 T 10 0 100 T 11 0 100 t 4 0 100 Block P2: Private SNPsfor 311268 AW has P2 P2 8531427 C T C 63 0 100 - - - - T 47 0 98 T 1P nH Y 1 - C 44 0 100 - - - - C 47 0 100 - - - C 69 0 100 - - C 72 0 100 - C 70 0 100 C 65 0 100 C 90 0 100 C 111 0 98 C 70 0 100 C 55 0 100 AW has P2 P2 16561158 A G A 17 0 100 - - - G 34 0 100 G 1P nH Y 1 - A 23 0 100 - - - A 41 0 100 - - - A 14 0 100 - - A 16 0 100 - A 22 0 100 A 34 0 100 A 37 0 100 A 33 0 100 A 13 0 100 A 32 0 100 AW has P2 P2 21515424 T A T 45 0 100 - A 59 0 98 1 T 49 0 100 - T 77 0 99 - T 42 0 100 - T 45 0 100 - T 53 0 100 T 51 0 100 T 54 0 100 T 59 0 100 T 40 0 100 T 74 0 100 AW has P2 P2 21950915 G T G 47 0 100 - - - T 63 0 94 T 1P nH Y 1 - G 61 0 100 - - - G 54 0 100 - - - G 29 0 100 - - G 42 0 100 - G 79 0 100 G 54 0 100 G 54 0 100 G 69 0 100 G 42 0 100 G 54 0 100 DW had above L21 n 13833214 T A A 41 0 85 - - t? 80 0 45 - - - A 15 0 93 - 1 - A 100 0 91 A R1 - - A 49 0 88 - a 63 0 79 - a 37 0 73 a 44 0 84 ?a 46 0 67 A 121 0 92 A 28 0 96 A 74 0 92 n 17729336 C C? c 6 0 100 - a/c 2 0 50 a 0/1P n - 1 - c 6 0 100 - c 3 0 100 - 0P - - c 4 0 100 - C 20 0 95 - C 14 0 100 C 17 0 100 C 31 0 100 C 12 0 100 c 2 0 100 C 12 0 100 CTS12439 n 28587358 T G c 123 0 72 U - - m g? 151 0 57 c 0/1R - c? 100 0 65 kU - - c? 77 0 65 ? k?u ?c 67 0 55 c 43 0 77 C 112 0 100 C 165 0 100 C 157 0 64 ?c 162 0 68 c 102 0 75 c 145 0 74 not on ybrowse n 13801126 A G c 1748 10 81 C 0/1R U - - m G 2281 0 89 (A) 0R U - - c 1144 1 76 - - m c 1658 7 71 C 0/1R ? knu ?c 1083 28 57 ?c 1676 53 63 ?c 853 17 61 ?c 1118 36 63 ?c 1037 32 56 ?c 2406 56 64 ?c 517 19 60 ?c 1554 25 61 5 - 230264- 2264263 - 65048
  • 56. L555 Phylogenetic Tree based on “DIY” BAM anlaysis 56 SNPs Indels & homopolymers L555 S393 Z16931 Z16935 Z16938 Z16946 FGC16164 Z16949 14101345 16344311 18680369 L557 S394 Z16932 S20749 Z16939 Z17660 L561 2746565 14379561 16344315 21613126 Z16929 Z16933 Z16936 Z16942 FGC19531 Z16947 6347814 14750280 16344316 Z16930 Z16934 Z16937 Z16945 FGC19536 Z16948 13550973 15305844 18585796 FCG34569 - 21368012GA 6966393AG 17319595GA CTS11273 FGC4341 9166468GA 7583420GA 16630774GA 7581395GT FGC19532 14209909CT 8531427CT 17417800AC 7244870AG 19263733TA 15093112GA 15218377TA 10007460CT 14768577CT 22487613GT FGC19534 16967721CA 16561158AG 20809987AC 7940600GA 21782548TG 21519299GA 19166468GA 19048311TC ? PR1489 FGC19535 17371426CT 21515424TA 23427058GA 8311955CA ?CTS6916 22479673GC 23804663GA 19201889CG FGC19537 21030091GA 21950915GT 16737596AT ?PF3498 ?13294119GA 24479734TC FGC19538 22164909TC 17357906TA ?PF6729 ?3715806TG FGC19539 ?16505988CT 17851999CG ?PF6730 ?13550958TG ?10064260CT 19262306GC ?S25968 ?13571571CT 21306828GA ?14769164TG ?13726006CA 22461683GT ?22257324GG ?16275572CA 280599 22874 N126337 23026 54774 280156 226426 160045 65048 364399 311268 87191 Irvin Irvine Irvin Irvin Irving Ervin Irving Irwin Erwin Ervin Cunningham Irving 12 - B14 1 - BX(I) 6 - BA 5 - B29 7 - BB 9 - B23 4 - B17 11 - B9 3 - B10 8 - BE 2 - BX(C) 10 - BD ?23898645TC 15542414CT
  • 57. Deriving TMRCAs from BigY tests TMRCAs derived from SNPs are easy to calculate: TMRCA in years = no. of SNPs x av. no. of years per SNP BUT: • all TMRCAs are probabilities • TMRCAs from a single test have wide confidence limits; confidence improved if several TMRCAs can be averaged • difficulties specific to SNP-based TMRCAs: - “av. years per SNP” depends on type of NGS test (FTDNA use “av. 120 years per SNP”); - no uniformity on what constitutes a relevant SNP, so I use: TMRCA in years = ∑(probable SNPs + 0.5 possible SNPs)/n x 120 57
  • 58. Irwin project: L555 TMRCAs (1): Age of L555 block 58 No. Duration Age of @120 years SNPs per SNP (approx.) R-L21 ) 5 600 years BC1700 DF13 ) L21 starburst DF21 DF41 DF49 FCG FCG Z251 L1335 S1026 Z1026 ZZ10 5494 11134 ) Z16943 ) Z16944 ) L555 block/bottleneck 20 2400 years L555 +19 other probable SNPs = 20 SNPs Pre-surname era Surname era Border Irwins starburst av. 5.5 650 years AD1300 1 probable 10 probables + 3 probables +2 probables +4 probables + 4 probables +4 probables +2 probables +1 probable 6 probables 5 probables 4 probables 3 probables +7 possibles +5 possibles +1 possible + 1 possible +4 possibles +1 possible =say 11 SNPs =say 7.5 SNPs =say 5.5 SNPs =say 5.5 SNPs =say 5 SNPs =say 5 SNPs =say 3.5 SNPs =say 2 SNPs =say 8 SNPs =say 5.5 SNPs =say 4 SNPs =say 3 SNPs 280599 22874 N126337 23026 54774 280156 226426 160045 65048 364399 311268 87191 Irvin Irvine Irvin Irvin Irving Ervin Irving Irvin Erwin Ervin Cunningham Irving 12 -B14 1 - BX 6 -BA 5 -B29 7 -BB 9 -B23 4 - B17 11 -B9 3 -B10 8 -BE 2 -BX 10 -BD
  • 59. Irwin project: L555 TMRCAs (2): Ages of individual members 59 No. Duration Age of @120 years SNPs per SNP (approx.) R-L21 ) 5 600 years BC1700 DF13 ) L21 starburst DF21 DF41 DF49 FCG FCG Z251 L1335 S1026 Z1026 ZZ10 5494 11134 ) Z16943 ) Z16944 ) L555 block/bottleneck 20 2400 years L555 +19 other probable SNPs =20 SNPs Pre-surname era Surname era Border Irwinsstarburst av. 5.5 650 years AD1300 1 probable 10 probables +3 probables +2 probables +4 probables +4 probables +4 probables +2 probables +1 probable 6 probables 5 probables 4 probables 3 probables +7 possibles +5 possibles +1 possible +1 possible +4 possibles +1 possible =say 11 SNPs =say 7.5 SNPs =say 5.5 SNPs =say 5.5 SNPs =say 5 SNPs =say 5 SNPs =say 3.5 SNPs =say 2 SNPs =say 8 SNPs =say 5.5 SNPs =say 4 SNPs =say 3 SNPs 280599 22874 N126337 23026 54774 280156 226426 160045 65048 364399 311268 87191 Irvin Irvine Irvin Irvin Irving Ervin Irving Irvin Erwin Ervin Cunningham Irving 12 -B14 1 -BX 6 -BA 5 -B29 7 -BB 9 - B23 4 - B17 11 -B9 3 -B10 8 - BE 2 -BX 10 -BD c.630 c.1050 c.1230 c.1230 c.1350 c.1350 c.1530 c.1700 c.750 c.1300 c.1350 c.1600 "DIY" BigY TMRCAs c.750 c.1800 c.1700 BC200 c.1200 c.1700 c.1000 c.1050 c.1450 c.1750 STR TMRCAs, 2011 c.1750 c.1780 c.1700 c.1650 c.1500 c.1650 c.1750 c.1700 c.1700 c.1600 c.1850 c.1565 Earliest genealogy
  • 60. (3): Age of L555 block by other SNP criteria No. Duration Age of @120 years SNPs per SNP (approx.) R-L21 ) 5 600 years BC1700 DF13 ) L21 starburst DF21 DF41 DF49 FCG FCG Z251 L1335 S1026 Z1026 ZZ10 5494 11134 ) Z16943 ) Z16944 ) L555 block/bottleneck 20 2400 years L555 +19 other probable SNPs =20 SNPs Pre-surname era Surname era Border Irwinsstarburst av. 5.5 650 years AD1300 1 probable 10 probables +3 probables +2 probables +4 probables +4 probables +4 probables +2 probables +1 probable 6 probables 5 probables 4 probables 3 probables +7 possibles +5 possibles +1 possible +1 possible +4 possibles +1 possible =say 11 SNPs =say 7.5 SNPs =say 5.5 SNPs =say 5.5 SNPs =say 5 SNPs =say 5 SNPs =say 3.5 SNPs =say 2 SNPs =say 8 SNPs =say 5.5 SNPs =say 4 SNPs =say 3 SNPs 280599 22874 N126337 23026 54774 280156 226426 160045 65048 364399 311268 87191 Irvin Irvine Irvin Irvin Irving Ervin Irving Irvin Erwin Ervin Cunningham Irving 12 -B14 1 -BX 6 -BA 5 -B29 7 -BB 9 -B23 4 -B17 11 -B9 3 -B10 8 -BE 2 -BX 10 -BD TMRCA = (∑(Probable SNPs + 0.5 possible SNPs)/12 ) * 120 11 7.5 5.5 5.5 5 5 3.5 2 8 5.5 4 3 av. 5.5 650 AD TMRCAs with SNPs as per Williamson's Big Tree 11 5 4.5 6 5 5 3 2 7 5 4 4 av. 5.1 615 AD TMRCA = (∑(Probable SNPs)/12 ) * 120 11 4 3 6 5 5 3 2 6 5 4 3 av. 4.8 570 AD TMRCAs with SNPs as per ISOGG Y Tree criteria 11 4 4 5 1 5 2 2 6 5 4 3 av. 4.3 520 AD years years years 1430 years 1380 1300 1335
  • 61. Criteria for BigY SNPs 61 Criterion FTDNA FGC Y Full Williamson D.Wright J.Irvine ISOGG csv Analysis Analysis Big Tree "DIY" Y Tree Min. no. of reads/calls 10 2 1-2* 10 10 4 Max. no. of reads none none 320? Min. % consistent reads 99/95/40/10 85 85/70 100/95 Stability within Haplogroup ) "shared excluded no excl. if known Stability within sub-clade ) SNPs" no important 22216800-22512940 unstable region excluded excluded included Other "Unreliable" regions included excluded excluded Indels? included excluded excluded excluded Homopolymers, recLOHs, excluded N/A Min. "Quality" (FTDNA) yes 500 N/A N/A "Confidence" (FTDNA) yes N/A N/A Max. locations on ISOGG tree N/A 3 Min. Mapping quality average (ISOGG) N/A 10% Min. extent of base-pairs (ISOGG) N/A 20 Max. segment, repeated alleles (ISOGG) N/A 5 alleles Av. years per SNP 120 118 - 120 120 - *: depending on region NB The criteria listed are as known to me 11 Nov. 2015; all are evolving and subject to change. Clearly there are both substantive differences and confusion over terminology & definitions. At least in theory it is clearly inappropriate: (1) to seek TMRCAs without clear understanding of how relevant "SNP"s are defined, and (2) to use the same "av. years per SNP" ratio for differing definitions of "SNP".
  • 62. The Irwin Surname tree 62 The Irwin Surname P311 showing the genetic and conventional genealogies P312 U106 BC2000 of some of the project's 33 genetic families L21 ? DF27 ? S263 and of the Borders genetic family sub-groups Z251 CTS4466 DF21 DF49 ? L176.2 ? S264 (many details omitted) Z16943 Z21065 Y11277 DF23 ? Z262 ? DF96 Bold indicates BigY test; indicates "Brick wall" Z16944 A541 Z16294 Z2961 ? SRY2627 ? ? L555, plus 20 other SNPs A195 Z16281 M222 ? ? ? ? AD1200 FCG34569 A88 A2427 A3955 ? ? ? ? ? 5 SNPs 4 SNPs 8 SNPs 4 SNPs 4 SNPs 4 SNPs 1-10 SNPs A89 A2432 M7964 ? ? ? ? ? 364399 87191 65048 22874 N126337 54774 B9 B14 B17 BE BD B10 BX BA BB B23 B29 IM1 IM2 NE2 PF DA O1 O2 NB1 1300s / 1400s 1500s 1600s 1700s / 1800s ? Today Irvings of ? Irvings of ? 169056 + 4 others 122282 + 7 others William 1754-1830 226426 + 48 others James 1730-1799 116495 + 2 others 51216 + 3 others Isaac 1781-1851 193093 + 9 others ? Washington 1783-1859 James fl.1534-67 Magnus 1655- 170? Criste fl.1460 Magnus fl.1470 ? Alexander 1754-1844 129415 + 3 others 122282 163590 + 3 others Charles 1738- ? Alexander fl.1601 Edward 1707-1798 129415 ? ? ? Eoin 1785-1841 15606 A3093 3722 116495 1690651216 ? ? ? ? ? ? ? ? 75606 + 2 others 65048 + 32 others ? ? ? Edward 1668-1708 ? ? Matthew 1697- 22874 + 65 others ? ? Edward 1669- ? William fl.1506 ? Irvings of ? ?? ? James 1776-1833 James 1750-1810 Irvings of Dumfries Francis fl.1596 ? Thomas 1650-1722 ? ? ? ? ? John 1734- John 1733- N126337 + 33 others ? ? 87191 + 2 others 13 others William 1710-1763 ? ? William 1698- David fl.1721 54774 + 4 others ? 11 others 169170 364399 + 16 others ? ? John fl.1662 GeneticgenealogyPapertrails Irvines of Eskdale William fl.1323 Alexander 1456-1527 Alexander 1527-1602 Irvings of Bonshaw Irvings of ? Edward 1590- ? Irving - NPE Bell (1) Irvines of Perthshire Irwins of Munster (1) Irwins of Munster (2) Irving - NPE Elliot (2) Irvines of Drum Irvines of Orkney (1) Irvines of Orkney (2)
  • 63. Main findings relevant to Irwin project • Steady growth over 10 years, now 392 STR test results (94% 37+ markers) • Most participants reside in USA, & typify the Scotch-Irish-American diaspora • 40% claim Irish ancestry, but lack paper trails “across the pond” • Tradition of single-origin Scottish surname refuted • > 90% of all participants matched to a genetic family • 34 genetic families identified, each unrelated to one another in surname era: - 22 Scottish, 4 native Irish, 1 German, 1 African, 6 unknown (Scots ?) • 13-26% of participants from NPEs • Border Irwins genetic family is apparently the largest in any surname project: - all 262 descended from a Dumfriesshire ancestor who fl. C14 - SNP L555 recognised by ISOGG, still unique to Border Irwins - tentatively split into 15 sub-groups - BigY is yielding further insights, but reliable TMRCAs elusive 6363
  • 64. Findings relevant to other surname projects • Small surname projects can learn much from large projects • Penetration ratios identify geographic bias • Spelling of surname is often misleading • FTDNA’s “Matches” pages give False Positives & False Negatives • TMRCA tables using GDs are misleading • TiP Scores avoid the many limitations of GDs • NPEs should be included • BigY: - a massive step forward - handling of results is unnecessarily cumbersome - comprehension of results is difficult & poorly explained - BAM data essential for analysing SNP quality - “starburst”/“bottleneck” phenomena need investigating - need for improved understanding of SNP criteria - individual TMRCAs unreliable: need SNP Pack back-up 64
  • 65. Further reading • www.dnastudy.clanirwin.org • www.jogg.info/62/files/Irvine.pdf • https://dl.dropboxusercontent.com/u/14028750/Testing%20and%20Analysing%20Big-Y.pdf (use of BAM IGV Viewer) • www.borderreivers.co.uk • Irving, JB 1907 The Book of the Irvings • Maxwell-Irving, AMT 1968 The Irvings of Bonshaw • Mackintosh, D 1999 The Irvines of Drum and their Cadet Lines 1300-1750 • Tough, DLW 1928 The Last Years of a Frontier • MacDonald Fraser, G 1971 The Steel Bonnets • Perceval-Maxwell, M 1973 The Scottish Migration to Ulster in the Reign of James I • Dickson, RJ 1976 Ulster Emigration to Colonial America, 1718-75 • Fischer, DH 1989 Albion’s Seed • Fitzgerald, P 2008 Migration in Irish History, 1607-2007 65
  • 66. Acknowledgements • All our 392 participants; • The many participants, most preferring anonymity, who have donated to our General Fund, helped with our website, and guided & encouraged me; • Fellow admins. John Cleary, Maurice Gleeson, Kent Irvin, Peter Irvine, Debbie Kennett, Ralph Taylor, Dennis Wright ; • Catherine Borges, for ISOGG; • Bennett Greenspan and his team at FTDNA; • My patient wife. 66

Notas do Editor

  1. Background: Genealogist for over 50 years. No knowledge of genetics, but 10 years of experience of administering Irwin DNA project, aka Clan Irwin Surname DNA Study.
  2. Irwin project also known as Irwin Clan Surname DNA Study. Irwin project is not necessarily typical of Scots clans, but many lessons apply to all surname projects.
  3. at-, mt- and x-DNA also used for Deep Ancestry and “chasing cousins”.
  4. Testing companies very dependent on “admins” for customer interface – viz. manning of FTDNA stand at WDYTYA. Important to recognise Administrators are volunteers whose interests, skills and time availability are, by definition, not limitless! This lecture focuses on items 3 and 4. A personal thought: as a surname project administrator, to date I have found understanding genetics to be less critical than having time, patience, a good support network, and skills in genealogy, data handling &amp; communicating. I have also been lucky to inherit an interesting surname and trained as an engineer. However to understand NGS SNP criteria I will need more knowledge of genetics.
  5. Very lucky this DNA project brings out so many features. 0.1% ratio is typical of many DNA surname projects
  6. “All Scottish Irwins, regardless of spelling, are descended from a common ancestor.” Solid lines show confirmed paper trails.
  7. 436 “joins”, but this includes some results pending” and mt-DNA and at-DNA orders ,and excludes non-FTDNA data; corrected figure to end-October 2015 is 392 y-DNA test results.
  8. Penetration is ratio of participants tested to world population. Note the heavy US bias in project, but Scotland not under-represented. Study suggests that penetration of about 0.06% necessary before project gets a fair perspective of diaspora.
  9. Distribution of all project participants who know the county in Britain or Ireland of their earliest confirmed paternal ancestor. Good correlation with census/Griffiths Valuations. Placenames in green appear in traditional genealogies.
  10. For background see Reading List slide at end of lecture.
  11. Spelling relevant in Scotland but not elsewhere. NB All forgoing data is before considering significance of DNA test results!
  12. - 111 marker panel more useful than 67 panel, but expensive - 12 marker panel can be useful, especially with individual “private” SNP test - “horses for courses”
  13. Full Excel table of results (470 lines, 180 columns) at www.dnastudy.clanirwin.org. This slide shows sample of 21 results (of 392), of first 25 markers (of 111), and of 4 (of the 34) genetic families identified by Administrator. Colour key at bottom denotes “genetic distance” from modal value for each marker. Some participants with only 12 markers can be categorised, some cannot. Challenge for lecture: How are these genetic families best defined, identified and named.
  14. Matching and grouping cause much confusion, and little reliable guidance available.
  15. Moral : use FTDNA pages to determine GD Average mutation rates of different markers vary by a factor of c.400.
  16. TiPs didn’t “arrive” until 2005 and by then the trail-blazing admins had developed their own tools and rules. They are still not popular with admins.
  17. TiP Score term conceived by myself and Ralph Taylor. The more I use it the more I realise its potential.
  18. FTDNA’s terminology and “Matches” pages cause much confusion. Time prevents discussion of latter; they are more useful for cousin chasing than for surname projects; not screened to remove dissimilar surnames. Most near matches have TiP score &amp;gt; 95%. I used to use cut-off of 80%, now use 60%, but not critical (for similar surnames).
  19. Grouping is biggest challenge for admins. Much inertia: most admins “set in their ways”.
  20. Fine in theory.
  21. Iterative process. DNA signature of Modal participant may not be that of common ancestor because (a) small sample size, (b) sample bias (e.g. two branches of the family have procreated at the same rate, but one stayed in UK where DNA sampling is rare, another migrated to USA where DNA sampling common), (c) “Founder effect”, where two branches procreated at different rates, typically one with a relatively lower rate in UK, but another with a higher rate migrated to USA, and (d) “Genetic drift”, the consequence of random mutations irrespective of procreation rates or migration where some lines flourish over time and others dwindle or die out. Some gentic families have only one participant if he has a very clear origin.
  22. My “Total participants” is a little less than FTDNA’s “Project joins”, as the latter include tests still at laboratory and mtDNAs Singletons, initially 50%, now steady at just 10% phases: - establishment &amp; initial growth – difficulty in identifying genetic families; - recognition of most genetic families - “maturity” - few new genetic families being added ,although project continues to grow The 0.04% and 0.07% on the right are the project’s penetration levels when the second and thrid phses happened: interesting to compare with other projects.
  23. In theory the TMRCA of a genetic family may be estimated by averaging the TMRCAs of members of the family using Magee’s matrix, but I am unsure how to interpret the mathematical result.
  24. Note: These wide probability ranges do not include further uncertainties attributable to individual marker mutation rates, back mutations and no. of years per generation. Moral: Don’t use TMRCAs based on genetic distance!
  25. Not all strict synonyms. The term NPE is borrowed from genetics, where it has a narrow interpretation. Some genetic genealogists feel this interpretation should be retained, and they and others feel very sensitive about its use in genetic genealogy. For genetic genealogy I think a wide interpretation is necessary. I would prefer the term “SDEs”, but this novelty is not widely known.
  26. Illegitimacy quite common (today technically 50%!!), but certainly not only cause of NPEs Historically, adoption and formal name-change were rare Step father probably most common
  27. These terms conceived by Dr John Plant; they are not widely used, but they need to be.
  28. This is an example of FTDNA’s “Matches” page. Note this is an Elliot with several Irwin near matches.
  29. Note this is an Irwin with several Elliot (&amp; Fairburn!) near matches.
  30. Touchy subject with many admins and participants. But with understanding, clear explanation and sensitivity I have handled over 50 NPEs without any complaints.
  31. Reminder of challenge.
  32. This is my spreadsheet analysis of the same data as in the previous slide. Many points arise. Note: - half of these examples claim Irish ancestry range of markers tested, from 12 to 111; 2 brothers with BG of 2/25 (anecdote), and one of 5/37, outside FTDNA “Matches” criterion; clarity of TiP Scores few pairs of cousins found e-NPEs and i-NPEs ability to name all four genetic families (Munster ancecdote)
  33. Most important slide. 30 genetic families now identified – for what was thought to be a single source surname! The Borders genetic family dominant, with 262 members; probably now the largest such cluster in any surname DNA project. Most e-NPEs and all i-NPEs have or used to have other Borders surnames, implying these “events” probably occurred after Irwin settlement in Borders (1300s?) but before migrations to Ulster (1600s). Only 5% of Irwin project STR tests via General Fund, but these provide several of the critical genealogies from which their geographical origins can be identified.
  34. Example of use of triangulation. Note the sequence in which the tests were taken.
  35. Note most participants reside in the New World, many can trace ancestry back to Ireland, but correlation of DNA and available genealogical evidence shows most have Scottish origins. Most apparently migrated from the Borders to Ulster in the 17th century, and from Ulster to America in the 18th century. Question arises: is project US biased?
  36. Project has cast a completely new light on traditional understanding of this Scottish surname. “X” indicates where traditional tree was wrong. Discoveries had to be handled sensitively.
  37. Pretty, but not convinced! Did help to identify sub-groups within Borders genetic family
  38. The modal sub-group BA (12 members match 67/67, 30 match 37/37) is probably an example of convergence, with regression towards the mode.
  39. Recent breakthroughs in “Next Generation Sequence” SNP tests (e.g. FGC Elite, Chromo2, Big Y) are very powerful, but expensive and difficult to analyse.
  40. Deep Ancestry speculates on the geographical distribution of these SNPs. L555 recognized by ISOGG mid 2012; still private to Irwin Borders genetic family. NGS tests necessary to bring tree into surname era.
  41. BigY is FTDNA’s Next Generation Sequencing (NGS) test. BAM data is the raw test results, typically 30Gb, i.e. too much to send by e-mail unless compressed.
  42. L21 members are very lucky to have Mike Walsh and Alex Williamson – www.ytree.net This is an on-line, free access phylogenetic tree of c.1800 P3I2/L21 NGS test results that have been copied to Williamson. He lists Private SNPs separately. This example shows the 12 L555 testees: the 5th largest such surname group in Williamson’s tree.
  43. BigTree data (only), as of 9 Nov. 2015, processed for Irwin project. I disagree with some minor details. Shows L555 still unique to Irwins. Note how “flat” this sub-clade is compared with, for example, the extensive biforcation shown in the sub-clades of the phylogenetic trees of Maurice Gleeson.
  44. Decision to minimise dependence on 3rd parties was prompted by Williamson’s threat to discontinue his Bigtree. This threat has now lapsed, but my resultant ability to read BAM data has improved my understanding of NGS data and enhanced reliability of TMRCA estimates, as well as avoiding dependence on FGC or FullY analyses. All but last prioritiy achieved in 2 years; L555 Pack test will be FTDNA’s first surname SNP Pack test.
  45. This example for kit 65048 may not be typical, and may be out of date, but the extent of the read and pink cells illustrates the principle that no computerised BigY analysis is necessarily as comprehensive as might be expected. I have even “found” probable SNPs listed in FTDNA’s Matches that were not in the relevant csv file, and “discovered”, by chance, probable SNPs that were not listed by FTDNA , Walsh or Williamson.
  46. This is a flow diagram illustrating my appreciation of the various tools available to analyse BigY test results., and some of the parameters used in these analyses.
  47. Thanks to Dennis Wright for pointing me in this direction. His webpage at https://dl.dropboxusercontent.com/u/14028750/Testing%20and%20Analysing%20Big-Y.pdf explains how to load and use the BAM IGV Viewer. Step 1 is the most difficult! Step 2 is tedious. Step 3 is easy and most illuminating. Steps 3(1) and 3(2) are iterative. See following slides. L555 is described by some as our project’s “Terminal” SNP for our Borders sub-group. Step 4 is most important. As the number of available genetically closely-related BigY test results increases, so does the likelihood of quality ratings that are incompatible. Judgement is thus called for, as no computer program could resolve these occasional conflicts (any more than a computer could describe an oil-painting).
  48. Once set up, surprisingly easy to use. This slide shows the 12 L555 results for variant 21368012-G-A on one screen!
  49. Sources: FTDNA CSV Novel Variants and Known SNPs; FTDNA Matches; FCG/YFull Analyses; haplogroup web sites, e.g. Mike Walsh, Alex Williamson
  50. NB 1. Capital A, C, G or T indicate “probable”, lower case a, c, g, t indicate “possible”. 2. Black boxes identify probable Intermediate and Private SNP blocks. 3. When identifying probable Intermediate and Private SNPs, compatibility of “possible” quality derived from a single BigY may need subjective revision. 4. Such revision cannot be undertaken by a computer program. 5. The more comparable BigY test results the better the insight into Intermediate and Private SNPs.
  51. This example shows page 2 of pages 1-3 of my manual analysis of BAM data for the 12 BigY Border Irwin tests to date. Raw BAM data is shown in red print (Read count s of SNPs &amp; of Indels, % consistency of SNP Reads). Alternate variants in capitals if Read count &amp;gt;10 AND Read consistency &amp;gt; 85%. This top page shows pre-L555 and L555 variants: boxed data is probable, unboxed data is possible. Note the Alternate variants for each base pair are the same for all Testees. FGC and YFull contributions shown in bright green.
  52. This analysis differs slightly from that of Alex Williamson. Worryingly, neither his version nor the above correlate with the STR data of these 12 BigY project members. The more private SNPs, the older the biforcation. Note the BA testee from our modal sub-group is apparently not the oldest – example of “founder effect”?
  53. Average mutation rates (“years per SNP”) are derived from radio carbon dating/ancient DNA/genealogies: YFull use 118 years per SNP (see Adamov D et al ‘Defining a New Rate Constant for Y-Chromosome SNPs based on Full Sequencing Data’ in Russian Journal of Genetic Genealogy 2015 7/1 p76 (ex http://dna.cfsna.net/HAP/index.html). Dennis Wright and FTDNA use 120 years per SNP. For FGC ‘s NGS tests over a larger sample of the genome, a smaller “years per SNP” ratio is applicable.
  54. TMRCAs based on av. mutation rate of 120 years per SNP. Mean of AD1200 for L555 block seems credible. Starburst/bottleneck/starburst phenomena – striking, no obvious explanation
  55. Some individual TMRCAs seem credible, e.g. B9. But others clearly not, e.g. B10: need for L555 SNP Pack to avoid reliance on single tests
  56. A difference of 1½ SNPs and 170 years seems a lot, and our genealogical evidence suggests that the ISOGG criteria for defining SNPs (as of 3 Nov. 2015) is too restrictive.
  57. I have included my “DIY” criteria above simply to put them in context, not to suggest they have more merit than the other criteria. Blanks indicate I haven’t got the relevant evidence.
  58. Format courtesy of Maurice Gleeson. We are making considerable progress at bridging the gap between paper trails and DNA test data.
  59. The bad news is that the Borders, Drum, Orkney and Perthshire Irvine/gs are apparently unrelated to each other through male line The good news is that : - so many American Irwins can now be positiveily entified as descendants of the Border Irvings; - surname is a plural origin name – not surprising, but upsets traditionalists; further developments and revelations likely. With 262 members (or 202 even if NPEs and &amp;lt;37 markers excluded), our Border Irwin genetic family is apparently the largest such cluster in all of the 8,000+ surname projects. And its 12 BigY test results are the 4th largest surname cluster in Alex Williamson’s Big Tree. These two features make it an excellent case study for statistical analyses by other project admins.
  60. Most of this would not have been possible without FTDNA’s vision, stoicism and patience.