This document summarizes a study that analyzed quantitative craniometric data from over 800 individuals from 13 populations across Asia, Europe, Africa, and North America to investigate population history and structure in Mongolia and Central Asia. Discriminant function analysis identified significant differences between some Mongolian groups that were excluded from further analysis. R-matrix analysis was used to calculate genetic distances between groups, phenotypic variation within and between groups, and estimates of gene flow. The results suggest the Mongolian sample has high within-group variation compared to other groups, indicating greater external gene flow into Mongolia.
1. The Central Asian Landscape: Possible Inquiries into the Population History and
Structure of Mongolia through Quantitative Genetic Analyses
R.W. Schmidt
INTRODUCTION
Mongolia, located in central Asia (see figure 1), has generated variable and
extensive genetic analyses, including the possible founding populations of North America
(Kolman et al., 1996; Merriweather et al., 1996), modern ethnogenetic hypotheses for
groups currently inhabiting the country and surrounding areas (Nasidze et al., 2005;
Keyser-Tracqui et al., 2006; Fu et al., 2007), the likely Y-chromosomal lineage of
Genghis Khan and his male-line descendents and the extensive geographic expansion in
which it is found (Zerjal et al., 2003), and lastly, the complex processes of unraveling the
underlying genetic variation seen in the larger regional context of central Asia (Comas et
al., 1998; Yao et al., 2000; Wells et al., 2001; Oota et al., 2002; Zerjal et al., 2002; Comas
et al., 2004; Quintana-Murci et al., 2004; Yao et al., 2004; Bennett and Kaestle 2006;
Derenko et al., 2007). The majority of these studies utilize common genetic markers,
such as mitochondrial DNA (mtDNA) and Y-chromosome, which have yielded
significant findings in anthropological genetic research (for a review see Crawford,
2007).
This paper will make use of existing research on the genetics of Mongolia and
central Asia to explore population history and structure of a region that has been
inhabited by a diverse mixture of individuals and groups, whom have occupied favorable
and unfavorable environments, and who now define clearly demarcated boundaries in the
form of nation-states. Central Asia is a vast territory located at the confluence of
historical empires and trade, crossed by the famous Silk Road with contacts to the south
1
2. in India and open to the steppes of the north. This region is essential to understanding
complex cultural phenomena such as acculturation, assimilation, languages, overlapping
economies, and ways of life that include migrations, expansions and conquests.
These topics will be investigated through current research of genetic markers
(including ancient DNA), migration studies in Mongolia and central Asian populations
(Perez-Lezaun et al., 1999). Also, biological variation will be investigated through the
use of quantitative trait variation, which may or may not correlate with historical and
genetic findings. Few studies in Mongolian population history and structure have given
primacy to quantitative analysis. This paper will utilize quantitative trait variation in the
form of craniometric measurements as a tool to potentially understand the complex
history of Mongolia and other nomadic groups now inhabiting the central Asian
landscape.
FIGURE 1. Map of Mongolia
2
3. MATERIALS AND METHODS
For comparative purposes in evaluating quantitative craniometric data, groups
were aggregated into major geographic regions with some partitioning: China, Japan,
Mongolia, Siberia, Southeast Asia, Europe, India, West Africa, North Africa, Mideast
(includes Israel, Iran, and Iraq), Russia, and North America (see Table 1). Group
differences were calculated by Wilks’ Lambda and discriminant function classification,
with significant differences between all groups (p ≤ .001). The Mongolian groups were
aggregated because of small sample sizes. Groups were further combined by time period:
Bronze Age, Mongolian period, Hunnu and modern. In addition, one group was labeled
“test”. A discriminant function analysis was conducted to ascertain possible group
differences (n = 14) that may skew statistical interpretation. The Mongolian “Iron Age”
and Mongolian “Bronze Age” did show significant statistical differences (p < .05) and
were therefore excluded from additional analysis (see Figure 2 and Table 2).
TABLE 1. Samples used in current study
Sample N
China 105
Japan 144
North China 54
Mongolia 109
Siberia 10
Southeast Asia 69
Europe 90
India 39
West Africa 36
North Africa 45
Middle East 40
Russia 59
North America 76
876
3
4. FIGURE 2. Mongolian Classification and Group Differences
Canonical Discriminant Functions
site
S China
4 N China
Mong Iron Age?
Mong Hunnu
Mong Period
2 Mong Bronze
Mong Modern Mong Modern
Function 2
N China Mong Hunnu Mong "Test"
Mong Period Group Centroid
0 Mong Iron Age? Mong "Test"
Mong Bronze
S China
-2
-4
-6 -4 -2 0 2 4
Function 1
TABLE 2. R matrix values for Chinese and Mongolian Samples
Population S China N China Iron? Hunnu Mongperiod
Bronze Modern "Test"
S China 0.0000
N China 0.1028 0.0000
M ongolia Iron? 0.0854 0.0936 0.0000
M ongolia Hunnu -0.0672 -0.0664 -0.0769 0.0000
M ongolian period -0.0609 -0.0584 -0.0583 0.0357 0.0000
M ongolia Bronze -0.0401 -0.0669 -0.0673 0.0300 0.0144 0.0000
M ongolia M odern -0.0859 -0.0820 -0.0707 0.0519 0.0372 0.0045 0.0000
M ongolia "Test" -0.0572 -0.0720 -0.0579 0.0347 0.0296 0.0331 0.0458 0.0000
All samples were taken from the University of Michigan’s Museum of
Anthropology database kindly provided by Dr. Noriko Seguchi. Only males were used in
the analysis to facilitate statistical competence. Seventeen craniofacial measurements
were taken on all samples, with no missing data. See Table 3 for traits used in this
analysis. For definitions of measurements, see Brace and Tracer (1992). Metric variables
4
5. record inherited differences in cranial and facial form and further, configurations in facial
form remain stable over considerable periods of time, making them excellent indicators
of groups similarities and differences (Brace et al., 2001).
TABLE 3. Traits used in this analysis with corresponding abbreviations
Quantitative Trait Abbreviation
Nasal Height nasoht
Nasal bone height nasbnht
Nasion prosthion length naprlng
Nasion basion nasbas
Basion prosthion baspros
Superior nasal bone width supnasbn
Inferior nasal bone width infnasbn
Nasal breadth nasbrdt
Frontoorbital width subtense at nasion fowsubna
Mid orbital width subtense at rhinion mowsubri
Bizygomatic breadth bizygoma
Glabella opisthocranion glabopis
Maximum cranial breadth maxbredt
Basion bregma basibreg
Basion rhinion basirhin
Width at 13 (fronto malar temporalis) fmtfmt
Mid orbital width (width at 14) mowidth
An analytical model has been used for this study. Quantitative variation will be
explored through the used of an R matrix analysis. R matrix analysis has become a
standard method for investigating population structure and history in both modern and
prehistoric contexts using quantitative traits due in large part to the interpretive quality of
the results (e.g. Relethford and Blangero, 1990; Relethford et al., 1997; Steadman, 2001;
Stojanowski, 2005). The R matrix (Relethford-Blangero) analysis has a number of
interpretive qualities that are useful for microevolutionary studies. Genetic distances
between pairs of populations can be estimated directly from the R matrix (Harpending
5
6. and Jenkins, 1973; Williams-Blangero and Blangero, 1989) as well as estimates of
phenotypic Fst. Genetic distances represent morphological similarity and difference
between samples, and serves as an indication of the rate of migration and mate exchange,
assuming the effects of random genetic drift are minimal (Relethford, 1996). Fst is a
measure of regional estimates of microdifferentiation (heterogeneity) based on the
contemporary array of allele frequencies (or quantitative traits). Large estimates of Fst are
the result of less gene flow or smaller population sizes, and smaller estimates of Fst are
the result of extensive gene flow between subpopulations. Significance tests for Fst are
calculated from standard errors, following Relethford et al. (1997).
The R matrix also another important interpretive function that is used to generate
estimates of differential extralocal gene flow by comparing observed and expected levels
of within-sample variability (Relthford and Blangero, 1990). The residual value (the
difference between the observed and expected values) indicate the rate of external alleles
being introduced into a subpopulation from outside the mating network. Positive
residuals indicate greater than average external gene flow, and negative individuals
indicate the opposite (Reddy 2001). Taken together, these analyses provide a robust
interpretation concerning the details on patterns of group affinity and phenotypic
variation among the selected populations.
Raw data sets were analyzed using the quantitative genetics software RMET 5.0,
provided by John Relethford (Relethford et al., 1997). RMET allows for trait heritability
to be estimated. A heritability of 1.0 produced both minimum genetic distances and
estimates of minimum Fst that are comparable to other phenotypic studies (Hemphill,
6
7. 1998; Steadman, 2001); however, because a heritability of one for craniometric variation
(which includes environmental variance) is not possible, an estimate of 0.55 was used
according to Relethford and Blangero (1990). They found that using an average of 0.55
for craniometric trait heritability did not significantly alter the results. That is, the average
heritability is a fairly robust one (although see Carson, 2006). This study has used a
heritability of 1.0 and 0.55 for comparisons. All tables shown use minimum Fst and
genetic distances (h2 = 1.0). Unless otherwise noted, the results using differential trait
heritability were similar.
RESULTS
Means and standard deviations for the Mongolian sample are shown in Table 4.
The results from the R matrix analyses are shown in tables 5 through 7. Table 5 gives
distance to the centroid (rii) and unbiased Fst values for all 13 populations. Table 6
displays the results of the Relethford-Blangero residuals and Table 7 gives the results for
the genetic (d2) distances among all sampled populations.
TABLE 4. Means and standard deviations for 17 craniometric measurements for the Mongolian
sample
Trait Mean SD
nasal height 53.84 3.44
nasal bone height 27.42 3.12
nasion prosthion length 74.78 5.25
nasion basion 100.76 4.58
basion prosthion 98.08 5.5
superior nasal bone width 11.20 2.35
inferior nasal bone width 19.0 12.50
nasal breadth 26.66 2.24
frontoorbital width subtense at nasion 18.85 3.14
mid orbital width subtense at rhinion 17.71 3.99
bizygomatic breadth 139.94 6.63
glabella opisthocranion 183.81 6.77
maximum cranial breadth 147.84 6.78
basion bregma 130.76 5.35
basion rhinion 103.31 5.60
width at 13 (fronto malar temporalis) 107.58 4.45
mid orbital width (width at 14) 57.32 4.93
7
8. TABLE 5. R matrix results: Genetic distance (biased and unbiased) to the centroid for all 13
populations(h2 = 1.0)
Population Biased r(ii) Unbiased r(ii) se
Chinese 0.079523 0.074761 0.008804
Japanese 0.068142 0.064670 0.006959
North China 0.128733 0.119473 0.015621
Mongolia 0.147474 0.143851 0.010458
Siberia 0.228754 0.178754 0.048388
SE Asia 0.102555 0.095308 0.012334
Europe 0.082653 0.077098 0.009695
India 0.183652 0.170832 0.021954
West Africa 0.283655 0.269766 0.028398
Mideast 0.083717 0.071217 0.014636
North Africa 0.088384 0.077273 0.014179
Russia 0.111038 0.102563 0.013897
North America 0.101747 0.095168 0.011706
Fst = 0.13002
Unbiased Fst = 0.118518
se = 0.004779
TABLE 6. R matrix results: Relethford-Blangero residuals (h2 = 1.0)
Within-
group Phenotypic Variance
Population r(ii) Observed Expected Residual
Chinese 0.074761 0.694 0.788 -0.094
Japanese 0.06467 0.688 0.796 -0.108
North China 0.119473 0.703 0.75 -0.047
Mongolia 0.143851 1.19 0.729 0.461
Siberia 0.178754 0.784 0.699 0.085
SE Asia 0.095308 0.692 0.77 -0.078
Europe 0.077098 0.809 0.786 0.023
India 0.170832 0.629 0.706 -0.077
West Africa 0.269766 0.663 0.622 0.041
Mideast 0.071217 0.695 0.791 -0.096
North Africa 0.077273 0.764 0.786 -0.021
Russia 0.102563 0.806 0.764 0.042
North America 0.095168 0.639 0.77 -0.131
8
9. TABLE 7. Genetic distances among 13 populations used in analysis (h2 = 1.0)
Pop China Japan NChinaMong S iberia S E Asia Europe India WAfrica Mideast NAfrica Russia NAmerica
China 0.000 0.045 0.079 0.038 0.027 0.051 -0.040 -0.052 -0.021 -0.059 -0.064 -0.056 -0.028
Japan 0.049 0.000 0.056 -0.002 0.023 0.023 -0.034 -0.043 0.007 -0.035 -0.028 -0.035 -0.045
N China 0.036 0.073 0.000 0.011 0.025 0.037 -0.055 -0.068 -0.060 -0.046 -0.038 -0.040 -0.029
Mong 0.142 0.213 0.241 0.000 0.066 0.007 0.027 -0.104 -0.065 -0.064 -0.068 -0.025 0.030
S iberia 0.199 0.196 0.249 0.190 0.000 -0.041 -0.032 -0.130 -0.024 -0.086 -0.069 -0.051 0.062
S E Asia 0.068 0.114 0.141 0.226 0.356 0.000 -0.045 0.006 0.028 -0.035 -0.042 -0.042 -0.047
Europe 0.231 0.210 0.307 0.168 0.319 0.263 0.000 0.000 -0.066 0.043 0.037 0.062 0.019
India 0.350 0.322 0.426 0.522 0.610 0.255 0.247 0.000 0.079 0.073 0.059 0.016 -0.020
WAfrica 0.387 0.320 0.509 0.543 0.497 0.310 0.479 0.283 0.000 0.001 -0.019 -0.091 -0.053
Mideast 0.265 0.207 0.284 0.342 0.421 0.236 0.062 0.095 0.339 0.000 0.072 0.059 0.073
N Africa 0.279 0.197 0.273 0.357 0.393 0.257 0.080 0.130 0.385 0.004 0.000 0.073 -0.003
Russia 0.289 0.238 0.302 0.296 0.383 0.282 0.055 0.242 0.555 0.056 0.034 0.000 0.019
NAmerica 0.225 0.250 0.272 0.179 0.150 0.285 0.134 0.305 0.470 0.182 0.179 0.160 0.000
Note: Values in the upper diagonal are derived from the R matrix. Values in the lower diagonal are
derived from d2 distances.
Visual representation for group affinity is given in Figures 3, 4 and 5. Figure 3 is
the genetic distance map (scaled by the square root of their eigenvalues) produced from
the Relethford-Blangero analysis. The first two principal coordinates account for 64.6%
of the variation. Figure 4 plots group centroids on the first two canonical variates and
Figure 5 plots group centroids on the first three canonical variates resulting from
discriminant function analysis. The first three canonical variates account for 76.8% of the
variation.
9
10. FIGURE 3. Genetic Distance Map
West Africa
0.4000
SE Asia
0.2000
India Japanese
PC2
Chi nese
North C hina
0.0000
Mi deast Siberi a
North Africa
Mon golia
- 0.2000
North America
Europe
Russia
- 0.4000 0.0000 0.4000
PC1 (37.4%)
FIGURE 4. Plot of the first two canonical variates resulting from discriminant function
for 13 groups, 17 variables
1.727 Mon golia
0.996 Siberia
0.746 North America
0.723 Europe
0.189 Russia
Function2
- 0.196 Chinese
- 0.370 N China
- 0.599 Japanese
- 0.640 N Africa
- 0.657 Mideast
- 0.829 SE Asia
- 1.693 I ndia
- 2.112 W Africa
- 1.368 - 0.924 - 0.856 0.041 1.357 1.637 1.668
- 1.300 - 0.900 - 0.511 0.774 1.588 1.663
Function1
10
11. FIGURE 5. Plot of the first three canonical variates resulting from discriminant
function analysis for 13 groups, 17 variables
W Afri ca
India
Mong olia
SE Asia
North Ameri ca
Mideast
Europe
Chinese Si beria
N Africa
Japanese
Russia
N China
DISCUSSION
Little is known about the people of Mongolia prior to the rise of Genghis Khan
(Keyser-Traqui et al., 2006). Early in Mongolia’s history, there were many war-like tribes
inhabiting the region, usually nomadic similar to other peoples of the central Asian
steppe. These nomadic tribes sometimes united with other peoples of the steppe, forming
large confederations that routinely threatened places like China, Europe, and the Middle
East. These confederacies rarely lasted; however these conflicts did redistribute people
and left particular genetic impressions.
Central Asia is a vast territory that has been central to the development of human
history because of its strategic location. The territory has been a complex assembly of
11
12. peoples, cultures, and habitats. The area has been occupied since Lower Paleolithic times,
and there is evidence of Neanderthal skeletal material in Uzbekistan (Comas et al., 2004).
The genetic legacy of the Mongols was expanded with the rise of Temujin (c.
1162-1227), otherwise known as Genghis Khan (Chinggis Khaan) and later the formation
of the Yuan Dynasty (1271-1368) (Mote 1999). By 1206 all tribes had come under the
rule of Temujin, who firmly began the establishment of the Mongol Empire. Genghis
Khan and his immediate successors conquered nearly all of Asia and European Russia, as
well as sending armies as far west as the Middle East, and south into Southeast Asia. This
was the largest land empire known in history (Figure 5).
FIGURE 6. Map showing the extent of the Mongol Empire circa 1294
Genghis Khan and his male-line descendents left a large genetic imprint across
the Old World by ruling large areas of Asia for many generations. Genghis Khan and his
descendents would often slaughter large segments of the population under their control,
which allowed a new genetic signature to thrive (Mote, 1999; Zerjal et al., 2003). Zerjal
12
13. et al., (2003) suggest the Mongol ruler and his male lineage may be responsible for a
“star-cluster” Y-chromosomal pattern found throughout a large geographical area
extending from Central Asia to the Pacific. This “star-cluster” formation (closely related
lineages) is found in 16 populations extending from the Pacific to the Caspian Sea and is
found in high frequencies (~8%), suggesting they do not result from an event specific to
any single population (Zerjal et al., 2003). It is possible that a form of social selection is
responsible for the observed pattern. That is, on the basis of social prestige (descendent of
Genghis Khan), a novel form of selection favored various human populations.
Central Asia is a major contact point for many diverse peoples. As such, the
history and development of the Mongolian population was a complex process affected by
the mixture of ethnically diverse groups (Keyser-Traqui et al., 2006). Importantly, little is
known genetically of this region, which has played a crucial role in the history of
humankind (the Silk Road), where contacts and trade occurred between the steppe
peoples of the north and peoples of India in the south. These contacts should have
resulted in the generation of complex cultural phenomena, such as acculturation,
assimilation, language acquisition, overlapping economies, all acting upon the genetic
makeup of diverse groups found throughout central Asia.
Comas et al., (1998; 2004) found the central Asian genetic landscape to present
features (such as frequencies of certain nucleotides, levels of nucleotide diversity, mean
pairwise differences, and genetic distances) intermediate between Europe and eastern
Asia, possibly suggesting significant gene flow enhanced as a result from the trade routes
along the Silk Road. Further, these researchers point to mtDNA eastern Asian sequences
in central Asia originating in the Mongols and/or Chinese (Comas et al., 1998). Yao et
13
14. al., (2000) examined mtDNA control region segment I and melanocortin 1 receptor
(MC1R) gene polymorphisms along the Silk Road region of China. In congruence with
Comas et al., (1998) in the larger region of central Asia, both the frequencies of the
MC1R variant and the mtDNA presented intermediate values between those of Europe
and East and Southeast Asia, suggestive of extensive admixture in this area of increased
contact and interaction.
This study makes use of quantitative trait variation and accordingly, the results
are similar to the genetic analyses described above. Table 5 shows the results from the
Relethford-Blangero analysis. Within-group phenotypic variance is greatest in Mongolia
(1.190), indicating greater than expected extralocal gene flow (0.461). In fact, Mongolia
has the highest value of positive residuals. This finding would suggest that significant
admixture has been occurring in Mongolia despite the relative nomadic lifestyle of many
groups. Figures 3, 4, and 5 all suggest an intermediate position for Mongolia between
European and East Asian populations. Interestingly, although contacts have been
persistent between central Asia and India, there is little indication that gene flow has been
occurring between Mongolians and people of the Indian subcontinent. India is seen as a
consistent outlier in all three analyses, clustering closer to the Middle East and North
Africa.
Genetic distances resulting from the quantitative analyses are also informative.
The lowest d2 values for Mongolia are China (0.142), North America (0.179), and Europe
(0.168). The R matrix values derive similar results for Mongolia, indicating a closer
genetic relationship to the Chinese groups, Southeast Asia, North America, Siberia, and
Europe. Kolman et al. (1996) suggest that central Asian groups (including Mongolia)
14
15. represent the closest link between the Old World and the New World using mtDNA
diversity. They feel that the narrow geographic corridor of east Central Asia, extending
from Mongolia to the Pacific coast may have served as a starting point for the human
migration that lead to the colonization of the New World. Although this study does not
allow for the more nuanced underlying variation that could support this hypothesis, the
data does suggest an affinity for Mongolians and North American Indian groups.
CONCLUSION
The analyses conducted in the present study indicate the utility of quantitative
genetic variation. Although the R matrix analysis does not get to greater underlying
variation for the Mongolian population, it does however show a correlation with recent
genetic studies using mtDNA, Y chromosome and ancient DNA analysis.
15