Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2006 May 15;103(21):8012–8017. doi: 10.1073/pnas.0509718103

Serial coalescent simulations suggest a weak genealogical relationship between Etruscans and modern Tuscans

Elise M S Belle *, Uma Ramakrishnan , Joanna L Mountain , Guido Barbujani *,
PMCID: PMC1472421  PMID: 16702560

Abstract

The Etruscans, the only preclassical European population that has been genetically characterized so far, share only two haplotypes with their modern geographic counterparts, the Tuscans, who, nonetheless, appear to be their closest relatives. We modeled 10 demographic scenarios spanning the last 2,500 years and tested by serial coalescent simulation whether any are consistent with the patterns of genetic diversity observed within and between the Etruscan and the modern Tuscan populations. Models in which the Etruscans are the direct ancestors of modern Tuscans appear compatible with the observed data only when they also include a very high mutation rate and an ancient founder effect. A better fit was obtained when the ancient and the modern samples were extracted from two independently evolving populations, connected by little migration. Simulated and observed parameters were also similar for a scenario in which the ancient samples came from a subset, e.g., a social elite, genetically differentiated from the bulk of the Etruscan population. In principle, these results may be biased by factors such as gross and systematic errors in the ancient DNA sequences and failure to sample suitable modern individuals. If neither proves to be the case, this study strongly suggests that either the mitochondrial mutation rate is much higher than currently believed or the Etruscans left very few modern mitochondrial descendants.

Keywords: ancient DNA, mtDNA, population genetics


The Etruscan culture is documented in central Italy between the eighth and second centuries before Christ (B.C.), and its origin is still controversial. Some ancient historians, including Herodotus, suggested that the Etruscans came to Italy from Asia Minor, and, although it is difficult to imagine a mass migration from there, contacts of various types are documented (1) and may have entailed genetic exchanges. However, most modern archaeologists, along with Dionysius of Halicarnassos, believe that the Etruscan civilization developed locally from the 10th century B.C. Villanovian culture. In the second century B.C. the Etruscans were given Roman citizenship, and soon afterward their language disappeared from records (2).

The Etruscans, the first population of preclassical Europe to be characterized genetically, harbor levels of mtDNA diversity comparable to those of modern European populations (3). Their closest modern genetic relatives appear to be the inhabitants of the same region, the Tuscans, with an FST genetic distance of 0.024. However, the population of Anatolia, although geographically distant, appears more similar to the Etruscans than any other Italian or eastern Mediterranean populations. In addition, Etruscans and Tuscans share only two mitochondrial haplotypes, both of which are observed throughout Europe, which raises a series of questions on the historical relationships of these populations.

Taken at face value, this finding suggests that modern and ancient inhabitants of Tuscany are largely unrelated or that they are related, but some process occurring in the last 2,500 years had such a strong impact that their genealogical continuity is now hard to recognize. Among these processes is social stratification, namely the existence of genetic differences between a small social elite and the bulk of the population, a well known phenomenon in human history (4). Indeed, in an area rich in archaeological material such as Tuscany, only decorated tombs, or tombs containing broad collections of artifacts, can be attributed with confidence to the Etruscan civilization, but these are also the burials of the wealthy. If so, the study by Vernesi et al. (3) would provide little information on the other layers of the Etruscan society. To understand which historical scenarios are consistent with the genetic data, we have investigated, by computer simulation, the effects of past demographic events and social structure on genetic diversity.

In this study we used recently developed, coalescent theory-based software, serial simcoal (5), to analyze DNA sequences sampled at different moments in time. We simulated a number of demographic processes affecting the population(s) of Tuscany over 2,500 years, or ≈100 generations, and tested their consequences at the genetic level. Models included parameters such as the rate of mutation and the rate of migration between independently evolving populations, as well as changes in population size and reduced reproductive success. At the end of each simulation, measures of genetic diversity were calculated, and the goodness of fit of each model was evaluated. We hereby rejected several hypotheses regarding the evolutionary relationships between past and present inhabitants of Tuscany, and we showed that either the mtDNA mutation rate is higher than currently believed or Etruscans and Tuscans have only a weak genealogical relationship.

Results

For each model, we report the median of the chosen summary statistics (see Materials and Methods) through 1,000 simulations (Table 1) the empirical likelihood values, and the χ2 values, estimated by Fisher's test (Table 2).

Table 1.

Observed and simulated diversity statistics for the Tuscan (T) and Etruscan (E) samples based on mtDNA sequences

Data set Haplotype no.
Haplotype diversity
Nucleotide diversity
Average pairwise difference
Haplotype sharing
FST
T E T E T E T E T E E-T
Observed 40 22 0.938 ± 0.015 0.944 ± 0.018 0.014 ± 0.008 0.011 ± 0.006 3.9 ± 2.0 5.0 ± 2.5 0.050 0.091 0.024
Model 1 49 27 0.980 0.963 0.353 0.353 127.1 127.2 0.000 0.000 0.015
Model 2 43 26 0.975 0.960 0.160 0.159 57.5 57.4 0.067 0.145 0.016
Model 3 46 26 0.977 0.960 0.160 0.160 57.7 57.6 0.065 0.120 0.015
Model 4 36 17 0.955 0.908 0.019 0.016 6.7 5.9 0.194 0.412 0.016
Model 4′ 7 4 0.705 0.466 0.002 0.002 0.7 0.6 0.571 0.250 0.012
Model 5 33 15 0.946 0.889 0.946 0.013 5.5 4.8 0.200 0.421 0.017
Model 6 36 18 0.957 0.916 0.019 0.017 6.8 6.1 0.054 0.111 0.027
Model 7 37 18 0.960 0.919 0.019 0.017 6.9 6.2 0.092 0.188 0.023
Model 8 36 17 0.956 0.914 0.021 0.019 7.7 6.8 0.056 0.118 0.027
Model 9 36 18 0.957 0.916 0.018 0.017 6.6 6.0 0.057 0.111 0.027
Model 10 36 17 0.956 0.916 0.022 0.020 7.9 7.3 0.059 0.125 0.027

Haplotype no. is the number of distinct haplotypes; haplotype diversity is a measure of expected heterozygosity; haplotype sharing represents the fraction of haplotypes in one population that are also found in the other; FST represents between group diversity. Data are medians of the 1,000 simulations performed for each evolutionary scenario. The simulated parameters that differed significantly from the observed values (see Table 2) are in italics.

Table 2.

Empirical likelihood values (P) for each summary statistic derived from 1,000 simulations, and χ2 values for Fisher's test where χ2 = −2 Σ Ln(P)

Simulated data set Haplotype no.
Haplotype diversity
Nucleotide diversity
Average pairwise difference
Haplotype sharing
FST
χ2
T E T E T E T E T E E-T
Model 1 0.0005 0.0005 0.0005 0.0005 0.0005 0.0005 0.0005 0.0005 0.0060 0.0060 0.0005 106.4
Model 2 0.1150 0.0210 0.0005 0.0005 0.0005 0.0005 0.0005 0.0005 0.8250 0.8580 0.0005 88.1
Model 3 0.0010 0.0210 0.0005 0.0005 0.0005 0.0005 0.0005 0.0005 0.7010 0.7000 0.0005 97.5
Model 4 0.2030 0.0460 0.4960 0.1660 0.4240 0.3460 0.4060 0.3280 0.0005 0.0005 0.1860 21.7
Model 4′ 0.0005 0.0005 0.0005 0.0005 0.0005 0.0005 0.0005 0.0040 0.0005 0.0005 0.3880 88.9
Model 5 0.0430 0.0160 0.9440 0.0900 0.7660 0.6520 0.7450 0.6300 0.0005 0.0005 0.3300 23.2
Model 6 0.2300 0.1220 0.3740 0.3140 0.3960 0.2820 0.3880 0.2620 0.8240 0.8100 0.6920 16.7
Model 7 0.3380 0.1260 0.3180 0.3840 0.3220 0.2680 0.3180 0.2480 0.1750 0.1520 0.8420 15.9
Model 8 0.2290 0.0630 0.4620 0.2240 0.3340 0.3040 0.3240 0.2900 0.7180 0.6580 0.6680 18.5
Model 9 0.2590 0.0750 0.4180 0.2920 0.4340 0.2840 0.4220 0.2640 0.7400 0.7020 0.6880 17.2
Model 10 0.2370 0.0820 0.4520 0.2920 0.3400 0.2440 0.3200 0.2200 0.6480 0.6080 0.7320 17.9

In our case, the χ2 was calculated based on seven statistics (haplotype number, haplotype diversity, and average pairwise difference for each population and FST between the two populations). The χ2 values that are not significant are given in italics. E, Etruscans; T, Tuscans.

Single-Population Models.

Model 1: A large population of constant size.

This oversimplified model corresponds to a population that remained at a constant female population size of Nf = 300,000 (the modern population size) for 100 generations. Under this scenario, all P values are very low, thereby suggesting that the simulated values are all significantly different from the observed ones.

Model 2: A small population of constant size.

Consistent haplotype-sharing values are obtained by simulating a stationary population with a small effective size of Nf = 25,000. In this case, as expected, genetic diversity within populations is lower than for model 1. However, almost all simulated within-population genetic diversity values remain significantly higher than the observed values.

Model 3: A growing population.

Here we considered a population that expanded exponentially from Nf = 25,000 to Nf = 300,000. Again, the level of haplotype sharing is compatible with the observed data, but the simulated FST remains significantly lower than the observed value. The results are similar to those obtained for model 2 with a population of small and constant size, suggesting that a population expansion during the last 100 generations did not have a great impact on the measures of genetic diversity considered.

Model 4: A founder effect.

We next added a founder effect before the population expansion, maintaining the same growth rate (0.0248) of the previous model, and running some preliminary simulations in which the event was between generations 200 and 250; by a process of trial and error, the best fit was obtained for a founder effect 240 generations ago, at which time the population size was Nf = 770. In this way, the simulated FST value does not differ significantly from the observed one anymore, but the simulated level of haplotype sharing increases to values significantly higher than observed. However, the founder effect dramatically decreases the internal genetic diversity. Therefore, under this model, all indices of within-population diversity, except haplotype number for the Etruscans, fit the observed values. Fisher's test indicated no significant difference between the simulated and observed data.

Model 4′: A founder effect and a lower mutation rate.

Using the parameter values of model 4 with the exception of a mutation rate of 0.05 mutations per million years per nucleotide (6), we found that most simulated diversity statistic values were very different from the observed values, leading to a highly significant χ2 value.

Model 5: An episode of selection after the expansion.

Here we tested for the possible effects of a reproductive disadvantage associated with poor living conditions for the Etruscan population after their political assimilation into the Roman state. We simulated selection by introducing a bottleneck to Nf = 1,000 between 80 and 85 generations ago (i.e., around the second century B.C.), followed by a reexpansion to Nf = 300,000 in the present. This scenario is also consistent with the data, but it does not improve the fit with respect to model 4.

The comparison of models 4, 4′, and 5 suggests that the observed genetic diversity measures, within and between populations, are compatible with direct genealogical relationships between Etruscans and Tuscans, but only if a very high mutation rate is assumed. Next, we explored whether and under which conditions one can improve the overall fit, assuming that Etruscans and Tuscans belong to different genealogies. We therefore simulated independent evolution of two populations, one ancestral to current Tuscans and one representing the Etruscans and their descendants, and estimated the divergence time and the migration rate compatible with the data. For the Tuscan population we chose model 4; for the Etruscans, based on model 5, we modeled a population of constant size after the bottleneck.

Two Population Models.

Model 6: Two independent populations with no migration.

Considering no migration between the two populations, we obtained compatible values for all genetic diversity statistics for a divergence time of the two populations between 250 and 500 generations (or 6,250 and 12,500 years) ago. Under this extreme model, where the Tuscans are simply unrelated to the Etruscans, we found that the best fit to the observed data was obtained for a divergence time 300 generations, or 7,500 years, ago. This divergence time was then used in all following simulations. The χ2 evaluated under this model is clearly lower than under any one-population models.

Model 7: Two populations with ancient migration.

We next considered the possibility that individuals migrated between the two populations only from the beginning of the expansion of the Tuscan population, 240 generations ago, until 100 generations ago. Under this scenario, the maximum migration rate that was not rejected was 0.001%, and the fit of this model was the best observed through all simulations, although the χ2 was not much less than under model 6.

Model 8: Two populations with recent migration.

The only testable model including migration between present time and 100 generations ago involves unidirectional migration from the Etruscan population to the present-day Tuscan population. Migration in the other direction was impossible to model, because we have no modern group that can be safely regarded as descended from the Etruscans. In order not to obtain a significant departure of simulated from observed statistics, the migration rate from Etruscans to Tuscans could not exceed 0.0001%, and the overall fit was marginally worse than under previously simulated models.

Model 9: Two populations and social stratification.

As previously mentioned, most individuals identified as Etruscans in ref. 3 probably belonged to the social elite. If this social elite came from abroad and imposed its language on the local population, following the well established model of elite dominance (4), it might have differed genetically from the rest of the population. This was apparently the case for the Seljuk and Ottoman elites immigrating to Anatolia at the turn of the second millennium anno Domini (A.D.) (7). For this model, we represented the migration of a small number of individuals from abroad by simulating a source (Etruscan) population with a strong bottleneck 100 generations ago. A model with an ancestral population of Nf = 25,000 followed by a bottleneck to Nf = 1,000 over four generations led to summary statistics, none of which differed significantly from the observed ones, although the observed χ2 was slightly higher than for models 6 and 7.

Model 10: Two populations, social stratification, and migration.

We next considered the possibility that some individuals migrated from the Etruscan social elite to the Tuscan population between present time and 100 generations ago. Under this model, the maximum migration rate did not exceed 0.0001%, and the fit of the model did not improve with respect to model 9.

Additional Simulations.

To explore the sensitivity of our initial results, we ran four supplementary sets of simulations. (i) We assigned to each Etruscan sequence its estimated age, based on archaeological evidence, which corresponds to a time period spanning from the first to the seventh century B.C. (i.e., between 84 and 106 generations ago). All of the results previously obtained remained unchanged after this modification. (ii) We repeated the simulations by using 96 sequences from Turkey (810), rather than the Tuscans, as the modern population. We found that Turks and Etruscans could not be regarded as a single population studied in two time periods under any demographic scenario, including model 4. (iii) To test whether a different modern Tuscan sample could resemble more closely the Etruscans, we used another, so far unpublished, data set comprising 86 mtDNA sequences from the village of Murlo (A. Piazza and A. Torroni, personal communication), located in the heart of what once was Etruria. The observed diversity values for Murlo were as follows: 60 haplotypes; haplotype diversity, 0.960 ± 0.010; nucleotide diversity, 0.012 ± 0.007; average pairwise difference, 4.5 ± 2.2. The Etruscan sample shared with them only haplotype 5AM, carried by 13.9% of the Murlo individuals. In the simulations in which Etruscans and Murlo were part of the same genealogy, the overall fit was worse than in the previous simulations, and the departure from the observed data was significant [χ2 = 30.9, P < 0.01; χ2 = 32.1, P < 0.005 (for models 4 and 5, respectively)]. (iv) Because models 4 and 5 were the one-population models showing the best fit, we modified them to test whether stronger effects of genetic drift might improve the correspondence between observed and simulated data. For that purpose, we simulated a stronger bottleneck (Nf = 100 instead of Nf = 5,000) at generation 80 and assumed that the effective population sizes of both Etruscans and Tuscans were 1/40th (instead of 1/12th) the census population sizes under model 4, i.e., Nf = 7,500 and 87,500, respectively. In both cases, we observed a significant decrease of the fit (χ2 = 31.6, P < 0.005; χ2 = 40.6; P < 0.001, respectively).

Discussion

The models tested are not exhaustive of all possibilities. However, by modeling 10 combinations of demographic events during the last 100 generations and two mutation rates, we could rule out several possible models representing the evolution of the population, or populations, of Tuscany. Simulating a population of constant size, be it large (model 1) or small (model 2), yields statistics in sharp contrast with the observed ones. This finding was expected given the oversimplified nature of those scenarios. However, a more realistic scenario representing a simple expanding population, with the modern Tuscans being the direct descendants of the Etruscans (model 3), proved also incompatible with the observed data.

A good correspondence between simulated and observed data was obtained only by incorporating a founder effect in the model. Both model 4, starting with a population of 9,240 individuals (Nf = 770) 6,000 years ago, and model 5, in which we added an episode of selection against the Etruscans starting at the Roman assimilation, give compatible results for all measures of internal diversity and for FST. The additional tests we ran show that neither a more exact placing in time of the ancient samples nor the choice of a different modern Tuscan sample changes these results in their essence.

A number of factors enhancing the evolutionary impact of genetic drift (including population subdivision, fertility correlations, and age structure) may have caused the Etruscan and Tuscan population sizes to be lower than we assumed. Fertility correlation refers to the finding that daughters of highly fertile women also tend to have above-average fertility (11), which may lead to a similar effect as a bottleneck in the coalescence. Age structure may imply an excess of individuals of nonreproductive age in a community, so that the effective population size may be as low as 10% of the census population size (12). However, incorporating a stronger genetic drift in models 4 and 5 did not improve the correspondence between observed and simulated data.

We next modeled two independent populations and various levels of migration. Model 6 represents an extreme scenario with no migration. The best correspondence to the data was obtained by placing the divergence between the ancestors of the Etruscans and the ancestors of modern Tuscans 7,500 years ago. This period roughly corresponds to the beginning of the Neolithic spread of farming in Europe, which has entailed immigration from the Near East and demographic growth (13). According to our simulations, it would be during that period that the ancestors of the modern Tuscans and the ancestors of the Etruscans separated. Although this model is not the only one consistent with the data, it fits well with the prediction of archaeological models (14) suggesting that the incoming Neolithic farmers (who in this case would be largely ancestral to modern Tuscans) spoke an Indo-European language, whereas previous European settlers (who would be largely ancestral to the Etruscans) spoke a non-Indo-European language.

In models 7 and 8 the maximum rate of migration compatible with the observed data was between 0.0001% and 0.001%. If migration was ancient, it had to be at most between 9 and 300 individuals per generation, respectively, at the beginning and at the end of the expansion. Alternatively, if migration occurred in the last 100 generations, it involved at most 30 individuals per generation. There are no direct estimates of migration in populations of the same time period to compare these figures with, but the Etruscan contribution to the modern gene pool of Tuscany seems very limited indeed.

Finally, the measures of genetic diversity generated under models 9 and 10, in which the Etruscans of our sample were regarded as members of an elite social class, corresponded to the observed values roughly as well as those generated under models 6 and 8, respectively. Although the incorporation of elite dominance does not lead to a better fit, models with elite dominance are compatible with the data.

Potential Methodological Problems.

Vernesi et al. (3) found no evidence of subdivision in their analysis of the Etruscan data, and so we considered the Etruscan population as essentially panmictic. Simplifying assumptions of this kind are necessary for studying complex evolutionary processes (see, e.g., ref. 15). Modeling a subdivided Etruscan population would have been problematic in the absence of reliable information on the number of population units and on rates of migration among them. Had the Etruscan population been subdivided, one would expect an increased probability of haplotype loss by genetic drift (16). Three additional factors, namely systematic typing errors, underestimation of mutation rate, and inappropriate choice of the modern sample, might have been a source of bias in our analyses.

Systematic Typing Errors.

Errors may and do occur in both ancient and modern DNA typing (17). Although Vernesi et al.'s (3) Etruscan sequences were obtained by using the strictest available standards for ancient DNA (18), including independent replicated extractions, amplifications, cloning, and sequencing (3, 19, 20), Bandelt (20) questioned the occurrence of substitutions at positions 16069 and 16294 of the mtDNA sequence and the joint occurrence of substitutions at sites 16193 and 16219 in three sequences. We evaluated the impact of these possible typing errors on our results and found that, even if they occurred, they would not change the haplotype sharing, nor would they substantially affect the other measures of genetic diversity. Only errors at specific other positions (such as 16129 and 16261), for which no suspicions of mistyping have been raised, might increase the haplotype sharing between Etruscans and modern Tuscans.

Impact of Mutation Rate.

The hypothesis of genealogical continuity between Etruscans and Tuscans is consistent with the data only when the mtDNA mutation rate is set very high (14). In this way, the Etruscans could have transmitted even a large fraction of their mtDNAs to the modern Tuscans, but in the meantime many sequences would have mutated, remaining similar but no longer identical to the ancestral sequences, a pattern corresponding well to the observed one. The problem with this scenario is that the mutation rate we used was the highest estimated from pedigree studies (21). Most authors assume a 10-fold lower mutation rate, similar to estimates based on phylogenies (see, e.g., ref. 6). However, with this low mutation rate (model 4′) all simulated values except FST differed significantly from the observed values. Therefore, mutation rates higher than we used are implausible, and lower mutation rates further reduce the concordance between observed and simulated data.

Inappropriate Choice of the Modern Population.

Perhaps the Etruscans did leave mitochondrial descendants in modern Tuscany, but gene flow from other areas was so extensive that these descendants are underrepresented in the modern sample we studied. Collecting more modern data is the only way to know. Meanwhile, this study demonstrates that neither available modern data set (ref. 22 and A. Piazza and A. Torroni, personal communication) supports a close genealogical relationship between ancient and modern inhabitants of Tuscany.

In short, we cannot guarantee that the ancient data set is absolutely error-free, that the mutation rate we chose is accurate, and that no other modern Tuscan populations could be genetically closer to the Etruscans, but none of these factors is sufficient to account, by itself, for our difficulty to fit one-population models to the available data.

Relationships with Anatolian Populations.

Could the Etruscans be related to modern populations other than the Tuscans? An Eastern origin of the Etruscans was suggested by comparisons of artifacts (1, 23) and is consistent with the observation that modern Turks appear genetically closer to the Etruscans than any Italian population except the Tuscans (3). However, our simulations gave no evidence of a genealogical continuity between Etruscans and modern people from Anatolia. As a consequence, it seems simpler to interpret the cultural and genetic similarities between Etruscans and Turks as a consequence of contacts entailing genetic exchanges (as opposed to common origins).

An Etruscan Social Elite?

If during the development of the Etruscan civilization a small group imposed its rule, and possibly its language, on Tuscany, the Etruscan upper class may have differed genetically from the rest of the population. However, even when we considered the Etruscans of our sample as part of a small group that evolved independently from the ancestors of the Tuscans and came in contact with them 2,500 years ago (models 9 and 10), the rate of migration from Etruscans to modern Tuscans (or their ancestors) was extremely low. Therefore, these results do not reveal whether the sampled Etruscans were representative of the entire population or of the upper class only, although they show that simulated and observed data can be reconciled with a model of social stratification. At any rate, the Etruscans that we studied seem to have contributed very little to the mitochondrial genomes of modern Tuscans.

Final Remarks.

To summarize, although most simulations reproduced some aspects of genetic diversity in modern and ancient Tuscany, the Etruscans can be considered directly ancestral to modern Tuscans only if the very high mtDNA mutation rates estimated from pedigree studies are accurate (models 4 and 5). Two-population models fitted better the observed data than did any of the one-population models, suggesting that ancient and modern inhabitants of Tuscany can be regarded as independent, with the latter being largely descended from non-Etruscan ancestors. This conclusion raises two questions, namely (i) what happened to the Etruscans after the Roman assimilation and (ii) how far back in time can one go in reconstructing history from current mtDNA diversity.

Regarding the fate of the Etruscans, it is unlikely that they were assimilated into other Italic populations; if they had been, the Etruscan mtDNA sequences should be frequent in some modern samples (see simulations in ref. 24), but that is not the case (3). The only ancient European population typed to date at the genetic level, the seventh-to-second-century Iberians, show a closer relationship with modern populations (25), suggesting that the Etruscans' case is somewhat peculiar. This observation raises the possibility that the Etruscans, or at least their maternal lineages, went extinct (in the absence of a reliable way to type ancient Y chromosomes, there is no way to test what happened to the Etruscan paternal lineages). Historical demography data on native Americans in the 16th century show that a dramatic population decline can indeed occur within a century or so during a colonization process (26). It is unclear to what extent a parallel with European populations of 2,000 years ago is justified, but certainly population sizes were much smaller in first-century-B.C. Italy than in many areas of Central America and South America in the 16th century anno Domini (A.D.). However, if the people in our Etruscan sample did not leave any modern descendants, it is easier to imagine that what went extinct was a social class rather than the entire population. It is also conceivable that, although elite female Etruscan lineages did not survive, elite male lineages did.

Regarding time depth, essentially all studies of mtDNA variation in Europe have drawn conclusions regarding demographic phenomena occurring in a rather remote past. Neolithic or even Paleolithic demographic processes have been inferred from patterns in modern mtDNA diversity in the absence of genetic information on past populations (see, e.g., refs. 10 and 2729) under the implicit assumption of genetic continuity among people dwelling in the same region at different time periods. The results of this study imply that this assumption is not always correct and that the mitochondrial gene pool can undergo a drastic turnover in as few as 100 generations.

Along with the Iberians, the Etruscans are the only ancient European population typed at the DNA level so far, and hence it is unclear whether our results should be regarded as an exception or as the rule. Only the genetic characterization of other ancient individuals, followed by detailed comparative analyses based on explicit demographic models, will help clarify in detail the evolutionary relationships between ancient and contemporary people of Europe.

Materials and Methods

mtDNA Sequences.

Forty-nine sequences of hypervariable region I of mtDNA represent modern individuals sampled in 22 localities of southern Tuscany selected to represent what once was the Etruscan territory (22). Twenty-seven sequences represent Etruscans (3) found in six necropoleis and dated between the second and seventh centuries B.C. Vernesi et al. (3) identified 22 different haplotypes and found no significant difference between sites or time periods. Therefore, we treated sequences of different provenance as if they belonged to a single population.

Serial Coalescent Simulations.

For nonrecombining DNA regions, patterns in the data depend on the joint effect of the population's genealogy and mutation. In turn, the shape of the genealogy depends on the population size through time and on the sample size and is affected by factors such as selection and (in subdivided populations) gene flow (30). Serial coalescence, in particular, allows one to consider both ancient and modern samples within the same genealogy (31) and test hypotheses regarding their demographic history. Here we used a serial coalescent program, serial simcoal (5), an extension of simcoal (32), to simulate the evolution of the population of Tuscany. The method is based on a two-step modeling. If n1 and n2 are the sample sizes for the modern and the ancient sample, respectively, and t1 is the age (in generations) of the ancient sample, coalescence of n1 sequences, or haplotypes, is first modeled backwards in time from the present, with n2 sequences added to the genealogy after t1 generations. After the reconstruction of the genealogy, mutations are then randomly distributed onto the tree by using a user-specified mutation model, in our case a finite-site model with two potential allelic states for each site.

In this study, n1 = 49 (Tuscans) and n2 = 27 (Etruscans); the Etruscans are considered to have lived 100 generations ago (or 2,500 years in the past, assuming a generation time of 25 years). For each demographic scenario considered we simulated 1,000 genealogies, thus obtaining sets of haplotypes at two moments in time.

Genetic Diversity Statistics.

From each simulated data set we calculated 11 statistics, namely four measures of genetic diversity within each population (total number of haplotypes, haplotype diversity, nucleotide diversity, and average pairwise differences) and three measures of diversity between populations (haplotype sharing relative to the total Tuscan haplotypes, haplotype sharing relative to the total Etruscan haplotypes, and FST distance). Haplotype sharing was expressed, for each sample, as the fraction of haplotypes also present in the other sample. This measure is highly dependent on the accuracy of the sequences, and when dealing with ancient DNA there is a higher-than-average chance that single-nucleotide sites might have been mistyped. Although all possible precautions were taken in the original study to avoid sequencing errors, leading to elimination of 53 of the 80 samples initially available (3), we also compared populations by using FST, an index that is less sensitive to possible sequence errors. We shall refer to these statistics as simulated values, which we compared with the values observed in the Tuscan and Etruscan samples and calculated by using the program arlequin v.2.000 (33).

Simulation Parameters.

The models considered differ in terms of population size, migration, and selection. Parameters common to all simulations are described here, and individual models (outlined in Fig. 1) are described in detail in Results.

Fig. 1.

Fig. 1.

Outline of the demographic models simulated. For explanation of models, see Results. E, Etruscans; T, Tuscans. The figures on the left refer to the number of generations from the present; Nf is the effective female population size, and r is the rate of demographic change. Because the coalescence process is simulated backwards, an increase in population size is obtained by using a negative value.

Population Sizes and Growth Rates.

Accurate census figures are impossible to obtain for the Etruscans, but, based on the survey of Albegna, an area including both urban settings and rural dwellings (34), a plausible estimate of the population of classic Etruria in the sixth century B.C. is between 300,000 and 630,000 individuals. According to the 2001 census there were 3,500,000 people living in Tuscany. Because the effective population size for mitochondria is approximately one-fourth of the autosomal population size, and taking effective size to be approximately one-third of the census size, the effective population sizes Nf are approximated as N/12. They thus range from 25,000 to 41,000 for the Etruscans and are around 300,000 for the modern Tuscans. Whereas in most models Nf was set to 25,000 and 300,000 for the ancient and modern populations, respectively, in some cases Nf was set to 7,500 and 90,000 (1/10th the census size; ref. 12), thereby enhancing the impact of drift. When applicable, population growth or decline was modeled as exponential, and the growth rate was calculated based on the effective population sizes.

Mutation Parameters.

We simulated a sequence of 360 base pairs, the length of the mtDNA region sequenced in both populations (hypervariable region I). The mutation rate was set to 0.5 mutations per million years per nucleotide, the highest rate estimated in pedigree studies (21). A much lower rate, ≈0.05 mutations per million years per nucleotide (6) is commonly accepted in mtDNA studies, and we modeled this rate in some cases (data given only for model 4′). Based on recent estimates for hypervariable region I (35), we used a transition bias of 0.9375 and a rate-heterogeneity parameter of 0.26, allowing variation at each of the 360 sites.

Overall Test of Significance of the Models.

For each observed measure of genetic diversity, we estimated its empirical likelihood, P, given the parameters of the simulation, as follows. Suppose the observed statistic is x, which ranks as the kth among 1,000 simulated values whose mean is m (in what follows, we assume x > m; the reasoning is symmetrical for x < m). The empirical likelihood of that value is represented by the frequency of simulated values >x. Thus, we counted the values from (k + 1)th to 1,000th in the right tail of the distribution, and, to obtain a two-tailed test, we doubled that number. When the observed statistic fell outside the range of the simulated values we set P = 0.0005 as a conservative estimate. We then used Fisher's method to combine probabilities, thus obtaining an overall test of significance for each model (36). Fisher's test assumes that probabilities are independent, which was not strictly true in our case. To approximate independence, we excluded nucleotide diversity (which is related to the pairwise sequence difference) and allele sharing (which is related to FST and more sensitive to the presence of errors in the sequences). In this way, the test statistic, which is distributed as a χ2, was estimated from the remaining seven parameters and has 14 degrees of freedom. Because we tested 22 independent models (the 11 models of Table 1; 2 models considering the exact age of each ancient sample; 7 models using either Turks or other modern Tuscans as modern populations; and 2 models enhancing the effects of drift) we introduced a Bonferroni correction (36) of the critical value of the χ2. For 14 degrees of freedom, the significance threshold corresponding to P = 0.05/22 = 0.0023 was then between 31.3 and 36.1.

Acknowledgments

We thank Alberto Piazza and Antonio Torroni for giving us access to unpublished DNA sequences; Christian Anderson for technical advice; and Tom Rasmussen, Graeme Barker, Rob Tykot, and Giorgio Bertorelle for very useful comments and suggestions. This study was supported by grants from the Fondazione Cassa di Risparmio di Ferrara and the Italian Ministry for Education, University, and Research (PRIN 2003) (to G.B.), National Science Foundation Grant DEB#0108541 (to E. Hadly and J.L.M.), and National Institutes of Health Grant GM028428 (to J.L.M.).

Abbreviations

B.C.

before Christ

Nf

female population size

FST

genetic distance.

Footnotes

Conflict of interest statement: No conflicts declared.

This paper was submitted directly (Track II) to the PNAS office.

References

  • 1.Tykot R. H. Etruscan Stud. 1994;1:59–83. [Google Scholar]
  • 2.Barker G., Rasmussen T. The Etruscans. Oxford: Blackwell; 1998. [Google Scholar]
  • 3.Vernesi C., Caramelli D., Dupanloup I., Bertorelle G., Lari M., Cappellini E., Moggi-Cecchi J., Chiarelli B., Castri L., Casoli A., et al. Am. J. Hum. Genet. 2004;74:694–704. doi: 10.1086/383284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Renfrew C. Trans. Philol. Soc. London. 1989;87:103–155. [Google Scholar]
  • 5.Anderson C. N. K., Ramakrishnan U., Chan Y. L., Hadly E. A. Bioinformatics. 2005;21:1733–1734. doi: 10.1093/bioinformatics/bti154. [DOI] [PubMed] [Google Scholar]
  • 6.Pakendorf B., Stoneking M. Annu. Rev. Genomics Hum. Genet. 2005;6:165–183. doi: 10.1146/annurev.genom.6.080604.162249. [DOI] [PubMed] [Google Scholar]
  • 7.Di Benedetto G., Ergüven A., Stenico M., Castrì L., Bertorelle G., Togan I., Barbujani G. Am. J. Phys. Anthropol. 2001;124:144–156. doi: 10.1002/ajpa.1064. [DOI] [PubMed] [Google Scholar]
  • 8.Calafell F., Underhill P., Tolun A., Angelicheva D., Kalaydjieva L. Ann. Hum. Genet. 1996;60:35–49. doi: 10.1111/j.1469-1809.1996.tb01170.x. [DOI] [PubMed] [Google Scholar]
  • 9.Comas D., Calafell F., Mateu E., Perez-Lezaun A., Bertranpetit J. Mol. Biol. Evol. 1996;13:1067–1077. doi: 10.1093/oxfordjournals.molbev.a025669. [DOI] [PubMed] [Google Scholar]
  • 10.Richards M., Corte-Real H., Forster P., Macaulay V., Wilkinson-Herbots H., Demaine A., Papiha S., Hedges R., Bandelt H. J., Sykes B. Am. J. Hum. Genet. 1996;59:185–203. [PMC free article] [PubMed] [Google Scholar]
  • 11.Sibert A., Austerlitz F., Heyer E. Theor. Popul. Biol. 2002;62:181–197. doi: 10.1006/tpbi.2002.1609. [DOI] [PubMed] [Google Scholar]
  • 12.Frankham R., Ballou J. D., Briscoe D. A. Introduction to Conservation Genetics. Cambridge, U.K.: Cambridge Univ. Press; 2002. [Google Scholar]
  • 13.Ammerman A. J., Cavalli-Sforza L. L. The Neolithic Transition and the Genetics of Populations in Europe. Princeton: Princeton Univ. Press; 1984. [Google Scholar]
  • 14.Renfrew C. Archaeology and Language: The Puzzle of Indo-European Origins. London: Jonathan Cape; 1987. [Google Scholar]
  • 15.Kuo C.-H., Janzen F. J. Conserv. Genet. 2004;5:425–437. [Google Scholar]
  • 16.Wright S. Evolution. 1965;19:395–420. [Google Scholar]
  • 17.Helgason A., Stefánsson K. Am. J. Hum. Genet. 2003;73:974–975. doi: 10.1086/378780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Cooper A. R., Poinar H. N. Science. 2000;289:1139. doi: 10.1126/science.289.5482.1139b. [DOI] [PubMed] [Google Scholar]
  • 19.Barbujani G., Vernesi C., Caramelli D., Castrì L., Lalueza-Fox C., Bertorelle G. Am. J. Hum. Genet. 2004;75:923–927. [Google Scholar]
  • 20.Bandelt H.-J. Am. J. Hum. Genet. 2004;75:919–920. doi: 10.1086/425180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Howell N., Smejkal C. B., Mackey D. A., Chinnery P. F., Turnbull D. M., Herrnstadt C. Am. J. Hum. Genet. 2003;72:659–670. doi: 10.1086/368264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Francalacci P., Bertranpetit J., Calafell F., Underhill P.A. Am. J. Phys. Anthropol. 1996;100:443–460. doi: 10.1002/(SICI)1096-8644(199608)100:4<443::AID-AJPA1>3.0.CO;2-S. [DOI] [PubMed] [Google Scholar]
  • 23.Rathje A. In: Italy before the Romans. Ridgway D., Ridgway F., editors. London: Academic; 1979. pp. 145–183. [Google Scholar]
  • 24.Currat M., Excoffier L. Proc. Biol. Sci.; 2005. pp. 679–688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Sampietro M. L., Caramelli D., Lao O., Calafell F., Comas D., Lari M., Agusti B., Bertranpetit J., Lalueza-Fox C. Ann. Hum. Genet. 2005;69:535–548. doi: 10.1111/j.1529-8817.2005.00194.x. [DOI] [PubMed] [Google Scholar]
  • 26.Sanchez-Albornoz N. The Population of Latin America: A History. Berkeley: Univ. of California Press; 1974. [Google Scholar]
  • 27.Richards M., Macaulay V., Torroni A., Bandelt H. J. Am. J. Hum. Genet. 2002;71:1168–1174. doi: 10.1086/342930. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Barbujani G., Bertorelle G. Proc. Natl. Acad. Sci. USA. 2001;98:22–25. doi: 10.1073/pnas.98.1.22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Achilli A., Rengo C., Battaglia V., Pala M., Olivieri A., Fornarino S., Magri C., Scozzari R., Babudri N., Santachiara-Benerecetti A. S., et al. Am. J. Hum. Genet. 2005;76:883–886. doi: 10.1086/430073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Donnelly P. In: Variation in the Human Genome: Ciba Foundation Symposium 197. Chadwick D., Cardew G., editors. Chichester, U.K.: Wiley; 1996. pp. 25–50. [Google Scholar]
  • 31.Drummond A. J., Nicholls G. K., Rodrigo A. G., Solomon W. Genetics. 2002;161:1307–1320. doi: 10.1093/genetics/161.3.1307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Excoffier L., Novembre J., Schneider S. J. Hered. 2000;91:506–509. doi: 10.1093/jhered/91.6.506. [DOI] [PubMed] [Google Scholar]
  • 33.Schneider S., Roessli D., Excoffier L. arlequin: A Software for Population Genetics Data Analysis. Geneva: Univ. of Geneva; 2000. Version 2.0. [Google Scholar]
  • 34.Rasmussen T. In: Urbanisation of Etruria. Osborne R., editor. Oxford: Oxford Univ. Press; 2004. [Google Scholar]
  • 35.Meyer S., Weiss G., Haeseler A. Genetics. 1999;152:1103–1110. doi: 10.1093/genetics/152.3.1103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Sokal R. R., Rohlf F. J. Biometry. 5th Ed. New York: Freeman; 1995. [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES