Ancient DNA from European Early Neolithic Farmers Reveals Their Near Eastern Affinities

Wolfgang Haak; Oleg Balanovsky; Juan J Sanchez; Sergey Koshel; Valery Zaporozhchenko; Christina J Adler; Clio S I Der Sarkissian; Guido Brandt; Carolin Schwarz; Nicole Nicklisch; Veit Dresely; Barbara Fritsch; Elena Balanovska; Richard Villems; Harald Meller; Kurt W Alt; Alan Cooper; the Genographic Consortium

doi:10.1371/journal.pbio.1000536

. 2010 Nov 9;8(11):e1000536. doi: 10.1371/journal.pbio.1000536

Ancient DNA from European Early Neolithic Farmers Reveals Their Near Eastern Affinities

Wolfgang Haak ^1,^*, Oleg Balanovsky ², Juan J Sanchez ³, Sergey Koshel ⁴, Valery Zaporozhchenko ^2,⁵, Christina J Adler ¹, Clio S I Der Sarkissian ¹, Guido Brandt ⁶, Carolin Schwarz ⁶, Nicole Nicklisch ⁶, Veit Dresely ⁷, Barbara Fritsch ⁷, Elena Balanovska ², Richard Villems ⁸, Harald Meller ⁷, Kurt W Alt ⁶, Alan Cooper ¹; the Genographic Consortium^¶

Editor: David Penny⁹

PMCID: PMC2976717 PMID: 21085689

The first farmers from Central Europe reveal a genetic affinity to modern-day populations from the Near East and Anatolia, which suggests a significant demographic input from this area during the early Neolithic.

Abstract

In Europe, the Neolithic transition (8,000–4,000 b.c.) from hunting and gathering to agricultural communities was one of the most important demographic events since the initial peopling of Europe by anatomically modern humans in the Upper Paleolithic (40,000 b.c.). However, the nature and speed of this transition is a matter of continuing scientific debate in archaeology, anthropology, and human population genetics. To date, inferences about the genetic make up of past populations have mostly been drawn from studies of modern-day Eurasian populations, but increasingly ancient DNA studies offer a direct view of the genetic past. We genetically characterized a population of the earliest farming culture in Central Europe, the Linear Pottery Culture (LBK; 5,500–4,900 calibrated b.c.) and used comprehensive phylogeographic and population genetic analyses to locate its origins within the broader Eurasian region, and to trace potential dispersal routes into Europe. We cloned and sequenced the mitochondrial hypervariable segment I and designed two powerful SNP multiplex PCR systems to generate new mitochondrial and Y-chromosomal data from 21 individuals from a complete LBK graveyard at Derenburg Meerenstieg II in Germany. These results considerably extend the available genetic dataset for the LBK (n = 42) and permit the first detailed genetic analysis of the earliest Neolithic culture in Central Europe (5,500–4,900 calibrated b.c.). We characterized the Neolithic mitochondrial DNA sequence diversity and geographical affinities of the early farmers using a large database of extant Western Eurasian populations (n = 23,394) and a wide range of population genetic analyses including shared haplotype analyses, principal component analyses, multidimensional scaling, geographic mapping of genetic distances, and Bayesian Serial Simcoal analyses. The results reveal that the LBK population shared an affinity with the modern-day Near East and Anatolia, supporting a major genetic input from this area during the advent of farming in Europe. However, the LBK population also showed unique genetic features including a clearly distinct distribution of mitochondrial haplogroup frequencies, confirming that major demographic events continued to take place in Europe after the early Neolithic.

Author Summary

The transition from a hunter–gatherer existence to a sedentary farming-based lifestyle has had key consequences for human groups around the world and has profoundly shaped human societies. Originating in the Near East around 11,000 y ago, an agricultural lifestyle subsequently spread across Europe during the New Stone Age (Neolithic). Whether it was mediated by incoming farmers or driven by the transmission of innovative ideas and techniques remains a subject of continuing debate in archaeology, anthropology, and human population genetics. Ancient DNA from the earliest farmers can provide a direct view of the genetic diversity of these populations in the earliest Neolithic. Here, we compare Neolithic haplogroups and their diversity to a large database of extant European and Eurasian populations. We identified Neolithic haplotypes that left clear traces in modern populations, and the data suggest a route for the migrating farmers that extends from the Near East and Anatolia into Central Europe. When compared to indigenous hunter–gatherer populations, the unique and characteristic genetic signature of the early farmers suggests a significant demographic input from the Near East during the onset of farming in Europe.

Introduction

The transition from a hunter–gatherer existence to a “Neolithic lifestyle,” which was characterized by increasing sedentarism and the domestication of animals and plants, has profoundly altered human societies around the world [1],[2]. In Europe, archaeological and population genetic views of the spread of this event from the Near East have traditionally been divided into two contrasting positions. Most researchers have interpreted the Neolithic transition as a period of substantial demographic flux (demic diffusion) potentially involving large-scale expansions of farming populations from the Near East, which are expected to have left a detectable genetic footprint [3],[4]. The alternative view (cultural diffusion model; e.g., [5]) suggests that indigenous Mesolithic hunter–gatherer groups instead adopted new subsistence strategies with relatively little, or no, genetic influence from groups originating in the Near East.

Genetic studies using mitochondrial DNA (mtDNA) and Y-chromosomal data from modern populations have generated contradictory results, and as a consequence, the extent of the Neolithic contribution to the gene pool of modern-day Europeans is still actively debated [6]–[8]. Studies that suggest that the genetic variation in modern-day Europe largely reflects farming communities of the Early Neolithic period [9]–[11] contrast strongly with others that consider the input from the Near East an event of minor importance and ascribe the European genetic variation and its distribution patterns to the initial peopling of Europe by anatomically modern humans in the Upper Paleolithic [12]–[15]. These patterns are also likely to have been significantly impacted by the early Holocene re-expansions of populations out of southerly refugia formed during the Last Glacial Maximum (∼25,000 y ago) and by the numerous demographic events that have taken place in post-Neolithic Europe.

The genetics of prehistoric populations in Europe remain poorly understood, restricting real-time insights into the process of the Neolithic transition [16]–[21]. As a result, most attempts to reconstruct history have been limited to extrapolation from allele frequencies and/or coalescent ages of mitochondrial and Y chromosome haplogroups (hgs) in modern populations. Ancient DNA (aDNA) analyses now provide a powerful new means to directly investigate the genetic patterns of the early Neolithic period, although contamination of specimens with modern DNA remains a major methodical challenge [22].

A previous genetic study of 24 individuals from the early Neolithic Linear Pottery Culture (LBK; 5,500–4,900 calibrated b.c. [cal b.c.]) in Central Europe detected a high frequency of the currently rare mtDNA hg N1a, and proposed this as a characteristic genetic signature of the Early Neolithic farming population [19]. This idea was recently supported by the absence of this particular lineage (and other now more common European hgs) among sequences retrieved from neighboring Mesolithic populations [20],[21]. However, a study of 11 individuals from a Middle/Late Neolithic site on the Iberian Peninsula (3,500–3,000 cal b.c.) did not find significant differences from modern populations, supporting a quite different population genetic model for the Neolithic transition in Iberia [18].

To gain direct insight into the genetic structure of a population at the advent of farming in Central Europe we analyzed a complete graveyard from the Early Neolithic LBK site at Derenburg Meerenstieg II (Harzkreis, Saxony-Anhalt) in Germany. The archaeological culture of the LBK had its roots in the Transdanubian part of the Carpathian Basin in modern-day Hungary approximately 7,500–8,000 y ago and spread during the subsequent five centuries across a vast area ranging from the Paris Basin to the Ukraine [23],[24]. The graveyard samples provide a unique view of a local, closed population and permit comparisons with other specimens of the LBK archaeological culture (the contemporaneous meta-population) and with modern populations from the same geographical area (covering the former range of the LBK), as well as groups across the wider context of Western Eurasia. Our primary aim was to genetically characterize the LBK early farming population: by applying comprehensive phylogeographic and population genetic analyses we were able to locate its origins within the broader Eurasian region, and to trace its potential dispersal routes into Europe.

Results/Discussion

We used standard approaches to clone and sequence the mitochondrial hypervariable segment I (HVS-I) and applied quantitative real-time PCR (qPCR) as an additional quality control. In addition, we developed two new multiplex typing assays to simultaneously analyze important single nucleotide polymorphisms (SNPs) within the mtDNA coding region (22 SNPs: GenoCoRe22) and also the Y chromosome (25 SNPs: GenoY25). In addition to minimizing the risk of contamination, the very short DNA fragments (average 60–80 bp) required by this approach maximize the number of specimens that can be genetically typed.

We successfully typed 17 individuals for mtDNA, which together with a previous study [19] provided data for 22 individuals from the Derenburg graveyard (71% of all samples collected for genetic analysis; Tables 1 and S1), and significantly extended the genetic dataset of the LBK (n = 42), to our knowledge the largest Neolithic database available. Sequences have been deposited in GenBank (http://www.ncbi.nlm.nih.gov/genbank/; accession numbers HM009339–HM009341, HM009343–HM009355, and HM009358), and detailed alignments of all HVS-I clone sequences from Derenburg are shown in Dataset S1.

Table 1. Summary of archaeological, genetic, and radiocarbon data.

Sample	Feature	Grave	Age, Sex^a	Radiocarbon Date (Laboratory Code) (Uncalibrated BP, Cal b.c.) [73]	HVS-I Sequence (np 15997–16409), Minus np 16000	Hg HVS-I	Hg GenoCoRe22	Hg GenoY25
deb09	420	9	Adult, f		rCRS	H	H
deb06	421	10	Adult/mature, n.d.		Ambiguous	n.d.	H	—
deb11	569	16	Adult, f?		n.d.	n.d.	T
deb10	566	17	Adult, m		093C, 224C, 311C	K	K	—
deb23	565	18	Infans I, m?		093C, 223T, 292T	W	W	—
deb12I	568	20	Infans I, m?	6,015±35 BP (KIA30400), 4,910±50 cal b.c.	298C	V	V	—
deb03	591	21	Adult, f	6,147±32 BP (KIA30401), 5,117±69 cal b.c.	147A, 172C, 223T, 248T, 320T, 355T	N1a	n.d.
deb15	593	23	Infans I, f?		126C, 294T, 296T, 304C	T2	T	—
deb05	604/2	29	Infans II, f??		311C	HV	HV^b
deb22	604/3	30	Adult/mature, f		092C, 129A, 147A, 154C, 172C, 223T, 248T, 320T, 355T	N1a	N1	—
deb20	599	31	Adult, m	6,257±40 BP (KIA30403), 5,247±45 cal b.c.	311C	HV	HV	F*(xG,H,I,J,K)
deb21	600	32	Mature, f	6,151±27 BP (KIA30404), 5,122±65 cal b.c.	rCRS	H	H
deb01	598	33	Infans II/Juvenile, f??		147A, 172C, 223T, 248T, 355T	N1a	N1
deb04	596	34	Adult, m	6,141±33 BP (KIA30402), 5,112±73 cal b.c.	311C	HV	HV^b
deb26	606	37	Juvenile, m??		069T, 126C	J	J	—
deb32	640	38	Adult/mature f	6,142±34 BP (KIA30405), 5,112±73 cal b.c.	n.d.	n.d.	T
deb30	592	40	Adult, f?		069T, 126C	J	J	—
deb29II	649	41	Adult, f?	6,068±31 BP (KIA30406), 4,982±38 cal b.c.	n.d.	n.d.	K
deb34II	484	42	Adult/mature, m		093C, 223T, 292T	W	W	G2a3
deb33	483	43	Juvenile II, f??		126C, 147T, 293G, 294T, 296T, 297C, 304C	T2	T	—
deb02	644	44	Mature, f		224C, 311C	K	K	—
deb36	645	45	Mature, f		093C, 256T, 270T, 399G	U5a1a	U
deb38	665	46	Adult/mature, m		093C, 224C, 311C	K	K	F*(xG,H,I,J,K)
deb35II	662	47	Adult, f?		126C, 189C, 294T, 296T	T	T
deb37I	643	48	Adult/mature f		069T, 126C	J	J
deb39	708	49	Adult/mature, f	6,148±33 BP (KIA30407), 5,117±69 cal b.c.	126C, 294T, 296T, 304C	T2	T	—

Open in a new tab

Italicized samples had been described previously [19].

One versus two question marks after sex indicate two levels of insecurity in sexing.

Previously analyzed diagnostic SNP sites at np 7028 AluI (hg H) and np 12308 HinfI (hg U) per restriction fragment length polymorphism.

BP, before present; f, female; m, male; n.d., not determined.

Multiplex SNP Typing Assays

All of the mtDNA SNP typing results were concordant with the hg assignments based on HVS-I sequence information (Tables 1 and S1) and the known phylogenetic framework for the SNPs determined from modern populations [25]. The tight hierarchical structure of the latter provides a powerful internal control for contamination or erroneous results. Overall, both multiplex systems proved to be extremely time- and cost-efficient compared to the standard approach of numerous individual PCRs, and required 22–25 times less aDNA template while simultaneously reducing the chances of contamination dramatically. Also, both multiplex assays proved to be a powerful tool for analyzing highly degraded aDNA, and the GenoCoRe22 assay was able to unambiguously type four additional specimens that had failed to amplify more than 100 bp (Table 1) from two independent extractions. However, for reasons of overall data comparability, we could not include these specimens in downstream population genetic analyses, which required HVS-I sequence data. The only artifacts detected were occasional peaks in the electropherograms of the SNaPshot reactions outside the bin range of expected signals. These were probably due to primers and were mainly present in reactions from extracts with very little or no DNA template molecules; they were not observed with better preserved samples or modern controls.

In contrast, Y chromosome SNPs could be typed for only three out of the eight male individuals (37.5%; Table S2) identified through physical anthropological examination, reflecting the much lower copy number of nuclear loci [22]. After typing with the GenoY25 assay, individual deb34 was found to belong to hg G (M201), whereas individuals deb20 and deb38 both fall basally on the F branch (derived for M89 but ancestral for markers M201, M170, M304, and M9), i.e., they could be either F or H (Table 1). To further investigate the hg status beyond the standard GenoY25 assay, we amplified short fragments around SNP sites M285, P287, and S126 to further resolve deb34 into G1, G2*, and G2a3, and around SNP site M69 to distinguish between F and H [26]. deb34 proved to be ancestral for G1-M285 but derived for G2*-P287 and additional downstream SNP S126 (L30), placing it into G2a3. deb20 and deb38 were shown to be ancestral at M69 and hence basal F (M89), and remained in this position because we did not carry out further internal subtyping within the F clade.

The multiplexed single base extension (SBE) approach with its shortened flanking regions around targeted SNPs significantly increases the chance of successful Y-chromosomal amplifications, which have remained problematic for aDNA studies, as have nuclear loci in general, because of the much lower cellular copy number compared to mitochondrial loci. The multiplexed SBE approach promises to open the way to studying the paternal history of past populations, which is of paramount importance in determining how the social organization of prehistoric societies impacted the population dynamics of the past.

Quantitative Real-Time PCR

Results of the qPCR revealed significantly (p = 0.012, Wilcoxon signed-ranks test) more mtDNA copies per microliter of each extract for the shorter fragment (141 bp) than for the longer (179 bp), with an average 3.7×10⁴–fold increase (detailed results are shown in Table S3). This finding is consistent with previous observations demonstrating a biased size distribution for authentic aDNA molecules [22],[27],[28] and suggests that any contaminating molecules, which would also result in higher copy numbers in the larger size class, did not significantly contribute to our amplifications.

Population Genetic Analyses

To analyze the Neolithic mtDNA sequence diversity and characterize modern geographical affinities, we applied a range of population genetic analyses including shared haplotype analyses, principal component analyses (PCAs), multidimensional scaling (MDS), geographic mapping of genetic distances, and demographic modeling via Bayesian Serial Simcoal (BayeSSC) analyses (Table 2).

Table 2. Summary statistics, overview of population genetic analyses, and summary of haplogroup frequencies used for comparison with PCA vector loadings.

Category	Variable, Simulation, or Hg	Modern Datasets				Ancient Datasets^a
		Total Dataset	Pooled Geographic Sets of Equal Size (n = ∼500)	Pooled European Dataset	Pooled Near East Populations	DEB22	LBK20	LBK42	LBK34	Hunter–Gatherers
Summary statistics	Populations	55	37	41	14	1	1	1	1	1
	Samples	23,394	18,039			22	20	42	34	20
Population genetic analysis & simulations	Shared haplotypes		X					X
	PCA	X				X	X	X	X	X
	Relative hg frequencies			X	X	X	X	X	X	X
	MDS	X				X	X	X	X
	Genetic distance maps	X				X		X
	BayeSSC			X^b	X^b			X		X
	Haplotype diversity h					0.957	0.989	0.969	0.982	0.932
	Tajima's D					−0.91645	−0.90573	−0.91374	−0.88555	−1.05761
Relative hg frequencies	Asian hgs			1.62	2.09	0.00	0.00	0.00	0.00	0.00
	African hgs			0.65	6.43	0.00	0.00	0.00	0.00	0.00
	R0/preHV			0.37	3.26	0.00	0.00	0.00	0.00	0.00
	H			43.35	23.74	13.64	25.00	19.05	17.65	0.00
	HV			1.40	5.80	13.64	0.00	7.14	2.94	0.00
	J			8.49	10.59	13.64	5.00	9.52	5.88	4.76
	T			9.26	8.91	13.64	25.00	19.05	23.53	9.52
	I			2.23	1.97	0.00	0.00	0.00	0.00	0.00
	N1a			0.30	0.32	13.64	15.00	14.29	17.65	0.00
	K			5.39	6.67	13.64	15.00	14.29	14.71	4.76
	V			4.35	0.77	4.55	5.00	4.76	5.88	0.00
	W			2.03	2.25	9.09	5.00	7.14	5.88	0.00
	X			1.22	2.52	0.00	0.00	0.00	0.00	0.00
	U2			1.04	1.52	0.00	0.00	0.00	0.00	0.00
	U3			1.26	4.43	0.00	5.00	2.38	2.94	0.00
	U4			4.04	2.10	0.00	0.00	0.00	0.00	9.52
	U5a			5.46	2.53	4.55	0.00	2.38	2.94	23.80
	U5b			3.89	0.64	0.00	0.00	0.00	0.00	28.57
	Other rare hgs			3.67	13.45	0.00	0.00	0.00	0.00	19.05

Open in a new tab

X's indicate which datasets were used in the genetic analyses.

For explanation of datasets, see Materials and Methods.

For BayeSSC analyses, representative samples of the key areas were randomly drawn from the larger meta-population pool (Table S6).

Shared Haplotype Analyses

We prepared standardized modern population datasets of equal size (n = ∼500) from 36 geographical regions in Eurasia (n = 18,039; Table S4) to search for identical matches with each LBK haplotype. Out of 25 different haplotypes present in 42 LBK samples, 11 are found at high frequency in nearly all present-day populations under study, a further ten have limited geographic distribution, and the remaining four haplotypes are unique to Neolithic LBK populations (Table S4). The 11 widespread haplotypes are mainly basal (i.e., constituting a basal node within the corresponding hg) for Western Eurasian mitochondrial hgs H, HV, V, K, T, and W. While these haplotypes are relatively uninformative for identifying genetic affiliations to extant populations, this finding is consistent within an ancient population (5,500–4,900 cal b.c., i.e., prior to recent population expansions), in which basal haplotypes might be expected to be more frequent than derived haplotypes (e.g., end tips of branches within hgs). The next ten LBK haplotypes were unequally spread among present-day populations and for this reason potentially contain information about geographical affinities. We found nine modern-day population pools in which the percentage of these haplotypes is significantly higher than in other population pools (p>0.01, two-tailed z test; Figure 1; Table S4): (a) North and Central English, (b) Croatians and Slovenians, (c) Czechs and Slovaks, (d) Hungarians and Romanians, (e) Turkish, Kurds, and Armenians, (f) Iraqis, Syrians, Palestinians, and Cypriotes, (g) Caucasus (Ossetians and Georgians), (h) Southern Russians, and (i) Iranians. Three of these pools (b–d) originate near the proposed geographic center of the earliest LBK in Central Europe and presumably represent a genetic legacy from the Neolithic. However, the other matching population pools are from Near East regions (except [a] and [h]), which is consistent with this area representing the origin of the European Neolithic, an idea that is further supported by Iranians sharing the highest number of informative haplotypes with the LBK (7.2%; Table S4). The remaining pool (a) from North and Central England shares an elevated frequency of mtDNA T2 haplotypes with the LBK, but otherwise appears inconsistent with the proposed origin of the Neolithic in the Near East. It has been shown that certain alleles (here hgs) can accumulate in frequency while surfing on the wave of expansion, eventually resulting in higher frequencies relative to the proposed origin [29],[30]. Several of the other population pools also show a low but nonsignificant level of matches, which may relate to pre-Neolithic distributions or subsequent demographic movements (Figure 1).

Populations are plotted on a northwest–southeast axis. Note that the percentage of non-informative matches (orange) is nearly identical to the percentage of all shared haplotypes (red) in most populations, whereas we observe elevated frequencies of informative matches (blue) in Southeast European and Near Eastern population pools, culminating in Iranians.

Of the four unique mtDNA haplotypes, two were from an earlier study of the LBK (16286-16304 and 16319-16343; Table S5 and [19]). The haplotype 16286-16304 has many one- or two-step derivates in all parts of Europe and is therefore rather uninformative for inferring further geographical affinities. The only relatively close neighbor of haplotype 16319-16343 is found in Iraq (16129-16189-16319-16343), in agreement with the Near Eastern affinities of the informative LBK haplotypes. The other two unique LBK haplotypes belong to N1a, the characteristic LBK hg. The frequency of N1a was 13.6% for Derenburg samples (3/22) and 14.3% for all LBK samples published to date (6/42). Notably, N1a has not yet been observed in the neighboring hunter–gatherer populations of Central Europe before, during, or after the Early Neolithic [20] nor in the early Neolithic Cardial Ware Culture from Spain [18].

The Y chromosome hgs obtained from the three Derenburg early Neolithic individuals are generally concordant with the mtDNA data (Table 1). Interestingly, we do not find the most common Y chromosome hgs in modern Europe (e.g., R1b, R1a, I, and E1b1), which parallels the low frequency of the very common modern European mtDNA hg H (now at 20%–50% across Western Eurasia) in the Neolithic samples. Also, while both Neolithic Y chromosome hgs G2a3 and F* are rather rare in modern-day Europe, they have slightly higher frequencies in populations of the Near East, and the highest frequency of hg G2a is seen in the Caucasus today [15]. The few published ancient Y chromosome results from Central Europe come from late Neolithic sites and were exclusively hg R1a [31]. While speculative, we suggest this supports the idea that R1a may have spread with late Neolithic cultures from the east [31].

Principal Component Analysis and Multidimensional Scaling

Four Neolithic datasets were constructed (Table 2) and compared with 55 present-day European and Near Eastern populations and one Mesolithic hunter–gatherer population [20] in a PCA (Figure 2). The PCA accounted for 39% of the total genetic variation, with the first principal component (PC) separating Near Eastern populations from Europeans (24.9%), and with LBK populations falling closer to Near Eastern ones. However, the second PC (17.4%) clearly distinguished the four Neolithic datasets from both Near East and European populations. An MDS plot (Figure S1) showed similar results, with the Near Eastern affinities of the LBK populations even more apparent.

The two dimensions display 39% of the total variance. The contribution of each hg is superimposed as grey component loading vectors. Notably, the Derenburg dataset (DEB22) groups well with its meta-population (LBK20), supporting the unique status and characteristic composition of the LBK sample. Populations are abbreviated as follows (Table S6): ALB, Albanians; ARM, Armenians; ARO, Aromuns; AUT, Austrians; AZE, Azeris; BAS, Basques; BLR, Byelorussians; BOS, Bosnians; BUL, Bulgarians; CHE, Swiss; CHM, Mari; CHV, Chuvash; CRO, Croats; CZE, Czechs; DEB22, Derenburg; DEU, Germans; ENG, English; ESP, Spanish; EST, Estonians; FIN, Finns; FRA, French; GEO, Georgians; GRC, Greeks; HG, European Mesolithic hunter–gatherers.; HUN, Hungarians; IRL, Irish; IRN, Iranians; IRQ, Iraqis; ISL, Icelanders; ITA, Italians; JOR, Jordanians; KAB, Kabardinians; KAR, Karelians; KOM, Komis (Permyaks and Zyrian); KUR, Kurds; LBK20, LBK without Derenburg; LBK34, all LBK samples excluding potential relatives; LBK42, all LBK; LTU, Lithuanians; LVA, Latvians; MAR, Moroccans; MOR, Mordvinians; NOG, Nogais; NOR, Norwegians; OSS, Ossetians; POL, Poles; PRT, Portuguese; PSE, Palestinians; ROU, Romanians; RUS, Russians; SAR, Sardinians; SAU, Saudi Arabians; SCO, Scots; SIC, Sicilians; SVK, Slovaks; SVN, Slovenians; SWE, Swedes; SYR, Syrians; TAT, Tatars; TUR, Turkish; UKR, Ukrainians.

To better understand which particular hgs made the Neolithic populations appear either Near Eastern or (West) European, we compared average hg frequencies of the total LBK (LBK42) and Derenburg (DEB22) datasets to two geographically pooled meta-population sets from Europe and the Near East (Tables 2 and S6; 41 and 14 populations, respectively). PC correlates and component loadings (Figure 2) showed a pattern similar to average hg frequencies (Table 2) in both large meta-population sets, with the LBK dataset grouping with Europeans because of a lack of mitochondrial African hgs (L and M1) and preHV, and elevated frequencies of hg V. In contrast, low frequencies of hg H and higher frequencies for HV, J, and U3 promoted Near Eastern resemblances. Removal of individuals with shared haplotypes within the Derenburg dataset (yielding dataset LBK34) did not noticeably decrease the elevated frequencies of J and especially HV in the Neolithic data.

Most importantly, PC correlates of the second component showed that elevated or high frequencies of hgs T, N1a, K, and W were unique to LBK populations, making them appear different from both Europe and Near East. The considerable within-hg diversity of all four of these hgs (especially T and N1a; Table 1) suggests that this observation is unlikely to be an artifact of random genetic drift leading to elevated frequencies in small, isolated populations.

The pooled European and Near Eastern meta-populations are necessarily overgeneralizations, and there are likely to be subsets of Near Eastern populations that are more similar to the Neolithic population. Interestingly, both the PCA and the MDS plots identified Georgians, Ossetians, and Armenians as candidate populations (Figures 2 and S1).

Mapping Genetic Distances

We generated genetic distance maps to visualize the similarity/distance of the LBK and Derenburg populations (datasets LBK42 and DEB22) to all modern populations in the large Western Eurasian dataset (Figure 3). In agreement with the PCA and MDS analyses, populations from the area bounding modern-day Turkey, Armenia, Iraq, and Iran demonstrated a clear genetic similarity with the LBK population (Figure 3A). This relationship was even stronger in a second map generated with just the Neolithic Derenburg individuals (Figure 3B). Interestingly, the map of the combined LBK data also suggested a possible geographic route for the dispersal of Neolithic lineages into Central Europe: genetic distances gradually increase from eastern Anatolia westward across the Balkans, and then northwards into Central Europe. The area with lower genetic distances follows the course of the rivers Danube and Dniester, and this natural corridor has been widely accepted as the most likely inland route towards the Carpathian basin as well as the fertile Loess plains further northwest [23],[32],[33].

Mapped genetic distances are illustrated between 55 modern Western Eurasian populations and the total of 42 Neolithic LBK samples (A) or the single graveyard of Derenburg (B). Black dots denote the location of modern-day populations used in the analysis. The coloring indicates the degree of similarity of the modern local population(s) with the Neolithic sample set: short distances (greatest similarity) are marked by dark green and long distances (greatest dissimilarity) by orange, with fainter colors in between the extremes. Note that green intervals are scaled by genetic distance values of 0.02, with increasingly larger intervals towards the “orange” end of the scale.

Bayesian Serial Simcoal Analysis

While an apparent affinity of Neolithic farmers to modern-day Near East populations is revealed by the shared haplotype analyses, PCA, MDS, and genetic distance maps, the population-specific pairwise F _ST values among ancient populations (hunter–gatherers and LBK) and the modern population pools (Central Europe and Near East) tested were all significant (p>0.05; Table 3), suggesting a degree of genetic discontinuity between ancient and modern-day populations. The early farmers were closer to the modern Near Eastern pool (F _ST = 0.03019) than hunter–gatherers were (F _ST = 0.04192), while both ancient populations showed similar differences to modern Central Europe, with the hunter–gatherers slightly closer (F _ST = 0.03445) than the early farmers (F _ST = 0.03958). The most striking difference was seen between Mesolithic hunter–gatherers and the LBK population itself (F _ST = 0.09298), as previously shown [20]. We used BayeSSC analyses to test whether the observed F _ST values can be explained by the effects of drift or migration under different demographic scenarios (Figure S2). This encompassed comparing F _ST values derived from coalescent simulations under a series of demographic models with the observed F _ST values in order to test which model was the most likely, given the data. By using an approximate Bayesian computation (ABC) framework we were able to explore priors for initial starting deme sizes and dependent growth rates to maximize the credibility of the final results. The Akaike information criterion (AIC) was used to evaluate a goodness-of-fit value of the range of models in the light of the observed F _ST values. In addition, a relative likelihood estimate for each of the six models given the data was calculated via Akaike weights (ω). The highest AIC values, and therefore the poorest fit, were obtained for models representing population continuity in one large Eurasian meta-population through time (Models H₀a and H₀b; Table 4). Of note, the goodness of fit was better with a more recent population expansion (modeled at the onset of the Neolithic in Central Europe) and hence higher exponential growth rate (H₀a). The model of cultural transmission (H₁), in which a Central European deme including Neolithic farmers and hunter–gatherers coalesced with a Near Eastern deme in the Early Upper Paleolithic (1,500 generations, or ∼37,500 y ago), resulted in intermediate goodness-of-fit values (H₁a and H₁b; Table 4; Figure S2). The best goodness-of-fit values were retrieved for models of demic diffusion (model H₂; Table 4) with differing proportions of migrants (25%, 50%, and 75% were tested) from the Near Eastern deme into the Central European deme around the time of the LBK (290 generations, ∼7,250 y ago; Table 4). Notably, the models testing 50% and 75% migrants returned the highest relative likelihood values (42% and 52%, respectively), and therefore warrant further investigation. However, while the demic diffusion model H₂ produced values that approximated the observed F _ST between Neolithic farmers and the Near Eastern population pool, none of the models could account for the high F _ST between hunter–gatherers and early farmers or early farmers and modern-day Central Europeans.

Table 3. Pairwise F _ST values between ancient and modern-day population pools as used for goodness-of-fit estimates in BayeSSC analyses.

	Hunter–Gatherers	Near East	LBK	Central Europe
Hunter–Gatherers	0	—	—	—
Near East	0.04192	0	—	—
LBK	0.09298	0.03019	0	—
Central Europe	0.03445	0.00939	0.03958	0

Open in a new tab

Table 4. Details of the demographic models analyzed with BayeSSC and AIC goodness-of-fit estimates, and resulting model probabilities via Akaike weights.

Model	H₀a	H₀b	H₁	H₂	H₂	H₂
Prior N _e, time 0 ,deme 0	U^a:100000,30000000	U:100000,30000000	U:100000,12000000	U:100000,12000000	U:100000,12000000	U:100000,12000000
Prior N _e, time 0, deme 1			U:100000,12000000	U:100000,12000000	U:100000,12000000	U:100000,12000000
Percent migrants from deme 0 to deme 1				25%	50%	75%
AIC	97.78	120.37	89.19	82.56	78.52	78.07
Akaike weight ω	2.76164e⁻⁵	3.42478e⁻¹⁰	0.002018032	0.055596369	0.418527622	0.52383036

Open in a new tab

Of note, the smaller the AIC value, the better the fit of the model. While no threshold value can be assigned to AIC values at which any model can be rejected, the Akaike weights estimate a model probability given the six models tested.

U, uniform distribution of given range.

N _e, effective population size.

The models we tested represent major oversimplifications and it should be noted that modeling human demographic history is notoriously difficult, especially given the complex history of Europe and the Near East over this time scale. The fact that no model explained the observed F _ST between ancient and modern-day populations particularly well suggests that the correct scenario has not yet been identified, and that there is also an obvious need for sampling of material from younger epochs. Additionally, sampling bias remains an issue in aDNA studies, and this is particularly true for the chronologically and geographically diverse hunter–gatherer dataset. In the light of the models tested (see also [19],[20]), we would suggest that the basis of modern European mtDNA diversity was formed from the postglacial re-peopling of Europe (represented here by the Mesolithic hunter–gatherers) and the genetic input from the Near East during the Neolithic, but that demographic processes after the early Neolithic have contributed substantially to shaping Europe's contemporary genetic make up.

Synthesis of Population Genetic Analyses

The aDNA data from a range of Mesolithic hunter–gatherer samples from regions neighboring the LBK area have been shown to be surprisingly homogenous across space and time, with an mtDNA composition almost exclusively of hg U (∼80%), particularly hg U4 and U5, which is clearly different from the LBK dataset as well as the modern European diversity (Table 2) [20]. The observation that hgs U4 and U5 are virtually absent in the LBK population (1/42 samples) is striking (Table 2). Given this clear difference in the mtDNA hg composition, it is not surprising that the pairwise F _ST between hunter–gatherers and the LBK population is the highest observed (0.09298) when we compared ancient populations with representative population pools from Central Europe and the Near East (Table 3; see also [20]). If the Mesolithic data are a genuine proxy for populations in Central Europe at the onset of the LBK, it implies that the Mesolithic and LBK groups had clearly different origins, with the former potentially representing the pre-Neolithic indigenous groups who survived the Last Glacial Maximum in southern European refugia. In contrast, our population genetic analyses confirm that the LBK shares an affinity with modern-day Near East and Anatolia populations. Furthermore, the large number of basal lineages within the LBK, a reasonably high hg and haplotype diversity generated through one- or two-step derivative lineages, and the negative Tajima's D values (Tables 1 and 2) indicate a recent expansion. These combined data are compatible with a model of Central Europe in the early Neolithic of indigenous populations plus significant inputs from expanding populations in the Near East [4],[12],[34]. Overall, the mtDNA hg composition of the LBK would suggest that the input of Neolithic farming cultures (LBK) to modern European genetic variation was much higher than that of Mesolithic populations, although it is important to note that the unique characteristics of the LBK sample imply that further significant genetic changes took place in Europe after the early Neolithic.

aDNA data offers a powerful new means to test evolutionary models and assumptions. The European lineage with the oldest coalescent age, U5, has indeed been found to prevail in the indigenous hunter–gatherers [12],[35]. However, mtDNA hgs J2a1a and T1, which because of their younger coalescence ages have been suggested to be Neolithic immigrant lineages [8],[12], are so far absent from the samples of early farmers in Central Europe. Similarly, older coalescence ages were used to support hgs K, T2, H, and V as “postglacial/Mesolithic lineages,” and yet these have been revealed to be common only in Neolithic samples. The recent use of whole mitochondrial genomes and the refinement of mutation rate estimates have resulted in a general reduction in coalescence ages [8], which would lead to an improved fit with the aDNA data. However we advise caution in directly relating coalescence ages of specific hgs to evolutionary or prehistoric demographic events [36]. Significant temporal offsets can be caused by either observational bias (the delay between the actual split of a lineage and the eventual fixation and dissemination of this lineage) or calculation bias (incorrect coalescent age estimation). aDNA has considerable value not only for directly analyzing the presence or absence of lineages at points in the past but also for refining mutation rate estimates by providing internal calibration points [37].

Archaeological and anthropological research has produced a variety of models for the dispersal of the Neolithic agricultural system (“process of Neolithization”) into and throughout Europe (e.g., [1],[2],[38]). Our findings are consistent with models that argue that the cultural connection of the LBK to its proposed origin in modern-day Hungary, and reaching beyond the Carpathian Basin [23],[32],[38],[39], should also be reflected in a genetic relationship (e.g., shared haplotype analyses; Table S4). Therefore at a large scale, a demic diffusion model of genetic input from the Near East into Central Europe is the best match for our observations. It is notable that recent anthropological research has come to similar conclusions [40],[41]. On a regional scale, “leap-frog” or “individual pioneer” colonization models, where early farmers initially target the economically favorable Loess plains in Central Europe [33],[42], would explain both the relative speed of the LBK expansion and the clear genetic Near Eastern connections still seen in these pioneer settlements, although the resolving power of the genetic data is currently unable to test the subtleties of these models.

In conclusion, the new LBK dataset provides the most detailed and direct genetic portrait of the Neolithic transition in Central Europe; analysis of this dataset reveals a clear demonstration of Near Eastern and Anatolian affinities and argues for a much higher genetic input from these regions, while also identifying characteristic differences from all extant (meta-)populations studied. Ancient genetic data from adjacent geographic regions and time periods, and especially from the Near East and Anatolia, will be needed to more accurately describe the changing genetic landscape during and after the Neolithic, and the new multiplexed SBE assays offer a powerful means to access this information.

Materials and Methods

Archaeological Background

The archaeological site Derenburg Meerenstieg II (Harzkreis, Saxony-Anhalt, Germany) was excavated during three campaigns in 1997–1999 comprising an area of 3 ha. The archaeological context at this site shows a record of settlement activity ranging from the Early Neolithic (LBK) and Middle Neolithic (Rössen and Ammensleben cultures) to Bronze and Iron Age [43]. However, the main features of Derenburg are the LBK graveyard and its associated partial settlement approximately 70 m southwest. The archaeological data revealed that the larger part of the settlement has not yet been excavated and lies outside the area covered during these campaigns. In contrast, the graveyard was recorded in its entire dimension (25×30 m) and encompassed a total of 41 graves. Two separate graves were found outside the graveyard (50 m WSW and 95 m SSE). Erosion and modern agricultural ploughing might have led to a loss of some graves at the plateau area. Here, the graves were shallow and in average state of preservation, whereas the graves embedded in deeper Loess layers showed an excellent state of preservation. In total, 32 single grave burials were found; there were also one double burial, one triple burial, two burials in settlement pits, two or three times additional singular bones in a grave, three burials with a secondary inhumation, and one empty grave. The majority of individuals (75%) at Derenburg were buried in East–West orientation in a varying flexed position. The duration of usage of the graveyard spans over the entire time frame of the LBK and is reflected by the typology of the ceramics and associated grave goods ranging from older LBK pottery (Flomborn style) to youngest LBK pottery. Absolute radiocarbon dates confirm the usage over three centuries (5,200–4,900 cal b.c.; see also Table 1 and [44]).

Ancient DNA Work

From an initial 43 graves in the Derenburg graveyard, 31 indicated morphological preservation suitable for sampling and aDNA analyses. Five individuals had already been sampled in 2003 for our previous study and showed excellent preservation of aDNA, a negligible level of contamination, and an unusual mtDNA hg distribution, thereby justifying further investigation [19]. Hence, 26 additional individuals were processed in this study (Table 1). We amplified, cloned, and sequenced mitochondrial HVS-I (nucleotide positions [np] 15997–16409; nucleotide position according to [45]) as described previously [19]. mtDNA hg assignments were further supported by typing with a newly developed multiplex of 22 mtDNA coding region SNPs (GenoCoRe22). In addition, we typed 25 Y chromosome SNPs using a second novel multiplex assay (GenoY25). Final refinement of Y chromosome hg assignments was performed via singleplex PCRs. Lastly, the amount of starting DNA template molecules was monitored using qPCR on seven random samples (Table S3). aDNA work was performed in specialized aDNA facilities at the Johannes Gutenberg University of Mainz and the Australian Centre for Ancient DNA (ACAD) at the University of Adelaide according to appropriate criteria. All DNA extractions as well as amplification, cloning, and sequencing of the mitochondrial control region HVS-I were carried out in the Johannes Gutenberg University of Mainz facilities. Additional singleplex, all multiplex, and quantitative real-time amplifications, SNP typing, and direct sequencing of Y chromosome SNPs were carried at the ACAD as described below.

SNP Selection and Multiplex Design

The technique of SNP typing via SBE reactions (also known as minisequencing) has proven a reliable and robust method for high throughput analyses of polymorphisms, e.g., human mitochondrial variation [46], human X- and Y-chromosomal SNPs [47],[48], and human autosomal SNPs [49]. However, few SBE studies have addressed the special need for very short amplicon sizes to allow amplification from highly degraded DNA, as even forensic protocols have generally targeted relatively long amplicon sizes [50]–[54]. Our first multiplex (GenoCoRe22) was designed to type a panel of 22 mitochondrial coding region SNPs that are routinely typed within the Genographic Project [25], to allow for future maximum comparability with modern population data. A second multiplex (GenoY25) targeted a basal, but global, coverage of 25 commonly typed Y chromosome SNPs, for maximum comparability of paternal lineages. The aim of the SNP assay design was to produce highly efficient and sensitive protocols, capable of working on highly degraded DNA, that also allow modern human DNA contamination to be detected at very low levels and monitored [51]. The GenoCoRe22 SNP panel was chosen to cover the basal branches of mitochondrial hgs across modern human mtDNA diversity [25]. The chosen SNP sites were identical to the initial set (Figure 4 in [25]) except for hg W (SNP at np 8994 instead of np 1243) and hg R9 (SNP at np 13928 instead of np 3970), as a compromise arising from primer design within a multiplex assay. Selection of GenoY25 SNP panel for incorporation into the multiplex assay was performed using the highly resolved Y Chromosome Consortium tree and an extensive literature search for corresponding SNP allele frequencies in European populations [13],[26],[55].

Multiplex PCR Assays GenoCoRe22 and GenoY25

Multiplex assays were set up, established, and performed at the ACAD facilities. Multiplex PCR using Amplitaq Gold (Applied Biosystems) was conducted in 25-µl volumes using 1× Buffer Gold, 6 mM (GenoCoRe22) or 8 mM (GenoY25) MgCl₂, 0.5 mM dNTPs (Invitrogen), ≤0.2 µM of each primer, 1 mg/ml RSA (Sigma), 2 U of Amplitaq Gold Polymerase, and 2 µl of DNA extract. Thermocycling conditions consisted of an initial enzyme activation at 95°C for 6 min, followed by 40–45 cycles of denaturation at 95°C for 30 s, annealing at 60°C (GenoCoRe22) or 59°C (GenoY25) for 30 s, and elongation at 65°C for 30 s, with a single final extension time at 65°C for 10 min. Each PCR included extraction blanks as well as a minimum of two PCR negatives at a ratio of 5∶1. PCRs were visually checked by electrophoresis on 3.5% agarose TBE gels. PCR products were purified by mixing 5 µl of PCR product with 1 U of SAP and 0.8 U of ExoI and incubating at 37°C for 40 min, followed by heat inactivation at 80°C for 10 min. Because of the sensitivity of the multiplex PCR (using fragment lengths of only 60–85 bp), and to be able to monitor potential human background contamination, usually all controls were included in downstream fragment analysis. Multiplex primer sequences and concentration are given in Table S7.

SNaPshot Typing

SBE reactions were carried out on the GenoCoRe22 and GenoY25 SNP multiplex assay using the ABI Prism SNaPshot multiplex reaction kit (Applied Biosystems) following the manufacturer's instructions, except that 10% 3 M ammonium sulfate was added to the extension primer mix to minimize artifacts [56]. SBE primers and concentrations are given in Table S7. Cycling conditions consisted of 35 cycles of denaturation at 96°C for 10 s, annealing at 55°C for 5 s, and extension at 60°C for 30 s. SBE reactions were purified using 1 U of SAP, incubating at 37°C for 40 min, followed by heat inactivation at 80°C for 10 min. Prior to capillary electrophoresis, 2 µl of purified SNaPshot product was added to a mix of 11.5 µl of Hi-Di Formamide (Applied Biosystems) and 0.5 µl of Gene-Scan-120 LIZ size standard (Applied Biosystems). Samples were run on an ABI PRISM 3130xl Genetic Analyzer (Applied Biosystems) after a denaturation carried out according to the manufacturer's instructions using POP-6 (Applied Biosystems). Evaluation and analyses of SNaPshot typing profiles were performed using custom settings within the GeneMapper version 3.2 Software (Applied Biosystems).

Y Chromosome SNP Singleplex PCRs and Sequencing

Additional Y chromosome SNPs (M285, P287 S126, and M69) were tested to determine specific downstream subclades based on the initial multiplex results in order to gain further resolution. We chose appropriate SNP loci by following general criteria, trying to keep the PCR amplicon size smaller than 90 bp in size and flanking DNA sequences free from interfering polymorphisms, such as nucleotide substitutions in potential primer binding sites. We selected PCR amplification primers that have a theoretical melting temperature of around 60°C in neutral buffered solutions (pH 7–8), with monovalent cation (Na⁺) concentrations at 50 mM and divalent cation (Mg⁺⁺) concentrations at 8 mM. All primer candidates were analyzed for primer–dimer formation, hairpin structures, and complementarities to other primers in the multiplex using Primer 3 (http://primer3.sourceforge.net/). Primer characteristics were chosen to ensure equal PCR amplification efficiency for all DNA fragments, as previously described [50]. The primers were HPLC-purified and checked for homogeneity by MALDI-TOF (Thermo). Table S7 shows the sequences and the concentrations of the amplification primers in the final multiplex PCR.

Additional Y chromosome SNP singleplex PCRs were carried out in the ACAD facilities. Standard PCRs using Amplitaq Gold (Applied Biosystems) were conducted in 25-µl volumes using 1× Buffer Gold, 2.5 mM MgCl₂, 0.25 mM of each dNTP (Fermentas), 400 µM of each primer (Table S7), 1 mg/ml RSA (Sigma-Aldrich), 2 U of Amplitaq Gold Polymerase, and 2 µl of DNA extract. Thermocycling conditions consisted of an initial enzyme activation at 95°C for 6 min, followed by 50 cycles of denaturation at 94°C for 30 s, annealing at 59°C for 30 s, and elongation at 72°C for 30 s, with a single final extension time at 60°C for 10 min. Each PCR reaction included extraction blanks as well as a minimum of two PCR negatives. PCR products were visualized and purified as described above and were directly sequenced in both directions using the Big Dye Terminator 3.1 Kit (Applied Biosystems) as per manufacturer's instructions. Sequencing products were purified using Cleanseq magnetic beads (Agencourt, Beckman Coulter) according to the manufacturer's protocol. Sequencing products were separated on a 3130xl Genetic Analyzer (Applied Biosystems), and the resulting sequences were edited and aligned relative to the SNP reference sequence (GenBank SNP accession numbers: M285, rs13447378; P287, rs4116820; S126 [also known as L30], rs34134567; and M69, rs2032673) using the software Sequencher 4.7 (Genecodes).