Spread of an Inactive Form of Caspase-12 in Humans Is Due to Recent Positive Selection

Yali  Xue; Allan  Daly; Bryndis  Yngvadottir; Mengning  Liu; Graham  Coop; Yuseob  Kim; Pardis  Sabeti; Yuan  Chen; Jim  Stalker; Elizabeth  Huckle; John  Burton; Steven  Leonard; Jane  Rogers; Chris  Tyler-Smith

doi:10.1086/503116

. 2006 Feb 21;78(4):659–670. doi: 10.1086/503116

Spread of an Inactive Form of Caspase-12 in Humans Is Due to Recent Positive Selection

Yali Xue ¹, Allan Daly ¹, Bryndis Yngvadottir ¹, Mengning Liu ¹, Graham Coop ³, Yuseob Kim ⁴, Pardis Sabeti ⁵, Yuan Chen ², Jim Stalker ¹, Elizabeth Huckle ¹, John Burton ¹, Steven Leonard ¹, Jane Rogers ¹, Chris Tyler-Smith ¹

PMCID: PMC1424700 PMID: 16532395

Abstract

The human caspase-12 gene is polymorphic for the presence or absence of a stop codon, which results in the occurrence of both active (ancestral) and inactive (derived) forms of the gene in the population. It has been shown elsewhere that carriers of the inactive gene are more resistant to severe sepsis. We have now investigated whether the inactive form has spread because of neutral drift or positive selection. We determined its distribution in a worldwide sample of 52 populations and resequenced the gene in 77 individuals from the HapMap Yoruba, Han Chinese, and European populations. There is strong evidence of positive selection from low diversity, skewed allele-frequency spectra, and the predominance of a single haplotype. We suggest that the inactive form of the gene arose in Africa ∼100–500 thousand years ago (KYA) and was initially neutral or almost neutral but that positive selection beginning ∼60–100 KYA drove it to near fixation. We further propose that its selective advantage was sepsis resistance in populations that experienced more infectious diseases as population sizes and densities increased.

Our evolutionary past profoundly influences our current genetic makeup, including our predispositions to health and disease, and is also of great intrinsic interest. We know from fossil and archaeological records that both the human genus Homo and our species sapiens evolved in Africa. Anatomically modern Homo sapiens was present in Ethiopia ∼195 thousand years ago (KYA) (McDougall et al. ²⁰⁰⁵), but modern human behavior developed much later (∼50–100 KYA) (Henshilwood et al. 2002), and populations outside Africa derive their genes almost entirely from migrations of humans who were both anatomically and behaviorally modern, beginning ∼50–60 KYA, followed by further local adaptations (Jobling et al. 2004). Many of the changes in phenotype must have had a genetic component, and we would like to understand these, but few of the relevant genes have been identified.

Two approaches have been used to search for these evolutionarily important genes. One is to start from a phenotype of interest, for which biological information or a Mendelian disorder may point toward a particular gene. The identification of genes involved in resistance to malaria (e.g., Saunders et al. ²⁰⁰²) or dietary adaptation (e.g., Bersaglieri et al. ²⁰⁰⁴) and the ASPM gene (in which mutation produces microcephaly) (Zhang 2003; Evans et al. ²⁰⁰⁴; Mekel-Bobrov et al. ²⁰⁰⁵) and the FOXP2 gene (in which mutation produces speech/language disorder) (Enard et al. 2002; Zhang et al. ²⁰⁰²) testifies to the utility of this approach, but, in many cases, we do not know which genes influence the phenotype of interest. A complementary approach is, therefore, to identify changes in DNA sequence, gene copy number, or expression level without prior information about their phenotypic relevance. The chimpanzee genome sequence allows for genomewide comparisons (Chimpanzee Sequencing and Analysis Consortium ²⁰⁰⁵), whereas targeted studies have examined coding regions (Clark et al. 2003), gene-copy-number changes (Fortna et al. 2004), and expression-level changes, particularly in brain (reviewed by Preuss et al. [2004]). These analyses provide information mainly about fixed differences, whereas studies of variation within humans provide information about more-recent genetic changes that still differ between populations (e.g., Kayser et al. [2003]). These genomewide studies often identify large numbers of differences, most of which are likely to be neutral, but provide candidates for further investigation.

Once candidate genes have been identified, their relevance needs to be evaluated. Evolutionarily important genes will have undergone positive selection, and this leaves an imprint on the gene and its surrounding region. There is no single test for past positive selection, but patterns of amino acid change, nucleotide diversity, allele-frequency spectra, differentiation between populations, and haplotype structure can all provide some information (Ronald and Akey 2005). Genes or alleles that have been positively selected can show such properties as rapid amino acid change, low diversity, high frequencies of rare or derived alleles, large differences between populations, and/or extended haplotypes, depending on the time when selection began and the frequency of the selected allele. Genes that do show evidence of positive selection fall into two categories: those that are selected in a wide variety of species, because they are involved in processes like host-pathogen interactions or reproduction, and those involved in more human-specific traits (Vallender and Lahn 2004).

Changes in protein sequence and expression pattern have been considered general molecular mechanisms underlying human evolution (King and Wilson 1975). One variant of these, gene loss, may be a common way in which populations and species adapt, because the multiplicity of ways in which a gene can be inactivated means that loss-of-function mutations are readily available for selection to act on (Olson 1999). Examples relevant to human evolution include (1) the inactivation, several million years ago, of a myosin heavy chain gene (MYH16), expressed predominantly in masticatory muscles, that may have influenced the anatomy of the head and removed a constraint to the development of the modern brain (Stedman et al. 2004), and (2) the CCR5 Δ32 deletion that inactivates the CCR5 protein, with the result that Δ32/Δ32 homozygotes are strongly protected against HIV infection and AIDS and heterozygotes receive some protection (Dean et al. 1996). Although the Δ32 mutation must confer a selective advantage now and its current abundance has been suggested to result from selection by prehistoric infections, the variation of this gene is compatible with a neutral evolutionary past (Sabeti et al. 2005).

The caspase-12 gene (CASP12) provides another example of gene loss. It exists in two forms: full length (ancestral, active) or truncated in the middle by a stop codon at aa 125 (derived, inactive) (Saleh et al. 2004). This polymorphism has been shown to have significant phenotypic consequences: individuals with the full-length form produce lower levels of cytokines after stimulation by bacterial lipopolysaccharides, which leads to a lower initial immune response. If bacteria enter the bloodstream, however, they are at greater danger of immune overreaction and sepsis (Saleh et al. 2004). The active form was reported at a frequency of ∼20% in Africa but was rare elsewhere. Because of the biological interest of this gene and the limited amount of information available about its evolutionary history, we investigated whether the predominant inactive form spread by neutral genetic drift or because of a selective advantage associated with gene loss. We conclude that it has spread through most of the human population within the past 100 KY because of positive selection.

Material and Methods

Population Samples

The examined samples consisted of 1,064 individuals from the CEPH Human Genome Diversity Panel (HGDP-CEPH) (Cann et al. 2002) and 77 individuals—26 Yoruba from Ibadan, Nigeria (YRI); 26 Han Chinese from Beijing (CHB); and 25 CEPH Utah residents with ancestry from northern and western Europe (CEU)—from the HapMap panel (International HapMap Consortium ²⁰⁰³).

Genotyping the Stop-Codon Polymorphism

The stop-codon polymorphism was genotyped in the HGDP-CEPH samples by SNaPshot primer extension as part of a tetraplex reaction. A fragment containing the stop-codon polymorphism was amplified using the forward primer 5′-CTCAACATCCGCAACAAAGA-3′ and the reverse primer 5′-TTGCTCTTTCAGCTGCCAAT-3′, followed by PCR extension with the primer 5′-GTATCCAAGGTTTTCAAGTAGATCTC-3′, through use of the ABI Prism SNaPshot Multiplex Kit (Applied Biosystems) according to the manufacturer’s guidelines, with minor modifications.

Resequencing and Detection of Variants

PCR-amplified fragments of ∼9–11 kb were generated from genomic DNA, then fragments of 500–700 bp overlapping by 200–400 bp were reamplified and sequenced. Primer and PCR details are given in table 1. For each individual, each nucleotide position was determined from both strands by at least two reads each. The CASP12 genomic DNA sequence (GenBank accession number NC_000011) was used as the reference sequence, and the chimpanzee Casp12 sequence (GenBank accession number NW_113990) was used in some analyses. The seven exons in the standard transcript (AceView) and an eighth exon (exon 3) present in some splice variants (Fischer et al. 2002) were considered.

Table 1.

Primers and PCR Conditions^a

Primer Name	Primer Sequence(5′→3′)	Start	End	Product Size(bp)	Overlap(bp)
Primers for amplifying large fragments^b:
CSP12L1F	ACCATAATGCCTTCATTTTCCTAGAG	104262020
CSP12L1R	TAAACTATGCCCATCTTAGGACCTTC		104273050	11,030
CSP12L2F	AAAGTCCTGTTAACTTTGAACGTTTCTT	104266097
CSP12L2R	TTTATTATTATTACAAGGTGGCCAGTCA		104275541	9,444
Primers used for reamplification and sequencing^c:
CSP12S1F	TCATTGCCTCAGCATAGATT	104262183
CSP12S1R	GCCCACCATTGAAAGACTAT		104262709	526
CSP12S2F	ACCACTATTGGGCTACCATT	104262415
CSP12S2R	GGTTTTCCCAATAACCTGAC		104262966	551	294
CSP12S3F	ATTTGGGGTCTCAAATGAAT	104262723
CSP12S3R	GTTTCCCTCTCTTCTCCAAA		104263363	640	243
CSP12S4F	AAAGTTTTCTGGGGCATAAC	104263089
CSP12S4R	AGCAACTTGGTCATCTTGAA		104263688	599	274
CSP12S5F	TTTGGAGAAGAGAGGGAAAC	104263344
CSP12S5R	ATTTGGCAAAGCTGATGTTA		104263874	530	344
CSP12S6F	CTCTGGGTTTGCAAGTAGTG	104263481
CSP12S6R	GATGCTGCCCTAAGGATAAT		104264167	686	393
CSP12S7F	TCCTATCAGGCTTCTCCTTC	104263827
CSP12S7R	GCAAGAGTCGATACATGAGG		104264515	688	340
CSP12S8F	ATTATCCTTAGGGCAGCATC	104264167
CSP12S8R	TCAGGAGAGATGCTAGTGGA		104264692	525	348
CSP12S9F	CCTCATGTATCGACTCTTGC	104264492
CSP12S9R	ACTCCCTTCCTTCCTTCTTT		104265052	560	200
CSP12S10F	ACTCAGCCTCCTCTCCTAAG	104264708
CSP12S10R	CCTTTCTTCCTTCCTTCCTT		104265213	505	344
CSP12S11F	AGGCCTAGCACACAATTACA	104264997
CSP12S11R	GGAGAACAGGAGCAATTTTT		104265678	681	216
CSP12S12F	TAGTCCCTCAGTGCTCACAT	104265439
CSP12S12R	AGCCAACCACTAAAACCATT		104265946	507	239
CSP12S13F	AAAAATTGCTCCTGTTCTCC	104265659
CSP12S13R	TTTCAAATCTTCCACACCAC		104266330	671	287
CSP12S14F	GGTTTTAGTGGTTGGCTTCT	104265930
CSP12S14R	TGCATATGTGGATGTTTGTG		104266580	650	400
CSP12S15F	AAGCAATGAAGTCCTTTTCC	104266329
CSP12S15R	ACTCAAGTGGGGTCTGTTTT		104266859	530	251
CSP12S16F	ACACACAAATGCACACACAT	104266624
CSP12S16R	AAAGACAAACCCAAGGTCAT		104267149	525	235
CSP12S17F	AACAGACCCCACTTGAGTTT	104266842
CSP12S17R	ACTCAAGGGTCTCTTTCAGG		104267360	518	307
CSP12S18F	ATGACCTTGGGTTTGTCTTT	104267130
CSP12S18R	TCTGCTGCTCCATAGTGAAT		104267755	625	230
CSP12S19F	TTGCCCAGTGGTTTTTAGTA	104267553
CSP12S19R	TTAATTGGCAGCTGAAAGAG		104268192	639	202
CSP12S20F	GGTGCAGAGCTTTTGTCTTA	104267935
CSP12S20R	GAGGGTGTATTTTCATGCAG		104268567	632	257
CSP12S21F	CTTTCAGCTGCCAATTAAGA	104268175
CSP12S21R	TCACAAAGGCCTTAAGATCA		104268751	576	392
CSP12S22F	GCCTCTCTTTCTCCATCACT	104268413
CSP12S22R	GCAGTAAGCAGTTTTGAGGA		104269035	622	338
CSP12S23F	CCTGATCTTAAGGCCTTTGT	104268730
CSP12S23R	GGAGATGTCTCAGAGAATGGT		104269343	613	305
CSP12S24F	AGGCTCTCATTCCTCAAAAC	104269006
CSP12S24R	AGACATGTTGGTCATGGAAG		104269511	505	337
CSP12S25F	AGTGCTCACAGCATGAACTT	104269168
CSP12S25R	AGAAGGTTTGTTGCCCTAAG		104269772	604	343
CSP12S26F	GAATTCTTCCATGACCAACA	104269487
CSP12S26R	GTGGGAAAAGAGGAAGAGAA		104270023	536	285
CSP12S27F	GGGCAACAAACCTTCTATTT	104269757
CSP12S27R	CTGGCATAGAAAAGCACAAC		104270408	651	266
CSP12S28F	TACCTGAGCTCTCAAATCCA	104269918
CSP12S28R	TGGGAAAGAGCATTGATAGA		104270442	524	490
CSP12S29F	TTTTGTATGCAATCCAATCC	104270190
CSP12S29R	ATGGCAATAGAGCTGATGAA		104270824	634	252
CSP12S30F	TTTGCCTATTCAACATCCAC	104270538
CSP12S30R	TTTCTTCCCTCCGTACTCTC		104271093	555	286
CSP12S31F	TGCCAAAACTAGGTCTCAAA	104270774
CSP12S31R	GCCCTGAGTAAGAACTTGGT		104271330	556	319
CSP12S32F	AGGGAGAATTGAGAGTACGG	104271064
CSP12S32R	GGGTTTTGTTTTTGCTTTTT		104271601	537	266
CSP12S33F	TTGGTAAAAGGGAGTACCAAG	104271296
CSP12S33R	CAGTGAGCCAGGATGTTTAG		104271864	568	305
CSP12S34F	CCTGCAACGTTTTATATTGC	104271596
CSP12S34R	ATAGGGAATTCATGGGTCAG		104272195	599	268
CSP12S35F	TCTGGAGTAGGAATCAGCAA	104271920
CSP12S35R	TCCCTCTGCTGAAATGTAGA		104272522	602	275
CSP12S36F	CTCTAACGTCCACTTTGTGC	104272307
CSP12S36R	ATTTTGCTTGCTGTTGTCAT		104272980	673	215
CSP12S37F	TACATTTCAGCAGAGGGAGA	104272499
CSP12S37R	TATTGTGGGGCTAACAGCTA		104273152	653	481
CSP12S38F	TGGTGAAACCCTGTGTCTAC	104272751
CSP12S38R	ATGGCATTTTTGATGATTTG		104273342	591	401
CSP12S39F	ATGACAACAGCAAGCAAAAT	104272961
CSP12S39R	ATTTGGGAACCACTACCCTA		104273480	519	381
CSP12S40F	ATATTTTGCCTGCAGTTTGA	104273196
CSP12S40R	TCCCTGAATCTATTTCACCA		104273697	501	284
CSP12S41F	TAGGGTAGTGGTTCCCAAAT	104273457
CSP12S41R	CTCCACATTTCTGCTCTCTG		104273981	524	240
CSP12S42F	GGAGAAGCTCCTGTCTTGTT	104273636
CSP12S42R	TTTATGGCTGTCCTTTGAGA		104274319	683	345
CSP12S43F	CATGTTGTAGCTGACCCATT	104273918
CSP12S43R	GAAAACACCTTCTGCTTCCT		104274563	645	401
CSP12S44F	GGTTTGCATTTTTAGTGCTG	104274180
CSP12S44R	ATGGCATCAGACAGACAAAC		104274702	522	383
CSP12S45F	GAAAAAGCTGTGAAAGCAAA	104274375
CSP12S45R	TGAGTGGATCAGGAAAGAGA		104274877	502	327
CSP12S46F	TCCTTTGGAAAATAGGAAGC	104274531
CSP12S46R	CCTTGCCATGTGAAATTAAA		104275049	518	346
CSP12S47F	TGGAAGTTAAGGGAAAGAGG	104274767
CSP12S47R	GTAGGGTAGGCATCTCTGCT		104275461	694	282

Open in a new tab

We designed primers from a human DNA sequence (GenBank accession number NC_000011) and used the Expand 20kb^Plus PCR System (Roche) to amplify either a fragment spanning 11,030 bp (positions on chromosome 11: 104262020–104273050) or 9,444 bp (104266097–104275541). Using these PCR products as templates, we amplified overlapping fragments with a size of ∼500–700 bp and sequenced them with all of the nested primers. Sequence traces were processed by the program ExoTrace, developed at The Wellcome Trust Sanger Institute, and all potentially polymorphic positions flagged by the program were checked manually. Variable positions were compared in overlapping and complementary reads in all individuals. Primer sequences and PCR conditions are shown here.

PCR conditions for amplifying large fragments were according to manufacturers’ protocol: reaction in 25μl at 92°C for 2 min 15 s; followed by a cycle at 92°C for 10 s, at 60°C for 30 s, and at 68°C for 11 min 30 s 11 times; followed by a cycle at 92°C for 15 s, at 60°C for 30 s, and at 68°C for 11 min 30 s (increasing 10 s every cycle) 21 times; followed by a 68°C extension for 7 min.

PCR conditions for reamplification were as follows: PCR reaction in 15μl, 0.5μl of the template from the large-fragment PCR reaction (diluted 50 times), 0.5 U PlatinumTaq (Invitrogen), at 94°C for 6 min; followed by a cycle at 94°C for 45 s, at 60°C for 45 s, and at 72°C for 1 min 30 s 35 times; followed by a 3-min extension at 72°C.

Sequence traces were processed by the program ExoTrace, developed at The Wellcome Trust Sanger Institute (S. Leonard, unpublished material). Potentially polymorphic positions were flagged by the program and then were checked manually. Variants were accepted if they lay in a region with a Phred score ⩾30 and were detectable in other relevant high-quality reads. In a blind test, 1,328 (99.4%) of the 1,336 SNPs identified in this way corresponded to the genotype of the same sample in the HapMap database, equivalent to the accuracy of the HapMap data themselves (International HapMap Consortium ²⁰⁰⁵).

Data Analysis

For the stop-codon polymorphism, allele frequencies were determined by direct counting. A χ² test was performed for each population as well as on the pooled world population, to evaluate Hardy-Weinberg equilibrium (HWE). F statistics were calculated according to the methods of Weir and Cockerham (1984) with the program FSTAT (Goudet 1995) and resulted in a value between 0 and 1, where 0 would imply no differentiation between populations and 1, complete differentiation.

Linkage disequilibrium (LD) blocks were inferred from genotype data through use of the Haploview program (Barrett et al. 2005). Haplotypes were reconstructed using PHASE 2.1 (Stephens et al. 2001; Stephens and Donnelly ²⁰⁰³; see the Stephens Web site). A median-joining network was constructed using NETW4.1.0.9 (Bandelt et al. 1999; Fluxus). The ratio of the number of nonsynonymous substitutions per nonsynonymous site (K_a) to the number of synonymous substitutions per synonymous site (K_s), K_a/K_s, and summary statistics (Tajima 1989; Fu and Li ¹⁹⁹³; Fu ¹⁹⁹⁷; Fay and Wu ²⁰⁰⁰) were calculated using DnaSP 4.00 (Rozas et al. 2003). Several of these tests compare different estimators of the population mutation parameter, θ. Tajima’s D compares θ estimated from the number of polymorphic sites with θ estimated from the nucleotide diversity; negative values indicate an excess of rare variants, whereas positive values indicate an excess of intermediate-frequency variants. Fu and Li’s tests compare θ estimated from the number of mutations in external branches of a gene tree rooted using the chimpanzee as an outgroup, with θ estimated from the number polymorphic sites (giving D) or the nucleotide diversity (giving F), and negative values indicate an excess of singleton mutations. Fay and Wu’s H compares θ estimated from the nucleotide diversity with a θ estimator based on the frequency of derived variants, and negative values indicate an excess of high-frequency derived alleles. The other tests compare the observed haplotype distribution with that expected under a chosen population model. Fu’s F_s is based on the probability of obtaining a sample with an equal or larger number of haplotypes than that observed, whereas the common-haplotype frequency test measures whether the most common single haplotype is expected to reach the frequency observed. Coalescent simulations were performed using the program ^ms (Hudson 2002) via a custom Perl script to process the output. Extended haplotype homozygosity (EHH) or relative EHH (REHH), which measures the decay of the ancestral extended haplotype with distance due to recombination (Sabeti et al. 2002), was analyzed using the program Sweep 1.0.

Frequency-based ages of the stop-codon mutation were estimated as described elsewhere (Griffiths 2003). Phylogeny-based time to the most recent common ancestor (TMRCA) estimates were obtained from NETW4.1.0.9 with use of a mutation rate based on 82 fixed differences between chimpanzee and human in the 8.6-kb LD block, under the assumption that 41 mutations occurred on each lineage and that the lineages split 7 million years ago. The hypotheses of a partial and complete selective sweep were compared using a composite-likelihood-ratio test (Meiklejohn et al. 2004). Likelihoods for both partial and complete sweeps were calculated from the entire data under the assumption that the selective target is located at the site of the stop-codon mutation. To estimate the strength of putative selection on the stop-codon mutation, we applied the composite likelihood analysis (Kim and Stephan 2002) to the subsample of haplotypes carrying the inactive gene, again assuming that the selective target is the stop-codon mutation. We assumed that N_e=10,000 and r=10^-8, where N_e is the effective population size and r is the recombination rate per base per generation across the ∼13-kb region. In addition, we applied a full likelihood method (Coop and Griffiths 2004) to estimate the selection coefficient of the stop-codon polymorphism under a model of genic selection. The method assumes no recombination, so we restricted the analysis to an ∼2-kb region, surrounding the polymorphism, that showed no evidence of recombination under the four-gamete test (Hudson and Kaplan 1985). The maximum-likelihood estimate of the selection coefficient was then used to estimate the age of the polymorphism by use of the same method (Coop and Griffiths 2004). In performing this analysis, we used the per base mutation rate obtained above, an N_e of 10,000 and a generation time of 25 years. Both methods used to estimate the strength of selection make the assumption of a single panmictic, constant-sized population.

Results

Worldwide Distribution of the Stop-Codon Polymorphism

We first investigated the distribution of the active and inactive forms of the caspase-12 gene in the HGDP-CEPH panel of 1,064 individuals from 52 worldwide populations. The results (fig. 1 and Ctable 2) show that the active form of the gene predominates in some sub-Saharan African populations but is very rare outside Africa. Mbuti Pygmies and San have the highest frequencies of the active form—60% and 57%, respectively; the average for the sub-Saharan African populations is 28%. Outside Africa, the active allele was detected at low frequency in Israel, Pakistan, China, and Mexico (fig. 1), but the average was <1%, and 65% of the population samples were fixed for the inactive form. Although recent admixture may account for some of the active copies outside Africa (e.g., Mexico), other populations carrying active genes have no known history of recent African admixture.

Table 2.

Worldwide Distribution of the Active and Inactive Forms of the Caspase-12 Gene in the HGDP-CEPH Diversity Panel

		No. of Genotypes					Allele Frequency
Population	Geographic origin	AA	GG	GA	Total	Fail	A	G
Mozabite	Algeria (Mzab)	28		2	30		.97	.03
NAN Melanesian	Bougainville	22			22		1.00	.00
Karitiana	Brazil	24			24		1.00	.00
Surui	Brazil	21			21		1.00	.00
Cambodian	Cambodia	11			11		1.00	.00
Biaka Pygmies	Central African Republic	23	4	9	36		.76	.24
Northeast China:	China	40			40		1.00	.00
Oroqen		10			10
Daur		10			10
Hezhen		10			10
Mongola		10			10
Northwest China:	China	19			19		1.00	.00
Xibo		9			9
Uygur		10			10
Central China:	China	30			30		1.00	.00
Han		44		1	45
She		10			10
Tu		10			10
Tujia		10			10
Southwestern China:		48	0	2	50		.98	.02
Lahu		10			10
Miaozu		10			10
Naxi		8		2	10
Dai		10			10
Yizu		10			10
Colombian	Colombia	13			13		1.00	.00
Mbuti Pygmies	Democratic Republic of Congo	1	4	10	15		.40	.60
All French:	France	53			53		1.00	.00
French		29			29
French Basque		24			24
Druze	Israel (Carmel)	43		1	48	4	.99	.01
Palestinian	Israel (central)	45		4	51	2	.96	.04
Bedouin	Israel (Negev)	45		2	49	2	.98	.02
All Italian:	Italy	50			50		1.00	.00
Sardinian		28			28
Tuscan		8			8
Northern Italian	(Bergamo)	14			14
Japanese	Japan	31			31		1.00	.00
Bantu, northeastern	Kenya	11		1	12		.96	.04
Maya	Mexico	24		1	25	1	.98	.02
Pima	Mexico	25			25		1.00	.00
San	Namidia	1	2	4	7		.43	.57
Papuan	New Guinea	17			17		1.00	.00
YRI	Nigeria	18	1	6	25		.84	.16
Orcadian	Orkney Islands	15			16	1	1.00	.00
All Pakistan:	Pakistan	194	0	6	200		.99	.02
Balochi		22		3	25
Brahui		25			25
Burusho		24		1	25
Hazara		25			25
Kalash		25			25
Makrani		24		1	25
Pathan		25			25
Sindhi		24		1	25
Russian	Russia	25			25		1.00	.00
Adygei	Russia Caucasus	17			17		1.00	.00
Mandenka	Senegal	14	1	9	24		.77	.23
Yakut	Siberia	25			25		1.00	.00
All Bantu/South Africa:	South Africa	5	1	2	8		.75	.25
Bantu southeastern Pedi		1			1
Bantu, southeastern and southern Sotho				1	1
Bantu, southeastern Tswana		1	1		2
Bantu, southeastern Zulu				1	1
Bantu, southwestern Herero		2			2
Bantu, southwestern Ovambo		1			1

Open in a new tab

No disagreement with HWE was observed in individual populations, but the pooled sample departed significantly from HWE (P<.01), reflecting subdivision. The large interpopulation variation, mainly caused by differences between the African and non-African populations, led to an F_ST value of 0.274, calculated using the frequency in each individual population. To assess whether this was unusually high, we compared it with empirically derived F_ST values. These are not available on a large scale for the HGDP-CEPH panel but are available for American populations of African, Han Chinese, and European origin (Hinds et al. 2005). We therefore recalculated the caspase-12 F_ST for sub-Saharan Africans versus Han Chinese or Europeans (table 3). Since F_ST is dependent on minor-allele frequency, we used SNPs matching the caspase-12 minor-allele frequency averaged across the pair of populations in each comparison and noted the 95% empirical ranges of these control F_ST values (table 3). The African-Chinese caspase-12 F_ST value is not unusually high; the African-European value is the maximum possible for its allele frequency but, again, is not unusual and falls within the 95% CI of the control SNPs.

Table 3.

Caspase-12 and Control SNP F_ST Values

	Caspase-12		Control SNPs
Comparison	Frequency^a	F_ST	Frequency Range	No.^b	95% F_ST Range
Sub-Saharan Africa and Chinese Han	.138	.172	.132–.143	22,943	.016–.271
Sub-Saharan Africa and Europe	.132	.253	.127–.138	25,552	.011–.266

Open in a new tab

Frequency of the minor allele.

Number of control SNPs lying in this frequency range.

Is the observed predominance of the inactive form of caspase-12 due to positive selection, or does it result from factors such as a bottleneck associated with human migration out of Africa that acted on a neutral variant?

Sequence Variation of the Caspase-12 Gene

To address this question, we resequenced a 13.3-kb stretch of DNA that covers the whole caspase-12 gene and ∼0.7 kb on each side in 77 individuals from the HapMap collection (26 YRI, 26 CHB, and 25 CEU), and we investigated the evolutionary history of the region. Of our sample of 155 chromosomes (including the reference sequence), 8 carried the active form of the gene: 6 YRI, 1 CHB, and the reference sequence of unknown origin, roughly reflecting the worldwide geographical distribution. All the rest of the chromosomes carried the inactive form. A total of 123 SNPs were detected (table 4 and online-only tab-delimited SNP table.txt, which can be downloaded and opened into a spreadsheet), but these were distributed very unevenly among the forms of the gene and populations. In the inferred haplotypes, the active genes were much more diverse: the eight chromosomes carried 61 SNPs and showed a nucleotide diversity of 19.7×10^-4, whereas the 147 inactive chromosomes carried 76 SNPs and had a nucleotide diversity almost 10 times lower, 2.0×10^-4. This led to higher diversity in the YRI (9.1×10^-4) than in the other populations (1.9×10^-4 and 0.5×10^-4 in the CHB and CEU, respectively—a ratio more extreme than any encountered in a study of 132 genes in African American and European American populations [Akey et al. ²⁰⁰⁴]), although it did not entirely account for the high YRI diversity. The inactive genes were also more diverse in Africa than outside (π=4.4×10^-4 and π=0.7×10^-4, respectively; table 4). The low diversity of the inactive genes, particularly outside Africa, provided the first indication that their spread might have been rapid and thus due to positive selection.

Table 4.

Caspase-12 Summary Statistics

	Sample characteristics				Allele frequency distribution tests				Haplotype tests
Location	Sample Size (Chromosomes)	Polymorphic Sites	Nucleotide Diversity (×10⁴)	θ W (×10⁴)	Tajima’s D	Fu and Li’s D	Fu and Li’s F	Fay and Wu’s H (P)	Fu’s F_s	Common Haplotype Frequency
Entire region (13.3 kb)
Whole	155	123	4.5	16.5	−2.32^a	−2.75^a	−3.06^b	−46.2 (.002)^b	−27.7^b
African	52	99	9.1	16.5	−1.59^a	−1.05	−1.54	−28.7 (.021)^a	−5.8
European	50	7	.5	1.2	−1.57^a	−1.17	−1.54	−.9 (.287)	−6.6^b
Chinese	52	47	1.9	7.8	−2.60^b	−3.20^b	−3.59^b	−33.5 (.000)^b	−5.2
Active	8	61	19.7	17.7
Inactive (whole)	147	76	2.0	10.3
Inactive (African)	46	57	4.4	9.7
Inactive (non-African)	101	21	.7	3.0
LD block (8.6 kb)
Whole	155	90	4.5	18.2	−2.37^b	−2.46^a	−2.91^b	−38.4 (.005)^b	−18.5^b	99^b
African	52	71	9.1	17.9	−1.71^a	−1.29	−1.77	−23.2 (.027)^a	−2.9	21^b
European	50	4	.3	1	−1.67^a	−1.26	−1.63	.2 (.398)	−4.3^a	43
Chinese	52	37	2.1	9.3	−2.62^b	−3.16^b	−3.58^b	−25.1 (.000)^b	−3.2	35^b
Active	8	50	20.9	21.9
Inactive (all)	147	43	1.4	8.8
Inactive (African)	46	29	2.9	7.5
Inactive (non-African)	101	14	.6	3.1

Open in a new tab

P<.05.

P<.01 (one-sided tests).

Many analyses can be performed most simply on regions that have experienced little or no recombination. We therefore investigated the LD structure of the region and identified an ∼8.6-kb LD block containing SNPs 10–99, with the stop codon in its center, and used it, together with the complete region, in further analyses (fig. 2). Haplotypes were again inferred for the LD block, a task facilitated by the observation that 57 (74%) of the 77 individuals carried zero or one SNP in this section. We then investigated whether the inferred pattern of variation was compatible with neutral evolution.

Structure of the caspase-12 gene. The exon-intron structure—including exon 3, which is present only in some transcripts—is shown at the top, as is the whole sequenced region and the location of the 8.6-kb LD block. The lower part of the figure shows the LD block identified using Haploview. Each square represents a pairwise value of *D′,* with the standard color coding (red indicates *LOD*⩾2 and D^′=1; pink indicates *LOD*⩾2 and D^′<1; blue indicates *LOD*<2 and D^′=1; white indicates *LOD*<2 and D^′<1).

Neutrality Tests

We first examined the evolution of the coding region, expressed as the K_a/K_s ratio based on the human and chimpanzee sequences. This was 0.55, indicative of purifying selection over most of the evolutionary period but providing little insight into the most recent phase. Tests based on the variation within humans are better able to do this.

Neutral models of evolution provide predictions of expected allele-frequency characteristics, and observed patterns can be compared with these. We have calculated Tajima’s D (Tajima 1989), Fu and Li’s D and F (Fu and Li 1993), and Fay and Wu’s H (Fay and Wu 2000); results are summarized in table 4. Neutrality is rejected for both the entire region and the 8.6-kb LD block by all tests with use of the whole data set. In individual populations, neutrality is similarly rejected by all tests for the CHB, but only by Tajima’s D and Fay and Wu’s H for the YRI and by Tajima’s D for the CEU. These results can readily be understood in terms of a selective sweep that has proceeded to different stages in the different populations (see the “Discussion” section).

A second class of neutrality test examines haplotypes rather than single variable positions. A total of 36 haplotypes were identified (fig. 3), but one haplotype carrying the stop codon occurred 99 times and accounted for 64% of the sample (and 76% of non-African chromosomes). Thirty-six individuals (47%) were homozygous for this haplotype, so its high frequency cannot be an artifact of the haplotype-inference procedure. Fu’s F_s test (Fu 1997), performed on the entire region, shows that significantly fewer haplotypes are found in the whole sample and in CEU than expected under neutrality (table 4). In the 8.6-kb block, fewer haplotypes than expected are found in these populations also. We also used coalescent simulations (Hudson 2002) to evaluate how often a single haplotype would be expected to occur in ⩾99 of 154 chromosomes under neutrality and how often a single haplotype would be expected at the observed frequencies in the individual populations. Except among the CEU, the observed frequencies were highly significant (table 4; last column, headed “Common Haplotype Frequency”).

Inferred caspase-12 haplotypes. Only positions that are variable in humans are shown, coded according to whether they carry the same allele as chimpanzee (white) or not (yellow), except that the stop-codon polymorphism is shown in blue or red, respectively. The low diversity and high frequency of derived alleles can be seen in the inactive genes.

Therefore, according to all the tests used, sequence variation in the caspase-12 gene is significantly different from that expected under neutrality, and the properties of the LD block resemble those of the complete region. Departures from neutral expectation at a single locus can arise in many ways, including stochastic variation and demographic change, but, as discussed below, the simplest explanation for all these deviations is positive selection.

Haplotype Structure and Phylogeny

A median-joining network was constructed to show the relationships between the inferred haplotypes of the 8.6-kb LD block (fig. 4). The network had a simple structure, with little evidence of recombination or recurrent mutation, as expected from the way the region had been chosen. The eight haplotypes carrying active genes are all different from one another and from the inactive genes. All of the inactive haplotypes clustered together, with 99 chromosomes at the center of the cluster, 29 one step away, 6 two steps away, and a few more distant. Outside Africa, the most distant inactive haplotype lay only three steps from the center, whereas there was more diversity among the inactive haplotypes in Africa, and not all radiated directly from the central haplotype.

Median-joining network of inferred caspase-12 haplotypes from the 8.6-kb LD block. Roots 1, 2, and 3 are discussed in the text. Circle area is proportional to haplotype frequency, and circles are coded according to population.

EHH (Sabeti et al. 2002) is a feature of regions that have recently experienced positive selection. We have therefore explored the extended haplotype structure surrounding the caspase-12 gene. Fortunately, the stop SNP was included in the HapMap set (International HapMap Consortium ²⁰⁰³); therefore, we could perform this analysis entirely in silico. We first selected cores containing this SNP and tested regions of 10–100 kb on either side, but we found that neither EHH nor REHH was significantly different from the genome average. We then measured the genetic distance over which EHH remains above a threshold value (0.5 or 0.2) and compared this with the corresponding distances for all alleles on chromosome 11. These distances were 0.013 cM and 0.079 cM and fell in the 58th and 41st percentile, respectively, so, again, were not unusual (see fig. 5). A related analysis by Pritchard and coworkers revealed a similarly nonsignificant value of the measure iHH (integrated EHH) (B. Voight, S. Kudaravalli, and J. Pritchard; personal communication). One explanation could be that sufficient time has elapsed for the long-range structure of the selected haplotype to decay; therefore, we wished to understand the timing of selection more fully.

*Top,* EHH as a function of genetic distance in the CEU. *Bottom,* Corresponding haplotype bifurcation diagram.

Age of the Mutation: Timing and Strength of Selection

The frequency of an allele provides one guide to its age: it begins as a single copy, and the time required to rise to an observed frequency under neutrality or different selective regimes can be estimated (Griffiths 2003). According to this model, the stop codon would require almost 1 million years to reach 96% under neutrality, but this time would be greatly reduced by positive selection—for example, to 27 KY if it conferred a selective advantage of 1% (table 5). However, unless the selection coefficient can be estimated from other sources, this method does not provide an absolute age. We next estimated the TMRCA of the inactive alleles, using a phylogeny-based method (Bandelt et al. 1999), by means of the measure ρ, the average number of mutations from the root. This requires that a root be specified, and three different roots were investigated. Through use of root 1 (fig. 4), a mean (±SD) of 552±276 KYA was obtained for the entire set of inactive haplotypes. The sensitivity of this estimate to the specification of the root is illustrated by the use of root 2, one step away, which led to a time of 397±223 KYA. A TMRCA for the star-shaped cluster, through use of root 3 and without the haplotypes that lie between this root and the active genes, gave 61±16 KYA. The first two times provide information about when the inactivation mutation occurred, whereas the third provides information about when a subset of inactive chromosomes started to expand, so they are expected to differ.

Table 5.

Estimates of the Age of the Mutation That Inactivated Caspase-12 or the TMRCA of Subsets of Inactive Alleles

Basis, Reference, and Conditions or Comments	KYA
Frequency (Griffiths 2003):
Neutrality assumed	980
1% selective advantage assumed	27
5% selective advantage assumed	4.8
Phylogeny (Bandelt et al. 1999):
Root 1 used (see fig. 3)	552±276
Root 2 used (see fig. 3)	397±223
Root 3 used (see fig. 3)	61±16
Composite likelihood (Kim and Stephan 2002):
1.7% selective advantage estimated	19
Full likelihood (Coop and Griffiths 2004):
.8% selective advantage estimated	29

Open in a new tab

We then applied methods aimed at inferring the time of selection from the estimated selective advantage conferred by the stop-codon mutation. First, we attempted to estimate the strength of selection (4N_es) by using parametric models that predict the spatial pattern of nucleotide diversity and allele-frequency spectrum around the putative target of selection (composite likelihood analyses) (Kim and Stephan 2002; Meiklejohn et al. ²⁰⁰⁴). We found that the data did not provide significant support for an incomplete sweep compared with a complete one: log(L[incompletesweep])-log(L[complete sweep])=2.38 (under the assumption that θ [scaled mutation rate per site] = 0.002 = observed level of mean diversity in active genes; this is likely to be the before-sweep level of variation in the entire region). This likelihood ratio is not large enough to reject a complete sweep, when assessed using data sets simulated under a complete-sweep model (Meiklejohn et al. 2004). However, this test of incomplete versus complete sweep has rather low power and assumes sampling from a single, randomly mating population, which is clearly violated by our data. Under the assumption that an incomplete sweep of the stop-codon mutation indeed shapes the haplotype structure of the data, the strength of selection (4N_es) acting on the stop-codon mutation might be obtained by treating the 147 inactive haplotypes as if they represent a sample from a population in which a complete sweep had occurred (Meiklejohn et al. 2004). The model of a complete sweep (Kim and Stephan 2002) applied to the 147 sequences yields the estimate of 4N_es=677. If N_e=10,000, this corresponds to a selective advantage of ∼1.7%. This suggests a time for the mutation of ∼19 KYA. We also performed a full likelihood analysis of the data (Coop and Griffiths 2004), which required a data set free of recombination; we therefore restricted this analysis to a region of ∼2 kb around the stop-codon polymorphism (fig. 3). The likelihood surface for the selection parameter 4N_es peaked at ∼315 (fig. 6), a selective advantage of ∼0.8%. With use of this estimate of 4N_es, the time of the mutation was estimated from the ∼2-kb region, by use of the method of Coop and Griffiths (2004), to be 0.058 in units of 2N_e generations, or ∼29 KYA.

Likelihood surface for the selection parameter 4N_es

Finally, the geographical distribution of the alleles, combined with our understanding of modern human spread, provides indirect information about their age. Only one inactive haplotype appears to have left Africa, so this approach suggests that selection is likely to predate the exodus ∼50–60 KYA.

Discussion

The inactive form of the caspase-12 gene has spread recently throughout most of the human population. We discuss here the evidence that this occurred as a consequence of positive selection rather than of drift, the likely time scale of events, and the significance of the inactivation of this gene for human evolution.

Positive Selection for Loss of Caspase-12

Positive selection leads to the rapid increase of a particular allele and its surrounding sequences. The available tests for neutrality/selection capture different consequences of the process, and the population samples from Africa, China, and Europe illustrate different stages of the selective sweep, including complete fixation in the CEU sample. We would therefore expect the results of the tests to differ between the populations. Diversity becomes substantially reduced only as a sweep nears completion. The value for the caspase-12 gene worldwide was 4.5×10^-4 (table 4), not very different from the genomewide and chromosome 11 averages of 7.5×10^-4 and 8.4×10^-4, respectively (Sachidanandam et al. 2001), but the value in the CHB sample was reduced to 1.9×10^-4 (SD=0.9×10^-4), and that in the CEU was even lower, 0.5×10^-4 (SD=0.1×10^-4), both significantly lower than the YRI value (9.1×10^-4; SD=1.7×10^-4).

Similarly, allele-frequency spectra become greatly skewed only as a sweep nears completion. They therefore show highly significant departures from neutral expectation in the worldwide data set and in the CHB sample, where there was 1 active gene and 51 inactive genes, but not in the YRI, where there were more active genes and greater diversity among the inactive ones. However, on fixation, the variants that previously contributed the low-frequency SNPs, singletons, and high-frequency–derived SNPs to the tests are no longer variable in the population; thus, these tests show slightly significant or nonsignificant results, as for the CEU. The significance of the values obtained was tested against a model of neutral evolution, but departures from neutrality can arise from causes other than selection, such as changes in population size. We have used simulations of a population bottleneck to explore the effect of one set of nonneutral demographies on some of these statistics, assuming a population size of 10,000 before 2,000 generations ago (∼50 KYA at 25 years per generation, approximating the out-of-Africa migration), an instantaneous drop to a reduced size that remained until 1,000 generations ago (∼25 KYA, corresponding to a commonly estimated start of growth outside Africa) and then expanded exponentially back to 10,000. The reduced size ranged from 100 to 1,000 in different runs. We found (table 6) that the value of Tajima’s D in the CEU was not unusual under an extreme bottleneck but that the values observed in the CHB were never reached, even under the most extreme reduction in population size. It therefore seems that the summary statistic test results are not readily explained by a population bottleneck. Another way to assess the significance of the caspase-12 statistics is to compare them with empirical data, although even these comparisons must be interpreted with caution because they are based on different sample sets, which may have experienced different demographic histories. The caspase-12 values from the worldwide or CHB samples lie outside the 95% empirical range of 132 genes examined in two populations (Akey et al. 2004) and are more negative than almost all those published as examples of positive selection (table 7). Only TRPV6 in Europeans, which was detected as the most extreme outlier from 264 analyses but for which the selective agent remains unknown (Akey et al. 2004), shows lower values.

Table 6.

Summary Statistic Tests with Use of a Bottleneck Model in the CEU and CHB Populations (13.3-kb Region)

Nucleotide Diversity(π)×10,000

Polymorphic Sites

Tajima’s D

Fay and Wu’s H

Mean ± SD

95% Cutoff

Mean ± SD

95% Cutoff

Mean ± SD

95% Cutoff

Mean ± SD

95% Cutoff

Common Haplotype Frequency(8.6-kb Region)

Demographic Model

CEU

CHB

CEU

CHB

CEU

CHB

CEU

CHB

CEU

CHB

CEU

CHB

CEU

CHB

CEU

CHB

CEU^a

CHB^b

10,000 before 2,000 generations ago, decreases to 1,000 at 2,000 generations ago and remains at this size until 1,000 generations ago, then increases exponentially to 10,000

6.94±4.04

6.99±4.13

2.30

2.21

27±11

28±12

.26±1.01

.26±1.02

−1.32

−1.33

−.71±5.05

−.79±5.18

−10.63

−10.95

.057

.002

10,000 before 2,000 generations ago, decreases to 500 at 2,000 generations ago and remains at this size until 1,000 generations ago, then increases exponentially to 10,000

6.45±3.99

6.47±4.04

1.89

1.83

26±11

26±12

.35±1.08

.34±1.08

−1.37

−1.39

−.92± 4.99

−1.03± 5.03

−10.63

−10.52

.061

.003

10,000 before 2,000 generations ago, decreases to 200 at 2,000 generations ago and remains at this size until 1,000 generations ago, then increases exponentially to 10,000

5.94±4.05

5.84±4.00

1.33

1.32

22±11

23±11

.41±1.17

.40±1.18

−1.50

−1.52

−1.42± 5.13

−1.49± 5.11

−10.69

−11.67

.074

.006

10,000 before 2,000 generations ago, decreases to 100 at 2,000 generations ago and remains at this size until 1,000 generations ago, then increases exponentially to 10,000

5.27±3.97

5.27±3.93

.96

.88

20±10

.38±1.26

.36±1.27

−1.63

−1.67

−1.63± 4.77

−1.71± 5.03

−10.89

−11.70

.091

.012

Open in a new tab

Probability of observing ⩾43 copies of the most common haplotype.

Probability of observing ⩾35 copies of the most common haplotype.

Table 7.

Published Summary Statistics for Human Genes

Genes	Population	Tajima’s D	Fu and Li’s D	Fay and Wu’s H (P)^a	Reference
Control genes:
132 genes (95% range)	African Americans and European Americans	−1.66 to 1.56		−26.9 to 5.5 (.006–.940)	Akey et al. ²⁰⁰⁴
Genes showing positive selection:
TRPV6^b	European Americans	−2.74		−45.4 (.0001)	Akey et al. ²⁰⁰⁴
FOXP2	World	−2.20		−12.24 (<.05)	Enard et al. ²⁰⁰²
G6PD	World	−1.43	−1.13	NS^c	Saunders et al. ²⁰⁰²
Duffy (FY)	Mandinka^b	−1.40	−1.81		Hamblin and Di Rienzo ²⁰⁰⁰
TAS2R16	Brahui^b	−1.69	−.49	−5.4 (.002)	Soranzo et al. ²⁰⁰⁵
MATP (AIM1)	Europeans^b	−2.23	−2.90	−8.0 (<.025)	Soejima et al. ²⁰⁰⁶
CYP3A4	Europeans^b	−1.76		(.006)^c	Thompson et al. ²⁰⁰⁴

Open in a new tab

These values should not be compared directly.

NS = not significant. Gene or population showing the lowest values in studies involving many genes or populations.

Numerical values were not given.

The unusually high frequency of a single haplotype—21 (40%) of 52 chromosomes, even in the YRI sample, and higher in the other populations—provided a robust signal of departure from neutrality. It has been shown elsewhere that a single haplotype from a 62-kb region carrying 166 SNPs is unlikely to reach a frequency of even 21% under a wide range of demographic models (Mekel-Bobrov et al. ²⁰⁰⁵), so such a signal is also robust to the demographic specification. We found no evidence of an unusually extended haplotype associated with the caspase-12 gene. This can be explained by two factors. The first is the near fixation of the inactive gene, which drives other haplotypes down to low frequencies and consequently leads to low power to detect differences between haplotypes. The second is the time since the sweep began: the most significant EHH/REHH values have been reported for sweeps beginning <10 KYA (Sabeti et al. 2002; Bersaglieri et al. ²⁰⁰⁴). In conclusion, no plausible combination of demographic and stochastic factors can account for sequence variation surrounding the caspase-12 gene, but it shows exactly the signatures expected for a selective sweep that began early enough to have reached fixation in some populations but not in others. Indeed, it shows the clearest evidence of any locus documented thus far for a worldwide selective sweep in humans.

Target, Timing, and Strength of Selection

The rapid decay of LD within and surrounding the caspase-12 gene (fig. 2 and results not shown) indicates that selection is likely to be acting on the central region of the gene itself rather than on another gene in LD. Since the stop-codon polymorphism affects the phenotype and is the only variant in this region that is known to do so, we conclude that it is very likely to be the target of the selection.

Estimates of the age of the mutation or timing of selection depend on the method used, and all have wide CIs; nevertheless, all suggest that selection began in the Paleolithic period, a conclusion that is also consistent with the lack of EHH/REHH signal. The most recent—∼19 KYA—is likely to be an underestimate, since it assumed that the inactive genes represented a complete sweep, whereas the sweep is evidently incomplete, and additional time is required for fixation. Furthermore, several of the methods required assumptions about demography (a panmictic population of a constant size of 10,000) that are commonly made but are obviously oversimplifications. Interactions with other advantageous genes—a kind of assortative mating for “survivorship”—could lead to additional departures from these simple models. The date based on geography—before 50–60 KYA—thus seems to provide the firmest lower date for the time of origin of the mutation, but the upper limit remains poorly defined. Despite the considerable uncertainty about the strength and timing of selection, a selective advantage of ∼0.5%–1% beginning 60–100 KYA would explain most of our observations.

Selective Pressure

“Sepsis is the most common cause of death in infants and children in the world,” according to a recent review (Watson and Carcillo 2005, p. S3); deaths ascribed to the four major killers pneumonia, diarrhea, malaria, and measles often occur via a common pathway leading to fatal sepsis. Its incidence is likely to have been even higher before the availability of modern sanitation and medicines, and its action early in life would have made it a potent selective force. In modern hospitals, individuals with two copies of the inactive caspase-12 gene are both ∼7.8-fold more likely to escape severe sepsis and more likely to survive if they do develop it, whereas heterozygotes show an intermediate level of protection (Saleh et al. 2004). We therefore suggest that the avoidance and survival of severe sepsis was the selective force that led to the spread of the inactive form of the caspase-12 gene.

This hypothesis leads to the question of why, if the inactive caspase-12 gene is so advantageous, it has not been fixed in humans and, indeed, in other species. Many infectious diseases require large host population sizes to maintain themselves and thus would have been rare or absent in archaic humans, when population sizes were small (Dobson 1992). Consequently, in small populations, there would have been no advantage associated with the inactive gene, and the evolutionary conservation of the gene (illustrated by a low human/chimpanzee K_a/K_s ratio) suggests that there may even have been a disadvantage, although the nature of this remains to be identified. Thus, selection for the inactive gene would have occurred only when the human population size became large.

When did the population start to grow? The Neolithic transition beginning ∼10 KYA was associated with population growth and close contact with domestic animals, both of which would have increased the number of infections, but genetic studies suggest that the population started to grow long before the Neolithic period (Wall and Przeworski 2000). For example, one analysis suggested a start of expansion in sub-Saharan Africa 49–640 KYA (Reich and Goldstein 1998). According to our model, there would therefore have been an intermediate stage in which the active/inactive status of the gene was neutral or fluctuated between somewhat advantageous and disadvantageous in time or space. This could account for the accumulation of relatively diverse inactive haplotypes in Africa before the enormous expansion of a single inactive haplotype (fig. 4). But why did only a single haplotype expand? We cannot find any plausible biological difference between the most frequent haplotype and the more ancestral inactive ones—the SNPs that distinguish them lie in introns—so suggest that it could reflect either drift or some other advantage arising in a single population; if the latter, further studies of the caspase-12 gene may help to pinpoint the population and possibly the time in which this hypothetical key advance arose. More generally, selection on the caspase-12 gene appears to have started during a key period in human evolution, when modern behavior was developing. It therefore provides an example of the signature of selection that we may expect from this time period when unknown genes that may have contributed to modern human behavior may have experienced selection, although the pattern at any particular gene will depend on many factors, including stochastic variation, local mutation and recombination rate, and the strength of selection.

The “less is more” hypothesis of the importance of gene loss for human evolution (Olson 1999) remains to be systematically evaluated, but capsase-12 provides one striking example of the advantage that gene inactivation can confer and its role in human evolution.

Supplementary Material

Table.txt

AJHGv78p659suptable3.txt^{(42.8KB, txt)}

Acknowledgments

We thank Joe Greenhill and Jonathan Bailey, for their contributions to generating sequence data; Chris Gillson, for assistance; Kate Rice and Bob Griffiths, for useful discussions; Peter Donnelly, for advice; Benjamin Voight, Sridhar Kudaravalli, and Jonathan Pritchard, for permission to refer to their unpublished work; and two referees, for suggesting improvements to the manuscript. We particularly thank Molly Przeworski for her advice and suggestions during the course of this study and for comments on the manuscript. This work was supported by The Wellcome Trust.

Web Resources

Accession numbers and URLs for data presented herein are as follows:

AceView, http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?db=35g&c=Gene&l=CASP12P1
DnaSP, http://www.ub.es/dnasp/
Fluxus, http://www.fluxus-technology.com/ (for Network4.1.0.9)
FSTAT, http://www2.unil.ch/popgen/softwares/fstat.htm
GenBank, http://www.ncbi.nlm.nih.gov/Genbank/ (for the CASP12 genomic DNA sequence [accession number NC_000011] and the chimpanzee CASP12 sequence [accession number NW_113990])
Haploview, http://www.broad.mit.edu/mpg/haploview/
ms, http://home.uchicago.edu/~rhudson1/source/mksamples.html
Stephens Web site, http://www.stat.washington.edu/stephens/software.html (for PHASE)
Sweep, http://www.broad.mit.edu/mpg/sweep/download.html (for version 1.0)

References

Akey JM, Eberle MA, Rieder MJ, Carlson CS, Shriver MD, Nickerson DA, Kruglyak L (2004) Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol 2:e286 10.1371/journal.pbio.0020286 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bandelt HJ, Forster P, Röhl A (1999) Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol 16:37–48 [DOI] [PubMed] [Google Scholar]
Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21:263–265 10.1093/bioinformatics/bth457 [DOI] [PubMed] [Google Scholar]
Bersaglieri T, Sabeti PC, Patterson N, Vanderploeg T, Schaffner SF, Drake JA, Rhodes M, Reich DE, Hirschhorn JN (2004) Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet 74:1111–1120 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer J, et al (2002) A human genome diversity cell line panel. Science 296:261–262 10.1126/science.296.5566.261b [DOI] [PubMed] [Google Scholar]
Chimpanzee Sequencing and Analysis Consortium (2005) Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437:69–87 10.1038/nature04072 [DOI] [PubMed] [Google Scholar]
Clark AG, Glanowski S, Nielsen R, Thomas PD, Kejariwal A, Todd MA, Tanenbaum DM, Civello D, Lu F, Murphy B, Ferriera S, Wang G, Zheng X, White TJ, Sninsky JJ, Adams MD, Cargill M (2003) Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302:1960–1963 10.1126/science.1088821 [DOI] [PubMed] [Google Scholar]
Coop G, Griffiths RC (2004) Ancestral inference on gene trees under selection. Theor Popul Biol 66:219–232 10.1016/j.tpb.2004.06.006 [DOI] [PubMed] [Google Scholar]
Dean M, Carrington M, Winkler C, Huttley GA, Smith MW, Allikmets R, Goedert JJ, Buchbinder SP, Vittinghoff E, Gomperts E, Donfield S, Vlahov D, Kaslow R, Saah A, Rinaldo C, Detels R, O’Brien SJ (1996) Genetic restriction of HIV-1 infection and progression to AIDS by a deletion allele of the CKR5 structural gene. Science 273:1856–1862 [DOI] [PubMed] [Google Scholar]
Dobson A (1992) People and disease. In: Jones S, Martin R, Pilbeam D (eds) The Cambridge encyclopedia of human evolution. Cambridge University Press, Cambridge, United Kingdom, pp 411–420 [Google Scholar]
Enard W, Przeworski M, Fisher SE, Lai CS, Wiebe V, Kitano T, Monaco AP, Pääbo S (2002) Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418:869–872 10.1038/nature01025 [DOI] [PubMed] [Google Scholar]
Evans PD, Anderson JR, Vallender EJ, Gilbert SL, Malcom CM, Dorus S, Lahn BT (2004) Adaptive evolution of ASPM, a major determinant of cerebral cortical size in humans. Hum Mol Genet 13:489–494 10.1093/hmg/ddh055 [DOI] [PubMed] [Google Scholar]
Fay JC, Wu CI (2000) Hitchhiking under positive Darwinian selection. Genetics 155:1405–1413 [DOI] [PMC free article] [PubMed] [Google Scholar]
Fischer H, Koenig U, Eckhart L, Tschachler E (2002) Human caspase 12 has acquired deleterious mutations. Biochem Biophys Res Commun 293:722–726 10.1016/S0006-291X(02)00289-9 [DOI] [PubMed] [Google Scholar]
Fortna A, Kim Y, MacLaren E, Marshall K, Hahn G, Meltesen L, Brenton M, Hink R, Burgers S, Hernandez-Boussard T, Karimpour-Fard A, Glueck D, McGavran L, Berry R, Pollack J, Sikela JM (2004) Lineage-specific gene duplication and loss in human and great ape evolution. PLoS Biol 2:937–954 [DOI] [PMC free article] [PubMed] [Google Scholar]
Fu Y-X (1997) Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. Genetics 147:915–925 [DOI] [PMC free article] [PubMed] [Google Scholar]
Fu Y-X, Li W-H (1993) Statistical tests of neutrality of mutations. Genetics 133:693–709 [DOI] [PMC free article] [PubMed] [Google Scholar]
Goudet J (1995) FSTAT (vers 1.2): a computer program to calculate F-statistics. J Hered 86:485–486 [Google Scholar]
Griffiths RC (2003) The frequency spectrum of a mutation, and its age, in a general diffusion model. Theor Popul Biol 64:241–251 10.1016/S0040-5809(03)00075-3 [DOI] [PubMed] [Google Scholar]
Hamblin MT, Di Rienzo A (2000) Detection of the signature of natural selection in humans: evidence from the Duffy blood group locus. Am J Hum Genet 66:1669–1679 [DOI] [PMC free article] [PubMed] [Google Scholar]
Henshilwood CS, d’Errico F, Yates R, Jacobs Z, Tribolo C, Duller GA, Mercier N, Sealy JC, Valladas H, Watts I, Wintle AG (2002) Emergence of modern human behavior: Middle Stone Age engravings from South Africa. Science 295:1278–1280 10.1126/science.1067575 [DOI] [PubMed] [Google Scholar]
Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR (2005) Whole-genome patterns of common DNA variation in three human populations. Science 307:1072–1079 10.1126/science.1105436 [DOI] [PubMed] [Google Scholar]
Hudson RR (2002) Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18:337–338 10.1093/bioinformatics/18.2.337 [DOI] [PubMed] [Google Scholar]
Hudson RR, Kaplan NL (1985) Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111:147–164 [DOI] [PMC free article] [PubMed] [Google Scholar]
International HapMap Consortium (2003) The International HapMap Project. Nature 426:789–796 10.1038/nature02168 [DOI] [PubMed] [Google Scholar]
——— (2005) A haplotype map of the human genome. Nature 437:1299–1320 10.1038/nature04226 [DOI] [PMC free article] [PubMed] [Google Scholar]
Jobling MA, Hurles ME, Tyler-Smith C (2004) Human evolutionary genetics. Garland Science, New York and Abingdon [Google Scholar]
Kayser M, Brauer S, Stoneking M (2003) A genome scan to detect candidate regions influenced by local natural selection in human populations. Mol Biol Evol 20:893–900 10.1093/molbev/msg092 [DOI] [PubMed] [Google Scholar]
Kim Y, Stephan W (2002) Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160:765–777 [DOI] [PMC free article] [PubMed] [Google Scholar]
King MC, Wilson AC (1975) Evolution at two levels in humans and chimpanzees. Science 188:107–116 [DOI] [PubMed] [Google Scholar]
McDougall I, Brown FH, Fleagle JG (2005) Stratigraphic placement and age of modern humans from Kibish, Ethiopia. Nature 433:733–736 10.1038/nature03258 [DOI] [PubMed] [Google Scholar]
Meiklejohn CD, Kim Y, Hartl DL, Parsch J (2004) Identification of a locus under complex positive selection in Drosophila simulans by haplotype mapping and composite-likelihood estimation. Genetics 168:265–279 10.1534/genetics.103.025494 [DOI] [PMC free article] [PubMed] [Google Scholar]
Mekel-Bobrov N, Gilbert SL, Evans PD, Vallender EJ, Anderson JR, Hudson RR, Tishkoff SA, Lahn BT (2005) Ongoing adaptive evolution of ASPM, a brain size determinant in Homo sapiens. Science 309:1720–1722 10.1126/science.1116815 [DOI] [PubMed] [Google Scholar]
Olson MV (1999) When less is more: gene loss as an engine of evolutionary change. Am J Hum Genet 64:18–23 [DOI] [PMC free article] [PubMed] [Google Scholar]
Preuss TM, Caceres M, Oldham MC, Geschwind DH (2004) Human brain evolution: insights from microarrays. Nat Rev Genet 5:850–860 10.1038/nrg1469 [DOI] [PubMed] [Google Scholar]
Reich DE, Goldstein DB (1998) Genetic evidence for a Paleolithic human population expansion in Africa. Proc Natl Acad Sci USA 95:8119–8123 10.1073/pnas.95.14.8119 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ronald J, Akey JM (2005) Genome-wide scans for loci under selection in humans. Hum Genomics 2:113–125 [DOI] [PMC free article] [PubMed] [Google Scholar]
Rozas J, Sánchez-DelBarrio JC, Messeguer X, Rozas R (2003) DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19:2496–2497 10.1093/bioinformatics/btg359 [DOI] [PubMed] [Google Scholar]
Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF, Gabriel SB, Platko JV, Patterson NJ, McDonald GJ, Ackerman HC, Campbell SJ, Altshuler D, Cooper R, Kwiatkowski D, Ward R, Lander ES (2002) Detecting recent positive selection in the human genome from haplotype structure. Nature 419:832–837 10.1038/nature01140 [DOI] [PubMed] [Google Scholar]
Sabeti PC, Walsh E, Schaffner SF, Varilly P, Fry B, Hutcheson HB, Cullen M, Mikkelsen TS, Roy J, Patterson N, Cooper R, Reich D, Altshuler D, O’Brien S, Lander ES (2005) The case for selection at CCR5-Δ32. PLoS Biol 3:e378 10.1371/journal.pbio.0030378 [DOI] [PMC free article] [PubMed] [Google Scholar]
Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, Sherry S, et al (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409:928–933 10.1038/35057149 [DOI] [PubMed] [Google Scholar]
Saleh M, Vaillancourt JP, Graham RK, Huyck M, Srinivasula SM, Alnemri ES, Steinberg MH, Nolan V, Baldwin CT, Hotchkiss RS, Buchman TG, Zehnbauer BA, Hayden MR, Farrer LA, Roy S, Nicholson DW (2004) Differential modulation of endotoxin responsiveness by human caspase-12 polymorphisms. Nature 429:75–79 10.1038/nature02451 [DOI] [PubMed] [Google Scholar]
Saunders MA, Hammer MF, Nachman MW (2002) Nucleotide variability at G6pd and the signature of malarial selection in humans. Genetics 162:1849–1861 [DOI] [PMC free article] [PubMed] [Google Scholar]
Soejima M, Tachida H, Ishida T, Sano A, Koda Y (2006) Evidence for recent positive selection at the human AIM1 locus in a European population. Mol Biol Evol 23:179–188 10.1093/molbev/msj018 [DOI] [PubMed] [Google Scholar]
Soranzo N, Bufe B, Sabeti PC, Wilson JF, Weale ME, Marguerie R, Meyerhof W, Goldstein DB (2005) Positive selection on a high-sensitivity allele of the human bitter-taste receptor TAS2R16. Curr Biol 15:1257–1265 10.1016/j.cub.2005.06.042 [DOI] [PubMed] [Google Scholar]
Stedman HH, Kozyak BW, Nelson A, Thesier DM, Su LT, Low DW, Bridges CR, Shrager JB, Minugh-Purvis N, Mitchell MA (2004) Myosin gene mutation correlates with anatomical changes in the human lineage. Nature 428:415–418 10.1038/nature02358 [DOI] [PubMed] [Google Scholar]
Stephens M, Donnelly P (2003) A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet 73:1162–1169 [DOI] [PMC free article] [PubMed] [Google Scholar]
Stephens M, Smith NJ, Donnelly P (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68:978–989 [DOI] [PMC free article] [PubMed] [Google Scholar]
Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595 [DOI] [PMC free article] [PubMed] [Google Scholar]
Thompson EE, Kuttab-Boulos H, Witonsky D, Yang L, Roe BA, Di Rienzo A (2004) CYP3A variation and the evolution of salt-sensitivity variants. Am J Hum Genet 75:1059–1069 [DOI] [PMC free article] [PubMed] [Google Scholar]
Vallender EJ, Lahn BT (2004) Positive selection on the human genome. Hum Mol Genet Spec No 2 13:R245–R254 10.1093/hmg/ddh253 [DOI] [PubMed] [Google Scholar]
Wall JD, Przeworski M (2000) When did the human population size start increasing? Genetics 155:1865–1874 [DOI] [PMC free article] [PubMed] [Google Scholar]
Watson RS, Carcillo JA (2005) Scope and epidemiology of pediatric sepsis. Pediatr Crit Care Med 6:S3–S5 10.1097/01.PCC.0000161289.22464.C3 [DOI] [PubMed] [Google Scholar]
Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358–1370 [DOI] [PubMed] [Google Scholar]
Zhang J (2003) Evolution of the human ASPM gene, a major determinant of brain size. Genetics 165:2063–2070 [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang J, Webb DM, Podlaha O (2002) Accelerated protein evolution and origins of human-specific features: FOXP2 as an example. Genetics 162:1825–1835 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table.txt

AJHGv78p659suptable3.txt^{(42.8KB, txt)}

[RF1] AceView, http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?db=35g&c=Gene&l=CASP12P1

[RF2] DnaSP, http://www.ub.es/dnasp/

[RF3] Fluxus, http://www.fluxus-technology.com/ (for Network4.1.0.9)

[RF4] FSTAT, http://www2.unil.ch/popgen/softwares/fstat.htm

[RF5] GenBank, http://www.ncbi.nlm.nih.gov/Genbank/ (for the CASP12 genomic DNA sequence [accession number NC_000011] and the chimpanzee CASP12 sequence [accession number NW_113990])

[RF6] Haploview, http://www.broad.mit.edu/mpg/haploview/

[RF7] ms, http://home.uchicago.edu/~rhudson1/source/mksamples.html

[RF8] Stephens Web site, http://www.stat.washington.edu/stephens/software.html (for PHASE)

[RF9] Sweep, http://www.broad.mit.edu/mpg/sweep/download.html (for version 1.0)

PERMALINK

Spread of an Inactive Form of Caspase-12 in Humans Is Due to Recent Positive Selection

Yali Xue

Allan Daly

Bryndis Yngvadottir

Mengning Liu

Graham Coop

Yuseob Kim

Pardis Sabeti

Yuan Chen

Jim Stalker

Elizabeth Huckle

John Burton

Steven Leonard

Jane Rogers

Chris Tyler-Smith

Abstract

Material and Methods

Population Samples

Genotyping the Stop-Codon Polymorphism

Resequencing and Detection of Variants

Table 1.

Data Analysis

Results

Worldwide Distribution of the Stop-Codon Polymorphism

Figure 1.

Table 2.

Table 3.

Sequence Variation of the Caspase-12 Gene

Table 4.

Figure 2.

Neutrality Tests

Figure 3.

Haplotype Structure and Phylogeny

Figure 4.

Figure 5.

Age of the Mutation: Timing and Strength of Selection

Table 5.

Figure 6.

Discussion

Positive Selection for Loss of Caspase-12

Table 6.

Table 7.

Target, Timing, and Strength of Selection

Selective Pressure

Supplementary Material

Acknowledgments

Web Resources

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases