Skip to main content
Genetics logoLink to Genetics
. 2012 May;191(1):215–232. doi: 10.1534/genetics.112.139576

Evaluating the Evidence for Transmission Distortion in Human Pedigrees

Wynn K Meyer *,1, Barbara Arbeithuber , Carole Ober *,, Thomas Ebner §, Irene Tiemann-Boege , Richard R Hudson **,2, Molly Przeworski *,**,††,2
PMCID: PMC3338262  PMID: 22377632

Abstract

Children of a heterozygous parent are expected to carry either allele with equal probability. Exceptions can occur, however, due to meiotic drive, competition among gametes, or viability selection, which we collectively term “transmission distortion” (TD). Although there are several well-characterized examples of these phenomena, their existence in humans remains unknown. We therefore performed a genome-wide scan for TD by applying the transmission disequilibrium test (TDT) genome-wide to three large sets of human pedigrees of European descent: the Framingham Heart Study (FHS), a founder population of European origin (HUTT), and a subset of the Autism Genetic Resource Exchange (AGRE). Genotyping error is an important confounder in this type of analysis. In FHS and HUTT, despite extensive quality control, we did not find sufficient evidence to exclude genotyping error in the strongest signals. In AGRE, however, many signals extended across multiple SNPs, a pattern highly unlikely to arise from genotyping error. We identified several candidate regions in this data set, notably a locus in 10q26.13 displaying a genome-wide significant TDT in combined female and male transmissions and a signature of recent positive selection, as well as a paternal TD signal in 6p21.1, the same region in which a significant TD signal was previously observed in 30 European males. Neither region replicated in FHS, however, and the paternal signal was not visible in sperm competition assays or as allelic imbalance in sperm. In maternal transmissions, we detected no strong signals near centromeres or telomeres, the regions predicted to be most susceptible to female-specific meiotic drive, but we found a significant enrichment of top signals among genes involved in cell junctions. These results illustrate both the potential benefits and the challenges of using the TDT to study transmission distortion and provide candidates for investigation in future studies.


ACCORDING to Mendel’s law of segregation, diploid organisms that are heterozygous at a locus are equally likely to transmit either allele to their offspring. Yet cases occur in which one allele is observed among offspring at >50% frequency. This phenomenon of observed “transmission distortion” (TD), also known as transmission ratio distortion, can result from two distinct biological processes. The first, which we call “segregation distortion,” includes meiotic drive, in which the functional products of meiosis preferentially carry one allele, and competition among gametes. Meiotic drive is more likely to occur in asymmetric meioses, such as those in human female germ cells (Pardo-Manuel de Villena and Sapienza 2001; Malik 2009). Examples include the B chromosomes most commonly observed in insects and plants and the “knob” chromosomes of maize (Östergren 1945; Peacock et al. 1981; Jones and Rees 1982). In turn, segregation distorters like the t-haplotype in mice confer an advantage in competition for fertilization between gametes carrying different alleles (Lyon 2003). The second process that could lead to observed TD is ongoing viability selection; if an allele confers a viability advantage to gametes or individuals, it will appear to be transmitted to >50% of the surviving offspring of heterozygous parents. With the exception of viability selection on diploids, these phenomena are more likely to produce TD in gametes of only one sex (see Lyttle 1993).

In several known cases of segregation distortion, the advantage to the distorter allele is strong, with as many as 99% of offspring inheriting this allele (Lyttle 1993). Such an allele is unlikely to be observed as segregating within a population if not maintained by some countervailing force, because it would drive to fixation rapidly. Yet there are numerous examples of polymorphic drivers across multiple taxa. Their maintenance in the population can often be explained by reduced fertility or fitness of adults homozygous for the driver (see Hartl 1972 and Carvalho and Vaz 1999), as in the well-known segregation distorter (SD) system in Drosophila and t-haplotypes in mice. The SD system disrupts a signaling pathway involved in nuclear localization, preventing SD+ sperm—those that do not carry the distorter—from developing normally, thus leading to eventual transmission of nearly 100% SD sperm (Kusano et al. 2003). Males homozygous for SD have severely reduced fertility (Hartl 1973, 1974), and it is presumably this deleterious effect, in combination with suppressors of distortion, that permits the observation of polymorphism at the SD locus in natural populations of Drosophila (Hartl 1975; Hiraizumi and Thomas 1984; Presgraves et al. 2009). In mice, interactions between t-haplotype distorters and responder loci reduce motility of non-t-haplotype-bearing sperm in heterozygotes and males homozygous for the t-haplotype are sterile (Lyon 2003; Veron et al. 2009). In these cases, the distorter allele enhances its own transmission at the expense of the organism and can thus be seen as a selfish genetic element. Beyond these two cases, segregation distortion has been detected in a wide variety of organisms, including many species of insects, plants, fungi, and vertebrates, suggesting that deleterious effects of drivers may be common (Lyttle 1993; Pardo-Manuel de Villena and Sapienza 2001; de la Casa-Esperón and Sapienza 2003).

The prevalence of distorters in natural populations has important implications for genome evolution, as well as for speciation. In particular, asymmetric female meiosis provides the opportunity for meiotic drive loci to influence the outcome of oötid competition, i.e., competition among the four products of meiosis to be included in the oocyte pronucleus. An allele affecting the orientation of chromosomes toward the pronucleus could lead either to distortion or to nondisjunction; therefore, common appearances of such alleles could potentially explain the high rates of nondisjunction observed in female Drosophila and humans (Zwick et al. 1999; Hassold and Hunt 2001). This type of meiotic drive has also been proposed as a powerful force in the evolution of centromeres, given their central importance to chromosome positioning during meiosis. Specifically, the rapid evolution of repetitive DNA in centromeres is thought to be due to competition among centromeres to bind spindle elements, with longer repeats favored. This “centromeric drive” hypothesis predicts frequent segregation distortion at the centromere in females (Henikoff et al. 2001; Malik and Henikoff 2002). The telomere may also be involved in determining orientation toward the meiotic spindle and has therefore been proposed as another potential target of female-specific meiotic drive (Novitski 1951; Anderson et al. 2008; Axelsson et al. 2010).

The dynamics of distorter alleles may also influence local patterns of meiotic recombination. In several known cases, distortion results from an interaction wherein the “drive” allele at the distorter locus acts on a “sensitive” allele at a responder locus. This dynamic produces indirect selection on linked recombination rate modifiers, whereby linked mutations on the drive/insensitive background that decrease recombination between distorter and responder will be favored (Charlesworth and Hartl 1978). Conversely, at unlinked sites, modifiers that increase recombination will be beneficial because they uncouple the distorter and responder, thereby suppressing the costly drive (Thomson and Feldman 1974; Haig and Grafen 1991). There may also be selection on modifiers of recombination that influence the stage of meiosis at which distorters gain a transmission advantage (Haig 2010; Brandvain and Coop 2012). Moreover, because systems of distortion loci and their responders coevolve rapidly and can generate Dobzhansky–Muller incompatibilities, they may play an important role in the evolution of reproductive isolation (Frank 1991; Hurst and Pomiankowski 1991). On the X chromosome, segregation distortion loci can influence sex ratios and even lead to novel sex-determining mechanisms (Jarrell 1995; Gileva 1998; Hurst and Werren 2001). Thus, understanding the prevalence of TD is important for many aspects of evolutionary genetics.

Although there are numerous examples from other organisms, the extent and influence of TD in humans remains unknown. One study found a genome-wide excess of allele sharing among siblings, suggestive of TD, in a founder population of European origin (Zöllner et al. 2004), but another reported a deficit of allele sharing in Australian and Dutch dizygotic twins (Montgomery et al. 2006). A more direct way of assessing TD is by testing the null hypothesis that the transmission rates of both alleles from heterozygous parents are equal to 50%. The transmission disequilibrium test (TDT), originally designed for family-based association tests using an affected-only design, can be used to test for TD in genotyping data from pedigrees (Spielman et al. 1993). One limitation of the TDT (and tests for excess allele sharing) is that even relatively low levels of genotyping error can strongly enrich for apparent TD. For example, mistyping of major allele homozygote parents as heterozygotes can lead to apparent overtransmission of the major allele (Mitchell et al. 2003), as can a large proportion of missed calls among heterozygotes (see Hirschhorn and Daly 2005, Box 4). Several authors have proposed modifications or alternatives to the TDT that are more robust to genotyping errors (Gordon et al. 2001, 2004; Cheng and Chen 2007), but they suffer from a number of limitations when applied genome-wide: for instance, they cannot be used for tests in only one sex, do not address the problem of differential fractions of missing data among genotype classes, and/or are not robust to population stratification (a benefit of the original TDT). An additional challenge for genome-wide scans is that correction for multiple testing leads to stringent cutoffs for significance, such that extremely large sample sizes are required to detect moderate TD; for example, 2839 transmissions are required to achieve 50% power to detect distortion strength (deviation from 50% transmission) of 5% at α = 10−7 (Evans et al. 2006). The best power for detecting TD genome-wide, therefore, exists at loci with strong TD and high minor allele frequency (MAF), because, for a given sample size, these provide the most observable transmissions from heterozygotes. A strongly distorting locus experiences a trajectory similar to that of a beneficial allele, so to observe a TD locus with high MAF, distortion must be either extremely common or counterbalanced, as is often observed in other organisms.

To date, three studies have looked for TD in human pedigrees using the TDT. Santos et al. (2009) applied the TDT across chromosome 6p in fathers, mothers, and both sexes of 30 HapMap Yoruba in Ibadan, Nigeria (YRI) and 30 CEPH (Utah residents with ancestry from northern and western Europe) (CEU) trios (Frazer et al. 2007) and found one experiment-wide significant region in CEU males. This study reduced the impact of multiple testing correction by using tag SNPs and investigating a small region of the genome, selected in part because it is largely syntenic with mouse chromosome 17, where t-haplotypes lie, and contains the major histocompatibility complex (MHC) region. The power of the study was limited for all but very strong TD; even if 43 parents were heterozygous—the maximum number for which a SNP would not be filtered due to deviation from Hardy–Weinberg equilibrium (HWE)—distortion strength of 27.9% would be required to achieve 50% power for experiment-wide significance (P = 2 × 10−4). The one region that these authors identified as significant at this level showed 17 of 18 transmissions of the same allele. Given the small sample, the result could be due to chance fluctuations in male transmission rate; thus, replication is necessary for the finding to be well supported and, because of the winner’s curse (Bazerman and Samuelson 1983, Göring et al. 2001), to estimate its strength. In a second study, the TDT was extended to the whole genome in the HapMap; the authors reported 200 candidate genes containing markers in the top 0.1% of signals in one or both parents, none of which met genome-wide significance (Deng et al. 2009). None of these top signals met genome-wide significance, which is unsurprising given the small sample size of this study. Finally, Paterson et al. (2009) conducted a genome-wide assessment of TD using parents of both sexes in the Framingham Heart Study (FHS), an outbred population of European descent. They attributed most strong signals to the confounding effects of genotyping error but reported eight cases in which genotypes appeared to have been called more reliably, one of which had P < 10−7.

As these studies demonstrate, determining the full extent of TD in the human genome is hampered by the pervasive effects of genotyping error and the large sample sizes needed to obtain power for all but very strong effects. Here we used a large set of genotyped families to address the following questions: (1) Are there any well-supported examples of strong TD in contemporary human populations, (2) are there any developmental or molecular processes that tend to be overrepresented in regions with signals of TD, and (3) is there evidence for TD near human female centromeres or telomeres, the locations proposed to be most susceptible to drive in asymmetric meioses? To this end, we applied the TDT genome-wide to three large, independent European cohorts with at least 800 parent–offspring pairs each, using multiple approaches to try to overcome the problems posed by genotyping error.

Materials and Methods

Genome-wide scan for TD

Samples:

We used three sets of pedigrees:

  • (1) The FHS is a longitudinal study of individuals of European ancestry from Framingham, Massachusetts (Dawber et al. 1951, 1963; Cupples et al. 2007). The study includes three generations of individuals, collected beginning in 1948.

  • (2) The Hutterites (HUTT) are a founder population of European ancestry. The HUTT samples included in this study were collected in South Dakota (Ober et al. 2001).

  • (3) The Autism Genetic Resource Exchange (AGRE) is a set of families in which more than one member has been diagnosed with an autism spectrum disorder (Geschwind et al. 2001). The AGRE families come from several self-reported race and ethnicity categories.

Quality controls (QC) on individuals:

For FHS and AGRE, we removed individuals with <90% call rate. No individuals had >5% SNPs with Mendelian errors; to enrich for high-quality samples in AGRE, we removed the 1% of individuals with the most Mendelian errors. We confirmed reported relationships with family members using identity-by-state (IBS); we used p(IBS1) > 0.75 for parent–offspring relationships to allow for variation around the expectation of p(IBS) = 1. We removed all individuals whose IBS information indicated that they were unrelated to the other individuals in their reported pedigree. We identified monozygotic twins and mislabeled duplicates using p(IBS2) > 0.9 for full siblings and kept only the individual with the highest call rate. We checked individuals’ sexes by confirming that they had the correct X chromosome homozygosity (F). The expectation for F is near 0 in females and 1 in males; we switched the sex labels for a parent pair whenever F was greater for the mother than the father (this occurred in three cases in AGRE only). In total, this resulted in the exclusion of 142 individuals in FHS and 90 in AGRE. All above steps were conducted using PLINK v. 1.07 (Purcell et al. 2007; http://pngu.mgh.harvard.edu/purcell/plink/). QC on individuals in AGRE was performed following principal component analysis (PCA) to define a European subset (described below). The HUTT data were preprocessed to remove any individuals with <95% call rate, >4% Mendelian error rate, sample misspecification, low concordance between Affymetrix platforms, or sex mismatch. We additionally removed one individual that IBS data suggested was a twin or sample duplicate.

The TDT is not sensitive to population stratification; however, heterogeneity in ancestry could dilute the signal of a geographically restricted segregation distorter or selected allele. We therefore attempted to construct a subset of individuals with fairly homogeneous ancestry, without drastically reducing the sample size. To this end, we performed PCA on HapMap CEU genotype data (Frazer et al. 2007) using Eigenstrat (Price et al. 2006) and projected the data from AGRE and FHS pedigree founders separately onto these PCs. We plotted PC1 against PC2, and we defined the “CEU ellipse” as the ellipse whose focus was the mean of HapMap CEU points and whose axes extended to the maxima and minima of these points. We then removed FHS/AGRE individuals whose (PC1, PC2) points fell outside a concentric ellipse that was 500% the size of this CEU ellipse, with the same axis proportions.

Quality controls (QC) on SNPs:

Within each data set, we retained only SNPs that met the following criteria: >90% call rate, <20 Mendelian errors, and HWE P-value (calculated using only data set founders) ≥10−4 (this filter was not applied in HUTT, due to the interrelatedness of the founders). In FHS, we also filtered individual genotypes whose BRLMM confidence score was in the top (i.e., worst) 5% of all scores (Affymetrix 2006). To reduce genotyping error further by eliminating genotypes that appeared unlikely according to HapMap data, we imputed FHS genotypes using Impute v1 with HapMap CEU as an imputation panel (Marchini et al. 2007). We excluded all SNPs whose concordance was <0.25 + 0.65 × I, where I represents information (this cutoff was based on the distribution of high-quality data on an imputation–concordance plot, as suggested by Bryan Howie, personal communication). This imputation-based filtering did not completely eliminate problematic SNPs with poor genotype clustering, as determined by visual inspection (results not shown). To exclude SNPs at which power was limited, we removed SNPs with <200 (FHS, AGRE) or <50 (HUTT) transmissions from heterozygous parents of the relevant type (with the reduced transmission requirement in HUTT due to its inclusion as a replication panel).

TDT:

The TDT is a McNemar’s test of the binomial (H0: pA1 = pA2 = 1/2), where pA1 is probability of transmitting the A1 allele and pA2 is the probability of transmitting the A2 allele. The test statistic, X = (bc)2/(b + c), where b and c are the numbers of observed transmissions of the A1 and A2 alleles, respectively, is asymptotically χ2 distributed with 1 d.f. (Spielman et al. 1993). We performed the TDT in all data sets for (1) all parental transmissions (“combined”), (2) paternal transmissions only (“paternal”), and (3) maternal transmissions only (“maternal”), using PLINK (Purcell et al. 2007), with all individuals in the pedigrees coded as “affected.” Raw data (transmission counts and P-values for all SNPs) for all tests for FHS will be available to approved users through dbGaP (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000007.v17.p6); data for HUTT and AGRE are provided in Table S1.

For cases in which all members of a trio are heterozygous at a locus, the allele transmitted by each parent is not identifiable without phase information. In these instances, 0.5 was added to both b and c when calculating the paternal and maternal test statistics. This biases the test statistic toward the null and produces estimates of allele transmission rates that are closer to 50% than they would be in the presence of TD. An alternative method for estimating transmission rates, which we implement when estimating the strength of TD (see Discussion), is to calculate the maximum-likelihood estimate, θp1, of the probability of transmitting the overrepresented allele from the parental sex of interest, when the opposite parent’s transmission rate is set to θp2 = 0.5.

We considered loci to be “maternal specific” if they reached a particular significance threshold in the maternal TDT but were not significant at P < 0.01 in the paternal TDT or had P < 0.01 in the paternal TDT, but with the opposite allele overtransmitted. The reverse comparison was used to identify “paternal-specific” loci.

Permutations:

To maintain the pattern of linkage within parents contributing to the test, we permuted the data as follows: for all offspring within a family, for each chromosome, with 50% probability, we flipped which allele was transmitted, and with 50% probability, we kept the transmitted allele as observed. We performed this permutation for all loci with sufficient number of transmissions (see above) that passed QC, and we determined permutation test statistics, recording the lowest P-value genome-wide. We then selected the 5%-tile of minimum P-values across permutations as the genome-wide significance threshold. In HUTT, because of the large number of children within each family and small overall sample size, permuting in this way does not substantially change the minimum P-value; we therefore used a Bonferroni correction to estimate genome-wide significance in this data set.

Replication:

Because of the prevalence of genotyping error in FHS, we looked for replication of the top FHS combined TDT signals in HUTT to gain confidence that some of these signals were truly due to TD. We defined SNPs as “replicating” if they reached genome-wide significance in FHS and had P < 0.01 in HUTT. We tested whether more of the FHS genome-wide significant SNPs replicated in HUTT than expected by chance, by examining (1 − Fbinom(x; n, p))/2, where Fbinom represents the cumulative distribution function of the binomial, x the observed number of replicating SNPs, n the number of independent (r2 < 0.2) SNPs with sufficient sample size in HUTT, and p the empirical probability of any SNP having P < 0.01 in HUTT. We divided by 2 because chance overtransmission is equally likely to occur for either allele.

Validation:

Because of our concerns that many of the top signals in FHS and HUTT (both genotyped on Affymetrix platforms) might be driven by genotyping error, we attempted to validate the top HUTT signals using an independent technology. Specifically, we genotyped a subset of 384 HUTT on the Sequenom iPLEX Gold platform with a multiplex designed to contain five of the six genome-wide significant maternal-specific TDT SNPs and the top five combined TDT SNPs from HUTT, along with eight other SNPs.

From the iPLEX output, we eliminated individuals with <50% of genotypes successfully called. All remaining individuals were called at ≥12 of the 17 successfully typed SNPs. We removed individuals involved in at least one Mendelian error at a SNP for which the yield, peak, and clustering for that individual did not suggest genotyping error, because the identity of these individuals was uncertain. This resulted in the elimination of all but two Mendelian errors. We then removed the individuals most likely responsible for the errors at these particular SNPs, using peak height and genotype clustering (by eye) to determine which individual was of poorest quality. We removed one SNP that failed, producing yields similar to the negative controls. All remaining SNPs had call rates >94%. We additionally removed genotypes with yield <0.7.

We estimated whether the genotypes obtained from iPLEX supported the TDT results obtained from the Affymetrix arrays as follows: we computed error rates from the Affymetrix arrays for each SNP, assuming that the iPLEX genotypes were the truth. We then used iPLEX genotypes for all individuals typed on that platform and generated genotypes for all other individuals at random using the error rates estimated for each SNP. We calculated the mean P-value for the TDT in these randomized data sets (pRandom), setting p to 1 for any randomizations with overtransmission of the opposite allele. We considered a result validated if pRandom was genome-wide significant for the relevant (combined or maternal) TDT.

Investigation of autism-related TD in AGRE:

To reduce the probability that SNPs in AGRE displayed TD because of the overrepresentation of individuals with autism, we determined whether the results differed between offspring with and without a diagnosis of autism spectrum disorder (ASD). For top SNPs in AGRE, we performed the TDT separately in ASD and non-ASD offspring. We then compared the transmission of each allele in the two subsets using a Fisher’s exact test (Supporting Information, Table S2).

Characterizing regions with TD:

We defined a TD region as the maximal region surrounding the SNP with the lowest P-value (the “focal SNP”) that contained all SNPs with both r2 > 0.5 with the focal SNP and P-value <0.01, and in which more than half the SNPs excluding the focal SNP had P < 0.01. We used the UCSC browser (http://genome.ucsc.edu/) to identify all genes within the top 10 TD regions for each test. We selected regions for functional enrichment analysis using a P-value cutoff of 10−4 (combined TDT) or 10−3 (paternal and maternal TDT). We then used the DAVID bioinformatics resources website (Huang et al. 2008, 2009) to test for enrichment of gene ontologies, considering the gene nearest the focal SNP in physical distance within each region, identified using the UCSC browser (http://genome.ucsc.edu/). To look for evidence supporting a selective sweep at or near SNPs of interest, we examined iHS (Voight et al. 2006) and XP-EHH (Sabeti et al. 2007) scores obtained from Hapmap phase II data for autosomal SNPs (Frazer et al. 2007; Pickrell et al. 2009). We obtained derived/ancestral state information using Haplotter (http://haplotter.uchicago.edu/; Voight et al. 2006). SNP categories are as listed in dbSNP build 132 (http://www.ncbi.nlm.nih.gov/projects/SNP).

Assessing overlap with region syntenic to mouse t-haplotypes:

One reason Santos et al. (2009) provided for investigating the p arm of human chromosome 6 is that it is largely syntenic to mouse chromosome 17, where t-haplotypes are located. Given that we also find a paternal signal on chromosome 6p, we assessed whether this region shared sequence similarity to the t-haplotype region; locations of shared sequence similarity with the mouse genome for the paternal-specific TD region in 6p21.1 were determined using the UCSC genome browser conversion tool (http://genome.ucsc.edu). The only region of the mouse genome with sequence similarity spanning the entire TD region identified here is outside the annotated boundaries of t-haplotypes (Silver 1993; Wallace and Erhart 2008).

Comparing AGRE signal in 6p21.1 between Affymetrix and Illumina platforms:

We considered only those individuals who had been genotyped on both platforms and were included in the original TDT in AGRE (using Illumina data). We merged the data sets, setting any genotypes differing between Affymetrix and Illumina to missing data. Aside from elimination of Mendelian errors, no quality control steps were performed for Affymetrix genotyping data. We considered SNPs with at least 141 transmissions, the minimum sample size required for 80% power to detect TD at P < 0.05, using the estimated distortion strength of 0.1187. We additionally used this data set to calculate pairwise LD between Affymetrix and Illumina SNPs in AGRE founders.

Analysis of maternal TD near centromeres and telomeres:

We calculated genetic distance to the centromere for all SNPs using the HapMap phase II genetic map (Frazer et al. 2007) and gaps in the assembly annotated as centromeres by the UCSC Genome Browser (http://genome.ucsc.edu/), using build HG18. We also calculated genetic distance to the most telomeric SNP in the HapMap phase II (Frazer et al. 2007) and physical distance between the most telomeric SNPs in our data set and gaps in the assembly annotated as telomeres by the UCSC Genome Browser (http://genome.ucsc.edu/) using build HG19 (telomere locations were not available for all chromosomes in HG18). We determined the genetic distance to the centromere and most telomeric HapMap SNP, as well as the number of SNPs between the SNP and the centromere and telomere, for all maternal-specific (i.e., paternal TDT P > 0.01) SNPs with P < 10−3. We additionally checked for marginally significant maternal TDT P-values (P < 0.05) at the SNP closest to the centromere and telomere on both arms of metacentric chromosomes and the q arm of acrocentric chromosomes.

Sperm typing to test for TD in sperm production or motility

Samples:

Blood and semen from anonymous donors were provided by the Kinderwunsch Zentrum of the Landes-Frauen-und Kinderklinik, Linz, Upper Austria, Austria. All ejaculates were obtained by sterile masturbation. Blood DNA was extracted using the PAXgene blood DNA kit (Qiagen, Germany). Sperm DNA was extracted using the Gentra Puregene Cell Kit (Qiagen, Germany) with the addition of 24 µM DTT (Sigma-Aldrich, Austria) and 60 µg/ml proteinase K during the cell lysis step and 1 µl glycogen solution (Qiagen, Germany) during the DNA precipitation step. The DNA pellet was resuspended in TE buffer (pH 7.4). The genotype of donor samples was determined for three SNPs (rs9381373, rs1284965, and rs2093903) within the region of possible paternal-specific TD in 6p21.1. Sample genotypes were determined as described below for the genotyping of single molecule amplifications (SMA), except that 10 ng of genomic DNA from blood was used per reaction instead of the 1000-fold SMA dilution.

Sperm motility assay:

To test for TD in sperm motility, sperm from five patients, either normozoospermic or with mild forms of teratozoospermia, were processed as previously described (Ebner et al. 2011). In short, a special sperm selecting chamber (Zech-selector, AssTIC AMedizintechnik GmbH, Leutsch, Austria) was used to separate highly motile spermatozoa from slower ones. This device consists of two concentric wells, overlain by a U-ring. Progressive motile spermatozoa migrate from the ejaculate in the outer well (3 ml) to concentrate in the medium-filled (BM1Medium, Eurobio, Courtaboeuf, France) inner well, using a capillary bridge created by the overlying U-ring. If the volume of ejaculate was less than 3 ml, the outer well was filled to that volume with BM1 medium. After 20 min to 1 hr, the sperm solution from the central chamber was centrifuged to concentrate highly motile male gametes. These were cryostored at −20° and later referred to further analyses.

Single molecule amplification (SMA):

SMA was performed for all ten sperm donors identified as heterozygous for at least one of three SNPs (rs9381373, rs1284965, and rs2093903) within the region of possible paternal-specific TD in 6p21.1. Sample genotypes were determined as described below for the genotyping of single molecule amplifications (SMA), except that 10 ng of genomic DNA from blood was used per reaction instead of the 1000-fold SMA dilution. A 1914-bp region containing the three SNPs was amplified using the following PCR conditions: 1× Phire HS buffer (Biozym, Austria), 0.16 mM dNTPs, 0.8 µM forward (5′-AGCCTCTTGTGCCAAACAGT-3′) and 0.8 µM reverse primers (5′-TTTTTGCTGGCAGAGGATCT-3′), 0.5× EvaGreen fluorescent DNA stain (Jena Bioscience, Germany), 0.25 µl Phire Hot Start DNA polymerase (Biozym, Austria), and 0.3–0.6 molecules of blood or sperm DNA per reaction. This amount of template ensured that <10% of the reactions had more than one molecule amplified, according to the Poisson distribution. SMA reactions were set up in a dedicated laminar flow hood decontaminated with UV light and 10% chlorine before the start of each experiment. No-template controls were included in each experiment to screen for contamination. The PCR was performed in 10 µl volumes in a real-time PCR thermocycler (CFX384 System, Bio-Rad), using an initial heating step of 94° for 2 min followed by 5 cycles at 94° for 15 sec, 65° for 15 sec, and 72° for 30 sec, and then 35 cycles at 94° for 15 sec, 68° for 15 sec, and 72° for 30 sec. We considered the amplification curve and the melting curve profile to identify PCR reactions that amplified our region of interest. We verified the amplification of the correct product by acrylamide gel electrophoresis in the initial stages when appropriate experimental conditions were optimized. Approximately 20% of our SMA reactions had the wrong product size, probably due to amplification of other genomic regions. These false positives were identified by a different melt curve profile and did not render a genotyping result, so they could be easily excluded.

Genotyping single molecule amplifications:

SMA reactions that amplified the region of interest were diluted 1000-fold and genotyped by allele-specific PCR in combination with a real-time PCR machine (CFX384 System, Bio-Rad), as described previously (Tiemann-Boege et al. 2006). The last three phosphodiester bonds at the 3′ end of the allele-specific primers were substituted by phosphorothioate bonds to increase allele-specific selectivity. The allele-specific PCR reactions were carried out in 10 µl volumes containing 5 µl of the SMA dilution, 0.4 µM allele-specific primer, and 0.4 µM outside primer, and either 1× OneTaq Reaction Buffer (NEB), 1× SYBR Green I (Invitrogen), and 0.125 U OneTaq Hot Start DNA polymerase or 1× AmpliTaq Gold buffer, 1.5 mM MgCl2, 0.16 mM dNTPs, 1× SYBR Green I (Invitrogen), and 0.25 U Z05 DNA polymerase (Roche, Austria). The primer sequences are shown in Table S3.

The reactions were carried out with an initial heating step of 94° for 2 min, followed by 40 cycles at 94° for 30 sec, 65° for 30 sec, and 72° for 15 sec when using OneTaq Hot Start DNA polymerase or with 95° for 2 min, followed by 40 cycles at 95° for 30 sec, 56° for 30 sec, and 72° for 15 sec when using Z05 DNA polymerase. For each sample two reactions were amplified: one for each allele, differing only by the allele-specific primer. The genotype was assessed using the difference between the quantification cycles (Cqs) of the two allele-specific reactions; homozygote samples presented a large difference compared to heterozygote samples. To verify the genotyping data, we genotyped two SNPs for each sample, except for one donor who was heterozygous at only one of the three SNPs; for this donor, we genotyped the same SNP twice. In 98% of cases, the alleles at the two SNPs conformed to the expected haplotypes, and for rare nonmatching genotypes between SNPs (mostly occurring for samples with more than one molecule), we repeated the genotyping and corrected the false genotyping call. Reactions that resulted in a heterozygous genotype at each SNP, indicating that more than one molecule had been amplified, were eliminated from the analysis. Equivalence of the two PCR methods (OneTaq Hot Start and Z05 DNA polymerase) was determined by typing 48 samples from one donor using both polymerases; both methods yielded the same genotype for all samples.

Testing allelic ratios:

Blood genotyping was performed in a subset of donors as a measure of noise. We performed a two-sided test of the binomial for blood samples because we had no expectation for which allele would be overrepresented. In sperm, we tested for deviations from 50% occurrence of each allele independently for each donor, as well as for all donors combined (because a χ2 test of homogeneity did not demonstrate significant heterogeneity among donors). Given that we expected a particular allele to be overrepresented in sperm under the alternative hypothesis of TD, we performed a one-sided test of the binomial (i.e., the P-value is the probability of observing at least as many of the TDT-based overtransmitted alleles as we observed in the sperm genotyping, assuming a binomial with parameter 0.5). We tested unselected sperm (not selected using the motility assay) and fast sperm (the fastest 0.005–5.155% of sperm from the motility assay) separately. We additionally tested whether the transmission rate from sperm typing was compatible with that inferred from the TDT, using the following likelihood-ratio test: LR = Lik(θsperm = θTDT)/Lik(θsperm, θTDT), where θ represents the paternal transmission rate of the TDT-based overrepresented allele at SNP rs9381373; for this test, we combined allele counts from unselected and selected sperm.

Research on FHS and AGRE data were approved by University of Chicago IRB 10-674-B, “Population genetic analyses of the Framingham and AGRE Data.” Because we analyzed only data that had been previously collected by other researchers for other purposes and were then made available to us, our IRB granted a waiver of consent. Research on HUTT data was approved by University of Chicago IRB numbers 5444, “Studies of fertility in Hutterite couples,” and 8073, “Genetic studies of complex phenotypes in the Hutterites.” For the sperm genotyping, blood and semen from anonymous donors were provided by the IVF clinic of the Landes-Frauen-und Kinderklinik, Linz, following protocols approved by the Ethics Committee of Upper Austria (EK-Number: 1-11 [2.1.6]).

Results

We performed the TDT using transmissions from both parents (combined), only fathers (paternal), and only mothers (maternal) in FHS, HUTT, and AGRE (see sample descriptions in Materials and Methods). In total, these pedigrees consisted of 4728 offspring with both parents genotyped (Table 1). To ensure adequate power, we required a minimum of 200 informative transmissions per SNP in FHS and AGRE and 50 in HUTT (with a reduced sample size in HUTT because it was considered a replication panel). Considering a genome-wide significance level, α, of 1.08 × 10−7 for the combined TDT in FHS and 1.10 × 10−7 in AGRE (see Materials and Methods), the distortion strength required to achieve 50% power with these sample sizes is 18.9% in both cases. Although this suggests that power is limited unless distortion is very strong, most SNPs had substantially more informative transmissions; in AGRE, the median number of transmissions was 1102, yielding 50% power for genome-wide significance at distortion strength 8.1% and for α = 10−4 at 5.9%.

Table 1. Samples and SNPs remaining after quality control and sample size cutoffs.

Sample Offspring SNPs
(combined) SNPs
(paternal) SNPs
(maternal) Genotyping array
FHSa 2,362 353,116 328,855 335,466 Affymetrix GeneChip® Mapping 500K (Affymetrix, Santa Clara, CA)
HUTTa 848 538,139 343,945 353,047 Affymetrix GeneChip® Mapping 500K and SNPChips, 5.0 and 6.0 (Affymetrix)
AGREa 1,518 491,632 452,468 462,981 Illumina Human Hap550 BeadChip (Illumina Inc., San Diego, CA)
a

For sample names and descriptions, see Materials and Methods.

Analysis of FHS and HUTT

In the FHS data, we found an extreme excess of low P-values and inconsistent TD signals between neighboring SNPs (Figure S1), confirming that these results largely reflect genotyping error-driven false positives (Paterson et al. 2009). Visual comparison of signal intensity data and BRLMM calls (Affymetrix 2006) for the top 100 loci in each test with those for 100 random loci in FHS revealed an excess of poor clustering, apparent incorrect calls, and differential levels of missing data among genotypes at the top loci. In addition, in 85 of the top 100 loci from the combined TDT, the major allele was overtransmitted, as expected under both types of genotyping error that produce strong TD signals (i.e., major allele homozygotes frequently mistyped as heterozygotes and higher missed call rate among heterozygotes than among homozygotes; Mitchell et al. 2003; Hirschhorn and Daly 2005).

We therefore sought to replicate the findings from FHS in HUTT. We did not find enrichment in the top signals: of 263 independent SNPs that were genome-wide significant in FHS and included in the TDT in HUTT, we found five that had P < 0.01 in the HUTT with overtransmission of the same allele, when 2.1 were expected by chance (P = 0.206; see Materials and Methods). To evaluate whether top TDT signals derived using Affymetrix genotyping arrays were frequently driven by genotyping error, we additionally used an independent technology to regenotype all five maternal-specific and the top five combined genome-wide significant SNPs in HUTT. This validation experiment suggested that these genome-wide significant TDT results largely reflected incorrect genotype calls (Table S4). Together, these findings suggest that any true signal of TD within FHS and HUTT is obscured by noise from genotyping error.

AGRE TDT results

Next, we considered the output of the TDT in a data set generated using a different genotyping platform, the European subset of the AGRE (Figure 1). These data appeared to be much less affected by poor genotype calls than the other data sets (Figure S2). In particular, very few P-values were <10−4 in the combined TDT or 10−3 in the maternal and paternal TDTs, in contrast to observations in FHS and HUTT (the P-values tend to be higher for paternal and maternal than for combined TDTs because of the smaller number of transmissions and the lack of information for triple-heterozygote trios; see Materials and Methods). Moreover, there was more apparent clustering in signal in AGRE, with low P-values tending to occur at multiple neighboring SNPs.

Figure 1.

Figure 1

Manhattan plots for the TDT in AGRE. The TDT was performed separately considering (A) all transmissions, (B) paternal transmissions only, and (C) maternal transmissions only. The horizontal line in each plot indicates the permutation-based genome-wide significance threshold of 1.10 × 10−7 (combined), 1.20 × 10−6 (paternal), or 1.34 × 10−6 (maternal).

In the combined TDT, rs748001 on chromosome 10 reached genome-wide significance, with a P-value of 4.55 × 10−8 (permutation-based genome-wide P = 0.021; see Materials and Methods). SNPs in LD with rs748001 also had low P-values (12 SNPs with r2 > 0.3 had P < 0.01; three of these were significant at α = 10−4) (Figure 2). SNP rs748001 is involved in no Mendelian errors and has a call rate of 96.25% in AGRE. Together, these results indicate that the TD signal at rs748001 is not the result of genotyping error. Moreover, at this SNP, the transmission rates in ASD-only and non-ASD-only children do not differ (P = 0.129), indicating that this signal is not influenced by ascertainment for individuals with autism.

Figure 2.

Figure 2

Region surrounding genome-wide significant SNP in AGRE combined TDT. (A) The region (shaded) is shown with the nearest upstream and downstream genes. All SNPs with P < 0.01 for the combined TDT are plotted as black points. (B) A close-up of the region (shaded) is shown, with SNPs colored by their LD with the focal SNP (rs748001), which is circled in red. SNPs with | iHS | > 2 are starred. The most highly conserved regions of all conserved in vertebrates (Siepel et al. 2005) are denoted in orange. SNP positions are as mapped in HG18.

The one genome-wide significant signal in the maternal TDT in AGRE is at rs12858772 on the X chromosome. Contrary to expectation under true TD, however, two SNPs in strong LD (r2 > 0.6) with this SNP do not deviate from 50% transmission of each allele (minimum pMaternal = 0.172), indicating that this signal is likely due to genotyping error (Figure S3). In the paternal TDT, there are no genome-wide significant signals, but several regions contain multiple SNPs with low P-values (P < 0.01), suggesting possible TD.

For each of the three tests, we investigated in more detail the top 10 signals in which more than half of the other SNPs in the TD region (see Materials and Methods) had P < 0.01 (Table 2). Of these 30 regions, 16 contained at least four SNPs with P < 0.01 in addition to the focal SNP, and three regions contained at least 10 such SNPs, providing strong evidence that these TD signals are not due to genotyping error (although not ruling out chance fluctuations in transmission rates).

Table 2. Top 10 SNPs in each test with regional signals of TD.

TDTa Focal SNP Chr p Rateb OAc Nd MAFe Lengthf
(kb) Startg
(HG17) No.h Genes
C rs748001 10 4.55 × 10−8 0.585 Anc 1036 0.248 27.586 127199116 8 None
C rs12661087 6 3.65 × 10−6 0.601 Anc 524 0.124 203.859 120861096 6 None
C rs13112011 4 4.90 × 10−6 0.624 Anc 338 0.074 20.956 25233826 4 None
C rs1941852 11 7.91 × 10−6 0.613 Der 388 0.069 40.57 97380183 4 None
C rs10766778 11 9.74 × 10−6 0.570 Anc 1002 0.210 2.297 21219167 2 NELL1
C rs1249667 1 1.33 × 10−5 0.637 Der 251 0.051 39.316 75491425 2 SLC44A5
C rs16910190 11 1.35 × 10−5 0.578 Der 773 0.158 52.365 22773148 5 GAS2, SVIP
C rs12582514 12 1.59 × 10−5 0.558 Der 1374 0.483 99.547 21469826 13 PYROXD1, RECQL, GOLT1B
C rs10935427 3 1.80 × 10−5 0.556 Anc 1445 0.396 97.379 142396949 9 ACPL2
C rs10136259 14 1.94 × 10−5 0.566 Anc 1059 0.220 7.736 32663208 3 NPAS3
P rs12199720 6 1.77 × 10−5 0.593 Anc 532 0.409 710.622 44779449 46 SUPT3H, MIR586, RUNX2
P rs10795851 10 2.28 × 10−5 0.596 Der 482 0.198 31.105 11265343 8 CUGBP2
P rs2921031 8 3.29 × 10−5 0.601 Der 419 0.190 22.967 8365511 5 None
P rs2166013 5 4.11 × 10−5 0.606 Der 371 0.147 109.672 119407151 8 None
P rs1377210 4 4.11 × 10−5 0.627 Der 259 0.100 40.571 110022271 2 AGXT2L1
P rs6084148 20 6.62 × 10−5 0.584 Der 567 0.237 36.55 2632924 5 EBF4
P rs4261974 4 7.97 × 10−5 0.586 Der 532 0.220 9.772 120221079 3 SYNPO2
P rs295263 9 8.46 × 10−5 0.603 Der 364 0.134 19.792 4829511 4 RCL1, MIR101-2
P rs10506080 12 9.63 × 10−5 0.592 Der 453 0.174 69.518 31825919 5 H3F3C
P rs321202 5 1.10 × 10−4 0.570 Anc 765 0.414 122.722 143107350 6 HMHB1
M rs17807087 1 6.14 × 10−6 0.633 Anc 290 0.106 21.172 229092422 4 None
M rs2282315 6 1.60 × 10−5 0.645 Anc 220 0.086 53.056 132808430 3 STX7
M rs12229163 12 2.43 × 10−5 0.649 Anc 202 0.076 24.574 113358516 2 None
M rs4646421 15 2.58 × 10−5 0.621 Der 301 0.108 302.84 72503662 6 6i
M rs10868142 9 1.38 × 10−4 0.597 Anc 387 0.171 14.718 84150265 6 SLC28A3
M rs952893 2 1.41 × 10−4 0.589 Der 453 0.175 125.338 51727031 13 None
M rs17066329 4 1.54 × 10−4 0.614 Anc 277 0.112 11.921 179845588 2 None
M rs1372679/rs1372680 4 1.64 × 10−4 0.616 Anc 262 0.094 20.547 100658226 3 None
M rs10216366 8 1.75 × 10−4 0.593 Anc 410 0.157 64.351 126776817 4 None
M rs16973745 18 1.82 × 10−4 0.627 Anc 216 0.081 124.932 36610407 5 None
a

Category of TDT (C, combined; P, paternal; M, maternal)

b

Rate at which the overtransmitted allele was transmitted.

c

Overtransmitted allele (Anc, ancestral; Der, derived)

d

Sample size (number of transmissions)

e

Minor allele frequency in AGRE founders.

f

Length of the region of TD (see Materials and Methods)

g

Start of the region of TD (see Materials and Methods)

h

Number of SNPs within the TD region that have P < 0.01.

i

This region contains six genes: SEMA7A, UBL7, ARID3B, CLK3, EDC3, and CYP1A1.

We performed a test for enrichment of specific gene ontologies on the collection of genes nearest to the SNP with the lowest P-value in the top regions for each test, using the DAVID bioinformatics resources website (Huang et al. 2008, 2009). For this analysis, we considered all regions (described above) with lowest P-value < 10−4 (combined) or lowest P-value < 10−3 (paternal or maternal, considering sex-specific results only). The most enriched categories were, for the combined test, “alternative splicing” (P = 0.0459; 1.59-fold enrichment); for the paternal test, “vitamin metabolic process” and “cell maturation” (P = 0.0593; 30.06-fold enrichment); and for the maternal test, “vinculin, conserved site” (P = 5.27 × 10−3; 362-fold enrichment), with additional related functional categories also enriched (see Table 3; P-values are uncorrected for multiple testing but presented for comparison among categories). The top GO categories related to combined TD signals were broad and difficult to interpret, and none of those related to paternal TD signals were enriched at P < 0.05. Intriguingly, however, the maternal-specific TD signals tagged vinculin and an α-catenin, which are unlinked but share the capacity to bind actin and are involved in cytoskeletal integrity and cell spreading. If variants in these genes influence cell division or early development, this would provide a candidate mechanism for distortion in females.

Table 3. Gene ontology terms most enriched among genes nearest top TD signals.

Test Term No. genes Fold enrichment P-valuea FDR (%) Annotation source
Combined Alternative splicing 13 1.590 0.0459 38.91 SP_PIR_KEYWORDS
Combined Splice variant 13 1.586 0.0467 41.31 UP_SEQ_FEATURE
Paternal Vitamin metabolic process 2 30.062 0.0593 54.26 GOTERM_BP_FAT
Paternal Cell maturation 2 30.062 0.0593 54.26 GOTERM_BP_FAT
Maternal Vinculin, conserved site 2 362.152 0.00527 5.31 INTERPRO
Maternal Vinculin/alpha-catenin 2 289.722 0.00659 6.59 INTERPRO
Maternal Fascia adherens 2 159.775 0.0117 10.95 GOTERM_CC_FAT
Maternal Intercalated disc 2 106.517 0.0175 15.97 GOTERM_CC_FAT
Maternal Cell–cell junction 3 12.614 0.0203 18.35 GOTERM_CC_FAT
Maternal Cell–cell adherens junction 2 45.65 0.0403 33.39 GOTERM_CC_FAT
a

Uncorrected P-values are presented for comparative purposes.

A suggestive signal of paternal TD in AGRE

In the paternal-specific TDT, there was a strong signal of TD (P = 1.77 × 10−5) in a region where experiment-wide TD was previously identified in HapMap CEU males (Santos et al. 2009). The region of TD in AGRE spans ∼711 kb on chromosome 6 surrounding SNP rs12199720, which is the strongest regional signal of paternal-specific TD in the AGRE (Table 2 and Figure 3). The finding of TD in this region in the paternal but not the maternal TDT (minimum pMaternal = 0.1037) both indicates that this TD is due to a male-specific process and suggests that the signal is not due to subtle genotyping error affecting calls for parents of both sexes.

Figure 3.

Figure 3

Region of suggestive paternal TD in AGRE. The region (shaded) is shown with SNPs colored by their LD with the focal SNP (rs12199720), which is circled in red. In the lower half of the figure are the genes in the region, as well as lines indicating in orange the region previously reported by Santos et al. (2009) in CEU males and in purple the amplicon used for genotyping single sperm. All positions are as mapped in HG18.

If there is truly distortion in this region, the causal SNP is likely to be regulatory: the only three nonsynonymous SNPs in this region known to be polymorphic in CEU have minor allele frequencies of <0.07 (Frazer et al. 2007; 1000 Genomes Project Consortium 2010), making them unlikely to be driving the observed TD signal. The transcription factors RUNX2 and SUPT3H and the miRNA MIRN586 all fall within the region, and top SNP rs12199720 is within an intron of both RUNX2 and SUPT3H, with the nearest exon in RUNX2. These two genes play an important role in human growth; RUNX2 is involved in osteoblastic differentiation and skeletal morphogenesis (Otto et al. 1997; Ducy et al. 2000; Wheeler et al. 2000), defects in RUNX2 cause the autosomal dominant skeletal disorder cleidocranial dysplasia (CLCD) (Mundlos et al. 1997), and a SNP in an intron of SUPT3H was suggestively associated with human height (Gudbjartsson et al. 2008). Both RUNX2 and SUPT3H have moderate transcript abundance in human testis (Wang et al. 2008), and experimental evidence indicates that RUNX2 is expressed in mouse testis during spermatogenesis (Jeong et al. 2008). A segregation distorter that affects the production or maturation of sperm would be expected to show male-specific TD, the signal observed for this region.

To determine whether SNPs in this region were associated with long-range haplotypes characteristic of selective sweeps, we investigated the integrated Haplotype Score (iHS; Voight et al. 2006) and cross-population extended haplotype homozygosity (XP-EHH; Sabeti et al. 2007) at SNPs throughout the region, using statistics derived from the HapMap phase II populations (Frazer et al. 2007; Pickrell et al. 2009). Of 619 SNPs in the region with minor allele frequency ≥5% in CEU, only SNP rs9357480 has an iHS score within the 1% tail of genomewide |iHS| (iHS = 2.74, P = 0.0043). The maximum XP-EHH score in the region when comparing CEU and YRI is 0.756 at SNP rs10508643; 44.7% of SNPs genome-wide with positive XP-EHH scores have a higher score, indicating a lack of evidence for a near complete selective sweep in Europeans. Focal SNP rs12199720 does not have an extreme value for either iHS (−0.720, P = 0.479) or XP-EHH (−0.709, P = 0.239).

The region that we identify in AGRE overlaps almost entirely with the region identified by Santos et al. (2009); of the 733 kb spanned by the union of both regions, 708 kb is in the intersection (Figure 3). The SNP that we identify as most significant, rs12199720, had P = 5.1 × 10−4 in Santos et al. (2009); it appears that study’s sample size (14 transmissions) was insufficient to detect this SNP as experiment-wide significant. The four tagSNPs that met genome-wide significance in Santos et al. (2009) were not typed in AGRE but were in strong LD (r2 = 0.749, 0.693, 0.720, and 0.339; 1000 Genomes Project Consortium 2010) with rs12199720. Additionally, 26 of the 33 SNPs in the AGRE region that were typed in both studies and had paternal P < 0.01 in our study also had paternal P < 0.01 in Santos et al. (2009). These facts strongly suggest that the source of the TD signal in both data sets is the same.

Because multiple SNPs typed in AGRE fell within the region identified by Santos et al. (2009), we sought to determine the probability of observing a P-value as low as 1.77 × 10−5 at any one of these by chance. We therefore implemented the permutation procedure described for determining genome-wide significance (see Materials and Methods) for SNPs within this region. In 1000 random permutations, the minimum P-value for any SNP in the region was 7.05 × 10−5, suggesting that the empirical probability of observing any P-value as extreme as that which we observe here is P < 0.001.

An alternative to calculating the probability of observing such a strong signal in the same region by chance is to analyze our data in combination with the HapMap CEU data from Santos et al. (2009) as a meta-analysis. When we combined the inferred counts of each allele transmitted at top AGRE SNP rs12199720 in the HapMap CEU with those obtained from AGRE, the resulting TDT P-value was 1.64 × 10−6, and the empirical P-value estimate was 0.072. This permutation-based P-value estimate is slightly inaccurate due to the addition of the HapMap CEU samples, which were not included in the permutations, at this locus; however, this should be a small effect. Thus, the meta-analysis suggests that our combined findings are somewhat unlikely, but not compellingly so.

We investigated whether FHS and HUTT also showed evidence of TD in this region. The Affymetrix platform does not include top AGRE SNP rs12199720 but does include many other SNPs in this region. Of the 72 SNPs within the region that pass QC in FHS, only two have paternal TDT P < 0.01, only one of these (rs16873103) is supported by other SNPs in LD, and this SNP is not in strong LD with rs12199720 (r2 = 0.06 in AGRE). In turn, none of the 75 SNPs in the region that pass QC in HUTT have paternal TDT P < 0.01. The lack of signal is not due to a lack of power in FHS: given the estimated distortion strength of 0.1187, there is 83.2% estimated power to detect P < 0.01 with the minimum observed 223 transmissions. Power may influence ability to detect the signal in HUTT, however, with only 21.7% power to detect P < 0.01 with the minimum observed 58 transmissions. Of the 62 HUTT SNPs with >80% power to detect P < 0.05, six have P < 0.05. Five of these were typed in AGRE, and all were in moderate LD with rs12199720 (r2 from 0.234 to 0.306 in AGRE), indicating that they may be due to the same signal.

To investigate whether the failure to replicate in FHS could be due to a platform-specific technical artifact, we also performed the TDT on a subset of AGRE individuals who had been genotyped on both Affymetrix and Illumina platforms. Eight of the 53 Illumina-specific SNPs in the region had P < 0.01 in this subset, compared with 7 of 58 Affymetrix-specific SNPs, and 2 of 14 overlapping SNPs. This indicates that the signal is not platform specific; therefore, the lack of replication in FHS is particularly worrisome and suggests that the signal in the other data sets is unlikely to be driven by real TD.

Sperm typing and motility assays

Because there was evidence for TD in 6p21.1 in HapMap and AGRE but no evidence in FHS and HUTT, we sought to determine whether functional assays would independently support this as a TD region. To test for evidence of distortion during meiosis, we assayed SNPs within a 1914-bp region (bp 45,283,735–45,285,648 on chromosome 6) within the SUPT3H gene using single-molecule amplification (SMA) in mature sperm (Figure 3). We genotyped one to three heterozygous SNPs in the amplified SMA reactions and used the counts of each allele to test for a deviation from 50%.

We screened on average 370 sperm molecules per donor across seven different Caucasian donors. The data were consistent with our expectation that approximately 10–15% of the reactions would have more than one molecule, with half of these detectable as heterozygotes (see Materials and Methods). None of the observed counts from donor-matched blood controls or from sperm that were not selected by motility assay deviated significantly from 50% of each allele (Table 4), indicating a lack of evidence for TD during male meiosis or the formation of mature sperm.

Table 4. Sperm typing data from single DNA molecules characterized for transmission distortion.

Donor ID SNPs OHap/
UHapa Blood Unselected sperm Fast sperm Prop fast sperm (%)f
Ob Uc pd Ob Uc pe Ob Uc pe
26 rs9381373 TA/CC 129 153 0.171 142 119 0.0866 NA NA NA NA
rs2093903
1 rs9381373 TA/CC 134 141 0.718 143 140 0.453 NA NA NA NA
rs2093903
1006 rs9381373 TA/CC 123 131 0.661 150 155 0.634 NA NA NA NA
rs2093903
19 rs9381373 TG/CA 186 210 0.248 249 251 0.553 NA NA NA NA
rs1284965
21 rs9381373 TA/CC NA NA NA 173 200 0.926 NA NA NA NA
rs2093903
37 rs9381373 TA/CC NA NA NA 109 124 0.853 101 111 0.775 0.922
rs2093903
45 rs1284965 G/A NA NA NA 222 211 0.361 121 116 0.398 5.155
2 rs9381373 TA/CC NA NA NA NA NA NA 123 136 0.808 0.015
rs2093903
6 rs9381373 TGA/CAC NA NA NA NA NA NA 112 134 0.929 0.007
rs1284965
rs2093903
8 rs9381373 TGA/CAC NA NA NA NA NA NA 110 149 0.994 0.005
rs1284965
rs2093903
Total 572 635 0.0743 1067 1084 0.651 567 646 0.989
a

Haplotype expected to be overrepresented/haplotype expected to be underrepresented on the basis of the TDT in AGRE.

b

Count of haplotype expected to be overrepresented on the basis of the TDT.

c

Count of haplotype expected to be underrepresented on the basis of the TDT.

d

Probability of observing at least as strong a deviation from 50% each allele (two-sided binomial)

e

Probability of observing at least as many of the allele expected to be overrepresented as observed (1-sided binomial)

f

Proportion of total sperm selected by motility assay for fast sperm.

We also tested whether TD in this region might influence sperm motility by assaying allelic ratios in sperm fractions containing only the fastest sperm molecules, obtained as described in Materials and Methods. There were no statistically significant differences from 50% frequency of each allele in any of the samples of fastest sperm (Table 4). When we combined the sperm genotyping data for all donors, we observed that the allele transmission rate inferred from sperm typing was significantly different from that inferred using the TDT (P = 1.341 × 10−4).

On the basis of transmission rates from the lowest TDT P-value SNP in the region, we estimate that the distortion strength in the region is 11.87%, with a normal approximation 95% confidence interval of [7.7%, 16.0%]. Our power to detect distortion strength of 7.7% at α = 0.05 is 70.4% with a sample size of 261 and 73.9% with a sample size of 283 (the two lowest sample sizes for sperm genotyping), suggesting that we do not lack power to detect distortion in sperm unless the true distortion strength is substantially lower than estimated here.

Candidate region for TD in AGRE

In the combined test, one SNP (rs748001) achieved genome-wide significance (Table 2). As defined, the region around this SNP contains no genes, but it does contain several regions that are among the most highly conserved elements in vertebrates (i.e., within the 5000 most highly conserved elements out of 1.31 million total conserved elements; Siepel et al. 2005) (Figure 2). The maximum range at which loci are in LD (r2 > 0.1; 1000 Genomes Project Consortium 2010) with the focal SNP contains all of LOC100169752 and ∼22 kb of the 3′ end of C10ORF122, including one exon. Notably, focal SNP rs748001 is associated with a signal of recent directional selection, falling within the tail of iHS signals (iHS = 2.181, P = 0.021) in the HapMap II CEU (Frazer et al. 2007; Pickrell et al. 2009) (Figure 2). This SNP is also in LD with two SNPs that have strongly negative iHS: rs4962310 (iHS = −2.05, P = 0.014, r2 with rs748001 in CEU = 0.469) and rs11244542 (iHS = −2.01, P = 0.016, r2 with rs748001 in CEU = 0.45), and the overtransmitted allele at rs748001 is in phase with the derived allele at both of these SNPs (Frazer et al. 2007). We note, however, that none of the seven SNPs within this region that were genotyped in the FHS and HUTT had TDT P < 0.01 in these data sets. True distortion of strength 3.8% in FHS and 6.2% in HUTT would be required for 80% power at two or more of these SNPs. Because of the winner’s curse (see Ioannidis et al. 2001; Göring et al. 2001; Lohmueller et al. 2003), estimating effect size from our data would yield a substantial overestimate, so it is unclear whether the true distortion strength is large enough to achieve power in these other data sets; nevertheless, the failure to replicate in FHS suggests the absence of strong TD in this region.

Analysis of maternal TD Near centromeres and telomeres

We evaluated the prevalence of TD at loci closely linked to centromeres, because these sites are likely to segregate with the untyped centromeric repeats proposed to be subject to female-specific meiotic drive. We found only one example of a maternal-specific (paternal TDT P > 0.01) SNP with P < 10−3 within 1 cM of the centromere (on chromosome 10) in AGRE, with 116 SNPs separating this SNP from the centromere; the next nearest SNP with P < 10−3 was separated from the centromere by 256 SNPs (Figure 4). Across all chromosomes, the nearest SNPs to all centromeres had maternal TDT P > 0.05 except chromosome arms 3q (P = 0.021) and 19q (P = 5.63 × 10−3). The only other SNP in strong LD (r2 > 0.6) with the most centromeric 3q SNP had maternal P = 0.261 and a stronger signal in the paternal TDT (P = 0.030), so this may be a spurious signal due to genotyping error. The SNP on chromosome 19q also had a lower P-value in the paternal TDT (P = 2.18 × 10−3), so if it is a true signal of TD, it is unlikely to be due to a mechanism specific to asymmetric meioses. With the possible exception of chromosome 22, the lack of signal was not due to SNP sparsity near centromeres; at least 25 SNPs within 1 cM of the centromere passed QC on all chromosomes except 15 (four SNPs within 1 cM) and 22 (minimum distance 2.22 cM).

Figure 4.

Figure 4

No strong signals of maternal-specific TD near centromeres or telomeres in AGRE. Genetic distances to (A) the centromere and (B) the telomere for all SNPs with paternal TDT P > 0.01 within 3 cM of the centromere (A) or telomere (B) are plotted against the SNPs, maternal TDT P-values. All SNPs with P < 10−3 are colored, with the number of SNPs separating them from the centromere (A) or telomere (B) listed next to them. *The genetic distance to the most telomeric SNP in HapMap (Frazer et al. 2007) is used to approximate the genetic distance to the telomere.

We performed a similar analysis for the most distal SNPs typed in our data set, the strongest candidates for telomeric drive. None of the SNPs nearest the telomere had maternal TDT P < 10−3 and paternal TDT P > 0.01; the most distal maternal-specific SNP with P < 10−3 (on chromosome 19q) was separated from the telomere by 164 SNPs. Several of the most distal SNPs had P < 0.05, however, namely those on chromosomes 4p, 8p, and 9p. These SNPs were 63.5, 155.0, and 36.6 kb, respectively, from the proximal end of the telomeres. The genetic distance between SNPs in the data set and the telomere cannot be fully measured because of the inability to assess recombination events occurring between the most distal HapMap SNPs and the telomeres; however, all chromosome arms (excluding the p arms of acrocentric chromosomes) contained at least five SNPs within 1 cM of the most distal HapMap SNP except 1p (minimum distance 2.02 cM).

Discussion

We used two-generation pedigrees from contemporary human populations to look for ongoing TD using the TDT. This approach is known to be highly sensitive to genotyping error (Mitchell et al. 2003; Paterson et al. 2009). We observed the influence of genotyping error on our results, particularly in those data sets that were genotyped on Affymetrix genotyping arrays and called with BRLMM (Affymetrix 2006). The failure to validate TD results for the top HUTT SNPs that were regenotyped using a different technology further encourages caution in the interpretation of the strongest signals in FHS. It also suggests that, for other uses of genotyping data that may be extremely sensitive to error, results from the TDT (treating all individuals as affected) could be used to identify problematic SNPs on array-based genotyping platforms. Nonetheless, genotyping error is highly unlikely to produce signals of TD that span a broad region, encompassing many SNPs. Because we observe such regional signals, particularly in AGRE, our results cannot be entirely due to false positives resulting from genotyping error.

Unlike other types of genotyping error, unidentified copy number variants (CNVs) could produce spurious TDT signals that span multiple SNPs; however, there are a number of lines of evidence against CNVs underlying our strongest signals. First, CNVs common enough to yield strong TD signals should produce numerous Mendelian errors and cause deviations from HWE; therefore, SNPs in these regions should be eliminated in our QC steps. Some rare cases, for instance, duplications with more than one polymorphic paralog, may be more difficult to detect. To rule out the possibility of an interchromosomal duplication or paralog causing the genome-wide significant signal on chromosome 10, we verified that there was no LD between the region and other segments of the genome. Additionally, there are no CNVs in this region in the Database of Genomic Variants, an online database of published CNVs (http://projects.tcag.ca/variation/). CNVs can produce a sex-specific signal of distortion only if one of the copies resides on a sex chromosome, and in this case, the distortion should differ between male and female offspring. We determined that there was no difference in the distortion rate between male and female offspring in the region of suggestive paternal-specific TD on chromosome 6p (P = 0.5379), and therefore this signal cannot be attributed to a duplication or paralog on a sex chromosome.

In addition to false positives, another concern is that our loose filter based on deviation from HWE (P < 10−4) could cause us to miss true signals of TD. We used this filter to eliminate SNPs with unusual genotype proportions due to genotyping error; however, strong viability selection or segregation distortion could also produce a deviation from HWE. With this in mind, we investigated the strength of selection or distortion necessary to create a deviation of P < 10−4. We generated genotypes at random for all founders, using the expected frequencies under viability selection or sex-specific drive (assuming a 1:1 sex ratio). We then performed the exact test of Hardy–Weinberg on 1000 such simulated data sets. We found that, even with s = 0.5 and h = 0.5, <0.1% of cases of viability selection generated a deviation from HWE strong enough to be detected by our filter. In contrast, 18.2% of cases with (unbalanced) sex-specific distortion equal to 30% generated such a deviation. We conclude that loci experiencing very strong segregation distortion and not subject to a counterbalancing force may occasionally deviate from HWE and be excluded from our analysis, but that the effects of viability selection on HWE are negligible.

Given the possibility of filtering sex-specific TD alleles due to departures from HWE, we checked whether this may have affected our analysis of maternal TD near centromeres and telomeres. In the maternal TDT, we filtered 614 SNPs due to deviations from HWE alone, and two of these had maternal P < 0.05 and were the nearest SNPs to the centromere: rs10439884 on chromosome 21p (the only SNP in the data set on chromosome 21p) and rs2873665 on chromosome 14q. These SNPs also have P < 0.05 in the paternal TDT, however, indicating that any real TD at these loci is unlikely to be due to a mechanism that relies on asymmetric meioses. All of the other six filtered SNPs within 1 cM of the centromere with maternal P < 0.05 were separated from the centromere by at least two nonfiltered SNPs. With the exception of the lone SNP on chromosome 21p, none of the SNPs filtered for deviations from HWE with maternal TDT P < 0.05 were separated from the telomeres by fewer than 16 SNPs. We therefore conclude that our investigation of maternal TD near centromeres and telomeres is unaffected by the filtering of SNPs that deviate from HWE.

Because of the sensitivity of the TDT to genotyping error, it is difficult to reach any general conclusion about the prevalence of TD in humans from these data. In addition to genotyping error, which can generate false-positive signals of TD, we have reduced power to detect weak to moderate TD because of sample size limitations, a bias toward the null hypothesis in unphased parent-specific TDT and a conservative requirement that TD signals span multiple SNPs to be believable. These considerations may help to explain why we find only one region meeting genome-wide significance in the data set with highest quality genotyping data, and why this signal does not replicate in our other data sets.

One suggestive paternal signal on chromosome 6p21.1 in AGRE overlaps almost entirely with a male-specific signal previously identified as experiment-wide significant in a small European sample (Santos et al. 2009). When we examine all our findings jointly with those of Santos et al., the balance of the evidence argues against true TD in this region. The primary line of evidence in support is the identification of TD in the same region in two independent European data sets, in both cases in fathers only. This repeated finding does not necessarily demonstrate, however, that the signal represents a true biological phenomenon; it may instead be due to an artifact of both data sets. The observation of suggestive TD (paternal TDT P < 0.01) specific to fathers (maternal TDT P > 0.1) at 46 SNPs in strong LD demonstrates that locus-specific genotyping error cannot be responsible for the signal in AGRE. In addition, the genotyping data supporting the signal in AGRE derive from the Illumina Human Hap550 platform, which tends to have lower rates of error-driven false positives than Affymetrix platforms (Figure S2).

However, a number of lines of evidence fail to support real TD in this region. First, our data from FHS and HUTT do not display evidence of paternal TD in this region. Power in HUTT is somewhat limited given the strength of distortion estimated in AGRE, and 10% of highly powered SNPs do show marginal TD in this data set, so this is not a clear failure to replicate; additionally, a distorter allele could have been lost through a founder effect in HUTT. The absence of a signal in FHS, however, cannot be explained by these considerations. Possible reasons for a lack of replication in FHS include that (i) the LD between SNPs on the arrays and the causal SNP differs among data sets, (ii) the observed TD is population specific, or (iii) locus-specific genotyping error is obscuring the signal in FHS. These explanations seem unlikely, given that FHS and AGRE samples were chosen to have similar ancestries, the TD signal has been observed in two distinct data sets, one with somewhat heterogeneous ancestry (AGRE), and multiple highly powered SNPs within the region fail to replicate in FHS. Moreover, at least on the basis of a limited number of individuals typed on both platforms, there appeared to be no difference between platforms in ability to detect a signal in AGRE.

In addition, we do not observe traditional signatures of a selective sweep in this region; such signatures may be expected at loci subject to TD, given the distorter’s rapid trajectory through the population in the absence of long-term balancing forces. When we considered two statistics sensitive to these signatures, the iHS and XP-EHH, this region was not notable in the CEU. If this region represents ongoing male-specific distortion, therefore, this distortion must act without generating a high frequency variant on a long haplotype that can be detected by iHS and XP-EHH. This could potentially occur if the strength of selection were not as strong as indicated by the measured 11.87% distortion strength. The occurrence of distortion in only one sex weakens the strength of selection twofold, and its occurrence only in heterozygotes weakens it further. The distorter in this region could also be counterbalanced by deleterious effects when homozygous, as in the known examples in other organisms, which would further weaken the strength of selection. Finally, a distorter or distorters could exist on multiple haplotypes, which would reduce the power of iHS or XP-EHH to detect TD. The lack of strong LD between the focal SNP and several other SNPs within the region with P-values <10−3 may indicate the presence of at least two haplotypes contributing to the TD signal (Figure 3).

Also arguing against a real effect, the available functional data from sperm do not support a role for this region in spermatogenesis or sperm motility. When we genotyped both unselected sperm and the fastest sperm from heterozygous males, we observed no significant deviation from 50% of each allele in any of the sperm samples from 10 Austrian donors. Furthermore, the allele ratios inferred from sperm typing were significantly different from those inferred from the TDT. At least two scenarios involving real TD could produce this discrepancy between sperm typing and the TDT. First, the distortion could be heterogeneous among males, with distorters and nondistorters differing in genetic background or environmental conditions. Second, the distortion could occur through a mechanism that the assays performed here do not sufficiently capture, such as influencing sperm survival in the female reproductive tract or capacity to fertilize the egg. Absence of real TD in this region could obviously also explain the discrepancy between the TDT and sperm typing results.

Altogether, given the lack of replication in FHS, a long haplotype-based signature of selection, or functional validation in sperm, the most parsimonious explanation of the TD signal in 6p21.1 is chance fluctuation in male transmission rates in both HapMap and AGRE. Nonetheless, the detection of nearly identical large regions displaying TD in fathers only in two independent data sets is intriguing and, in our view, warrants further investigation of this region in future pedigree or sperm analyses.

Our scan also revealed a candidate region for TD in both parents on chromosome 10, surrounding SNP rs748001, which achieved genome-wide significance in AGRE (Figure 2). The presence of multiple SNPs with low P-values (P < 0.01) in strong LD with this SNP provides evidence that this signal is highly unlikely to be driven by genotyping error. This SNP is also within the tail of iHS signals in the HapMap II CEU (empirical P = 0.021) (Frazer et al. 2007; Pickrell et al. 2009). Interestingly, the overtransmitted haplotype in the TDT contains the ancestral allele at the focal SNP. The iHS is designed to identify selective sweeps on new mutations, so the ancestral allele at rs748001 may be in LD with a derived allele that is experiencing a selective sweep. SNPs rs4962310 and rs11244542, both of which have strongly negative iHS (a signature of selection on the derived allele) and derived alleles on the same haplotype as the ancestral allele at rs748001, are candidates for a selected site tagged by rs748001. When combined with the evidence of TD in the region, these details suggest that this region may be undergoing selection or segregation distortion in contemporary humans. On the other hand, this signal is not replicated in FHS, in which power should be high unless distortion is very weak (see Results). Thus, the TD that we observe here may still be due to strong chance fluctuations in transmission; replication is required to support the conclusion that there is TD in the region.

In addition to identifying specific candidate TD regions, we used these data to assess the evidence for ongoing, strong maternal-specific TD near centromeres and telomeres. If centromeric drive is currently causing the rapid evolution of human centromeric repeats and high rates of nondisjunction in contemporary human females (Zwick et al. 1999; Malik and Henikoff 2002), we might expect to observe evidence of ongoing maternal TD near one or more centromeres. Yet we found no such evidence at the most centromeric SNPs; this, therefore, suggests that there is little or no ongoing, strong centromeric drive in humans. Centromeric drive may nonetheless play a role in the evolution of human centromeres, if it occurs through rapid sweeps of alternate centromeric types in discontinuous intervals, such that no allele at high enough frequency for detection is currently undergoing such a sweep. This scenario would require a previous drive allele, now fixed, or a drive-suppressor allele to be responsible for the high rates of female nondisjunction currently observed (Hassold and Hunt 2001).

In summary, our findings highlight several candidate regions with suggestive evidence of TD in the human genome and provide interesting hints into the nature of TD in females, but they remain limited by the difficulty of working with error-rich genotype data from a nonmodel organism. The imminent availability of high-quality resequencing data from pedigrees (e.g., Drmanac et al. 2010), however, together with more complete annotations of CNVs, should allow similar approaches to elucidate selective processes operating in contemporary populations. Sperm genotyping and motility assays, such as those conducted here, will also be particularly useful because the internal blood control can protect against spurious results due to genotyping error. Future studies implementing such assays at greater numbers of loci and in more individuals, with more single molecules per individual, could provide mechanistic insights into loci influencing regional transmission rates in males.

Supplementary Material

Supporting Information

Acknowledgments

We thank Graham Coop, Martin Kreitman, Guy Sella, Andrew Skol, Matthew Stephens, and members of the Pritchard, Przeworski, and Stephens labs for helpful discussions, as well as Stephen Wright, David Cutler, and two anonymous reviewers. We are grateful to Jonathan Pritchard for the suggestion of experimental assays in sperm, Adi Fledel-Alon and Ellen Leffler for help working with the Framingham Heart Study (FHS), Hutterites (HUTT), and Autism Genetic Resource Exchange (AGRE) data sets, Cord Melton for help with the 1000 Genomes data, Joe Pickrell for data and discussion of tests for selection, Bryan Howie for assistance with imputation, and Kevin Ross for performing iPlex genotyping and help interpreting the resulting data. Finally, we gratefully acknowledge the resources provided by the Framingham Heart Study and the Autism Genetic Resource Exchange Consortium and the participating families. The Framingham Heart Study and the Framingham SHARe project are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University. The Framingham SHARe data used for the analyses described in this article were obtained through dbGaP (accession nos. phs000007.v5.p3 and phs000007.v6.p3). This manuscript was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or the NHLBI. The Autism Genetic Resource Exchange is a program of Autism Speaks and is supported, in part, by grant 1U24MH081810 from the National Institute of Mental Health to Clara M. Lajonchere (PI). W.K.M. was partially supported by National Institutes of Health (NIH) Grant T32 GM007197. Funding for the sperm typing work was supported by the Austrian Science Fond (FWF) [P23811000]. This work was also supported by NIH R01 HD21244 and NIH R01 HL085197 to C.O., and by NIH GM72861 and an ARRA supplement to NIH GM83098 to M.P. M.P. is a Howard Hughes Early Career Scientist. The AGRE Consortium includes the following: Dan Geschwind, University of California, Los Angeles, CA; Maja Bucan, University of Pennsylvania, Philadelphia, PA; W.Ted Brown, New York State Institute for Basic Research in Developmental Disabilities, Staten Island, NY; Joseph Buxbaum, Mt. Sinai School of Medicine, New York, NY; Rita M. Cantor, UCLA School of Medicine, Los Angeles, CA; John N. Constantino, Washington University School of Medicine, St. Louis, MO; T. Conrad Gilliam, University of Chicago, Chicago, IL; Clara Lajonchere, Cure Autism Now, Los Angeles, CA; David H. Ledbetter, Emory University, Atlanta, GA; Christa Lese-Martin, Emory University, Atlanta, GA; Janet Miller, Cure Autism Now, Los Angeles, CA; Stanley F. Nelson, UCLA School of Medicine, Los Angeles, CA; Gerard D. Schellenberg, University of Washington, Seattle, WA; Carol A. Samango-Sprouse, George Washington University, Washington, D.C.; Sarah Spence, University of California, Los Angeles, CA; Matthew State, Yale University, New Haven, CT; and Rudolph E. Tanzi, Massachusetts General Hospital, Boston, MA.

Footnotes

Communicating editor: S. I. Wright

Literature Cited

  1. 1000 Genomes Project Consortium, 2010.  A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073. [Google Scholar]
  2. Affymetrix, 2006.  BRLMM: An Improved Genotype Calling Method for the GeneChip Human Mappking 500K Array Set, white paper. http://www.affymetrix.com/support/technical/whitepapers/brlmm_whitepaper.pdf.
  3. Anderson J. A., Song Y. S., Langley C. H., 2008.  Molecular population genetics of drosophila subtelomeric DNA. Genetics 178: 477–487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Axelsson E., Albrechtsen A., Van A. P., Li L., Megens H. J., et al. , 2010.  Segregation distortion in chicken and the evolutionary consequences of female meiotic drive in birds. Heredity 105: 290–298. [DOI] [PubMed] [Google Scholar]
  5. Bazerman M. H., Samuelson W. F., 1983.  I won the auction but don’t want the prize. J. Conflict Resolut. 27: 618–634. [Google Scholar]
  6. Brandvain Y., Coop G., 2012.  Scrambling eggs: meiotic drive and the evolution of female recombination rates. Genetics 190: 709–723. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Carvalho A. B., Vaz S. C., 1999.  Are Drosophila SR drive chromosomes always balanced? Heredity 83: 221–228. [DOI] [PubMed] [Google Scholar]
  8. Charlesworth B., Hartl D. L., 1978.  Population dynamics of the segregation distorter polymorphism of Drosophila Melanogaster. Genetics 89: 171–192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cheng K. F., Chen J. H., 2007.  A simple and robust TDT-type test against genotyping error with error rates varying across families. Hum. Hered. 64: 114–122. [DOI] [PubMed] [Google Scholar]
  10. Cupples L. A., Arruda H. T., Benjamin E. J., D'Agostino R. B., Sr, Demissie S., et al. , 2007.  The Framingham Heart Study 100K SNP genome-wide association study resource: overview of 17 phenotype working group reports. BMC Med. Genet. 8: S1–S19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Dawber T. R., Meadors G. F., Moore F. E., 1951.  Epidemiological Approaches to Heart Disease: The Framingham Study. American Public Health Association, New York, NY. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Dawber T. R., Kannel W. B., Lyell L. P., 1963.  An approach to longitudinal studies in a community: the framingham study. Ann. N. Y. Acad. Sci. 107: 539–556. [DOI] [PubMed] [Google Scholar]
  13. de la Casa-Esperón E., Sapienza C., 2003.  Natural selection and the evolution of genome imprinting. Annu. Rev. Genet. 37: 349–370. [DOI] [PubMed] [Google Scholar]
  14. Deng L., Zhang D., Richards E., Tang X., Fang J., et al. , 2009.  Constructing an initial map of transmission distortion based on high density HapMap SNPs across the human autosomes. J. Genet. Genomics 36: 703–709. [DOI] [PubMed] [Google Scholar]
  15. Drmanac R., Sparks A. B., Callow M. J., Halpern A. L., Burnes N. L., et al. , 2010.  Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327: 78–81. [DOI] [PubMed] [Google Scholar]
  16. Ducy P., Schinke T., Karsenty G., 2000.  The osteoblast: a sophisticated fibroblast under central surveillance. Science 289: 1501–1504. [DOI] [PubMed] [Google Scholar]
  17. Ebner T., Shebl O., Moser M., Mayer R. B., Arzt W., et al. , 2011.  Easy sperm processing technique allowing exclusive accumulation and later usage of DNA-strandbreak-free spermatozoa. Reprod. Biomed. Online 22: 37–43. [DOI] [PubMed] [Google Scholar]
  18. Evans D. M., Morris A. P., Cardon L. R., Sham P. C., 2006.  A note on the power to detect transmission distortion in parent–child trios via the transmission disequilibrium test. Behav. Genet. 36: 947–950. [DOI] [PubMed] [Google Scholar]
  19. Frank S. A., 1991.  Divergence of meiotic drive-suppression systems as an explanation for sex- biased hybrid sterility and inviability. Evolution 45: 262–267. [DOI] [PubMed] [Google Scholar]
  20. Frazer K. A., Ballinger D. G., Cox D. R., Hinds D. A., Stuve L. L., et al. , 2007.  A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–861. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Geschwind D. H., Sowinski J., Lord C., Iversen P., Shestack J., et al. , 2001.  The autism genetic resource exchange: a resource for the study of autism and related neuropsychiatric conditions. Am. J. Hum. Genet. 69: 463–466. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Gileva E. A., 1998.  Inbreeding and Sex Ratio in Two Captive Colonies of Dicrostonyx Torquatus Pall., 1779: A Reply to G. H. Jarrell. Munksgaard, Copenhagen. [Google Scholar]
  23. Gordon D., Heath S. C., Liu X., Ott J., 2001.  A transmission/disequilibrium test that allows for genotyping errors in the analysis of single-nucleotide polymorphism data. Am. J. Hum. Genet. 69: 371–380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Gordon D., Haynes C., Johnnidis C., Patel S. B., Bowcock A. M., et al. , 2004.  A transmission disequilibrium test for general pedigrees that is robust to the presence of random genotyping errors and any number of untyped parents. Eur. J. Hum. Genet. 12: 752–761. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Göring H. H., Terwilliger J. D., Blangero J., 2001.  Large upward bias in estimation of locus-specific effects from genome-wide scans. Am. J. Hum. Genet. 69: 1357–1369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Gudbjartsson D. F., Walters G. B., Thorleifsson G., Stefansson H., Halldorsson B. V., et al. , 2008.  Many sequence variants affecting diversity of adult human height. Nat. Genet. 40: 609–615. [DOI] [PubMed] [Google Scholar]
  27. Haig D., 2010.  Games in tetrads: segregation, recombination, and meiotic drive. Am. Nat. 176: 404–413. [DOI] [PubMed] [Google Scholar]
  28. Haig D., Grafen A., 1991.  Genetic scrambling as a defence against meiotic drive. J. Theor. Biol. 153: 531–558. [DOI] [PubMed] [Google Scholar]
  29. Hartl D. L., 1972.  Population dynamics of sperm and pollen killers. Theor. Appl. Genet. 42: 81–88. [DOI] [PubMed] [Google Scholar]
  30. Hartl D. L., 1973.  Complementation analysis of male fertility among the segregation distorter chromosomes of Drosophila melanogaster. Genetics 73: 613–629. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Hartl D. L., 1974.  Genetic dissection of segregation distortion. I. Suicide combinations of SD genes. Genetics 76: 477–486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Hartl D. L., 1975.  Segregation distortion in natural and artificial populations of Drosophila melanogaster, pp. 83–91 Gamete Competition in Plants and Animals, edited by Mulcahy D. L. North-Holland Publishing, Amsterdam. [Google Scholar]
  33. Hassold T., Hunt P., 2001.  To err (meiotically) is human: the genesis of human aneuploidy. Nat. Rev. Genet. 2: 280–291. [DOI] [PubMed] [Google Scholar]
  34. Henikoff S., Ahmad K., Malik H. S., 2001.  The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293: 1098. [DOI] [PubMed] [Google Scholar]
  35. Hiraizumi Y., Thomas A. M., 1984.  Suppressor systems of segregation distorter (SD) chromosomes in natural populations of Drosophila melanogaster. Genetics 106: 279–292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Hirschhorn J. N., Daly M. J., 2005.  Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6: 95–108. [DOI] [PubMed] [Google Scholar]
  37. Huang D. W., Sherman B. T., Lempicki R. A., 2008.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4: 44–57. [DOI] [PubMed] [Google Scholar]
  38. Huang D. W., Sherman B. T., Lempicki R. A., 2009.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37: 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Hurst G. D. D., Werren J. H., 2001.  The role of selfish genetic elements in eukaryotic evolution. Nat. Rev. Genet. 2: 597–606. [DOI] [PubMed] [Google Scholar]
  40. Hurst L. D., Pomiankowski A., 1991.  Causes of sex ratio bias may account for unisexual sterility in hybrids: a new explanation of Haldane’s rule and related phenomena. Genetics 128: 841–858. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Ioannidis J. P., Ntzani E. E., Trikalinos T. A., Contopoulos-Ioannidis D. G., 2001.  Replication validity of genetic association studies. Nat. Genet. 29: 306–309. [DOI] [PubMed] [Google Scholar]
  42. Jarrell G. H., 1995.  A Male-Biased Natal Sex-Ratio in Inbred Collared Lemmings, Dicrostonyx Groenlandicus. Munksgaard, Copenhagen. [Google Scholar]
  43. Jeong J. H., Jin J. S., Kim H. N., Kang S. M., Liu J. C., et al. , 2008.  Expression of Runx2 transcription factor in non-skeletal tissues, sperm and brain. J. Cell. Physiol. 217: 511–517. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Jones R. N., Rees H., 1982.  B Chromosomes. Academic Press, London/New York. [Google Scholar]
  45. Kusano A., Staber C., Chan H. Y., Ganetzky B., 2003.  Closing the (ran)GAP on segregation distortion in Drosophila. BioEssays 25: 108–115. [DOI] [PubMed] [Google Scholar]
  46. Lohmueller K. E., Pearce C. L., Pike M., Lander E. S., Hirschhorn J. N., 2003.  Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat. Genet. 33: 177. [DOI] [PubMed] [Google Scholar]
  47. Lyon M. F., 2003.  Transmission ratio distortion in mice. Annu. Rev. Genet. 37: 393–408. [DOI] [PubMed] [Google Scholar]
  48. Lyttle T. W., 1993.  Cheaters sometimes prosper: distortion of mendelian segregation by meiotic drive. Trends Genet. 9: 205–210. [DOI] [PubMed] [Google Scholar]
  49. Malik H. S., 2009.  The centromere-drive hypothesis: a simple basis for centromere complexity. Prog. Mol. Subcell. Biol. 48: 33–52. [DOI] [PubMed] [Google Scholar]
  50. Malik H. S., Henikoff S., 2002.  Conflict begets complexity: the evolution of centromeres. Curr. Opin. Genet. Dev. 12: 711–718. [DOI] [PubMed] [Google Scholar]
  51. Marchini J., Howie B., Myers S., McVean G., Donnelly P., 2007.  A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39: 906–913. [DOI] [PubMed] [Google Scholar]
  52. Mitchell A. A., Cutler D. J., Chakravarti A., 2003.  Undetected genotyping errors cause apparent overtransmission of common alleles in the transmission/disequilibrium test. Am. J. Hum. Genet. 72: 598–610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Montgomery G. W., Zhu G., Hottenga J. J., Duffy D. L., Heath A. C., et al. , 2006.  HLA and genomewide allele sharing in dizygotic twins. Am. J. Hum. Genet. 79: 1052–1058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Mundlos S., Otto F., Mundlos C., Mulliken J. B., Aylsworth A. S., et al. , 1997.  Mutations involving the transcription factor CBFA1 cause cleidocranial dysplasia. Cell 89: 773–779. [DOI] [PubMed] [Google Scholar]
  55. Novitski E., 1951.  Non-random disjunction in Drosophila. Genetics 36: 267–280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Ober C., Abney M., McPeek M. S., 2001.  The genetic dissection of complex traits in a founder population. Am. J. Hum. Genet. 69: 1068–1079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Östergren G., 1945.  Parasitic nature of extra fragment chromosomes. Bot. Not. 2: 157. [Google Scholar]
  58. Otto F., Thornell A. P., Crompton T., Denzel A., Gilmour K. C., et al. , 1997.  Cbfa1, a candidate gene for cleidocranial dysplasia syndrome, is essential for osteoblast differentiation and bone development. Cell 89: 765–771. [DOI] [PubMed] [Google Scholar]
  59. Pardo-Manuel de Villena F., Sapienza C., 2001.  Nonrandom segregation during meiosis: the unfairness of females. Mamm. Genome 12: 331–339. [DOI] [PubMed] [Google Scholar]
  60. Paterson A. D., Waggott D., Schillert A., Infante-Rivard C., Bull S. B., et al. , 2009.  Transmission-ratio distortion in the Framingham Heart Study. BMC Proc. 3(Suppl. 7): S51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Peacock W. J., Dennis E. S., Rhoades M. M., Pryor A. J., 1981.  Highly repeated DNA sequence limited to knob heterochromatin in maize. Proc. Natl. Acad. Sci. USA 78: 4490–4494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Pickrell J. K., Coop G., Novembre J., Kudaravalli S., Li J. Z., et al. , 2009.  Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 19: 826–837. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Presgraves D. C., Gérard P. R., Cherukuri A., Lyttle T. W., 2009.  Large-scale selective sweep among Segregation Distorter chromosomes in African populations of Drosophila melanogaster. PLoS Genet. 5: e1000463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Price A. L., Patterson N. J., Plenge R. M., Weinblatt M. E., Shadick N. A., et al. , 2006.  Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38: 904–909. [DOI] [PubMed] [Google Scholar]
  65. Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M. A., et al. , 2007.  PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81: 559–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Sabeti P. C., Varilly P., Fry B., Lohmueller J., Hostetter E., et al. , 2007.  Genome-wide detection and characterization of positive selection in human populations. Nature 449: 913–918. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Santos P. S., Hohne J., Schlattmann P., Konig I. R., Ziegler A., et al. , 2009.  Assessment of transmission distortion on chromosome 6p in healthy individuals using tagSNPs. Eur. J. Hum. Genet. 17: 1182–1189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Siepel A., Bejerano G., Pedersen J. S., Hinrichs A. S., Hou M., et al. , 2005.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15: 1034–1050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Silver L. M., 1993.  The peculiar journey of a selfish chromosome: mouse t haplotypes and meiotic drive. Trends Genet. 9: 250–254. [DOI] [PubMed] [Google Scholar]
  70. Spielman R. S., McGinnis R. E., Ewens W. J., 1993.  Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 52: 506–516. [PMC free article] [PubMed] [Google Scholar]
  71. Thomson G. J., Feldman M. W., 1974.  Population genetics of modifiers of meiotic drive. II. Linkage modification in the segregation distortion system. Theor. Popul. Biol. 5: 155–162. [DOI] [PubMed] [Google Scholar]
  72. Tiemann-Boege I., Calabrese P., Cochran D. M., Sokol R., Arnheim N., 2006.  High-resolution recombination patterns in a region of human chromosome 21 measured by sperm typing. PLoS Genet. 2: e70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Veron N., Bauer H., Weisse A. Y., Luder G., Werber M., et al. , 2009.  Retention of gene products in syncytial spermatids promotes non-Mendelian inheritance as revealed by the t complex responder. Genes Dev. 23: 2705–2710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Voight B. F., Kudaravalli S., Wen X., Pritchard J. K., 2006.  A map of recent positive selection in the human genome. PLoS Biol. 4: e72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Wallace L. T., Erhart M. A., 2008.  Recombination within mouse t haplotypes has replaced significant segments of t-specific DNA. Mamm. Genome 19: 263–271. [DOI] [PubMed] [Google Scholar]
  76. Wang E. T., Sandberg R., Luo S., Khrebtukova I., Zhang L., et al. , 2008.  Alternative isoform regulation in human tissue transcriptomes. Nature 456: 470–476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Wheeler J. C., Shigesada K., Peter Gergen J., Ito Y., 2000.  Mechanisms of transcriptional regulation by runt domain proteins. Semin. Cell Dev. Biol. 11: 369–375. [DOI] [PubMed] [Google Scholar]
  78. Zöllner S., Wen X., Hanchard N. A., Herbert M. A., Ober C., et al. , 2004.  Evidence for extensive transmission distortion in the human genome. Am. J. Hum. Genet. 74: 62–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Zwick M. E., Salstrom J. L., Langley C. H., 1999.  Genetic variation in rates of nondisjunction: association of two naturally occurring polymorphisms in the chromokinesin nod with increased rates of nondisjunction in Drosophila melanogaster. Genetics 152: 1605–1614. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES