Significance
The diversity of centromere-specific DNA repeats in different species (centromere paradox) and the seemingly parallel rapid evolution of the cenH3 histone protein have previously been interpreted to be related to evolutionary pressures acting on both molecules based on their interaction (centromere drive hypothesis). Here we describe the detailed mechanism and chronology of centromere repeat replacement, and identify inbreeding as a major driver of centromeric DNA replacement that ultimately gives rise to new tandem centromere repeats at genetically indistinguishable loci. These insights explain both the frequently observed disturbance of established centromeres in crop plants following their domestication and the rapid appearance of novel centromere repeat sequences in genetically isolated individuals in nature during speciation.
Keywords: centromere drive, centromere paradox, founder effect, hemicentric inversion, linkage disequilibrium
Abstract
Functional centromeres, the chromosomal sites of spindle attachment during cell division, are marked epigenetically by the centromere-specific histone H3 variant cenH3 and typically contain long stretches of centromere-specific tandem DNA repeats (∼1.8 Mb in maize). In 23 inbreds of domesticated maize chosen to represent the genetic diversity of maize germplasm, partial or nearly complete loss of the tandem DNA repeat CentC precedes 57 independent cenH3 relocation events that result in neocentromere formation. Chromosomal regions with newly acquired cenH3 are colonized by the centromere-specific retrotransposon CR2 at a rate that would result in centromere-sized CR2 clusters in 20,000–95,000 y. Three lines of evidence indicate that CentC loss is linked to inbreeding, including (i) CEN10 of temperate lineages, presumed to have experienced a genetic bottleneck, contain less CentC than their tropical relatives; (ii) strong selection for centromere-linked genes in domesticated maize reduced diversity at seven of the ten maize centromeres to only one or two postdomestication haplotypes; and (iii) the centromere with the largest number of haplotypes in domesticated maize (CEN7) has the highest CentC levels in nearly all domesticated lines. Rare recombinations introduced one (CEN2) or more (CEN5) alternate CEN haplotypes while retaining a single haplotype at domestication loci linked to these centromeres. Taken together, this evidence strongly suggests that inbreeding, favored by postdomestication selection for centromere-linked genes affecting key domestication or agricultural traits, drives replacement of the tandem centromere repeats in maize and other crop plants. Similar forces may act during speciation in natural systems.
Centromere-specific tandemly arranged DNA repeats vary in length and nucleotide sequence between species. The puzzling observation that centromeres can consist of highly variable sequences despite being involved in an essential cellular function (i.e., chromosome segregation) has been coined the “centromere paradox” (1). “Centromere drive” has been proposed to preferentially segregate the “favored” centromere into the female gamete and thereby provide the selective force that acts on centromere DNA sequences and interacting proteins (2).
Maize (Zea mays ssp. mays) was domesticated between 7.5 and 10 thousand years ago (ka) from wind-pollinated outcrossing wild teosinte (Z. mays ssp. parviglumis) (3, 4) in a process that dramatically changed its morphology. Several quantitative trait loci (QTLs) responsible for these morphological changes were identified in pioneering work (5–8), and a large number of additional genetic loci involved in maize domestication and improvement were subsequently identified in genome-wide scans (9). Gene (and centromere) flow between the fully interfertile maize and teosinte subspecies has been documented (10, 11). Functional centromeres of maize consist of 1–2 Mb of DNA enriched for the tandemly arranged CentC repeat and members of the centromeric retrotransposon (CR) family (12), which are widely distributed in seed plants and have been extensively characterized (13–18). Elements belonging to the maize CR1, CR2, and CR3 subfamilies have the remarkable ability to target their integration to centromeres and thus mark the historic centromere positions (12).
FISH analysis has revealed that most centromeres of teosinte, and all centromeres of other Zea species and the more distantly related genus Tripsacum, contain large amounts of CentC (19, 20), suggesting that CentC-rich centromeres represent the ancestral state. In contrast, centromeres of domesticated maize display a remarkable variation of CentC content in different inbreds (19). Whole genome shotgun sequence of maize and teosinte revealed lower amounts of CentC, but a higher proportion of CR2, DNA in the former (9, 21), suggesting that CR2 is replacing CentC in domesticated corn.
Here we detail the processes that result in turnover of centromere repeats at unprecedented temporal and spatial resolution and identify selection for key centromere-linked genes as the driving force. Our analyses strongly suggest that prolonged inbreeding for favorable centromere-linked alleles results in a net loss of the tandem CentC repeat that forces spreading of the cenH3 nucleosomes to an adjacent region, or repositioning to a nearby region. Subsequent invasion of these neocentromeres by the centromere-targeting CR2 element will ultimately result in nested insertions that can give rise to novel tandem centromeric repeats.
We describe these events in detail for CEN5, where they are easily observed owing to a tightly linked domestication locus and the acquisition of several CEN5 variants by rare recombinations, but all other chromosomes exhibit the same trends, albeit to lesser degrees. In the case of maize, strong selection for favorable domestication or agronomic alleles drives the turnover of centromere repeats, but selection for any centromere-linked allele that increases fitness may replace centromere repeats in a similar manner in nature during speciation (e.g., Oryza brachyantha CentO-F) (22).
Neocentromere Formation in Maize Inbreds
We used chromatin immunoprecipitation (ChIP) with an antibody directed against maize cenH3 to determine the centromere positions of all 10 chromosomes in 26 maize lines (SI Appendix, Fig. S1) selected to capture a large fraction of the genetic diversity present in domesticated germplasm (23). Centromere 5 exhibits a number of different ChIP-seq signatures. The chromosome 5 region at 101.7–110.1 Mb, designated CEN5, contains the functional centromere in all genotypes examined, but the location of the ∼1.8-Mb cenH3 region within CEN5 varies among the inbreds (Fig. 1 A and B and SI Appendix, Figs. S2 and S3 and Tables S1 and S2). Inbred CML333, the sole inbred among those tested here that has been shown by FISH karyotyping to contain a CentC-rich CEN5 (19), shows cenH3 enrichment at CEN5M (Middle; B73 RefGen_V2 coordinates 5:105.2–106.8 Mb), a region of the B73 reference genome defined by large amounts of CentC and CR1. A large number of dated CR1 insertions in and near CEN5M of B73 (Fig. 1B) provide evidence that CEN5M served as the functional centromere for at least the past 610,000 y (610 ky). In contrast, the cenH3 mark of all other inbreds examined is located either upstream or downstream of CEN5M. CEN5L (Left; 5:102.1–103.7 Mb) is defined by the CR2-rich cenH3-binding region of inbred B73 and used by 15 lines, CEN5L′ (103.9–105.8 Mb) is defined by CML322 ChIP-seq, and CEN5R (Right; 5:107.9–109.8 Mb), a region virtually devoid of centromere repeats in the reference genome, is marked by cenH3 nucleosomes in 10 inbreds.
Fig. 1.
Functional centromeres are spatially correlated with CR2 insertions. (A) Anti–cenH3 ChIP-seq coverage at CEN5 delineates the functional centromeres of a diverse set of maize inbreds, including representatives of the blue, green, brown, orange, and red lineages, as well as inbreds IL14H, P39, and CML333 (and others in SI Appendix, Fig. S1). Only CML333 uses CEN5M exclusively. Expressed genes [fragments per kilobase of transcript per million mapped reads (FPKM) >10 in shoot apical meristem and ear; triangles] are primarily located in cenH3-free regions, whereas genes within the functional centromeres are generally not expressed (SI Appendix, Fig. S8). (B) CEN5M of the B73 reference genome is marked by CentC (green) and ancestral CR1 elements, but the functional centromere of B73 is located at CEN5L. Many more junctions (see C and D) than datable CR elements are identified owing to the incomplete assembly of the reference genome in these repeat-rich regions. (C) Ancestral elements (most of them CR1s) are clustered around CEN5M (indicated by stippled lines in A). Two clusters of ancestral elements (enclosed in stippled boxes) and other ancestral CR1 and CR2 elements associated with small upstream CentC clusters were likely relocated from CEN5M by small inversions. (D) CR2 elements shared by multiple maize inbreds inserted into their common ancestor, allowing the grouping of both inbreds and CR2 elements into colored lineages. Recently inserted lineage-specific CR2 elements identified in GSS or ChIP-seq data are found almost exclusively in neocentromeres (SI Appendix, Fig. S6). Unique CR2 insertions are those found in only one of the maize inbreds examined here. A presumed inversion (purple double-headed arrow) specific to the blue-b lines (Fig. 2) was inferred from lineage-specific CR2 elements (SI Appendix, Fig. S6). (E) Three distinct nonrecombinant CEN5 regions were identified (SI Appendix, Fig. S7) that allow phylogenetic reconstruction of CEN5 in the 28 maize inbreds examined.
These alternate cenH3 locations represent recently formed neocentromeres caused by cenH3 repositioning events that we broadly classify as “jumps” or “expansions” depending on whether they are clearly separated from, or either overlap or abut, the ancestral cenH3 location at the CentC cluster. Maize/sorghum synteny data indicate that although the CEN5M centromere exists in the same context as the sorghum centromere, relocation of the functional centromere from CEN5M to either CEN5L or CEN5R resulted in the formation of neocentromeres over previously euchromatic gene-rich regions (SI Appendix, Fig. S4). However, although the density of expressed genes is lowest in CEN5L and CEN5R (Fig. 1A), we failed to detect differential expression of these genes in lines using different neocentromeres. In fact, two genes located in CEN5L′ are expressed at similar levels in CML322 and most other lines, even though CEN5L′ is covered by cenH3 nucleosomes in CML322. This confirms previous reports of active genes in functional centromeres (24, 25). The fact that CEN5 is retained in almost its entirety in all maize and teosinte inbreds (SI Appendix, Fig. S5), whereas the downstream region carries several large deletions in both maize and teosinte, suggests that CEN5 contains essential genes.
Lineage-Specific CR Insertions at CEN5
To investigate whether cenH3 relocation to neocentromeres is associated with changes in CR element distribution, we identified CR insertions within CEN5 in the genomic (26) and (where available) ChIP-seq data of 30 improved maize lines and 17 teosinte inbred lines (TILs). The majority of CR1 (81 of 100), but only a small proportion of CR2 (15 of 508), elements in CEN5 are ancestral (e.g., shared between the improved maize lines and teosinte) and located near the original centromere at CEN5M (SI Appendix, Fig. S6). Small clusters of ancestral elements found outside of CEN5M (e.g., 107.62–108.09 Mb in Fig. 1C) likely were relocated from CEN5M by small inversions.
The majority of CR2 elements are restricted to different subsets of domesticated maize lines and define five distinct lineages: 15 lines with CEN5L or CEN5L′ neocentromeres separated into three (blue, green, and brown) lineages, and 11 CEN5R lines grouped into two (red and orange) distinct lineages (Fig. 1D). A total of 302 lineage-specific CR2 insertions (93 of them unique to a specific inbred) were identified in the genomic survey sequences (GSS) of these five lineages. Three lines (IL14H, Hi60, and P39) containing only unique CR2 elements, and CML333 with only ancestral CR2 insertions, could not be grouped with another inbred. Lineage-specific CR2s are located almost exclusively within the region occupied by cenH3 nucleosomes in the respective inbred (Fig. 1 and SI Appendix, Fig. S6), most of which could have been classified as CEN5L/L′ or CEN5R based solely on the CR2 junctions present in their GSS data (SI Appendix, Fig. S6A). In contrast to CR2, only 14 CR1 elements and no CR3 elements specific to one or more colored lines were identified in the GSS data, indicating lower activity of these elements in the recent past.
Reconstructing the Zea CEN5 Phylogeny
HapMap2 SNP data (26) of 28 diverse maize inbreds revealed three regions in CEN5 devoid of detectable recombinations (Fig. 1E and SI Appendix, Figs. S7–S10) and thus suitable for standard phylogenetic analysis. A robust phylogeny (Fig. 2) constructed from the largest of these recombination-free regions (CEN5L-M; 101.7–105.6 Mb) illustrates that inbred CML333 and all colored lineages shared a common ancestor before domestication, whereas inbreds P39 and IL14H contain distantly related CEN5L-M sequences. The blue, orange, and brown lineages arose from a common ancestor after domestication. Similarly, the red and green lineages arose from a common ancestor after domestication. Phylogenies constructed with the CEN5L-M and CEN5M-R regions are largely identical but confirm recombinations near CEN5M in CML333 and the brown lines (SI Appendix, Fig. S11A). A separate tree constructed using SNP data called from cenH3 ChIP-seq of Hi47 and Hi60 shows these breeding lines to be related to the red and the IL14H lineages, respectively (SI Appendix, Fig. S12).
Fig. 2.
Microevolution of Zea CEN5L-M. Phylogeny of domesticated maize and teosinte (TIL) inbreds reveals two major lineages that diverged 18.3 ka, the blue/orange/brown lineage (related to TIL14) and the red/green lineage (related to CML333, which acquired its CEN5L-M region, including the CentC-rich CEN5M, by recombination). The number of CR2 elements inserted into CEN5L (blue), CEN5M (green), or CEN5R (red) (SI Appendix, Fig. S6) is marked above each branch, and the shift of CR2 insertions from CEN5M to either neocentromere identifies the time interval of cenH3 movement (thick branch). Neocentromere formation in all five colored lineages postdates domestication, whether measured as maximum leaf–leaf (nodes marked with arrows), average leaf–node (indicated below neocentromere branch) or individual leaf–node distances (branch lengths given in SI Appendix, Table S3). Z. mays ssp. mexicana lines TIL08 and TIL25 (26) are intermingled with the 15 Z. mays ssp. parviglumis lines, indicating centromere movement among these subspecies. Deletions shared by multiple lines are indicated on the corresponding branch (SI Appendix, Fig. S5). Nodes with >90% bootstrap support are marked by dots. The neighbor-joining tree was constructed from HapMap2 SNP data and background calls and artificially rooted on Z. luxurians. Tree is drawn on two scales (black and gray). Ancestral CR2 insertions are in black irrespective of insertion site. Hi47, a red-B line, and Hi60, which contains 50 line-specific CR2 insertions into CEN5R (SI Appendix, Dataset S1), cannot be displayed on this tree because their CEN5L-M region was not sequenced.
Eight Independent Shifts of cenH3 from CEN5M to Nearby Neocentromeres
CR2 insertions were assigned to branches of the CEN5L-M tree based on their distribution in the inbreds. The branches corresponding to the time span during which the cenH3 signal moved from CEN5M to one of the neocentromeres can be identified by a concomitant shift of CR insertions (Fig. 2). Two CR2 elements that inserted into CEN5M in the progenitor of the blue and orange lineages indicate that CEN5M was the functional centromere in their common ancestor. Similarly, a CR1 element within CEN5M that is shared by the red and green lineages indicates that the common ancestor of these lineages used CEN5M. In contrast, insertion of one CR1 (SI Appendix, Fig. S6) and three CR2 elements (Fig. 2) into CEN5L (along with two CR1 insertions into CEN5M of the blue lineage ancestor) indicates that the shift of the functional centromere from CEN5M to CEN5L occurred after divergence of the blue and orange lineages. Applying this reasoning to all lineages with distinct CR2 insertion patterns shows that the 30 maize lines examined here represent eight independent relocations of the cenH3 domain from CEN5M to the flanking regions (one in each of the five colored lineages plus P39, IL14H, and Hi60). Moreover, a common ancestor gave rise to both CEN5L and CEN5R progeny in three distinct lineages (green/red, blue/orange, and IL14H/Hi60), indicating that both regions are equally suited for neocentromere formation.
Neocentromeres Form after Maize Domestication
We conclude that cenH3 relocation at CEN5 postdates domestication because (i) all 17 TILs are placed basal to the eight neocentromere formation events observed in the improved maize lines on the phylogeny, (ii) CR2 invasion of CEN5L/L′ or CEN5R is not observed in any of the 17 TILs examined, and (iii) this invasion is estimated to have begun no earlier than 9.3 ka in the five colored lineages, in agreement with neocentromere formation occurring postdomestication in these five lineages (Fig. 2 and SI Appendix, Table S3). Neocentromere formation at the distantly related CEN5 of the two sweet corn inbreds, P39 and IL14H, is estimated to have occurred 6.7 ka and 5.3 ka, respectively (SI Appendix, Text S1). CML333, the sole maize inbred still using CEN5M, reacquired this CentC-rich centromere from a wild relative via two CEN5-proximal recombinations ∼1.3 ka, recent enough to retain large amounts of its CentC. Previously identified domestication loci (9, 27) were used to validate our dating methods (SI Appendix, Text S2).
CR2 Insertion Rates at CEN5
Three CR2 insertions shared by Z. luxurians and the Z. mays subspecies, and nine CR2 insertions shared by all three Z. mays subspecies represent ancestral elements that originally inserted into a functional CEN5M (Figs. 1 and 2). The number of CR2 elements (identified in GSS and ChIP-seq) that subsequently inserted into the newly formed neocentromeres of 25 colored inbreds ranged from 21 to 75 per genotype (Fig. 2), adding on average approximately 44 ± 13 elements, or 333 ± 98 kb, to each neocentromere. Adjusted to the time spans estimated for neocentromere formation in the five colored lineages, insertion rates range from one new insertion every 94 ± 51–171 ± 80 y. The nonblue and nonleaf branch with the highest insertion frequency (Ra) shows 61 insertions in 5,791 y, or approximately one insertion every 95 y (SI Appendix, Table S3b). These rates likely represent underestimates, because CR2 insertions into other repeats frequently cannot be detected using short read data. CR2 insertions into CR1, CR2, and CentC represent on average 44.8% of CR2 junctions in the 10 functional centromeres of 23 inbreds (SI Appendix, Table S4).
The finding that only one of 20 internal branches lacks new insertions, along with the presence of inbred-specific CR2 insertions in 22 of the 25 colored inbreds, indicate that insertions are ongoing. Aside from the major shifts in CR2 insertion sites that coincide with movement of the functional centromere from CEN5M to a nearby neocentromere, CR2 elements assigned to consecutive branches of the phylogenetic tree insert within or near the neocentromeres with no discernable spatiotemporal pattern (SI Appendix, Fig. S13).
CentC loss, cenH3 repositioning to nearby sites, and CR2 invasion of these neocentromeres effectively replaces the CentC-rich CEN5M with CR2-rich centromeres at genetically nearly indistinguishable loci, thereby establishing what has been termed the “centromere paradox.”
Other Centromeres
To determine the extent of neocentromere formation on other chromosomes, we partially assembled the other maize centromeres (SI Appendix, Text S3) and subjected them to the same analyses performed for CEN5. As in CEN5, phylogenies from recombination-free regions of the other nine centromeres revealed lineage-specific deletions, rare centromere-proximal recombinations, CentC loss, and cenH3 relocation to regions that in some cases are euchromatic in the sorghum genome, followed by CR2 invasion (SI Appendix, Figs. S1 and S14–S25, Tables S5–S14, and Text S4). The frequency of neocentromere formation varies among chromosomes, but all neocentromeres formed close to or after domestication, with those on chromosome 5 being the oldest (SI Appendix, Fig. S26).
Time since neocentromere formation was significantly correlated (r2 = 0.54, P = 2.0 × 10−7; SI Appendix, Fig. S27) with the number of CR2 insertions across all nine chromosomes examined. This finding is remarkable given the difficulty in estimating the insertion rate, which is compounded by (i) variation in overlap of each neocentromere with its original centromere, (ii) false-positive and false-negative CR2 calls, (iii) variation in divergence estimates obtained from the phylogenies, and (iv) the fact that resolving the precise time of neocentromere formation and CR2 insertion is limited by the phylogenies. Genome-wide, CR2 elements insert at a rate of one per 233–401 y after neocentromere formation. CEN2 was excluded from this analysis because reliable identification of novel CR2 insertions (SI Appendix, Table S13) is prevented by the high density of ancestral CR2 elements in this centromere.
CentC Loss
Neocentromere formation on a given chromosome is highly correlated (P = 4 ×10−72) with, and likely caused by, a reduction of CentC at the original centromere (SI Appendix, Fig. S17) as determined by FISH karyotyping (19) (SI Appendix, Fig. S15 and Table S7). The frequency and extent of CentC loss varies by chromosome. Averaged over the 28 maize inbreds examined here, CEN2 and CEN5 contain little CentC, whereas CEN7 is generally CentC-rich. Four centromeres contain two large CentC clusters separated by at least 1.2 Mb that are indicative of hemicentric (28) inversions (SI Appendix, Table S5). The trend toward diminished CentC at the centromeres of domesticated maize indicates that in some lineages, CentC loss by hemicentric inversions and small or large deletions outpaces CentC gain by large-scale duplications or gene conversion, presumably because the majority of double-stranded break (DSB) repairs occur via intrastrand recombination. CentC can be reacquired by centromere-proximal recombinations, as illustrated for CEN5, CEN7, and CEN10 (SI Appendix, Text S1).
In contrast to the extensive CentC loss observed at CEN5, 15 of the domesticated maize inbreds have a CentC-rich CEN10. Superimposing CEN10 CentC amounts (19) onto the CEN10 phylogeny revealed that, in general, temperate inbreds contain less CentC at this centromere than their tropical relatives (Fig. 3 and SI Appendix, Text S1). Temperate MS71, which belongs to the same CEN10 haplotype as tropical CML247, contains much less CentC at its CEN10. Similarly, the two American inbreds with the CEN10 haplotype 2 differ significantly in CEN10 CentC content, with the temperate line TX303 containing less than its tropical counterpart, CML322 (Fig. 3). However, two African haplotype 2 inbreds, one tropical and one temperate, exhibit intermediate CEN10 CentC content and appear to not follow the overall trend. In the case of tropical TZI8, which may have been brought to Nigeria from the Americas by the Portuguese around 400 y ago (29) and whose nearest relative among the 915 sequenced HapMap3 lines (30) is a teosinte (TIL03; k2p = 0.007), the CentC reduction may be caused by a founder effect (i.e., lack of genetic diversity), possibly combined with selection for a CEN10-linked agronomic trait. In contrast, the relatively large genetic distance between South African line M162W and TZI8 (k2p = 0.012), in combination with the small distance (k2p = 0.002) to “American-derived” D1139 (31), suggest that M162W represents an independent introduction of a (possibly temperate) haplotype 2 CEN10 into South African germplasm. Finally, members of the largest tropical clade (haplotype 3) have much higher amounts of CentC than inbreds belonging to the largest temperate clade (haplotype 4). These two clades are distantly related (21.8 ka) but share some similarities, including number of inbreds (6 vs. 10), maximum divergence date of clade members (3.5 vs. 4.2 ka), and a large nonrecombinant CEN10 region (20.1 vs. 15.2 Mb). Nonetheless, inbreds of the temperate clade contain much lower amounts of CentC at CEN10 than those of the tropical clade. Similarly, temperate lines MO17 and OH7B (haplotype 8) contain lower amounts of CentC at CEN10 than all of the tropical lines except CML103. The exceptionally low amount of CentC at CEN10 of CML103 is due to its recent acquisition of a temperate CEN10 by recombination (SI Appendix, Text S1).
Fig. 3.
Tropical inbreds contain higher amounts of CentC at their CEN10 than comparable temperate inbreds. The CentC content of CEN10 is highly variable, but in general, temperate lines (blue) contain less CentC than tropical lines (red). Nine CEN10 haplotypes for the region spanning both CentC clusters are represented by the maize inbreds examined (SI Appendix, Fig. S31), and direct comparisons between tropical and temperate inbreds sharing the same haplotype are most informative. The temperate representative of haplotype 1 (MS71) contains less CentC at CEN10 than its tropical counterpart CML247; the estimated divergence time was 2.5–5.8 ka (SI Appendix, Fig. S32). Similarly, both temperate M162W and TX303 contain less CentC than their tropical relative CML322 (estimated divergence time, 3.1–4.5 ka). CentC reduction at CEN10 of tropical TZI8 is likely related to a founder effect of this Nigerian inbred. Other temperate/tropical comparisons are less meaningful because they involve different predomestication haplotypes, but confirm the general trend. The five tropical clade A inbreds and the related KI11 contain more CentC than the ten temperate clade A lines from which they are estimated to have diverged 21.8 ka. The three tropical lines CML277, CML69, and NC358 contain greater amounts of CentC than the temperate lines MO17 and OH7B. The sole apparent exception to the rule is tropical CML103, which appears to have acquired its temperate CEN10 containing very little CentC through recombination (between the two CentC clusters) with a temperate haplotype 8 line (SI Appendix, Fig. S31). Inbreeding of temperate lines near CEN10, likely owing to a linked improvement locus related to growth in temperate environments, has led to a decrease in the amount of CentC in CEN10 of temperate lines.
The prevalence of CentC reduction at CEN10 in temperate inbreds during the past 2.5–5.8 ka suggests a role of agricultural selection in CentC loss. Maize was domesticated in tropical southern Mexico and, beginning approximately 3,000 y ago, was introduced multiple times to temperate regions where it was selected to tolerate different environmental conditions (including flowering at long day lengths) and maintained in isolation from tropical lines until modern breeding efforts commenced (32). This founder effect, possibly enhanced by selection for favorable CEN10-linked adaptations to the new environment, such as the flowering time locus at 42.9 cM (33) that shows segregation distortion (23), favors inbreeding at CEN10. In contrast, tropical lineages retained higher levels of CentC at CEN10, likely by gene conversion (34) involving homologous chromosomes of heterozygotes with CentC-rich tropical domesticated maize and/or the overwhelmingly CentC-rich wild teosinte CEN10s.
Low Genetic Diversity at Most Maize Centromeres
The widespread CentC loss, coupled with formation of neocentromeres and subsequent CR2 insertions in maize but not in teosinte, also suggest that maize domestication and improvement play roles in this process. We used four measures to determine the genetic diversity of maize and teosinte inbreds across each chromosome. Using HapMap2 data, distance matrices, and phylogenetic trees constructed from 300-kb overlapping segments of each chromosome, we identified genetic bottlenecks in maize as regions with (i) low Tajima’s D allele frequency value, (ii) low maize–maize but high teosinte–teosinte maximum genetic distance, (iii) no or few teosinte lines intermingled with the maize clade(s), and/or (iv) one or two distinct maize lineages (haplotypes) at the time of domestication.
Results obtained for chromosome 1 (Fig. 4) clearly illustrate that the CEN1 region differs from the rest of chromosome 1 by exhibiting a Tajima’s D of <−2 for maize but not for teosinte, a maximum maize–maize divergence date near 10 ka (vs. a maximum TIL–TIL date of 55 ka for the same region), and no TILs in the single maize clade that formed postdomestication, indicating very strong selection of a single CEN1 haplotype in the 28 improved maize lines examined. Six other centromeres show similar trends (SI Appendix, Figs. S18 and S19), containing only one haplotype (CEN4 and CEN6) or two haplotypes (CEN2, CEN3, CEN8 and CEN9) in CEN regions spanning 6.0–15.5 Mb for these 28 lines. Examination of two larger datasets of 51 and 895 maize lines (Fig. 5 and SI Appendix, Fig. S28) revealed that the presence of only one or two postdomestication haplotypes at these seven centromeres is not an artifact of this particular set of 28 maize inbreds examined, because in all three datasets, the majority (e.g., 99.4–99.8% of 895 lines) contain the same haplotype at CEN1, CEN4, and CEN6, and 94.4–99.1% of 895 lines contain one of the two dominant haplotypes in the other four centromeres. In addition, a genome-wide screen for potential domestication loci (maximum maize–maize and teosinte–teosinte distances of <16.54 ka and >25 ka, respectively, for 67,292 overlapping 300-kb genome segments) revealed 42 potential domestication regions (SI Appendix, Figs. S21 and S22), 28 of which correspond to domestication or improvement loci identified previously using a different method and genotypes (9). Of the 14 loci that had not been identified previously, 9 had been ignored because they were in or near centromeres (CEN1-4, CEN6, and CEN9), and 1 was missing from the previous genome assembly (9). Thus, there is good agreement between these two methods of identifying domestication loci, the majority of which (55% of the 300-kb windows) originate from CEN1, CEN4, and CEN6 (SI Appendix, Figs S23 and S28, Table S9, and Text S2).
Fig. 4.
Low genetic diversity near most centromeres indicates selection for centromere-linked genes. Sequence diversity along maize chromosomes was assessed by Tajima’s D, the observed minus expected site frequency spectra for maize (black) and teosinte (gray); maximum distance, the divergence date of the most distantly related maize inbreds; TIL descendants, the number of teosinte inbreds contained within the ancestral node shared by all domesticated maize lines; and number of maize lineages, the number of distinct maize lineages at 15 ka. Centromere locations determined by ChIP-seq are highlighted in green (CEN1, CEN5M, and CEN7), blue (CEN5L), and red (CEN5R). CEN1 is the region of lowest sequence diversity on chromosome 1 by all four measures. On chromosome 5, region D (yellow highlight just downstream of CEN5) has undergone a selective sweep in all 28 maize lines, but not teosinte. The low Tajima’s D in CEN5 is caused by rare recombinants arising from crossovers between CEN5 and region D that gave rise to all colored lines. For accurate representation, maize lines carrying deletions (purple lines) within and flanking region D were excluded from trees constructed from these regions (SI Appendix, Text S2). On chromosome 7, the centromere proper (green highlight) does not exhibit any of the characteristics of reduced genetic diversity seen in all other chromosomes (SI Appendix, Fig. S18); however, it is flanked by two regions with low Tajima’s D (blue arrows) that are correlated with a decrease in the number of maize lineages at 15 ka. Negative selection of the haplotypes overrepresented in these two regions at CEN7 appears to contribute to the high diversity and CentC retention rate in this centromere (SI Appendix, Text S1). The y-axis scales are linear, except in the bottom panel of each chromosome. HapMap3.1 SNP calls were used for regions lacking HapMap2 data (SI Appendix, Table S17).
Fig. 5.
Genetic diversity is reduced at most centromeres of Z. mays ssp. mays. The number of haplotypes with an estimated divergence date of >10 ka is given for each of the 10 maize centromeres and three published domestication loci known to have undergone a postdomestication sweep for each of three datasets containing 28, 51, or 895 maize lines. The 10 centromeres are divided into three groups based on the number of haplotypes represented in the 28 I dataset and the percentage of all maize lines carrying the dominant haplotype in the HapMap3 dataset (30). All three datasets show very low genetic diversity at CEN1, CEN4, and CEN6 similar to that seen at the domestication loci. The middle group of four centromeres contains two dominant haplotypes. CEN5, CEN7, and CEN10 contain five or more haplotypes, with CEN7 exhibiting the greatest diversity of haplotypes. Haplotypes are classified based on the phylogenetic trees in Fig. 2 and SI Appendix, Figs. S11, S15, S24, and S29. Data are summarized in SI Appendix, Table S18. Color code: blue, red, green, proportions of the first, second, third, etc., most abundant haplotypes in the dataset; black, haplotypes not present in dataset 28 I. Maize lines: 28 I, 28 improved inbreds, including 25 NAM lines (23), B73, Mo17, and W64A; 28 I + 23 L, dataset 28 I plus 23 landraces (all HapMap2 data); 895 HM3, 895 maize lines of the HapMap3 dataset. Domestication loci: p2, pericentromere 2 (91.5–91.6 Mb); 10 L, chromosome 10 sweep (27); D, chromosome 5 region D (9).
The presence of only one or two haplotypes at seven maize centromeres suggests tight linkage of key domestication and/or agronomic alleles to these centromeres. This is confirmed for CEN2 by a 100-kb window with low Tajima’s D located just 2 Mb upstream of the CentC cluster (SI Appendix, Fig. S18, arrowhead), and for which all 28 domesticated maize inbreds form a single clade (maximum maize–maize divergence date, 10.6 ka; SI Appendix, Fig. S29), indicating that this region contains gene(s) critical for maize domestication (possibly the zinc finger protein gene GRZM2G096281 that is highly expressed in embryo; www.maizegdb.org). In fact, a 12-Mb region containing CEN2 was identified previously as containing a domestication locus (9). Thus, the second CEN2 haplotype found in the improved lines (divergence date from the major haplotype >70 ka) was introduced into maize germplasm after domestication by a single recombination event in the progenitor of six maize inbreds (SI Appendix, Fig. S15), illustrating that the centromere sequence is of secondary importance relative to the nearby gene.
The remaining three centromeres are represented in the 28 inbreds by 5, 9, and 5 haplotypes (CEN5, CEN7, and CEN10, respectively) that diverged >10 ka (Fig. 2 and SI Appendix, Table S15 and Text S1), indicating less intense selection at these loci. Nevertheless, Tajima’s D reveals genetic bottlenecks within or near all of these centromeres (Fig. 4 and SI Appendix, Fig. S18). Most informative for understanding centromere evolution is CEN5, where Tajima’s D defines genetic bottlenecks both for CEN5 itself as well as the large downstream region D (115.3–126.5 Mb; Fig. 4). All domesticated maize lines form a single clade with a maximum estimated divergence date of <15.8 ka for most of region D and <10 ka in several subregions (SI Appendix, Figs. S8 and S11), providing strong evidence that region D contains one or more domestication genes. Region D does not occur in its entirety in any of the sequenced teosinte inbreds, raising the possibility that it represents a rare combination of favorable alleles (SI Appendix, Fig. S30). QTLs with significant effect on seven of nine key domestication traits are located on chromosome 5 (7), and several QTLs for important agronomic traits (35), including plant height (36) and yield (37), map near CEN5.
In contrast to region D, five distinct CEN5L-M lineages with divergence dates of >15 ka are represented in domesticated maize, including the blue/orange/brown lineage brought in by linkage drag with region D (and subsequently subjected to rare double recombinations in the CEN5 region), and the distantly related (diverged up to 64 ka) red/green, CML333, IL14H, and P39 genotypes that were introgressed into domesticated maize by very rare recombinations that may have been selected based on agriculturally important phenotypes encoded by genes located within or tightly linked to CEN5L-M of these different CEN5 haplotypes. Thus, the dominant evolutionary force acting on CEN5 is retention of region D, with possible secondary selection for or against genes within CEN5, but there is no evidence for selection of a particularly “strong” centromere or centromere repeat as predicted by the centromere drive hypothesis. Instead, the principle discernable trend appears to be that all four of the distantly related CEN5 variants (red/green, blue/orange/brown, IL14H, and P39) lost CentC during approximately 5,300 y of inbreeding, whereas the sole inbred with a CentC-rich CEN5 acquired it <1.3 ka. Because neocentromeres formed (apparently stochastically at either CEN5L or CEN5R) only after acquisition of these four functional teosinte-like centromeres, we can exclude selection for a given centromere repeat as driving CEN5 evolution.
Regions of low Tajima’s D flanking CEN7 indicate genetic bottlenecks represented by five and four haplotypes (Fig. 4, blue arrows). The high genetic variation at CEN7 appears to result from selection against the haplotypes that make up the flanking regions; the TIL03-like haplotype is present in 21 and 24 out of 28 inbreds in these CEN7 flanking regions (SI Appendix, Table S15), but in the CEN7 proper, only 3 of 895 HapMap3.1 maize lines carry the TIL03-like haplotype. Consistent with the low apparent selection pressure for a specific CEN7 haplotype is the fact that it features the brightest CentC FISH signal in 26 out of 28 inbred karyotypes (SI Appendix, Text S1).
Origin of Centromere-Linked Genes
In the course of maize genome evolution, active centromeres were relocated to formerly euchromatic regions by sequential hemicentric inversions (SI Appendix, Fig. S20 and Table S8), instantly linking large numbers of genes to these centromeres (e.g., CEN5; SI Appendix, Fig. S4). A paracentric inversion placed GRMZM2G101098, a maize gene that is highly expressed in young tissue (www.maizegdb.org), to within 100 kb of the active CentC cluster of CEN10 and several CR elements. These inversions are genetic footprints of frequent centromere-proximal DSBs that also result in (presumably illegitimate) recombinations (SI Appendix, Text S4), insertion of nonsyntenic genes from other chromosomes (SI Appendix, Table S16), and large deletions near maize centromeres (SI Appendix, Online Resources). Strong selection against their deletion effectively enriches essential, or agriculturally important, pericentric genes relative to their nonessential counterparts. Modern breeding techniques using inbreds exacerbate the need to retain the most valuable centromere-linked gene combinations obtained during rare recombination events, thus favoring the establishment of key centromere haplotypes.
Model of Centromere Evolution
Taken together, our results—selective sweeps at many maize centromeres, reduced CentC and increased CR2 content of maize lines relative to teosinte, loss of CentC during inbreeding at CEN10 of temperate lines, and maintenance of the most CentC-rich centromeres at the genetically diverse maize CEN7 and tropical maize CEN10—suggest that the direction of centromere evolution (i.e., retention of centromeres rich in tandem repeats versus nearby neocentromere formation followed by invasion of CR elements) is heavily influenced by inbreeding owing to selection for linked genes, rather than selection for specific centromere repeat sequences. Favorable combinations of multiple advantageous alleles are easily maintained as a block in the nonrecombining centromere regions, and selection for these centromere-linked allelic blocks beginning with early domesticators of maize and continuing to the present day likely favored inbreeding of the linked centromeres and concomitant CentC loss.
Although the selective pressure during domestication may be extreme, similar selection for centromere-linked genes could drive centromere selection in nature. For example, Z. mays ssp. huehuetenangensis, which may have experienced a significant genetic bottleneck as a result of being geographically isolated in western Guatemala (38), also shows reduced CentC levels at many of its centromeres (19). Finally, selection for centromere-linked genes is not specific to maize, as illustrated by centromere-proximal loci showing strong differentiation between subspecies of rabbits (39), along with the observation that an ongoing insipient speciation in mosquito (40) appears to involve a selective sweep (estimated to have occurred 8–10 ka) of a block of coadapted genes linked to the centromere of the X chromosome and thought to confer the ability to colonize an emerging new environment (African rice fields).
Relatively frequent centromere-proximal DSBs (41) have the potential to cause gradual or sudden CentC loss, and thus likely play a major role in shaping maize centromeres. The frequency of CentC loss and neocentromere formation at any given maize centromere will depend on how long the centromere has been subjected to inbreeding (generally longer for domestication loci than for agricultural improvement loci), the amount of CentC at the start of inbreeding, the ability to regenerate CentC by hybridizing with an allopatric CentC-rich centromere, and the somewhat stochastic rate of CentC loss (one large deletion vs. many small deletions). CentC loss below a critical threshold at established centromeres requires that the epigenetic cenH3 mark expand into, or jump to, the nearest region providing adequate or optimal stability. These neocentromeres are subsequently colonized by CR2s, which are more likely to promote than to impair centromere function. At the insertion rates estimated, a 1.8-Mb maize centromere could be completely replaced by the 7.571-kb CR2 in 22.3–40.7 ky (CEN5 rate) to 55.3–95.3 ky (genome-wide rate). Nested insertions will inevitably dominate as these neocentromeres mature, setting the stage for the tandem repeat formation that has already been documented for CR elements (42).
We hypothesize that CentC (and the related CentO of rice) represent the end products of a previous cycle of neocentromere formation and subsequent retrotransposon invasion in the grass progenitor. The fact that centromeres in the majority of eukaryotes contain tandem repeats rather than transposable elements suggests that transposons represent a transient stage in centromere evolution shortly after neocentromere formation. The predominance of (possibly transposon-derived) tandem repeats may reflect the need for the efficient DSB repair [possibly by microhomology-mediated end joining (43)] that is afforded by such repeats.
Conclusion
Although centromeres can remain stable over millions of years, strong selection for CEN-linked genes caused replacement of many maize centromeres in just a few thousand years via gradual and catastrophic CentC loss, localized cenH3 repositioning, and invasion of neocentromeres by CRs, effectively resulting in replacement of the centromere repeat. These processes are most easily observed for centromere 5, where they are driven by the tightly linked region D, but were detected to varying degrees at all maize centromeres owing to selection for other domestication loci or agricultural traits. Because CR elements colonize neocentromeres only after their formation, we conclude that the change in centromeric DNA repeats is driven primarily by selection of centromere-proximal genes and not by one centromeric repeat outcompeting another as postulated by the centromere drive hypothesis. Nonetheless, rapid replacement of centromere sequences genome-wide has the potential to exert significant selective pressure on cenH3, the chromatin protein most closely associated with centromeric DNA.
Methods
Chromatin Immunoprecipitation and Sequencing.
Chromatin was isolated with antibody to cenH3. Input and ChIP DNA were subjected to Illumina sequencing (101 nt, paired ends) (SI Appendix, Methods). All sequences were deposited in GenBank (accession no. SRP067358).
Read Mapping.
Illumina read pairs were mapped onto the RefGen_v2 or the revised RefGen_v3 (containing resequenced CEN10) Z. mays genome using paired-end bowtie (-X 2000,–chunkmbs 3000, -k 3,–strata,–best, -v 2, -q) (44) (SI Appendix, Methods).
Phylogenetic Reconstructions.
Phylogenies were reconstructed primarily from HapMap2 data (26). HapMap2 SNP and nucleotide calls for B73, MO17, W64A, the TILs, and the NAM lines based on RefGen_v2 were downloaded from www.panzea.org/lit/data_sets.html. The number of non–SNP-covered nucleotides used to calculate divergence times (i.e., background coverage) was determined by counting the number of nucleotides covered by one or more reads in the region of interest using the BAM files from the HapMap2 study downloaded from iPlant, and equals the total number of covered nucleotides minus the variant calls of the HapMap2 study (SI Appendix, Methods).
Identification and Classification of CR1, CR2, and CR3 Junctions.
GSS and cenH3 ChIP-seq reads were used to identify CR junctions, map their position in the B73 physical map, and determine their distribution in Zea lines for placement on the CEN phylogenies (SI Appendix, Methods).
Dating CEN5 Divergence and Retrotransposon Insertion Times.
Distances were calculated using the K2P model for nucleotide substitutions. The substitution rate of 3.3 × 10−8 substitutions per site per year determined for the tb1 intergenic region (45) was used to convert distance to divergence time (SI Appendix, Text S2).
Genome-Wide Distance Matrices and Phylogenies.
FASTA files generated from SNP and background calls for each 300-kb window (30-kb steps) of the maize genome. For each of the 25 NAM lines, MO17, B73, W64A, and 17 TILs were used to calculate distance matrices and reconstruct phylogenies (SI Appendix, Methods).
Tajima’s D Statistics.
Tajima’s D statistics were calculated using HapMap2 SNPs plus regions replaced with HapMap3.1 (SI Appendix, Methods).
Construction of Centromere Phylogenies.
Neighbor-joining trees were constructed from sequence alignments of SNPs called using HapMap2 data from nonrecombinant centromere-proximal regions (SI Appendix, Methods). For all trees, nonancestral CR2 elements are noted on the appropriate branch and colored by centromere region (CEN#L, CEN#M, or CEN#R).
CentC Scoring.
The amount of CentC was determined visually for all chromosomes of each inbred, and quantitatively for CEN10, using published FISH karyotypes (19) (https://birchler.biology.missouri.edu/somatic-karyotype-analysis/). Details are provided in SI Appendix, Methods.
Determining the Time Span of Neocentromere Formation.
Neocentromere formation is defined on the phylogeny by the first branch (from the root) with CR2 insertions into the neocentromere and with descendants that all form similar neocentromeres (SI Appendix, Methods).
Supplementary Material
Acknowledgments
We thank Jim Brewbaker (University of Hawaii) and Karl Kremling (Cornell University) for providing tissue of the Hi and NAM lines, respectively. Mahalo to James Birchler (University of Missouri) and Ed Buckler (Cornell University) for many thoughtful discussions. Funding was provided by the University of Hawaii, the National Science Foundation (Grant DBI 0922703), and the US Department of Agriculture (Grant NIFA5022H).
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: The sequence reported in this paper has been deposited in the GenBank database (accession no. SRP067358).
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1522008113/-/DCSupplemental.
References
- 1.Henikoff S, Ahmad K, Malik HS. The centromere paradox: Stable inheritance with rapidly evolving DNA. Science. 2001;293(5532):1098–1102. doi: 10.1126/science.1062939. [DOI] [PubMed] [Google Scholar]
- 2.Henikoff S, Malik HS. Centromeres: Selfish drivers. Nature. 2002;417(6886):227. doi: 10.1038/417227a. [DOI] [PubMed] [Google Scholar]
- 3.Matsuoka Y, et al. A single domestication for maize shown by multilocus microsatellite genotyping. Proc Natl Acad Sci USA. 2002;99(9):6080–6084. doi: 10.1073/pnas.052125199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Piperno DR, Flannery KV. The earliest archaeological maize (Zea mays L.) from highland Mexico: New accelerator mass spectrometry dates and their implications. Proc Natl Acad Sci USA. 2001;98(4):2101–2103. doi: 10.1073/pnas.98.4.2101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Doebley J, Stec A. Genetic analysis of the morphological differences between maize and teosinte. Genetics. 1991;129(1):285–295. doi: 10.1093/genetics/129.1.285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Doebley J, Stec A. Inheritance of the morphological differences between maize and teosinte: Comparison of results for two F2 populations. Genetics. 1993;134(2):559–570. doi: 10.1093/genetics/134.2.559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Doebley J, Stec A, Wendel J, Edwards M. Genetic and morphological analysis of a maize-teosinte F2 population: Implications for the origin of maize. Proc Natl Acad Sci USA. 1990;87(24):9888–9892. doi: 10.1073/pnas.87.24.9888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Dorweiler J, Stec A, Kermicle J, Doebley J. Teosinte glume architecture 1: A genetic locus controlling a key step in maize evolution. Science. 1993;262(5131):233–235. doi: 10.1126/science.262.5131.233. [DOI] [PubMed] [Google Scholar]
- 9.Hufford MB, et al. Comparative population genomics of maize domestication and improvement. Nat Genet. 2012;44(7):808–811. doi: 10.1038/ng.2309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Hufford MB, et al. The genomic signature of crop–wild introgression in maize. PLoS Genet. 2013;9(5):e1003477. doi: 10.1371/journal.pgen.1003477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ross-Ibarra J, Tenaillon M, Gaut BS. Historical divergence and gene flow in the genus Zea. Genetics. 2009;181(4):1399–1413. doi: 10.1534/genetics.108.097238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Wolfgruber TK, et al. Maize centromere structure and evolution: Sequence analysis of centromeres 2 and 5 reveals dynamic loci shaped primarily by retrotransposons. PLoS Genet. 2009;5(11):e1000743. doi: 10.1371/journal.pgen.1000743. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gorinsek B, Gubensek F, Kordis D. Evolutionary genomics of chromoviruses in eukaryotes. Mol Biol Evol. 2004;21(5):781–798. doi: 10.1093/molbev/msh057. [DOI] [PubMed] [Google Scholar]
- 14.Miller JT, Dong F, Jackson SA, Song J, Jiang J. Retrotransposon-related DNA sequences in the centromeres of grass chromosomes. Genetics. 1998;150(4):1615–1623. doi: 10.1093/genetics/150.4.1615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Neumann P, et al. Plant centromeric retrotransposons: A structural and cytogenetic perspective. Mob DNA. 2011;2(1):4. doi: 10.1186/1759-8753-2-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Presting GG, Malysheva L, Fuchs J, Schubert I. A Ty3/gypsy retrotransposon-like sequence localizes to the centromeric regions of cereal chromosomes. Plant J. 1998;16(6):721–728. doi: 10.1046/j.1365-313x.1998.00341.x. [DOI] [PubMed] [Google Scholar]
- 17.Sharma A, Presting GG. Evolution of centromeric retrotransposons in grasses. Genome Biol Evol. 2014;6(6):1335–1352. doi: 10.1093/gbe/evu096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Sharma A, Schneider KL, Presting GG. Sustained retrotransposition is mediated by nucleotide deletions and interelement recombinations. Proc Natl Acad Sci USA. 2008;105(40):15470–15474. doi: 10.1073/pnas.0805694105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Albert PS, Gao Z, Danilova TV, Birchler JA. Diversity of chromosomal karyotypes in maize and its relatives. Cytogenet Genome Res. 2010;129(1-3):6–16. doi: 10.1159/000314342. [DOI] [PubMed] [Google Scholar]
- 20.Lamb JC, Birchler JA. Retroelement genome painting: Cytological visualization of retroelement expansions in the genera Zea and Tripsacum. Genetics. 2006;173(2):1007–1021. doi: 10.1534/genetics.105.053165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Bilinski P, et al. Diversity and evolution of centromere repeats in the maize genome. Chromosoma. 2015;124(1):57–65. doi: 10.1007/s00412-014-0483-8. [DOI] [PubMed] [Google Scholar]
- 22.Lee HR, et al. Chromatin immunoprecipitation cloning reveals rapid evolutionary patterns of centromeric DNA in Oryza species. Proc Natl Acad Sci USA. 2005;102(33):11793–11798. doi: 10.1073/pnas.0503863102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.McMullen MD, et al. Genetic properties of the maize nested association mapping population. Science. 2009;325(5941):737–740. doi: 10.1126/science.1174320. [DOI] [PubMed] [Google Scholar]
- 24.Nagaki K, et al. Sequencing of a rice centromere uncovers active genes. Nat Genet. 2004;36(2):138–145. doi: 10.1038/ng1289. [DOI] [PubMed] [Google Scholar]
- 25.Wang K, Wu Y, Zhang W, Dawe RK, Jiang J. Maize centromeres expand and adopt a uniform size in the genetic background of oat. Genome Res. 2014;24(1):107–116. doi: 10.1101/gr.160887.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Chia JM, et al. Maize HapMap2 identifies extant variation from a genome in flux. Nat Genet. 2012;44(7):803–807. doi: 10.1038/ng.2313. [DOI] [PubMed] [Google Scholar]
- 27.Tian F, Stevens NM, Buckler ES., 4th Tracking footprints of maize domestication and evidence for a massive selective sweep on chromosome 10. Proc Natl Acad Sci USA. 2009;106(Suppl 1):9979–9986. doi: 10.1073/pnas.0901122106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Lamb JC, Meyer JM, Birchler JA. A hemicentric inversion in the maize line knobless Tama flint created two sites of centromeric elements and moved the kinetochore-forming region. Chromosoma. 2007;116(3):237–247. doi: 10.1007/s00412-007-0096-6. [DOI] [PubMed] [Google Scholar]
- 29.Miracle MP. The introduction and spread of maize in Africa. J Afr Hist. 1965;6(1):39–55. [Google Scholar]
- 30.Bukowski R, et al. 2015. Construction of the third-generation Zea mays haplotype map. bioRxiv. [DOI]
- 31.Liu Y, et al. Genetic diversity and linkage disequilibrium estimation among the maize breeding germplasm for association mapping. Int J Agric Biol. 2014;16(5):851–861. [Google Scholar]
- 32.Troyer AF. Background of U.S. hybrid corn. Crop Sci. 1999;39(3):601–626. [Google Scholar]
- 33.Buckler ES, et al. The genetic architecture of maize flowering time. Science. 2009;325(5941):714–718. doi: 10.1126/science.1174276. [DOI] [PubMed] [Google Scholar]
- 34.Shi J, et al. Widespread gene conversion in centromere cores. PLoS Biol. 2010;8(3):e1000327. doi: 10.1371/journal.pbio.1000327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Wright SI, et al. The effects of artificial selection on the maize genome. Science. 2005;308(5726):1310–1314. doi: 10.1126/science.1107891. [DOI] [PubMed] [Google Scholar]
- 36.Larièpe A, et al. The genetic basis of heterosis: Multiparental quantitative trait loci mapping reveals contrasted levels of apparent overdominance among traits of agronomical interest in maize (Zea mays L.) Genetics. 2012;190(2):795–811. doi: 10.1534/genetics.111.133447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Graham GI, Wolff DW, Stuber CW. Characterization of a yield quantitative trait locus on chromosome five of maize by fine mapping. Crop Sci. 1997;37(5):1601–1610. [Google Scholar]
- 38.Hufford MB, Bilinski P, Pyhäjärvi T, Ross-Ibarra J. Teosinte as a model system for population and ecological genomics. Trends Genet. 2012;28(12):606–615. doi: 10.1016/j.tig.2012.08.004. [DOI] [PubMed] [Google Scholar]
- 39.Carneiro M, Ferrand N, Nachman MW. Recombination and speciation: Loci near centromeres are more differentiated than loci near telomeres between subspecies of the European rabbit (Oryctolagus cuniculus) Genetics. 2009;181(2):593–606. doi: 10.1534/genetics.108.096826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Stump AD, et al. Centromere-proximal differentiation and speciation in Anopheles gambiae. Proc Natl Acad Sci USA. 2005;102(44):15930–15935. doi: 10.1073/pnas.0508161102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Guerrero AA, et al. Centromere-localized breaks indicate the generation of DNA damage by the mitotic spindle. Proc Natl Acad Sci USA. 2010;107(9):4159–4164. doi: 10.1073/pnas.0912143106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Sharma A, Wolfgruber TK, Presting GG. Tandem repeats derived from centromeric retrotransposons. BMC Genomics. 2013;14(1):142. doi: 10.1186/1471-2164-14-142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.McVey M, Lee SE. MMEJ repair of double-strand breaks (director’s cut): Deleted sequences and alternative endings. Trends Genet. 2008;24(11):529–538. doi: 10.1016/j.tig.2008.08.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Clark RM, Tavaré S, Doebley J. Estimating a nucleotide substitution rate for maize from polymorphism at a major domestication locus. Mol Biol Evol. 2005;22(11):2304–2312. doi: 10.1093/molbev/msi228. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





