Summary
□Genetic incompatibilities are widespread between species. However, it remains unclear whether they all originated after population divergence as suggested by the Bateson-Dobzhansky-Muller model, and if not, what is their prevalence and distribution within populations.
□The gene presence-absence variations (PAVs) provide an opportunity for investigating genegene incompatibility. Here, we searched for the repulsion of coexistence between gene PAVs to identify the negative interaction of gene functions separately in two Oryza sativa subspecies.
□Many PAVs are involved in subspecies-specific negative epistasis and segregate at low-to-intermediate frequencies in focal subspecies but at low or high frequencies in the other subspecies. Incompatible PAVs are enriched in two functional groups, defense response and protein phosphorylation, which are associated with plant immunity and consistent with autoimmunity being a known mechanism of hybrid incompatibility in plants. Genes in the two enriched functional groups are older and seldom directly interact with each other. Instead, they interact with other younger gene PAVs with diverse functions.
□Our results illustrate the landscape of genetic incompatibility at gene PAVs in rice, where many incompatible pairs have already segregated as polymorphisms within subspecies, and many are novel negative interactions between older defense-related genes and younger genes with diverse functions.
Keywords: Oryza sativa, gene presence absence variation, genetic incompatibility, defense response and protein phosphorylation
Introduction
Speciation is the process of developing reproductive isolation. The crucial step in this process is accumulating intrinsic genetic incompatibilities that could block gene flow between populations and is therefore the main effort in speciation studies. In the Bateson-Dobzhansky-Muller incompatibility (BDMI) model, the divergence at interacting loci in different populations leads to incompatible interactions in hybrids (reviewed in Fishman and Sweigart 2018). Genetic incompatibilities are the strong negative interactions among two or more loci (negative epistasis), which are thought to be widespread both between and within species. Traditional mapping approaches found genes causing hybrid inviability or sterility between species, for example, Hmr-Lhr incompatibility in Drosophila (Brideau et al. 2006). Within species, the study in Drosophila recombinant inbred lines implies that epistasis is prevalent in the population but beyond detection, because the statistical power is confined by the sample size and the epistatic strength (Corbett-Detig et al. 2013). Several lines of evidence identified the existence of polymorphic hybrid incompatibilities through genetic mapping between species (e.g., Drosophila sibling species, Reed and Markow 2004; two species of Mimulus, Sweigart et al. 2007) or between strains in the same species (e.g., two house mice strains, Good et al. 2007, Larson et al. 2018). However, very few incompatible genes have been identified, which involves interacting genes being persevered by balancing selection in different populations (e.g., zeel-1 and peel-1 between Bristol and Hawaii strains of Caenorhabditis elegans, Seidel et al. 2008; NPR1 and RPP5 between incipient species of Capsella, Sicard et al. 2015). Nevertheless, it is still challenging to characterize genetic incompatibilities in detail, especially the origin and formation of incompatibilities and whether they were segregating as polymorphisms before divergence or developed afterwards as suggested by BDMI, which is important to understand the speciation process (reviewed in Cutter 2012).
Genetic mapping, as the common strategy to identify genetic incompatibility, highly depends on the tight linkage between genetic markers and causal elements. With the magnificent amount of population genomes data, methods based on negative linkage disequilibrium between loci have been widely used and tested in multiple species (for example, Arabidopsis thaliana, Simon et al. 2008, yeast, Li et al. 2013, Drosophila melanogaster Corbett-Detig et al. 2013, and swordtail fish Schumer et al. 2014 & 2016). All examples used genetic markers, mainly SNPs or microsatellites, to find inter-chromosomal genetic incompatibilities. Given that genes are the basic functional units, identifying incompatible genes relies on their linkage with those genetic markers. Instead of using genetic markers, the gene presence-absence variations (gene PAVs) in the pan-genome provide a unique opportunity to survey the gene-gene incompatibilities directly. Specifically, the lack of gene co-absence in the same genome implies functional essentiality, and the lack of gene co-presence suggests incompatibility caused by conflicts in their functions. Here we focus on the latter.
The cultivated Asian rice (Oryza sativa) is particularly suitable for studying the genetic incompatibilities as there are two subspecies, Oryza sativa subsp. indica and japonica. They are partially reproductively isolated without whole-genome or chromosomal duplications before or after subspecies divergence. Genetic mapping has been performed by crossing indica and japonica, and several fixed incompatible loci were identified (such as hybrid sterility of both female and male, reviewed in Ouyang and Zhang 2013). In addition to being fixed, substantial gene-gene incompatibilities might be segregating within subspecies. So far, a genomic survey of incompatibility in rice populations has not been performed. The 3K Rice Genome (3K-RG) Project provides 453 high-quality genomes with sequencing depth >20 X, including the two subspecies, 303 indica and 92 japonica, and three other groups, 33 Aus (Aus, Boro, and Rayada ecotypes), 10 Bas (Basmati and Sadri aromatic varieties), and 15 admixture/ADM accessions. These samples are distributed across most of the world's major geographic regions (Agrama et al. 2010, Wang et al. 2018). Previous studies have identified high genetic variations, including the gene presence-absence variations (gene PAVs) in the pan-genome (Sun et al. 2017, Hu et al. 2017, Wang et al. 2018).
The recent duplicates of an essential gene likely share the same function. With PAVs in these homologs, the presence of any one of the two homologs in the genome is sufficient, but the lack of co-absence could be observed due to the functional essentiality. In contrast, the lack of co-presence exhibits the disadvantage of gene interactions that may contribute to incompatibility, and this could happen between genes with distinct functions. To focus on the incompatibility causing interactions arising from novel interactions of new genes, here we searched for repulsion of co-presence in rice gene PAVs as signs of negative epistasis to identify potential gene-gene incompatible pairs. Different from the standard BDMI model, which proposed genetic incompatibilities originated after population divergence, we found many incompatible pairs segregate as polymorphisms. Genes involved in such negative epistasis are enriched in defense-related functions and show signs consistent with balancing selection. The results also provide genomic support to a previously proposed mechanism of gene-flow barrier in plants (Bomblies and Weigel 2007).
Materials and Methods
Data sources and curation
The rice (Oryza sativa L.) pan-genome project was described in detail by Sun et al. (2017) and Wang et al. (2018). Briefly, the 3K Rice Genome (3K-RG) Project sequenced more than 3000 rice (Oryza sativa L.) genomes. Among these, 453 accessions were sequenced with depth > 20X for PAV calling. De novo assembly for each accession was performed and aligned to the reference genome (Nipponbare RefSeq, Kawahara et al. 2013). After collapsing redundant sequences with >= 90% identity to either the reference genome or all novel sequences, the remaining contigs formed the non-redundant novel sequences (Sun et al. 2017). The pan-genome dataset was then constructed by combining the Nipponbare RefSeq and non-redundant novel sequences. After repeat masking, protein-coding genes on novel sequences were predicted and refined to remove redundant and incomplete genes. To identify the presence or absence of each gene in each accession, the "map-to-pan" strategy was used (Sun et al. 2017). Reads from each accession were mapped to the pan-genome. The read coverage at one gene was used to determine its presence, where 95% of the coding sequence and 85% of the gene body were covered (Wang et al. 2018). We included this gene PAV dataset predicted in 453 accessions (https://doi.org/10.6084/m9.figshare.c.3876022.v1). To characterize the biological features of those PAVs, the gene ages were determined by the sequence similarities with genes in other organisms (Wang et al. 2018). The PAVs are grouped into 13 taxonomic levels of gene age using the NR protein database from NCBI, from the oldest PS1 to the youngest PS13. The dataset of 13 taxonomic levels of gene age was downloaded from the Rice Pan-genome Browser (http://cgm.sjtu.edu.cn/3kricedb/data_download.php). Briefly, PS1 contains the oldest genes shared by all cellular organisms, PS2 genes shared among Eukaryotes, onward, PS8 genes shared among flowering plants, PS12 genes shared in the Oryza clade, and PS13 contains genes only segregating in O. sativa (for further details see Wang et al. 2018).
In addition to gene PAVs, we also included 4-fold degenerate SNPs and four other types of structural variations (duplications, deletions, inversions, and translocations). All those variants and annotations were generated by the 3K-RG project (Wang et al. 2018). The SNP data were downloaded from Rice SNP-Seek Database (https://snp-seek.irri.org/). The gene PAVs and structural variations were downloaded from the figshare database (https://doi.org/10.6084/m9.figshare.c.3876022.v1).
Characterizing variations
To evaluate the population differentiation, FST per variant was calculated between indica and japonica (Weir and Cockerham 1984). PCA was calculated for all variants mentioned above by PLINK 1.9 (Chang et al. 2015). We also investigated the differences between gene PAVs and other variants from their site frequency spectra (SFS). The SFS contains the presence frequency for the gene PAVs and SVs, as well as the derived allele frequency of 4-fold SNPs. The derived allele and ancestral alleles are defined by a whole genome comparison between Nipponbare RefSeq and a closely related species, Oryza. glumaepatula. Any allele shared with O. glumaepatula was defined as the ancestral state. Only SNPs whose ancestral and derived states could be identified were included.
Detecting incompatible gene pairs
In total, 24,777 gene PAVs are segregating in O. sativa. To find the significant genetic incompatibility, we only considered polymorphic PAVs which are not close to being fixed or lost in each subspecies. We filtered PAVs for the gene presence frequencies between 0.05 and 0.95 in each subspecies. Therefore, 12,978 PAVs in indica and 11,509 PAVs in japonica were retained for the analyses. Among them, 8,898 PAVs genes are shared. We calculated Hamming distance using these filtered PAV datasets (Fig. S1). No two accessions share >95% of gene PAVs. Therefore, we included all the accessions in our BDMI detection.
We searched for signs of negative epistasis to identify incompatible gene pairs independently in indica or japonica. Assuming that 1 represents the presence of a gene and 0 represents the absence of this gene, the co-presence of two genes (gene A and B) is denoted as 11. Similarly, the co-absence is denoted as 00, and individual gene presence is denoted as 01 or 10. The frequency of gene co-presence (f11) is expected to be the product of the presence frequencies of two PAVs (PA * PB). The significantly low frequency of co-presence implies selection force against the co-presence, i.e., negative epistasis. Here, D11 values of the linkage disequilibrium (i.e., D11 = (f11) — PA * PB) were adopted to estimate the underrepresentation of co-presence of two genes. D11 > 0 means the “coupling” of two genes: they co-occur more often in the same genome, and D11 < 0 represents the “repulsion” of two genes: they tend not to co-occur in the same genome.
We applied a statistic, X(2), to capture one topography of BDMIs (Fig. S2; Li et al. 2022). The measure of PAV variability is defined as the number of PAV differences between two randomly chosen accessions, the same with nucleotide diversity θπ. Here, the PAV diversity at one locus is hA = 2 * PA * (1 — PA) for gene A and hB = 2 * PB * (1 — PB) for gene B. The variance of PAV diversity at two genes A and B is expected to be Vexp = hA + hB — (hA)2 — (hB)2. When two PAVs are associated by epistasis or population structure, the variance of observed PAV diversity at two genes A and B deviates from its expected variance. The deviation is Δ= 4 * D11 * (f11 + f00 — f10 — 2 * D11), where D11 = f11 — PA * PB. The deviation Δ normalized to the expected variance is X(2) = Δ/Vexp. The sign of Δ determines the sign of X(2). In other words, X(2) is affected by two factors, linkage disequilibrium (D11) and topography (f11 + f00 — f10 — f01). Here, only the lack of co-presence is considered, D11 < 0.
A lack of two-PAV co-presence is present at two topographies (D11 < 0, Fig. S2). Imagine the most extreme form of population structure with two highly diverged sub-populations. One subpopulation was almost fixed for the presence of gene A and the absence of gene B, and the other sub-population had the reverse trend. While the lack of co-presence was observed (D11 < 0), population structure also created the lack of co-absence (D00 < 0). This led to the overrepresentation of genotypes 10 and 01, generating a positive X(2), where D11 < 0 and f11 +f00 − f10 − f01 <0 (Fig. S2, lower panel). On this topography, BDMI is confounded with population structure (Fig. S2; Brown et al. 1980, Smith et al. 1993). The other topography can be represented by f11 +f00 − f10 − f01 >0. In this case, co-presence is severely underrepresented (D11 < 0) and at the lowest frequency, co-absence is underrepresented as well but at relatively high frequency, and genotypes 01 and 10 are at the intermediate frequencies. Thus, X(2) becomes negative (Fig. S2; Li et al. 2022). BDMIs on this genotype-frequency topography are supposed to be less affected by population structure.
Notably, among the two possible genotype frequency topographies imposed by BDMI, only one typical topography of BDMI can be identified by negative X(2). The other topography (positive X(2)) could be generated by both BDMIs and population structure (Fig. S2). To eliminate the effect of population structure on our search for negative epistasis, in this study we focused on negative X(2) and acknowledged the lack of power to detect those weak BDMI confounded with population structure. Here, we identify the extreme cases of BDMI topography among all cases with deficient gene co-presence. After considering the proportion of most extreme cases of BDMI among all BDMI topographies (X(2) < 0 and D11 < 0; Fig. S2, upper panel), we use a stricter criterion for BDMI detection in japonica than in indica: X(2) < —0.02 for indica and X(2) < —0.04 for japonica. Both criteria ensure that the negative X(2) values identify the extreme cases among all BDMI topographies with negative X(2) (< 2%; Table S1).
If all PAVs form incompatibility with others randomly, a PAV is expected to involve in a few incompatible pairs, which equals (number of incompatible pairs*2) / (number of incompatible PAVs), following a Poisson distribution. Given that the observed distribution did not conform with expectation (see Results), the genes involved in the unexpectedly high number of PAV pairs were taken as the top incompatible genes (≥ 10 pairs in indica; ≥ 15 pairs in japonica) for further analyses.
To investigate whether incompatible PAVs were clustered in the genome, we used LDna to identify the linked clusters. LDna can cluster loci by linkage disequilibrium measured by the correlation coefficient of pairwise loci (r2, Kemppainen et al. 2015). In our study, a PAV cluster requires a minimal pairwise r2 >0.5 and a median of pairwise r2 of 0.8.
GO and protein annotation
To analyze the gene ontology (GO) enrichment, annotation data on Os-Nipponbare-Reference- IRGSP-1.0 were downloaded from the Rice Annotation Project Database (rap-db, version 202012-02, https://rapdb.dna.affrc.go.jp/download/irgsp1.html). Fisher's exact tests were used to identify the significantly enriched GO categories among incompatible PAV genes compared to the background, the filtered PAV dataset in each subspecies. Any GO term with Fisher's exact test p<0.05 was taken as the significant enrichment. Protein function annotation was obtained from Information Commons for Rice (IC4R, The IC4R Project Consortium 2016).
Results
General patterns of 4-fold degenerate SNPs and structural variations
In total, 48,098 genes were predicted and described in the 3K-RG project (Wang et al. 2018). Among them, 24,777 genes are polymorphic, i.e., gene PAVs (presence in < 453 accessions), among which 14,800 exist in Nipponbare IRGSP 1.0 reference genome (Nipponbare RefSeq genes, Kawahara et al. 2013) and 9,977 do not. To illustrate the genetic variation among 453 accessions, we performed principal component analysis (PCA) for 4-fold degenerate SNPs and structural variations (SVs: deletions, duplications, gene PAVs, inversions, and translocations. summarized in Table S2). Consistent with other species (i.e., Conrad & Hurles 2007) and other population structure inference (Kou et al. 2020), most types of SVs (deletions, duplications, inversions, translocations) have similar patterns of variation as 4-fold degenerate SNPs (Fig. 1a, Fig. S3). However, gene PAVs possess different patterns (Fig.1, Fig. S3). While the five genetic groups could be separated quite well from the first two PCs of 4-fold degenerate SNPs, more within-group variation was observed for gene PAVs (Fig. 1b). Interestingly, the PC scores are strongly correlated with the total gene number in the genome of each accession (Person's correlation, PC1: r2 = 0.588, P < 2.2e-16; PC2: r2 = 0.496, P < 2.2e-16). Unlike 4-fold SNPs or other SVs, PAV patterns might be shaped by other factors in addition to demography. Like other structure variations, PAVs show reduced FST between indica and japonica than 4-fold SNPs (Fig. S4, Table S3).
Figure 1.
Principal Component Analysis applied to 4-fold SNPs (a) and all gene PAVs (b) of 453 accessions. The 453 accessions were separated into five groups, including 303 indica, 92 japonica, 33 Aus (Aus, Boro, and Rayada ecotypes), 10 Bas (Basmati and Sadri aromatic varieties), and 15 admixed accessions of indica and japonica. These groups are separated more obviously by 4-fold SNPs than by gene PAVs.
Gene-gene incompatibilities are widespread in O. sativa
Incompatibility between genes is characterized as the underrepresentation of gene co-presence in the population. The under-representation of co-presence (denoted as genotype 11) is determined by negative values of linkage disequilibrium (LD) of genes A and B (D11 = f11 — PA * PB, where f and P are the presence frequency). We used negative X(2) to detect one type of BDMI topography: negative epistasis decreases the frequency of co-occurrence strongly, decreases the frequencies of two repulsive genotypes indirectly, but renders co-absence segregating at relatively high frequency (Fig. S2; see Methods). In total, we identified 5,646 incompatible pairs in indica and 4,124 pairs in japonica. Those incompatible pairs include 2,125 gene PAVs in indica and 1,022 PAVs in japonica. Since the incompatible pairs were detected based on both linkage disequilibrium (weak association) and their topography (X(2), see Methods), the LD between incompatible PAVs is slightly stronger than the genomic background (the 4-fold SNPs and the background PAVs; Fig. S5). Among those pairs, indica and japonica only share two incompatible pairs, but they share 198 incompatible PAVs (Fig. S6), which is significantly higher than random sampling (permutation test, mean=129, p<0.01).
Among the 2,125 gene PAVs involved in incompatible interactions in indica, 36.5% (775) exist in the Nipponbare reference genome, and the proportion is 37.8% (386/1,022) in japonica. Both proportions are significantly lower than the background PAVs with presence frequency between 5% and 95% (indica: 56.8% of all background PAVs exists in the reference genome; japonica: 60.3%; binomial test, p < 2.2e — 16). Most pairs include at least one gene PAV not existing in the Nipponbare genome (88.9% of incompatible pairs in indica; 97.1% in japonica), and a large proportion of pairs exist between non-reference gene PAVs (42.7 % for indica and 38.2% for japonica). These large proportions suggest that the pan-genome provides a more comprehensive set of genes for searching genetic incompatibilities than the reference genome per se, since the reference genome only represents genes present within one japonica cultivar (Kawahara et al. 2013).
Since the incompatible PAVs were detected using the repulsion-type LD, one potential confounding factor is the close linkage between PAVs, which creates strong LD even without negative epistasis. For incompatible PAVs with known locations in the Nipponbare reference genome (628 pairs in indica; 118 pairs in japonica), most are long-range (>1Mb) interactions within a chromosome (130 pairs in indica; 6 pairs in japonica) or between chromosomes (Fig. S7). As shown by recent studies, LD does not extend too long in rice (McCouch et al. 2016). In our study, the average r2 between incompatible PAVs is about 0.1 in japonica and 0.05 in indica (Fig. S5), which roughly equals the LD between two loci less than 50kb away (McCouch et al. 2016). For those PAVs that do exist in the reference genome, our analyses showed that only 39 pairs are between genes within 1Mb and two pairs within 100kb in indica (Fig. S7). For incompatible PAV pairs in japonica, none was between genes within 1Mb. This suggests that few gene PAVs in the incompatible pairs are closely linked.
Unequal contributions of PAVs to overall incompatibility
Assuming all gene PAVs have equal chances to contribute to genetic incompatibility, each PAV would be expected to be involved in (number of incompatible pairs*2) / (number of incompatible PAVs) pairs under a Poisson distribution, where the mean and variance are equal. In contrast, the distributions of the indica and japonica datasets significantly deviate from expectations (one-sample Kolmogorov-Smirnov test, p < 2.2e — 16; Fig. 2). The mean and variance are 5.31 and 37.82 in indica and 8.07 and 86.08 in japonica. More PAVs were overrepresented in the lower and higher ends of the distribution, leading to a monotonically decreasing distribution instead of a concave Poisson distribution (Fig. 2). This indicates that many cases of incompatibility are driven by a small number of gene PAVs interacting with many other PAVs. We further assigned PAVs involved in higher numbers of incompatible pairs than expected into the top PAV set (≥ 10 pairs in indica; ≥ 15 in japonica; Fig. 2; Table S4). Hence, we identified 349 top PAVs in indica (involved in 2,863 of 5,646 pairs, 50.7%) and 196 top PAVs in japonica datasets (in 2,239 of 4,124 pairs, 54.3%).
Figure 2.
Observed occurrence spectra of interactions per PAV are skewed towards low and high frequencies compared to the expected distributions for indica (a) and japonica (b). We used all significant negative interactions between all PAV loci pairs and counted the number of interactions of each PAV involved. The blue lines show the expected occurrence distributions estimated from a Poisson distribution. (c) The linkage disequilibrium clusters. We applied LD clustering to all the incompatible PAVs (indica_all, japonica_all) and PAVs involved in a high number of incompatible pairs (≥10 pairs in indica, indica_top; ≥15 pairs in japonica, japonica_top). In each cluster, the median correlation coefficient (r2) is 0.8, and the minimal r2 is 0.5. The right skew of interactions per PAV in panels (a) and (b) is not explainable by the LD clusters due to a low fraction of large clusters.
To find out whether those top PAVs truly have negative epistasis with many other PAVs or whether this pattern was confounded by the close linkage between other PAVs and the top PAVs' partner, we used LDna to identify linkage clusters among all incompatible PAVs (even if they were not identified in the same incompatible pair). In general, those incompatible PAVs show very similar patterns in indica and japonica, regardless of whether they are involved in many pairs. Among them, ~80% of PAV clusters contain only one PAV, ~8% of clusters contain two PAVs, and the rest, ~12%, contain more than two PAVs (Fig. 2c). Most of the clusters contain <10 PAVs, with only two exceptions (15 and 20 PAVs) in indica. Compared to the number of incompatible pairs that top PAVs involve (>10 interacting partners, Fig. 2a, b), the number of PAVs per cluster are relatively small. In other words, while the LD between incompatible PAVs is higher than 4-fold SNPs (Fig. S5), the LD is still much lower than what would be expected under close linkage (the minimal pairwise r2 >0.5 for identifying LD cluster by LDna, see Methods). Therefore, the unequal contributions of PAVs to overall incompatibility cannot be explained by one PAV interacting with another PAV that happened to be closely linked with many other PAVs, and this further supports that some PAVs are interacting hubs.
Incompatible PAV frequency spectra
The two-dimensional site frequency spectrum (2D-SFS) can describe allele frequency variability between two populations. For neutral comparison, 4-fold SNPs were filtered for minor allele frequency (MAF) > 0.05 like PAVs. 2D-SFS of filtered 4-fold SNPs clearly shows the differentiation between indica and japonica, with a relatively high abundance of low-frequency SNPs in one species and high-frequency SNPs in the other subspecies (top-left and bottom-right patches Fig. 3a). Given the much larger number of SNPs with MAF < 0.05, such a pattern is not obvious in the 2D-SFS using all 4-fold SNPs (Fig. S8a). Distinct from 4-fold SNPs, the homologous gene PAVs in indica and japonica share similar allele frequencies along the diagonal in the 2D-SFS (Fig. 3b and S8c). This suggests the two subspecies did not strongly diverge in PAV frequencies, consistent with the observation that PC1 of PAVs does not separate subspecies (Fig. 1). PAVs are enriched in high presence frequencies in both indica and japonica (Fig. 3b and S8c), suggesting either a very recent gene loss or that selection constraints kept gene presence. If one assumes gene absence tends to be the derived state, we note the PAV 2D-SFS mostly represents the ancestral allele frequency distribution, which does not affect our conclusion.
Figure 3. Variant two-dimension site frequency spectra (2D-SFS).
(a). 2D-SFS of 4-fold SNPs with minor allele frequency > 0.05. (b). 2D-SFS of background gene PAVs, which were used to determine pairwise incompatibilities. If one assumes gene absence tends to be the derived state, we note the PAV 2D-SFS mostly represents the ancestral allele frequency, which does not affect our conclusion. (c) & (d). 2D-SFS of incompatible PAVs identified in indica and japonica subspecies. Color intensity represents the relative dot density. Incompatible PAVs are at intermediate frequencies in the subspecies they were found but at low frequencies or close to fixation in the other subspecies, suggesting the mid-frequency patterns are subspecies-specific.
Different from background PAVs mostly concentrating at extreme frequencies (Fig. 3b), incompatible PAVs tend to have slightly more intermediate frequencies in the focal subspecies (the subspecies where the PAVs were identified as incompatible Fig. 3c,d, S9 and S10), suggestive of balancing selection maintaining both PAVs in populations or sibling species (Seidel et al. 2008, Sicard et al. 2015). The presence frequencies of background PAVs are correlated between indica and japonica (Fig. 3b and S8c). However, PAVs involved in negative epistasis in the focal subspecies tend to be at extreme frequencies in the other subspecies, particularly at low (<0.05) or high (>0.95) frequencies. Moreover, indica and japonica show different patterns. Incompatible PAVs identified in indica are mostly concentrated in the low-frequency regime and some in high-frequency in japonica. On the contrary, those incompatible PAVs in japonica are at low or intermediate frequencies in indica. This pattern is more pronounced in top incompatible PAVs, where the top incompatible PVAs in a subspecies are mostly at low frequency in the other subspecies (Fig. S9 and S10). Taken together, the mutual frequency distributions indicate that BDMIs lingering in one subspecies tend to be eliminated in the other species, likely by lowering the presence frequencies of the incompatible PAVs and thus decreasing the chance of co-presence.
The topographies and the genotype frequencies of incompatible PAV pairs in the focal and the other subspecies can be visualized in Fig. 4. In the focal subspecies, the genotype frequencies follow the BDMI topography (Fig. 4a,b & S2), with lower-than-expected co-presence. In the other subspecies, the co-presence genotypes (11) tend to be at low frequency as well (Fig. 4c,d). With respect to the co-absence genotype (00), they tend to be at either high or low frequencies (Fig. 4c,d). In the non-focal subspecies, a high-frequency co-absence is mostly associated with the lower presence (frequency below 0.2) of either gene and therefore low co-presence. A low-frequency co-absence is coupled with two repulsive genotypes (01/10), one at extremely low frequency (< 0.1) and the other at very high frequency (> 0.9). Consistent with the 2D-SFS of incompatible PAVs (Fig. 3), this indicates BDMI in one subspecies was likely avoided (or eliminated) in the other subspecies by either maintaining both genes at low presence frequency or fixing one gene and losing the other gene.
Figure 4.
Pairwise DMI topographies in one subspecies (a&b) and their corresponding topographies in the other subspecies (c&d). (a) 5,646 DMI topographies in indica. (b) 4,124 DMI topographies in japonica. (c) The corresponding topographies of indica DMIs (panel a) in japonica. (d) The corresponding topographies of japonica DMIs (panel b) in indica. A gene presence is coded as 1, and a gene absence is 0. The genotype combinations of two interacting PAVs (00 − co-absence, 01 / 10 − presence of either one PAV, 11 − co-presence) are listed along the horizontal axis, and the vertical axis represents frequency of these combinations. Lines connect two genotypes one step away on a topography. The black dots represent the expected genotype frequencies calculated from PAV frequencies, and the red dots are observed frequencies. The co-presence (11) is underrepresented in all four panels. The incompatible PAVs identified in one subspecies show an overrepresentation at extremely low- or high- frequencies in the other subspecies (c&d), indicating that DMI in the other subspecies is eliminated by losing both genes or fixing one and losing the other gene.
From the distinct frequency distribution of incompatible PAVs in the focal subspecies versus in the other subspecies (Fig. 3c,d & 4), it is intuitive that the incompatible PAVs have higher between-subspecies FST than the genomic background (the filtered PAVs; Fig. S11), suggesting that segregating BDMIs may have contributed to subspecies differentiation. Interestingly, within-subspecies incompatibilities better spread accessions in the focal subspecies but not accessions in the other subspecies in PCA (Fig. 5 & S12), consistent with their frequency distribution within each subspecies (Fig. 3c,d & 4).
Figure 5. Principal Component Analysis (PCA) applied to incompatible gene PAVs in indica and japonica.
Population structure partially affects the incompatibility distribution
We further investigate the association between incompatible PAVs and population structure. First, we applied PCA to 4-fold SNPs within subspecies as the indicator of population structure (Fig. S13). In indica, each principal component (PC1, PC2, and PC3) explains a small proportion of total variance almost equally (8.1-9.6%; Fig. 6), indicating that indica has a more homogenous population sub-structure. In japonica, PC1 explains a much larger proportion of variance than PC2 and PC3 (15.1% vs. ~7%; Fig. 6), and two major genetic clusters appear to exist (Fig. S13). Second, we calculated the correlation between gene PAVs and PC scores within subspecies. We found that most background PAVs are not associated with population sub-structure in both subspecies (Fig. S14). For the incompatible PAVs, their correlation with population structure has a similar distribution to that of the background PAVs in indica, suggesting the incompatible PAVs we identified are not strongly confounded by population sub-structure and genome-wide linkage disequilibrium (Fig. 6 & S14). In contrast, a bimodal distribution was observed for the correlation between PC axes and PAVs in japonica (Fig. 6 & S14). The bimodal distribution is more obvious in the top incompatible group. While difficult to exclude the potential influence of population structure, we note that within japonica, when one separates accessions into the two groups defined by PC1 (Fig. S13, red and blue), the FST is higher in incompatible PAVs than background PAVs or 4-fold SNPs (Fig. S15). This trend is similar to the elevated FST of incompatible PAVs between subspecies (Fig. S11). This demonstrates these incompatible PAVs have more restricted gene flow between genetic groups than neutral sites and suggests their potential effect in reproductive isolation.
Figure 6.
The distribution of correlation coefficients between population structure indices and incompatible PAVs. Principal Component Analysis (PCA) was applied to 4-fold degenerate SNPs separately in indica and japonica. Pearson's correlation (r) was calculated between a PAV(binary states of presence and absence) and three principal components (PC1, PC2, and PC3 scores). Three PAV categories were plotted, including background PAVs (light blue), incompatible PAVs (light orange), and the PAVs involved in the high number of incompatible pairs (≥ 10 pairs in indica; ≥ 15 pairs in japonica). The bin size is 0.01. The vertical axis was truncated to visualize the distribution for incompatible PAVs and top incompatible PAVs. For the whole distribution, see Figure S14.
Incompatible genes tend to be younger
To characterize the gene PAVs in the incompatible pairs, we examined their enrichment in the 13 taxonomic levels, in which gene PAVs were grouped based on gene age (PS1-13, Wang et al. 2018). PS1 is the oldest level, with homologs found in all cellular organisms. PS13 is the youngest level, existing only in O. sativa. The gene PAVs filtered for 5% minor allele frequency were used as the genomic background, where most PAVs belong to PS1 (~25%) and PS13 (~40%; Table S5). Compared to the background, the incompatible PAVs are enriched in young taxonomic levels (from PS8, annotated from flowering plants, and onwards to PS13, O. sativa specific) but fewer in old ones (PS1; incompatible genes vs. expectation from the genomic background, Chi-square test, p < 7e — 8; Fig. 7a, Table S5). The feature is similar for the top incompatible PAVs involved in multiple pairs in indica but not in japonica (Chi-square test, p = 4e — 4 for indica, and p = 0.08 for japonica; Fig. 7a, Table S5). We further performed permutation tests to examine the relative enrichment of incompatibility between gene PAVs of different age groups. In each sampling, the same number of pairs as the observed incompatible pairs were randomly drawn proportional to the PAV fractions in taxonomic levels (Table S5). Overall, the incompatibilities tend to cluster among young taxonomic levels in indica, whereas it tends to be evenly distributed in japonica (Fig. 7b,c).
Figure 7. Taxonomic levels of genes.
(a) Age distribution of gene PAVs in all incompatible pairs or those involved in a high number of incompatible pairs, plotted separately for indica and japonica. The frequencies of incompatible PAVs are distributed to 13 taxonomy levels classified by gene age, from the oldest (found in all organisms, PS1) to the youngest (O. sativa specific, PS13). Shown are the distributions from the background PAVs (indica_bg andjaponica_bg), the PAVs involved in any incompatible pair (indica_all and japonica_all), and the PAVs involved in the high number of incompatible pairs (≥ 10 pairs in indica, indica_top; ≥ 15 pairs in japonica, japonica_top). The incompatible PAVs are significantly enriched in younger taxonomy levels compared to genomic expectation in background PAVs (Chi-square test, p < 7e -5’). (b) & (c) Age distributions between the two members of each incompatible pair. The color intensity indicates the significance (p values) of high proportions. High intensity indicates a high proportion of incompatible pairs in the specific age group. The p values were obtained from 1000 samplings. In each sampling, the same number of pairs with the observed incompatible pairs were randomly drawn proportional to the PAV fractions in specific taxonomic levels (Table S5).
Incompatible PAVs are enriched in two functional groups
To check whether there is any enrichment in particular pathways, we performed gene ontology (GO) enrichment analyses. Among all filtered gene PAVs (those with presence frequency between 0.05 to 0.95, the background set of our analyses), around 20% were annotated with biological process, molecular function, or cellular component (19.6% in indica; 17.6% in japonica), where the low annotation level was also pointed out in Wang et al. 2018. Among the incompatible PAVs, 380 were annotated, with 268 (10.5%) in indica and 140 (10.7%) in japonica (Table S6). Among them, 144 PAVs are included in the enriched GO terms (Fisher's exact test p < 0.05; 107 of 268 incompatible genes with annotation in indica; 50 of 140 in japonica; Table S7). Interestingly, in both indica and japonica, the significant GO terms were summarized into two functional groups (Table S7). The first group is related to the most significant two GO terms: defense response in biological process and ADP binding in molecular function, which contain overlapping PAVs: among the 380 annotated incompatible gene PAVs, 33 belong to defense response, and 55 belong to ADP binding, with 33 overlapped genes (Table S7). Another major group is ATP binding and protein phosphorylation. Similarly, among 380 genes, 58 genes were annotated and classified as protein phosphorylation and 62 as ATP binding, where 57 genes were shared between them (Table S7). Interestingly, ADP binding and ATP binding belong to the higher-level GO term of adenyl ribonucleotide binding, but the two functional groups do not share any incompatible gene.
GO annotation is supposed to be biased towards well-annotated genes because they are commonly observed in many organisms. Here, we specify PS1, the oldest taxonomy level. Only ~25% of all the filtered background PAVs for BDMI detection are classified in PS1 (Table S5).
In indica, 268 incompatible PAVs were annotated to GO terms, but 68% (183/268) of them are classified to PS1; in japonica, 140 incompatible PAVs were annotated to GO terms, but 67.2% (94/140) of them are classified to PS1. Significant GO terms accumulate more PAVs in older groups as well (87 significant PAVs in PS1/107 significant PAVs in indica, 42/50 in japonica). We further checked the taxonomic levels of the partners of those PAVs in significant GO terms. The interacting partners tend to be younger (185 partners in PS13/330 partners in indica; 102/196 in japonica vs. genomic background ~0.4; binomial test, p < 0.001). In the interacting network of incompatible PAVs, we observed that PAVs belonging to these two functional categories seldom directly interact with each other but instead form interaction clusters through the negative epistasis with other PAVs (Fig. S16). We conclude that the potential hybrid incompatibility was manifested not directly by these functional categories but instead by their interactions with gene PAVs of other functional types, especially through negative interactions between old and young PAVs.
One good example includes three gene PAVs (Os11g0664100, Os11g0665950, and Os11g0666300; Table S6 & S8, Fig. S17). These three PAVs are annotated as "DnaJ domain containing Protein" and functioning in protein kinase and ATP pathway, but protein sequences show they are three different genes (Fig. S17). Those three PAVs span 200kb in chromosome 11: 26.7-26.9Mb, which contains 10 genes (Fig. S17). Empirical data has shown that this region is a major quantitative trait locus ("Sb11i") controlling non-stress-related sterility in indica (Dingkuhn et al. 2017), and it is also shown that DnaJ domain containing proteins may be involved in male sterility (Yang et al. 2008). In our study, Os11g0664100 and Os11g0666300 contribute to incompatibility in japonica, and Os11g0665950 is associated with incompatibility in indica. In addition to within-subspecies incompatibility, they also contribute to differentiation between subspecies as their FST values are relatively high (0.16, 0.46, and 0.18; Fig. S17). Strikingly, the incompatible gene (Os11g0665950) in indica is present in 35% (106/303) of indica accessions but only present in 7 out of 92 japonica accessions (7.6%). Consistent with the trend reported above, those three PAVs belong to old taxonomy levels (PS1 and PS5), whereas their interacting PAVs are mostly young O. sativa specific genes (PS13) without any annotation information (Table S8).
Discussion
Genetic incompatibilities have been recognized as essential in the process of speciation. The Bateson-Dobzhansky-Muller incompatibility (BDMI) model was used as the standard model to explain the origin of genetic incompatibilities. In this model, incompatible alleles are thought to be originated after population divergence. Here we used the rice pan-genome data to detect genetic incompatibilities caused by the co-presence of gene PAVs and found many of them exist within subspecies, different from BDMI. Such polymorphism might hamper breeding efforts not only between but also within subspecies.
The presence and absence of genes form genetic incompatibilities during speciation
The classic Bateson-Dobzhansky-Muller incompatibility model (BDMI) requires two conditions. First, derived alleles were fixed in different genetic backgrounds; second, genetic incompatibilities happened after secondary contact. The efficiency of the BDMI model relies on new mutations and levels of gene flow. Instead of emphasizing gene-gene incompatibilities between reproductively isolated genetic groups, here we investigated the patterns within subspecies where the gene flow may still exist.
In terms of the presence-absence relationship between gene pairs, hybrid incompatibility might be caused by two mechanisms: simultaneous loss of functionally redundant homologs (co-absence) or novel interactions of gene functions (co-presence). In the former case, two populations may experience reciprocal loss of one member of two duplicated genes with redundant functions. Upon hybridization, members in the F2 population may lose both copies, causing hybrid inviability (reviewed in Lynch 2002). For example, in O. sativa, gene duplicates DPL1 and DPL2 reciprocally lost their functions in indica and japonica, causing hybrid pollen incompatibilities (Mizuta et al. 2010). Therefore, one expects the lack of co-absence between gene pairs to be largely explained by the joint loss of redundant gene copies. On the other hand, in this study we focused on the lack of co-presence, which would be expected if the functions of two genes are in conflict and might involve diverse novel interactions of gene functions. For example, two copies of a duplicated histidine biosynthesis gene, HISN6A and HIS6B, were reciprocally silenced in two Arabidopsis thaliana strains, but their co-presence (homozygous-homozygous negative epistasis) causes hybrid inviability (Bikard et al. 2009). While the use of marker-marker linkage disequilibrium in standard genetic mapping could not readily discern the two mechanisms, here using gene PAVs, we are able to specifically focus on identifying novel interactions of gene functions by searching for the lack of co-presence.
In particular, we observed that some PAVs were involved in many more incompatible pairs than expected. We suspect that those PAVs are interaction hubs. Hub genes tend to have a greater fitness effect compared to genes with fewer interactions (Josephs et al. 2017; Taylor et al. 2019). In a mouse hybrid population, genetic incompatibilities are determined by a few hub loci interacting with many other loci (Turner et al. 2014). In our study, the presence and absence of a hub gene might lead to different consequences of reproductive isolation.
Population sub-structure might facilitate genetic incompatibility
In all cases, gene PAVs involved in negative epistasis show much stronger signs of linkage disequilibrium than 4-fold SNPs or background gene PAVs in the focal subspecies (Fig. S5). However, such associations do not exist in the other subspecies, since many incompatible genes in one subspecies have been nearly fixed or lost in the other subspecies as shown by gene frequency spectra or haplotype frequencies (Fig. 3 & 4). This observation suggests either that incompatible PAVs were under independent selection pressures in the two subspecies or that the genes' co-presence, which once existed as ancestral polymorphism, has been selected against due to the negative interaction between two genes but resulted in distinct consequences in two subspecies.
How does the segregating incompatibility contribute to speciation? We observed similar patterns of incompatible gene PAVs at different taxonomic levels: incompatible PAVs show a higher FST between indica and japonica than background PAVs or 4-fold SNPs do (Fig. S11); similarly, incompatible PAVs also have higher FST between sub-populations of japonica (Fig. S15). Nevertheless, incompatible PAVs in japonica tend to be more correlated with population structure (Fig. 6 & S14). We note that the distinct demographic histories might contribute to such differences between subspecies: japonica has experienced a stronger bottleneck during domestication than indica (Zhu et al. 2007). japonica likely also contains a greater population sub-structure than indica because the geographic dispersal of japonica occurred much earlier than indica, and gene flows are extensive between geographic populations in indica but not much so in japonica (Gutaker et al. 2020). Indeed, genes associated with hybrid incompatibility have been reported to be FST outliers (Huang et al. 2012). Those observations strengthen the connections between genetic incompatibility and population differentiation. On the one hand, negative epistasis is a crucial force in maintaining population structure. On the other hand, incompatibility might be a result of population structure. During the domestication process, populations often have experienced bottleneck or founder events under strong artificial selection, resulting in a drastic reduction of population size. Incompatibility genes can therefore be fixed in these small populations (Nei et al. 1983). Therefore, although demographic factors might contribute to the different patterns between indica and japonica, similar patterns of incompatible gene PAVs at different taxonomic level are not simply a coincidence but indicate that segregating genetic incompatibility potentially contribute to population differentiation.
Many incompatibilities are enriched in two known pathways
The GO annotation of incompatible PAVs highlights two groups of genes in four GO terms, ADP binding/defense response and ATP binding/protein phosphorylation. As shown by several studies, those defense-related genes potentially have the pleiotropic effect causing autoimmunity and further hybrid necrosis (reviewed in Bomblies and Weigel 2007). Its genetic architecture can follow a simple BDMI model of two loci. For instance, one NB-LRR disease resistance gene homolog causes hybrid necrosis sufficiently by combining with one specific allele at another locus (Bomblies et al. 2007). On the other hand, the genetic architecture can be very complicated. In Arabidopsis thaliana, an immune receptor gene, DM2, is shown to be incompatible with several genes which function differently and independently (Chae et al. 2014). This is analogous to our observation that some genes act as interacting hubs (Fig. 2, S7), suggesting pleiotropic effects. In addition, negative epistasis exists between gene clusters, e.g., between NLR and non-NLR gene clusters (Barragan et al. 2019). As defense-related genes were often rapidly evolving, they can be easily overlooked when studies were performed based on one reference genome, a point exemplified by NLR gene families in Arabidopsis thaliana (Lee and Chae 2020). Indeed, those genes tend to interact with unannotated genes, many of which are specific in O. sativa (Fig. 7, S17; Table S8). Future long-read genomes may potentially help to build the whole interacting landscape to investigate the consequences of deleterious interactions, e.g., BDMIs may limit our ability to combine favorable alleles of loci across the plant genomes (Chae et al. 2014).
Additionally, many genes associated with defense response are shown to be under long-term balancing selection. For example, RPM1 confers resistance to a hemibiotrophic bacterial pathogen, and an ancient, stably balanced presence-absence polymorphism across Arabidopsis thaliana is well-established for RPM1 (Stahl et al. 1999). More evidence comes from genomic studies in several species, where defense-related PAVs have been observed to be maintained in different lineages (Mace et al. 2014, Chae et al. 2014, Koenig et al. 2019, Wang et al. 2019, Van de Weyer et al. 2019, Goktay et al. 2020). Besides ADP binding/defense response, another group is PAVs enriched in ATP binding/protein phosphorylation, a group of proteins known to be also involved in plant defense response and hybrid necrosis (reviewed in Li & Weigel 2021).
The age of incompatible gene PAVs and the origin of polymorphic incompatibilities
We found that PAVs involved in incompatibilities were enriched in young age groups. Young genes are supposed to be non-essential and less conserved than old genes. For example, their expression profile is narrow, most often in the testis in animals (e.g., Betran and Long 2003, Dai et al. 2006, Zhao et al. 2014, Villanueva-Canas et al. 2017). Without strong selection constraints, they potentially evolved to refine regulatory networks, and such a lack of constrained and conserved functions might facilitate the evolution of novel functions such as hybrid inviability. Among the few cases where speciation genes have been identified (reviewed in Maheshwari and Barbash 2011, Ouyang and Zhang 2013), young gene duplicates have been validated to mediate hybrid incompatibility frequently (for example, Odysseus in Drosophila, Ting et al. 1998 & 2004; DPL1 & DPL2 in rice, Mizuta et al. 2010; pTAC14 in Mimulus, Zuellig and Sweigart 2018; Xmrk in swordtail fish, Powell et al. 2020).
Even though genes with gene ontology annotation information may tend to be older, incompatible PAVs in significantly enriched GO terms still have a much higher proportion in the oldest age group than all annotated gene PAVs. These incompatible PAVs in enriched GO groups seldom form negative epistasis with each other, consistent with the idea that negative interactions among old genes might be too harmful to be persisted through the evolutionary timescale. As shown by our results, their effects on hybrid incompatibility, therefore, were mainly manifested through the novel interactions with young PAVs. Since these older genes were likely to be already under balancing selection due to their defense-related functions, they would tend to have more mid-frequency absence than most genes in the genome. This might provide space for young genes (who would be incompatible with those older defense-related genes) to accumulate and rise to middle frequency, generating the patterns of polymorphic incompatibilities. Given this, one might expect different types of genes to be involved in BDMI or in polymorphic incompatibilities, and the latter might be enriched with genes already under balancing selection for other reasons.
Conclusion
In this study, we found prevalent negative epistasis within species, not only between but also within subspecies. Different from the classical BDMI model requiring partial reproductive isolation for the respective incompatible alleles to be fixed in sub-populations, many of the incompatible PAVs segregate as polymorphisms. Negative epistasis likely happens as novel interactions between older genes with specific defense-related functions and other younger genes. Further studies on these PAVs may enlighten the study of speciation and optimize breeding strategies.
Supplementary Material
Acknowledgements
We thank the 3K rice genomes project and super pan-genomic landscape of rice for data access. We acknowledge Jue Ruan and Yanni Song for clarifying the genome data process. We thank Claudia Bank for the comments to the manuscript. We thank Mark Rausher and two anonymous referees for their constructive reviews and thoughtful comments during the review process. We are grateful for the support from National Taiwan University's Computer and Information Networking Center for high-performance computing facilities. This work was supported by the Ministry of Science and Technology (Taiwan; grant number 111-2628-B-002-021 to CRL). JL was supported by funding from ERC Starting Grant (FIT2GO; grant number 804569 to Claudia Bank).
Footnotes
Author contributions
JL and CRL designed the study. JL performed data analyses. JL and CRL wrote, edited, and approved the paper.
Competing interests
None declared.
Competing interests
No new sequence data was generated in this study.
References
- Agrama HA, Yan W, Jia M, Fjellstrom R, McClung AM, et al. Genetic structure associated with diversity and geographic distribution in the USDA rice world collection. Natural Science. 2010;2(04):247. [Google Scholar]
- Barragan CA, Wu R, Kim ST, Xi W, Habring A, Hagmann J, et al. RPW8/HR repeats control NLR activation in Arabidopsis thaliana. PLoS genetics. 2019;15(7):e1008313. doi: 10.1371/journal.pgen.1008313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Betráan E, Long M. Dntf-2r, a young Drosophila retroposed gene with specific male expression under positive Darwinian selection. Genetics. 2003;164(3):977–988. doi: 10.1093/genetics/164.3.977. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bikard D, Patel D, Le Mettée C, Giorgi V, Camilleri C, Bennett MJ, et al. Divergent evolution of duplicate genes leads to genetic incompatibilities within A. thaliana. Science. 2009;323(5914):623–626. doi: 10.1126/science.1165917. [DOI] [PubMed] [Google Scholar]
- Bomblies K, Lempe J, Epple P, Warthmann N, Lanz C, Dangl JL, et al. Autoimmune response as a mechanism for a Dobzhansky-Muller-type incompatibility syndrome in plants. PLoS biology. 2007;5(9):e236. doi: 10.1371/journal.pbio.0050236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bomblies K, Weigel D. Hybrid necrosis: autoimmunity as a potential gene-flow barrier in plant species. Nature Reviews Genetics. 2007;8(5):382–393. doi: 10.1038/nrg2082. [DOI] [PubMed] [Google Scholar]
- Brideau NJ, Flores HA, Wang J, Maheshwari S, Wang X, Barbash DA. Two Dobzhansky-Muller genes interact to cause hybrid lethality in Drosophila. Science. 2006;314(5803):1292–1295. doi: 10.1126/science.1133953. [DOI] [PubMed] [Google Scholar]
- Brown A, Feldman M, Nevo E. Multilocus structure of natural populations of Hordeum spontaneum. Genetics. 1980;96(2):523–536. doi: 10.1093/genetics/96.2.523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chae E, Bomblies K, Kim ST, Karelina D, Zaidem M, Ossowski S, et al. Species-wide genetic incompatibility analysis identifies immune genes as hot spots of deleterious epistasis. Cell. 2014;159(6):1341–1351. doi: 10.1016/j.cell.2014.10.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4(1):s13742–015. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Conrad DF, Hurles ME. The population genetics of structural variation. Nature genetics. 2007;39(7):S30–S36. doi: 10.1038/ng2042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Consortium IP. Information commons for rice (IC4R) Nucleic acids research. 2016;44(D1):D1172–D1180. doi: 10.1093/nar/gkv1141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corbett-Detig RB, Zhou J, Clark AG, Hartl DL, Ayroles JF. Genetic incompatibilities are widespread within species. Nature. 2013;504(7478):135–137. doi: 10.1038/nature12678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cutter AD. The polymorphic prelude to Bateson-Dobzhansky-Muller incompatibilities. Trends in ecology & evolution. 2012;27(4):209–218. doi: 10.1016/j.tree.2011.11.004. [DOI] [PubMed] [Google Scholar]
- Dai H, Yoshimatsu TF, Long M. Retrogene movement within-and between-chromosomes in the evolution of Drosophila genomes. Gene. 2006;385:96–102. doi: 10.1016/j.gene.2006.04.033. [DOI] [PubMed] [Google Scholar]
- Dingkuhn M, Pasco R, Pasuquin JM, Damo J, Soulíe JC, Raboin LM, et al. Crop-model assisted phenomics and genome-wide association study for climate adaptation of indica rice. 2. Thermal stress and spikelet sterility. Journal of Experimental Botany. 2017;68(15):4389–4406. doi: 10.1093/jxb/erx250. [DOI] [PubMed] [Google Scholar]
- Fishman L, Sweigart AL. When two rights make a wrong: the evolutionary genetics of plant hybrid incompatibilities. Annual review of plant biology. 2018;69:707–731. doi: 10.1146/annurev-arplant-042817-040113. [DOI] [PubMed] [Google Scholar]
- Göktay M, Fulgione A, Hancock AM. A new catalog of structural variants in 1,301 A. thaliana lines from Africa, Eurasia, and north America reveals a signature of balancing selection at defense response genes. Molecular biology and evolution. 2020;38(4):1498–1511. doi: 10.1093/molbev/msaa309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Good JM, Handel MA, Nachman MW. Asymmetry and polymorphism of hybrid male sterility during the early stages of speciation in house mice. Evolution. 2008;62(1):50–65. doi: 10.1111/j.1558-5646.2007.00257.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gutaker RM, Groen SC, Bellis ES, Choi JY, Pires IS, Bocinsky RK, et al. Genomic history and ecology of the geographic spread of rice. Nature plants. 2020;6(5):492–502. doi: 10.1038/s41477-020-0659-6. [DOI] [PubMed] [Google Scholar]
- Hu Z, Sun C, Lu Kc, Chu X, Zhao Y, Lu J, et al. EUPAN enables pan-genome studies of a large number of eukaryotic genomes. Bioinformatics. 2017;33(15):2408–2409. doi: 10.1093/bioinformatics/btx170. [DOI] [PubMed] [Google Scholar]
- Huang X, Kurata N, Wang ZX, Wang A, Zhao Q, Zhao Y, et al. A map of rice genome variation reveals the origin of cultivated rice. Nature. 2012;490(7421):497–501. doi: 10.1038/nature11532. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Josephs EB, Wright SI, Stinchcombe JR, Schoen DJ. The relationship between selection, network connectivity, and regulatory variation within a population of Capsella grandiflora. Genome biology and evolution. 2017;9(4):1099–1109. doi: 10.1093/gbe/evx068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kawahara Y, de la Bastide M, Hamilton JP, Kanamori H, McCombie WR, Ouyang S, et al. Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data. Rice. 2013;6(1):1–10. doi: 10.1186/1939-8433-6-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kemppainen P, Knight CG, Sarma DK, Hlaing T, Prakash A, Maung Maung YN, et al. Linkage disequilibrium network analysis (LDna) gives a global view of chromosomal inversions, local adaptation and geographic structure. Molecular ecology resources. 2015;15(5):1031–1045. doi: 10.1111/1755-0998.12369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koenig D, Hagmann J, Li R, Bemm F, Slotte T, Neuffer B, et al. Long-term balancing selection drives evolution of immunity genes in Capsella. Elife. 2019;8:e43606. doi: 10.7554/eLife.43606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kou Y, Liao Y, Toivainen T, Lv Y, Tian X, Emerson J, et al. Evolutionary genomics of structural variation in Asian rice (Oryza sativa) domestication. Molecular biology and evolution. 2020;37(12):3507–3524. doi: 10.1093/molbev/msaa185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Larson EL, Vanderpool D, Sarver BA, Callahan C, Keeble S, Provencio LL, et al. The evolution of polymorphic hybrid incompatibilities in house mice. Genetics. 2018;209(3):845–859. doi: 10.1534/genetics.118.300840. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee RR, Chae E. Variation patterns of NLR clusters in Arabidopsis thaliana genomes. Plant Communications. 2020;1(4):100089. doi: 10.1016/j.xplc.2020.100089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li C, Wang Z, Zhang J. Toward Genome-Wide Identification of Bateson-Dobzhansky-Muller Incompatibilities in Yeast: A Simulation Study. Genome biology and evolution. 2013;5(7):1261–1272. doi: 10.1093/gbe/evt091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li J, Schumer M, Bank C. Imbalanced segregation of recombinant haplotypes in hybrid populations reveals inter-and intrachromosomal Dobzhansky-Muller incompatibilities. PLoS genetics. 2022;18(3):e1010120. doi: 10.1371/journal.pgen.1010120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li L, Weigel D. One Hundred Years of Hybrid Necrosis: Hybrid Autoimmunity as a Window into the Mechanisms and Evolution of Plant-Pathogen Interactions. Annual Review of Phytopathology. 2021;59 doi: 10.1146/annurev-phyto-020620-114826. [DOI] [PubMed] [Google Scholar]
- Luis Villanueva-Cañas J, Ruiz-Orera J, Agea MI, Gallo M, Andreu D, Albaá MM. New genes and functional innovation in mammals. Genome biology and evolution. 2017;9(7):1886–1900. doi: 10.1093/gbe/evx136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lynch M. Gene duplication and evolution. Science. 2002;297(5583):945–947. doi: 10.1126/science.1075472. [DOI] [PubMed] [Google Scholar]
- Mace E, Tai S, Innes D, Godwin I, Hu W, Campbell B, et al. The plasticity of NBS resistance genes in sorghum is driven by multiple evolutionary processes. BMC plant biology. 2014;14(1):1–14. doi: 10.1186/s12870-014-0253-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maheshwari S, Barbash DA. The genetics of hybrid incompatibilities. Annual review of genetics. 2011;45:331–355. doi: 10.1146/annurev-genet-110410-132514. [DOI] [PubMed] [Google Scholar]
- McCouch SR, Wright MH, Tung CW, Maron LG, McNally KL, Fitzgerald M, et al. Open access resources for genome-wide association mapping in rice. Nature communications. 2016;7(1):10532. doi: 10.1038/ncomms10532. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mizuta Y, Harushima Y, Kurata N. Rice pollen hybrid incompatibility caused by reciprocal gene loss of duplicated genes. Proceedings of the National Academy of Sciences. 2010;107(47):20417–20422. doi: 10.1073/pnas.1003124107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nei M, Maruyama T, Wu CI. Models of evolution of reproductive isolation. Genetics. 1983;103(3):557–579. doi: 10.1093/genetics/103.3.557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ouyang Y, Zhang Q. Understanding reproductive isolation based on the rice model. Annual review of plant biology. 2013;64:111–135. doi: 10.1146/annurev-arplant-050312-120205. [DOI] [PubMed] [Google Scholar]
- Powell DL, García-Olazábal M, Keegan M, Reilly P, Du K, Díaz-Loyo AP, et al. Natural hybridization reveals incompatible alleles that cause melanoma in swordtail fish. Science. 2020;368(6492):731–736. doi: 10.1126/science.aba5216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reed LK, Markow TA. Early events in speciation: polymorphism for hybrid male sterility in Drosophila. Proceedings of the National Academy of Sciences. 2004;101(24):9009–9012. doi: 10.1073/pnas.0403106101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schumer M, Brandvain Y. Determining epistatic selection in admixed populations. Molecular Ecology. 2016;25(11):2577–2591. doi: 10.1111/mec.13641. [DOI] [PubMed] [Google Scholar]
- Schumer M, Cui R, Powell DL, Dresner R, Rosenthal GG, Andolfatto P. High-resolution mapping reveals hundreds of genetic incompatibilities in hybridizing fish species. Elife. 2014;3:e02535. doi: 10.7554/eLife.02535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seidel HS, Rockman MV, Kruglyak L. Widespread genetic incompatibility in C. elegans maintained by balancing selection. Science. 2008;319(5863):589–594. doi: 10.1126/science.1151107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sicard A, Kappel C, Josephs EB, Lee YW, Marona C, Stinchcombe JR, et al. Divergent sorting of a balanced ancestral polymorphism underlies the establishment of gene-flow barriers in Capsella. Nature communications. 2015;6(1):7960. doi: 10.1038/ncomms8960. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simon M, Loudet O, Durand S, Bérard A, Brunel D, Sennesal FX, et al. Quantitative trait loci mapping in five new large recombinant inbred line populations of Arabidopsis thaliana genotyped with consensus single-nucleotide polymorphism markers. Genetics. 2008;178(4):2253–2264. doi: 10.1534/genetics.107.083899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith JM, Smith NH, O’Rourke M, Spratt BG. How clonal are bacteria? Proceedings of the National Academy of Sciences. 1993;90(10):4384–4388. doi: 10.1073/pnas.90.10.4384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stahl EA, Dwyer G, Mauricio R, Kreitman M, Bergelson J. Dynamics of disease resistance polymorphism at the Rpm1 locus of Arabidopsis. Nature. 1999;400(6745):667–671. doi: 10.1038/23260. [DOI] [PubMed] [Google Scholar]
- Sun C, Hu Z, Zheng T, Lu K, Zhao Y, Wang W, et al. RPAN: rice pan-genome browser for 3000 rice genomes. Nucleic acids research. 2017;45(2):597–605. doi: 10.1093/nar/gkw958. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sweigart AL, Mason AR, Willis JH. Natural variation for a hybrid incompatibility between two species of Mimulus. Evolution. 2007;61(1):141–151. doi: 10.1111/j.1558-5646.2007.00011.x. [DOI] [PubMed] [Google Scholar]
- Taylor MA, Wilczek AM, Roe JL, Welch SM, Runcie DE, Cooper MD, et al. Large-effect flowering time mutations reveal conditionally adaptive paths through fitness landscapes in Arabidopsis thaliana. Proceedings of the National Academy of Sciences. 2019;116(36):17890–17899. doi: 10.1073/pnas.1902731116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ting CT, Tsaur SC, Sun S, Browne WE, Chen YC, Patel NH, et al. Gene duplication and speciation in Drosophila: evidence from the Odysseus locus. Proceedings of the National Academy of Sciences. 2004;101(33):12232–12235. doi: 10.1073/pnas.0401975101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ting CT, Tsaur SC, Wu ML, Wu CI. A rapidly evolving homeobox at the site of a hybrid sterility gene. Science. 1998;282(5393):1501–1504. doi: 10.1126/science.282.5393.1501. [DOI] [PubMed] [Google Scholar]
- Turner LM, Harr B. Genome-wide mapping in a house mouse hybrid zone reveals hybrid sterility loci and Dobzhansky-Muller interactions. Elife. 2014;3:e02504. doi: 10.7554/eLife.02504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van de Weyer AL, Monteiro F, Furzer OJ, Nishimura MT, Cevik V, Witek K, et al. A species wide inventory of NLR genes and alleles in Arabidopsis thaliana. Cell. 2019;178(5):1260–1272. doi: 10.1016/j.cell.2019.07.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang B, Mojica JP, Perera N, Lee CR, Lovell JT, Sharma A, et al. Ancient polymorphisms contribute to genome-wide variation by long-term balancing selection and divergent sorting in Boechera stricta. Genome biology. 2019;20(1):1–15. doi: 10.1186/s13059-019-1729-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang W, Mauleon R, Hu Z, Chebotarov D, Tai S, Wu Z, et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature. 2018;557(7703):43–49. doi: 10.1038/s41586-018-0063-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weir BS, Cockerham CC. Estimating F-statistics for the analysis of population structure. Evolution. 1984:1358–1370. doi: 10.1111/j.1558-5646.1984.tb05657.x. [DOI] [PubMed] [Google Scholar]
- Yang KZ, Xia C, Liu XL, Dou XY, Wang W, Chen LQ, et al. A mutation in Thermosensitive Male Sterile 1, encoding a heat shock protein with DnaJ and PDI domains, leads to thermosensitive gametophytic male sterility in Arabidopsis. The Plant Journal. 2009;57(5):870–882. doi: 10.1111/j.1365-313X.2008.03732.x. [DOI] [PubMed] [Google Scholar]
- Zhao L, Saelao P, Jones CD, Begun DJ. Origin and spread of de novo genes in Drosophila melanogaster populations. Science. 2014;343(6172):769–772. doi: 10.1126/science.1248286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu Q, Zheng X, Luo J, Gaut BS, Ge S. Multilocus analysis of nucleotide variation of Oryza sativa and its wild relatives: severe bottleneck during domestication of rice. Molecular biology and evolution. 2007;24(3):875–888. doi: 10.1093/molbev/msm005. [DOI] [PubMed] [Google Scholar]
- Zuellig MP, Sweigart AL. Gene duplicates cause hybrid lethality between sympatric species of Mimulus. PLoS genetics. 2018;14(4):e1007130. doi: 10.1371/journal.pgen.1007130. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.







