Skip to main content
PLOS Genetics logoLink to PLOS Genetics
. 2021 Nov 8;17(11):e1009892. doi: 10.1371/journal.pgen.1009892

The effects of introgression across thousands of quantitative traits revealed by gene expression in wild tomatoes

Mark S Hibbins 1,*, Matthew W Hahn 1,2
Editor: Alex Buerkle3
PMCID: PMC8601620  PMID: 34748547

Abstract

It is now understood that introgression can serve as powerful evolutionary force, providing genetic variation that can shape the course of trait evolution. Introgression also induces a shared evolutionary history that is not captured by the species phylogeny, potentially complicating evolutionary analyses that use a species tree. Such analyses are often carried out on gene expression data across species, where the measurement of thousands of trait values allows for powerful inferences while controlling for shared phylogeny. Here, we present a Brownian motion model for quantitative trait evolution under the multispecies network coalescent framework, demonstrating that introgression can generate apparently convergent patterns of evolution when averaged across thousands of quantitative traits. We test our theoretical predictions using whole-transcriptome expression data from ovules in the wild tomato genus Solanum. Examining two sub-clades that both have evidence for post-speciation introgression, but that differ substantially in its magnitude, we find patterns of evolution that are consistent with histories of introgression in both the sign and magnitude of ovule gene expression. Additionally, in the sub-clade with a higher rate of introgression, we observe a correlation between local gene tree topology and expression similarity, implicating a role for introgressed cis-regulatory variation in generating these broad-scale patterns. Our results reveal a general role for introgression in shaping patterns of variation across many thousands of quantitative traits, and provide a framework for testing for these effects using simple model-informed predictions.

Author summary

It is now known from studying large genetic datasets that species often hybridize and cross with each other over many generations – a phenomenon known as introgression. Introgression introduces new genetic variation into a population, and this variation can cause traits to be shared among the introgressing species. When researchers study the evolution of trait variation among species, this source of trait sharing is rarely accounted for. Here, we present a statistical model of the effects of introgression on trait variation. This model predicts that, when averaged across many thousands of traits, introgressing species are consistently more similar than expected from standard approaches. Researchers studying gene expression often consider the expression of many thousands of genes, making this a case where the expected effects of introgression are likely to manifest. We tested our model prediction using ovule gene expression data from the wild tomato genus Solanum, in two groups of species with evidence of historical introgression. We found that patterns of expression similarity in both groups are consistent with their histories of introgression and the predictions from our model. Our results highlight the importance of accounting for introgression as a source of trait variation among species.

Introduction

Introgression—the historical hybridization and subsequent backcrossing of previously isolated lineages—has come to the forefront of phylogenomics with the availability of genome sequencing (reviewed in [1,2]). Introgression has been recognized as a powerful and frequent source of adaptive variation, with many charismatic examples including wing pattern mimicry in butterflies [3,4], coat color in snowshoe hares [5], herbivore resistance in sunflowers [6], high-altitude adaptation in humans [7], and fruit color in wild tomatoes [8]. Introgressed alleles do not have to underlie discrete traits to influence the course of evolution: alleles that contribute to quantitative trait variation can also lead to more similarity than expected between the introgressing lineages [9].

Empirically investigating the effects of introgressed ancestry on quantitative trait evolution remains a challenge, despite recent theoretical and methodological advances [911]. This is because many processes besides introgression can shape the distribution of any particular character, including incomplete lineage sorting and convergence. It is therefore necessary to sample a large number of traits in order to demonstrate a genome-wide effect of introgression. Gene expression is commonly used in comparative analyses between species [1214], allowing for the study of thousands of quantitative traits in a phylogenetic framework. Introgressed variants acting on gene expression either in cis or in trans may affect the evolution of gene expression across the genome. This could have potentially deleterious effects on fitness, which would be consistent with previous evidence for widespread selection against introgressed alleles [1518].

Incomplete lineage sorting (ILS) and introgression both introduce shared history that could influence the evolution of quantitative traits, though neither of these processes are captured by a standard species phylogeny. Therefore, to paint a complete picture of trait variation among species, it is necessary to include these sources of topological discordance in order to avoid errors inherent to methods that typically only consider the species topology [19]. Mendes et al. [20] showed that when gene tree discordance is unaccounted for, standard comparative approaches will return inflated evolutionary rate estimates and will underestimate phylogenetic signal. Despite these challenges, no approach has included all sources of gene tree discordance into a single framework for quantitative trait evolution. Some methods have extended the classic Brownian motion model for quantitative trait evolution to include shared histories due to ILS alone [20], while other work has applied the Brownian motion model to a phylogenetic network with introgression but no ILS [9]. A method including both sources of discordance would provide a complete picture of the most common causes of shared evolutionary history and their effects on quantitative traits. This would in turn allow for more accurate inferences of key evolutionary parameters, such as the trait evolutionary rate.

To address the effects of historical introgression on quantitative traits, we first develop a Brownian motion model of trait evolution that includes both ILS and introgression, showing the expected effects of introgression on the similarity in quantitative traits across species. This model leads directly to predictions about patterns of trait-sharing on a three-taxon tree, which we test by leveraging whole-transcriptome gene expression data [21] from the wild tomato clade in the genus Solanum. This clade includes 13 species that have radiated within the last 2.5 million years, and contains high rates of gene tree discordance due to both ILS and introgression [22,23]. Using ovule expression data from two independent species triplets with different levels of introgression, we find that transcriptome-wide patterns of variation in both triplets are consistent with histories of introgression, with quantitatively stronger signals in the sub-clade with greater introgression. Our analyses demonstrate that introgression can have measurable effects across the genome, on thousands of quantitative traits.

Results

Brownian motion on a species tree

To accurately model trait variation among species, we require an understanding of the evolutionary history that relates those species, and a model for how traits are expected to evolve given that history. We present results using Brownian motion, a statistical model that is commonly applied to quantitative traits. The evolutionary history relating species has classically been provided by a species phylogeny. Under Brownian motion, the character states on the tips of this phylogeny follow a multivariate normal distribution, with the variance and covariances of this distribution provided by the branch lengths of the phylogeny [24].

Consider a phylogeny of three species with the topology ((A,B),C) (Fig 1). In units of 2N generations, species A and B split at time t1, and C split from the ancestor of A and B at time t2. The expected phylogenetic variances and covariances for three species can be expressed in a 3x3 matrix, which we denote T:

T=[Var(A)Cov(AB)Cov(AC)Cov(BA)Var(B)Cov(BC)Cov(CA)Cov(CB)Var(C)] (1)

This matrix is multiplied by the evolutionary rate parameter of the Brownian motion model, σ2, to obtain trait variances and covariances. When only the species phylogeny is considered (i.e. there is no ILS or introgression), the trait variances (the diagonal elements of T) are determined by the total time along which evolution can occur for each lineage, so Var(A) = Var(B) = Var(C) = t2. The covariances are determined by the length of shared internal branches. In the species tree, only species A and B share an internal branch, so:

Cov(AB|noILSorintrogression)=σ2(t2t1) (2)

Where σ2 corresponds to the trait evolutionary rate per unit time, and Cov(AB) = Cov(BA).

Fig 1. Modelling quantitative trait evolution under the combined effects of ILS and introgression.

Fig 1

1) From a phylogenetic network with known parameters, the multispecies network coalescent model can be used to predict the expected frequency and branch lengths of each gene tree topology. 2) These gene trees contribute to trait covariances through their internal branches, and to trait variances through their total heights. The contribution of each gene tree to the overall quantities in T is weighted by its expected frequency. 3) Once the values of T are estimated, character states under Brownian motion can be simulated by drawing from a multivariate normal with a mean of 0 and variance of σ2T.

In the absence of other processes, the species pairs BC and AC have zero covariance. However, trees inferred at individual loci can disagree with the species phylogeny, in which case these species pairs could have shared internal branches, and therefore non-zero covariances. This widespread phenomenon is known as gene tree discordance [22,2531] and has multiple biological causes [32]. Gene trees with the topologies ((B,C),A) or ((A,C),B) contain internal branches shared by species B+C and A+C, respectively (Fig 1). This results in non-zero covariance terms between these two species pairs in T, covariance that cannot arise from evolution solely on the species phylogeny. The consequence of this discordance is that some traits may be closer in value between species that are not closely related in the species tree. We must therefore include discordance in our model to appropriately capture this trait covariance.

Modelling the effects of only incomplete lineage sorting on quantitative trait variances and covariances

One of the most common causes of gene tree discordance is incomplete lineage sorting, which occurs when ancestral lineages persist through successive speciation events [33,34]. For a rooted triplet, there are four possible gene trees in the presence of ILS: one concordant tree that occurs by lineage sorting with probability 1e(t2t1), and three trees produced by ILS, each with probability 13e(t2t1). One of the three ILS trees is concordant, while the other two are discordant. These probabilities are the basis for the multispecies coalescent model. To obtain the expected trait variances and covariances in T, Mendes et al. [20] weight the expected gene tree heights and internal branch lengths, respectively, by their expected frequencies under the multispecies coalescent model. We present those results here with a slightly different formulation for consistency with the new results presented below. For the covariance between A and B, we have:

Cov(AB|nointrogression)=σ2[(1e(t2t1))(et2(t2t1)et2et1)+(13e(t2t1))] (3)

In Eq 3, σ2 corresponds to the trait evolutionary rate per 2N generations, which is the scale over which time is measured in the multispecies coalescent model. Inside the square brackets, the first term is the probability of the gene tree produced by lineage sorting, multiplied by that tree’s expected internal branch length in units of 2N generations. The second term is the probability of the concordant tree produced by ILS, which has an expected internal branch length of 1 in units of 2N generations.

Species pairs BC and AC can only have covariance from discordant trees produced by ILS, which gives:

Cov(BC|nointrogression)=Cov(AC|nointrogression)=σ2(13e(t2t1)) (4)

Again, in these trees the internal branches are of length 1 in units of 2N generations, and so are not shown explicitly.

For the expected trait variances, all three species share the same expected variance, which is the total height of all the gene trees weighted by their probabilities. These are:

Var(A|nointrogression)=Var(B|nointrogression)=Var(C|nointrogression)=σ2[(1e(t2t1))(t2+1)+(e(t2t1))(t2+1+1/3)] (5)

Where the first term in the square brackets is the contribution from the lineage sorting tree, and the second term is the contribution from the three ILS trees.

Modelling the effects of introgression and ILS on quantitative trait variances and covariances

Now, we extend these expressions for species variances and covariances to include both ILS and introgression. We envision an instantaneous introgression event between species B and C (Fig 1), which occurs at time tm. This event can be in either direction, with the probabilities of a locus following a history of C → B introgression or B → C introgression represented using δ2 or δ3, respectively. To capture the processes of ILS and introgression simultaneously, we imagine that each possible history at an individual locus can be represented by a “parent tree” within which lineage sorting or ILS occurs according to the multispecies coalescent process [10,3537]. This is sometimes referred to as the multispecies network coalescent [38,39]. For our model, we consider three parent trees (see S3 Fig): one with no introgression, which occurs with probability 1 – (δ2 + δ3), and two parent trees for the two possible directions of introgression, which occur with probabilities of either δ2 or δ3 (these probabilities represent the "rate" of introgression in our model). Each of these three parent trees can generate four possible gene trees with three possible topologies (Fig 1, arrow 1), which vary in the frequency of topologies and expected branch lengths depending on each parent tree’s parameters (as in the model of ILS-only described in the previous section).

To obtain expressions for the expected variances and covariances under this model, we must sum the contributions of all gene trees within each parent tree, and then sum the contribution of each parent tree (Fig 1, arrow 2). For the covariance between A and B, this gives:

Cov(AB|ILSandintrogression)=σ2[(1(δ2+δ3))[(1e(t2t1))(et2(t2t1)et2et1)+(13e(t2t1))]+δ2(13e(t2tm))+δ3(13e(t1tm))] (6)

Note that the term inside the inner square brackets in Eq 6 is the same as in Eq 3, but is now weighted by the probability of a history with no introgression. In addition, there are two additional terms denoting the contributions of trees generated by ILS that follow a history of introgression (because ILS occurs regardless of the history at a locus). For a complete derivation, including the expectations of each gene tree within each parent tree, see S1 Text.

For the covariance between B and C, we have:

Cov(BC|ILSandintrogression)=σ2[(1(δ2+δ3))[13e(t2t1)]+δ2[(1e(t2tm))(et2(t2tm)et2etm)+(13e(t2tm))]+δ3[(1e(t1tm))(et1(t1tm)et1etm)+(13e(t1tm))]] (7)

Introgression occurs between B and C in our model, so B and C are sister in the parent trees that represent the two directions of introgression (see S1 Text, S3 Fig). This means that these parent trees can each produce two gene trees with BC as sister species: one from lineage sorting and one from ILS. The contributions of these two gene trees in each parent tree are captured in the last two terms of Eq 7. The first term corresponds to the contribution of ILS from the parent tree without introgression, i.e. Eq 4.

Finally, for the covariance between A and C, we have

Cov(AC|ILSandintrogression)=σ2[(1(δ2+δ3))(13e(t2t1))+δ2(13e(t2tm))+δ3(13e(t1tm))] (8)

Since gene trees where A and C are sister can only be produced by ILS in our model, Eq 8 is simply the sum of the gene trees with this topology produced by each of the three parent trees.

Lastly, we consider the expected trait variance with introgression. As with the covariances, we sum the total contribution of each gene tree within a parent tree, and then sum these contributions across each parent tree. All three share the same gene tree heights and therefore have the same expected variances. This gives:

Var(A)=Var(B)=Var(C)=σ2[(1(δ2+δ3))[(1e(t2t1))(t2+1)+(e(t2t1))(t2+1+1/3)]+δ2[(1e(t2tm))(t2+1)+(e(t2tm))(t2+1+1/3)]+δ3[(1e(t1tm))(t1+1)+(e(t1tm))(t1+1+1/3)]] (9)

The first term represents the contribution of the parent tree with no introgression, the same as in Eq 5. The second two terms represent the contributions to the total variance from C → B and B → C introgression, respectively. When T is updated to include all these expectations, it becomes possible to model character states under Brownian motion while accounting for both ILS and introgression.

Testing for the effect of introgression on quantitative traits

To evaluate whether patterns of quantitative trait variation are consistent with a history of introgression, we use a simple test statistic that employs the same logic as the D3 test for introgression [40]; see also the f3 statistic [41]. Imagine that species A, B, and C have values q1, q2, and q3 for a hypothetical quantitative trait, respectively. Given the species tree ((A,B),C), and assuming the Brownian motion model of trait evolution described in the previous sections, the expected distance between trait values q2 and q3 should be equal to the expected distance between q1 and q3. This is because species C is equidistant to species A and B in the phylogeny, and this tree determines quantitative trait variances and covariances. The same relationship between distances is expected when considering the ILS-only model, because of symmetries in expected gene tree frequencies and branch lengths, and therefore in trait covariances (see Eq 4).

However, introgression can introduce additional covariance between one pair of species, resulting in that pair having more similar trait values than the other non-sister pair (see Eqs 7 and 8). This naturally leads to the following test statistic:

Q3=|q2q3||q1q3||q2q3|+|q1q3| (10)

The numerator of Q3 takes the difference in trait distances between the two pairs of non-sister species; when there is no introgression, this numerator—and therefore Q3—has an expected value of 0. When a significant non-zero value of Q3 is observed, the statistic is consistent with a history of introgression. In addition, the sign of the statistic can tell us which species were involved in introgression (but not the direction of introgression). For example, a negative Q3 value would be consistent with introgression between species B and C, since that would result in q2 and q3 having more similar values (and therefore a smaller distance between them). The denominator of Q3 is the sum of the two trait distances, which normalizes the statistic between 0 and 1, allowing it to be compared across traits with different mean values. We imagine that this statistic will be applied to many individual quantitative traits, each providing a separate value of Q3. The significance for a dataset consisting of many traits can then be evaluated either by testing for a mean value of Q3 significantly different from 0, or by using a sign test with the null expectation that positive and negative Q3 values should be equally frequent (see the analyses below for more details).

To confirm the effects of introgression predicted by the model, and the ability of Q3 to detect it, we performed a power analysis. First, to illustrate the conceptual basis for Q3, we contrasted two conditions: an ILS-only condition and an ILS + introgression condition (Fig 2). Both scenarios use the three-taxon tree described in previous sections, simulating quantitative traits as the sum of contributions of many genes (and therefore gene trees; see Materials and Methods). For 20,000 independent simulated traits we calculated the mean and standard error of the difference in trait value at the tips of the tree between each pair of species (Fig 2). As predicted by our model, the taxa involved in introgression had a higher covariance and more similar trait values than the non-introgressing pair of taxa when averaging across the 20,000 traits (Fig 2).

Fig 2. Quantifying the effect of introgression on quantitative trait variation.

Fig 2

For ILS-only (top row) and ILS + introgression (bottom row) conditions, we show the expected variance/covariance matrix (middle-left column, variances not shown for clarity) and the average difference in quantitative trait values between each pair of species across 20,000 simulated traits (middle-right column). The expectations for the Q3 statistic are also shown (far-right column).

Second, we performed a power analysis across 90 different parameter combinations: three values each of the timing of introgression, the level of ILS, and the number of genes, and four values of the rate of introgression. We simulated 100 datasets for each set of parameters and asked how often Q3 was significantly different from 0 in the direction predicted by introgression. We found the most important parameter to be the rate of introgression: at a rate of 1% (i.e. 1% of the genome has been introgressed), power was consistently low (1-6%) regardless of other simulation parameters (Figs 3 and S1). At higher rates of introgression, power was increased when introgression was more recent relative to speciation, when the level of ILS was lower, and when more genes (traits) were considered. When 5,000 genes were used, power reached 67% under the best-case scenario (S1 Fig); this increased to 97% with 15,000 genes (Fig 3). Simulations under a no-introgression scenario yielded false positive rates of less than 5% across all conditions (S2 Fig).

Fig 3. Power analysis of the ability of Q3 to detect a signature of introgression from 15,000 simulated genes (σ2 = 1).

Fig 3

Each cell reports the proportion of 100 simulated datasets where Q3 was significantly different from 0 in the direction expected from the simulated history of introgression. Within each matrix, the x-axis is the time of introgression relative to speciation (larger values mean relatively more recent introgression), and the y-axis is the rate of introgression. There is one matrix for each of three times between speciation events, which determine the levels of ILS (decreasing from left to right, as the times increase). The greatest power comes in scenarios with little ILS, high rates of introgression, and recent introgression events.

Gene expression variation is consistent with inferred histories of introgression in Solanum

We used previously generated introgression and gene expression datasets from the wild tomato clade, Solanum section Lycopersicon, to empirically evaluate the effects of introgression on thousands of expression traits. This clade includes the domesticated tomato, S. lycopersicum, and its 12 wild relatives, which have all originated in the last 2.5 million years. The first dataset is a phylogenetic analysis of 29 accessions (i.e. populations) across these 13 tomato species and two outgroups [22]. This dataset includes an introgression analysis based on D-statistics [42,43] across all possible quartets, which provides a comprehensive overview of patterns of introgression in the clade. The second dataset is normalized quantitative expression of 14,556 genes expressed in ovules from six accessions across five tomato species. This includes published data for five accessions across four species [21], while data from the other two species are previously unpublished. Expression levels for each gene are represented as reads per kilobase of transcript, per million mapped reads (RPKM). Samples were collected on the day of flower opening for 1-4 individuals of each species grown in a common greenhouse [21].

Combining these two datasets, we sought to identify triplets of species with both evidence of introgression (from sequence data) and available gene expression data, so that we could apply the Q3 statistic. Additionally, we wanted these triplets to vary in the magnitude of introgression, so that the magnitude of the effect of introgression on trait variation could be evaluated in addition to the presence or absence of an effect. With these considerations in mind, we identified two triplets. The first consists of the accessions LA3475 (S. lycopersicum), LA1589 (S. pimpinellifolium), and LA0716 (S. pennellii), with LA3475 and LA1589 as sister taxa, and evidence of introgression between LA1589 and LA0716 (D = 0.057, P = 0.0015; values from [22]) (Fig 4A). Using the Dp statistic [23] on site pattern counts from [22], we obtained a value of 0.0013, corresponding to a genomic rate of introgression of 0.13%. We hereafter refer to this triplet as the “low” triplet because of the relatively low observed rate of introgression. The other triplet consists of LA3778 (S. pennellii), LA1777 (S. habrochaites), and LA1316 (S. chmielewskii), with LA3778 and LA1777 as sister taxa, and significant introgression between LA1777 and LA1316 (D = 0.135, P = 2.34 x 10-35; values from [22]) (Fig 4A). We obtained a Dp value of 0.0744 for this triplet, corresponding to a rate of introgression of 7.44%; this value is likely an underestimate, as Dp tends to underestimate the true value at higher rates of introgression [23]. As the rate of introgression is much higher for this triplet, we refer to it as the “high” triplet.

Fig 4. Ovule gene expression variation in tomatoes is consistent with inferred histories of introgression.

Fig 4

A) Histories of speciation and introgression for our chosen triplets in Solanum. B) Mean and standard error of Q3 across all genes in each triplet. C) Difference in the number of genes with a negative vs. positive Q3 value for both triplets. Density plots show the distribution of this difference across 10,000 bootstrapped datasets. Observed values for the two triplets relative to the bootstrap distributions are shown with arrows.

We used expression values from 14,556 genes available in both the low and high triplets. For each gene we calculated a separate value Q3, averaging across genes to obtain a mean value for each triplet. We obtained transcriptome-wide mean Q3 values of -0.012 and -0.019 for the low and high triplet, respectively (Fig 4B). The values we observe are consistent with the histories of introgression inferred from the sequence data in both sign and magnitude. Both triplets have negative values, which is consistent with introgression between S. pimpinellifolium and S. pennellii in the low triplet, and between S. habrochaites and S. chmielewskii in the high triplet (see Fig 4A for the accessions assigned as q1, q2, and q3 in each triplet). The Q3 value is also more negative in the high triplet, which is consistent with the higher level of introgression inferred from sequence data.

The signal of introgression from quantitative traits was also statistically significant in both triplets, using either method for assessing significance. Using a bootstrapping approach to ask whether the mean values were different from 0 (see Materials and Methods), we obtained P = 0.0012 and P < 0.0001 for the low and high triplets, respectively (Fig 4B). We obtained similar results when testing for a significant excess of either positive or negative Q3 values (i.e. a sign test) at individual genes using bootstrapping (Fig 4C; see Materials and Methods). For the low triplet, we observed 7432 negative and 7124 positive genes (P = 0.0134); for the high triplet, 7533 negative and 7020 positive genes (P < 0.0001). Again, the larger number of negative Q3 values in the high triplet is consistent with a higher amount of introgression.

Gene-level analysis of expression data

The expression level of genes can be affected by either cis-acting or trans-acting variants. Because cis-acting variants are most often found near the gene they affect [44,45], we might expect these regulatory elements to share the same local gene tree topology as the nearby genic protein-coding region; any signature of introgression would likely be reflected in both regions. While recombination either before or after introgression will uncouple the tree topology in the regulatory region from that in the coding region, we might expect to see an association between patterns of similarity in expression levels and patterns of gene tree discordance if cis-acting variants are common.

To test for such a relationship, we looked for an association between coding-region tree topologies and expression similarity among species in both triplets. Using trees estimated from each protein-coding gene [22], we identified 11,061 genes for which both the tree topology and expression values from all species were available. For each gene, we obtained the rooted tree topology for the relevant triplet and also determined which pair of species was most similar in expression value. We assume that expression similarity reflects the local topology at whichever locus has the largest effect on expression, such that the most similar pair of species represents the sister species in this topology.

In the low triplet, we found no significant relationship (P = 0.776, χ2 test of independence) between protein-coding gene tree topology and expression similarity (Fig 5A). In the high triplet, however, we did observe a significant relationship (P = 0.019, Fig 5B). For gene trees with a topology consistent with introgression (where S. habrochaites and S. chmielewskii are sister), there were significantly more genes where expression was also most similar between these species than expected by chance (476 observed vs. 449 expected). In other words, we found that gene expression similarity is correlated with the tree topology of protein-coding genes in the high triplet, in a fashion consistent with cis-acting effects of introgressed variation on expression.

Fig 5.

Fig 5

Relationships between coding sequence tree topology (rows) and gene expression similarity (columns) in the low (A) and high (B) triplets. Note that for expression similarity, we did not explicitly construct trees from expression data—the tree representation is simply meant to depict observed expression distances. Only discordant trees and expression patterns are shown, but χ2 P-values (0.776 and 0.019 for panels A and B, respectively) are reported from the full 3x3 table (see S1 and S2 Tables for the full tables). The cases where both the tree topology and pattern of expression are consistent with the inferred history of introgression for that triplet are highlighted in blue. Each cell reports the observed number of genes (O) in each category, and the number expected (E) from the χ2 distribution.

Discussion

Phylogenetic comparative methods provide powerful tools for studying the origins of trait variation among species. However, the rampant gene tree discordance uncovered in many phylogenomic studies paints a more complicated picture of the shared history among species. To date, most models of trait evolution employed by comparative methods have assumed that only the species phylogeny contributes to trait covariance, and have ignored covariance due to discordant gene trees. Our model builds on previous work [9,20] to incorporate both ILS and introgression into a single framework that captures the most common causes of discordance and their effects on quantitative trait evolution. We show that introgression leads to more discordance and stronger patterns of covariance in quantitative traits among non-sister species than ILS alone, paralleling results for binary traits under the same multispecies network coalescent framework [10].

Our model makes several assumptions and simplifications related to expected levels of genetic covariance between species. First, we have modeled post-speciation introgression as a single instantaneous pulse of exchange between one pair of non-sister species. Many other possible introgression scenarios are possible, such as multiple pulses or continuous periods of gene flow. Although each of these scenarios will increase the variance in gene tree topologies, we expect that they will still leave a detectable signature on quantitative traits because they still lead to gene tree asymmetries. In contrast, other gene flow scenarios—such as introgression between sister taxa, or between both pairs of non-sister taxa in a triplet at equal rates—will not result in a detectable signature of gene tree asymmetry. Second, we have assumed that the expected frequencies and coalescence times of loci contributing to trait variation follow neutral expectations. Through a local reduction in Ne, directional selection may reduce the rate of gene tree discordance due to ILS, while increasing the rate of discordance due to introgression [18,46,47]. This increase in the rate of introgression relative to ILS may allow for greater power to detect a signal of introgression in quantitative traits, as we show in our power analysis. This implies that positive selection, especially on introgressed variants [48], will make it more likely for quantitative traits to covary between non-sister taxa.

We also make key assumptions about the model of trait evolution. Our model assumes that traits evolve under a Brownian motion process, rather than alternative processes such as the Ornstein-Uhlenbeck (OU) [49] or early-burst [50,51] models. While it may be uncommon for traits to evolve according to an early-burst model [52], many quantitative characters are likely to be constrained in some way, which can be modelled by the OU process. For gene expression in particular, evidence suggests that over long phylogenetic timescales the OU process is a better fit to the data [5355]. However, multiple non-biological factors may favor the fit of OU models over Brownian motion, including small amounts of error in measured quantitative traits [56]. While we do not expect the model of trait evolution to affect asymmetries between species in thousands of traits, future work incorporating additional models of trait evolution, and their effect on trait covariances in particular, would be useful.

A key assumption of our statistical analysis is that each gene expression trait evolves independently. However, many genes show correlated patterns of expression, either because of locally shared cis-acting elements or because of trans-acting factors that affect the expression of many genes across the genome [44,45]. If, for instance, such a trans-acting factor is introgressed and affects many genes in a similar way, then treating each gene as an independent observation would constitute pseudoreplication of measurements. However, there are two pieces of evidence that suggest pseudoreplication is not a major problem in our analyses. First, previous data from experimental introgression lines between S. lycopersicum and S. pennellii are not consistent with a large role of introgressed loci on background gene expression: Guerrero et al. (2016) [57] found that each introgressed gene had downstream effects on the expression of only 0.4 genes on average. Second, we find here that the number of genes where expression is more similar between introgressing species is higher in the triplet with a higher rate of introgression. This is again consistent with largely local effects of each introgressed locus on gene expression. Based on these observations, we conclude that we likely have many thousands of independent data points testing the relationship between introgression and expression variation, even if the true correlation structure is unknown.

Our power analysis suggests that we should have had low power to detect an effect of introgression in the low triplet, which has a rate of introgression of less than 1%. One explanation for the fact that we do detect an effect is that the introgressed variation in this triplet affects the downstream regulation of a large set of correlated genes, though the discussion in the previous paragraph likely rules out this possibility. Very recent introgression is also an unlikely explanation, as our power analysis shows that the timing of introgression does not have an effect at low rates. As previously discussed, directional selection on introgressed variation in the low triplet could also improve the power; evaluating this possibility would be an interesting future direction. Finally, we may simply have been fortunate to observe a positive result, even with reduced (but non-zero) power in this area of parameter space. Distinguishing random chance from other processes would also be facilitated by testing additional triplets; unfortunately, we have exhausted the independent triplets possible from our data, having used six of the eight available accessions with gene expression data. Very few multispecies transcriptomic datasets are currently available in systems with widespread introgression, though similar tests may be possible from data in the butterfly genus, Heliconius [54,58]. Analyzing or generating such datasets in other systems would help to confirm the generality of our findings.

Our analysis of gene expression is consistent with the idea that introgression between wild tomato species has broadly influenced variation in gene expression among species. An alternative explanation is that species with more similar gene expression may be more likely to introgress, possibly due to reduced negative fitness consequences from hybrid dysregulation. There are again a number of pieces of evidence that argue against the latter interpretation. Guerrero et al. (2016) [57] found no evidence for an association between the magnitude of differential expression between tomato introgression lines and the sterility of hybrids. While those experiments had fewer generations of hybridization than wild introgressed populations—and were conducted in a greenhouse—they do not indicate that general expression levels are a barrier to introgression. Furthermore, here we observe a correlation between expression similarity at specific genes and the tree topology inferred from their protein-coding sequences (Fig 5B). This association suggests a direct causal effect of introgressed genes on their expression: cis-regulatory differences at introgressed loci lead to a relationship between local tree topologies and expression levels (cf. [59]). Such a relationship is highly unlikely to instead be due to a barrier to introgression. The fact that we do not observe the same correlation in the "low" triplet (Fig 5A) could be due either to a comparative lack of statistical power in this triplet, or due to more recombination between the protein-coding regions the tree topologies were inferred from and the cis-regulatory regions driving expression. Introgression will reduce the opportunities for recombination, which could explain why the "high" triplet retains a higher signal. Alternatively, it may be that trans-acting variation is much more common in this triplet, a scenario that would not lead to an association between local gene tree topologies and local gene expression. We cannot definitively distinguish among these possibilities given only the data presented here. Finally, it is possible that some form of experimental or technical artefact could be responsible for asymmetries in many traits, though we note that the sister species in both triplets examined here always show the greatest similarity in gene expression (S1 and S2 Tables). The association we observe between tree topologies and expression similarity at individual genes is also inconsistent with an artefact.

Overall, our results demonstrate both theoretically and empirically that introgression can affect patterns of quantitative trait evolution. While considerable attention and excitement has justifiably been devoted to the power of introgression as an evolutionary force shaping trait variation, this is a double-edged sword, as most phylogenetic comparative methods do not account for gene tree discordance. Previous work has shown that discordance due solely to ILS can lead to overestimates of the rate of quantitative trait evolution and to underestimates of phylogenetic signal [20]. The effects of introgression in misleading our inferences will be worse, as it both increases overall discordance and generates asymmetries in trait sharing. Future phylogenetic comparative approaches should strive to evaluate the contributions of both ILS and/or introgression on trait evolution, allowing for more accurate evolutionary inferences. Doing so will pave the way for more powerful inferences about the evolutionary forces that shape trait variation among species.

Materials and methods

Description of datasets

We use ovule gene expression data that is described in Moyle et al. [21]. The dataset consists of normalized quantitative expression for 14,556 genes measured in six accessions across five species (two different accessions of S. pennellii were used in the two triplets). For each accession, samples were collected on the day of flower opening for 1-4 biological replicates (individual plants) grown in a common greenhouse. When applicable, we took the average expression value across replicates within each accession for our analyses. Raw sequencing reads for this dataset are available on the SRA under BioProject PRJNA714065. The dataset containing normalized expression for each replicate, in addition to the scripts for all analyses, are available from https://github.com/mhibbins/intro_quant_traits.

We use phylogenomic data that is described in Pease et al. [22]. The dataset consists of transcriptomes from 29 accessions across 13 species, including the six accessions used in our analyses. Pease et al. [22] used MVFtools to estimate transcriptome-wide D-statistics for all possible rooted triplets (2925 total values) across the 27 ingroup accessions. From this dataset we selected the two triplets to use in our analyses. Pease et al. [22] also inferred gene trees for each individual protein-coding region (19,116 genes total) using RAxML [60]; we used this data in our gene-level analyses. Both datasets are published in the Dryad repository https://doi.org/10.5061/dryad.182dv.

Simulation of quantitative traits & power analyses

We simulated the effect of introgression on quantitative trait values (as shown in Fig 2) under two models: an ILS-only model and a model with ILS and introgression. For the ILS-only model, we used values of 1 and 1.3 for the speciation time of A and B, and the speciation of C from the ancestor of A and B, respectively (all in units of 2N generations). The introgression condition maintained the same speciation times, with the addition of an introgression event from C into B at a time of 0.5, with δ2 = 0.1. Using these parameters, we used our model to construct expected variance/covariance matrices with σ2 = 1 using a custom R function (script available at https://github.com/mhibbins/intro_quant_traits/blob/main/scripts/bm_model_sims.R). We then simulated trait values by drawing from a multivariate normal distribution using the R function mvrnorm with means of 0 and the constructed matrices.

We performed a power analysis to assess the statistical power of Q3. Using the simulation approach described above, we simulated 100 trait datasets under all combinations of the following parameters: 5000, 10000, and 15000 for the number of genes; 0.1, 0.5, and 1 for t2t1; 0.1, 0.25, and 0.5 for t1tm; and 0, 0.01, 0.05, and 0.1 for the rate of introgression. We evaluated significance for each dataset using a one-sample t-test with H0: Q3 = 0. A result was considered a true positive for our analysis when P < 0.05 and the sign of the mean simulated Q3 value was consistent with the simulated history of introgression.

Testing quantitative traits for introgression

We calculated average expression values across individual replicates of each accession before estimating Q3 for each gene. To assess the significance of both our estimated Q3 means and signs, we used bootstrap-resampling. For the mean Q3 values, we tested the null hypothesis of Q3 = 0 by randomly sampling 10,000 datasets of 14,556 genes each with replacement from the empirical gene expression dataset, and estimating the mean value of each. We assessed the rank i of the observed Q3 values among these resampled datasets, and a two-tailed P-value was estimated using the following formula:

P=12*|0.5(i/n)|

where n is the number of observations (in this case, 10,000). This formula measures the deviation of the observed value from the center of the bootstrapped dataset, which has a rank of 5000. For the sign of individual genes’ Q3 values, we tested the null hypothesis that the number of negative and positive signs are equal by randomly sampling 10,000 datasets of 14,556 genes each. For each resampled dataset we counted the number of negative and positive Q3 values, ranking the datasets from the one with the greatest excess of negative values to the greatest excess of positive values. The rank of the observed data against these resampled datasets was calculated, and two-tailed P-values were evaluated using the same formula as above.

For the analysis of the relationship between gene-level tree topology and expression similarity, we made use of gene trees inferred using RAxML by ref. [22]. We used the Python package ete3 [61] to prune these gene trees down to the accessions involved in our test triplets. We then obtained the overlapping set of genes for which both topologies and expression data were available, and recorded the expression “topology” based on the minimum pairwise distance in expression values. The counts of gene tree topology and expression topology were placed into a 3x3 contingency table for each triplet, and we tested for a significant association using a χ2 test of independence.

Supporting information

S1 Fig

Power analysis results using (A) 5000 and (B) 10,000 genes.

(EPS)

S2 Fig. Power analysis results under no introgression (i.e. false positive rates).

(EPS)

S3 Fig. Modelling introgression using parent trees.

Parent trees represent the possible histories of speciation and/or introgression at a locus, with their probabilities determined by the rate of introgression. Within each parent tree, gene trees sort according to the multispecies coalescent process. Adapted from [10].

(EPS)

S1 Table. Full 3x3 contingency table for relationship between gene-level expression and local gene tree topology in the low triplet.

(DOCX)

S2 Table. Full 3x3 contingency table for relationship between gene-level expression and local gene tree topology in the high triplet.

(DOCX)

S1 Data. Pairwise distances between simulated trait values for 20,000 genes.

(XLSX)

S2 Data. P-value and sign of Q3 statistic reported for each combination of parameters in our power analysis, with 15,000 simulated traits.

(TXT)

S3 Data. Normalized (RPKM) gene expression data from Solanum ovules used in our empirical analyses.

(CSV)

S4 Data. Gene tree topology counts for each trio calculated from Pease et al. [22] transcriptomic dataset.

Used in the gene-level expression analysis.

(CSV)

S5 Data. P-value and sign of Q3 statistic reported for each combination of parameters in our power analysis, under the no-introgression condition.

(TXT)

S6 Data. P-value and sign of Q3 statistic reported for each combination of parameters in our power analysis, with 5000 simulated traits.

(TXT)

S7 Data. P-value and sign of Q3 statistic reported for each combination of parameters in our power analysis, with 10,000 simulated traits.

(TXT)

S1 Text. Derivation of expected trait variances and covariances under the multispecies network coalescent.

(DOCX)

Acknowledgments

We thank Leonie Moyle and Matthew Gibson for sharing data, and for comments on the manuscript.

Data Availability

Raw sequencing reads for the gene expression dataset are available on the SRA under BioProject PRJNA714065. Quantitative expression data, as well as scripts for all analyses, are available from https://github.com/mhibbins/intro_quant_traits.

Funding Statement

This work was supported by a grant (DEB-1936187) from the National Science Foundation (https://www.nsf.gov/) awarded to M.W.H. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Mallet J, Besansky N, Hahn MW. How reticulated are species? BioEssays. 2016;38(2):140–149. doi: 10.1002/bies.201500149 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Taylor SA, Larson EL. Insights from genomes into the evolutionary importance and prevalence of hybridization in nature. Nature Ecology and Evolution. 2019;3(2):170–7. doi: 10.1038/s41559-018-0777-y [DOI] [PubMed] [Google Scholar]
  • 3.Pardo-Diaz C, Salazar C, Baxter SW, Merot C, Figueiredo-Ready W, Joron M, et al. Adaptive introgression across species boundaries in Heliconius butterflies. PLoS Genetics. 2012;8(6):e1002752. doi: 10.1371/journal.pgen.1002752 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Zhang W, Dasmahapatra KK, Mallet J, Moreira GR, Kronforst MR. Genome-wide introgression among distantly related Heliconius butterfly species. Genome Biology. 2016;17:25. doi: 10.1186/s13059-016-0889-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Jones MR, Mills LS, Alves PC, Callahan CM, Alves JM, Lafferty DJR, et al. Adaptive introgression underlies polymorphic seasonal camouflage in snowshoe hares. Science. 2018;360(6395):1355–8. doi: 10.1126/science.aar5273 [DOI] [PubMed] [Google Scholar]
  • 6.Whitney KD, Randell RA, Rieseberg LH. Adaptive introgression of herbivore resistance traits in the weedy sunflower Helianthus annuus. American Naturalist. 2006;167(6):794–807. doi: 10.1086/504606 [DOI] [PubMed] [Google Scholar]
  • 7.Huerta-Sánchez E, Jin X, Asan, Bianba Z, Peter BM, Vinckenbosch N, et al. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature. 2014;512(7513):194–7. doi: 10.1038/nature13408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Gibson MJ, Torres ML, Brandvain Y, Moyle LC. Introgression shapes fruit color convergence in invasive Galapagos tomato. eLife. 2021;10:e64165. doi: 10.7554/eLife.64165 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bastide P, Solis-Lemus C, Kriebel R, William Sparks K, Ane C. Phylogenetic comparative methods on phylogenetic networks with reticulations. Systematic Biology. 2018;67(5):800–20. doi: 10.1093/sysbio/syy033 [DOI] [PubMed] [Google Scholar]
  • 10.Hibbins MS, Gibson MJ, Hahn MW. Determining the probability of hemiplasy in the presence of incomplete lineage sorting and introgression. eLife. 2020;9:e63753 doi: 10.7554/eLife.63753 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Wang Y, Cao Z, Ogilvie HA, Nakhleh L. Phylogenomic assessment of the role of hybridization and introgression in trait evolution. PLoS Genetics. 2021;17(8):e1009701. doi: 10.1371/journal.pgen.1009701 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Rifkin SA, Kim J, White KP. Evolution of gene expression in the Drosophila melanogaster subgroup. Nature Genetics. 2003;33(2):138–44. doi: 10.1038/ng1086 [DOI] [PubMed] [Google Scholar]
  • 13.Brawand D, Soumillon M, Necsulea A, Julien P, Csardi G, Harrigan P, et al. The evolution of gene expression levels in mammalian organs. Nature. 2011;478(7369):343–8. doi: 10.1038/nature10532 [DOI] [PubMed] [Google Scholar]
  • 14.Davidson RM, Gowda M, Moghe G, Lin H, Vaillancourt B, Shiu SH, et al. Comparative transcriptomics of three Poaceae species reveals patterns of gene expression evolution. The Plant Journal. 2012;71(3):492–502. doi: 10.1111/j.1365-313X.2012.05005.x [DOI] [PubMed] [Google Scholar]
  • 15.Brandvain Y, Kenney AM, Flagel L, Coop G, Sweigart AL. Speciation and introgression between Mimulus nasutus and Mimulus guttatus. PLoS Genetics. 2014;10(6):e1004410. doi: 10.1371/journal.pgen.1004410 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Sankararaman S, Mallick S, Dannemann M, Prufer K, Kelso J, Paabo S, et al. The genomic landscape of Neanderthal ancestry in present-day humans. Nature. 2014;507(7492):354–7. doi: 10.1038/nature12961 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Schumer M, Xu C, Powell DL, Durvasula A, Skov L, Holland C, et al. Natural selection interacts with recombination to shape the evolution of hybrid genomes. Science. 2018;360(6389):656–60. doi: 10.1126/science.aar3684 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Martin SH, Davey JW, Salazar C, Jiggins CD. Recombination rate variation shapes barriers to introgression across butterfly genomes. PLoS Biology. 2019;17(2):e2006288. doi: 10.1371/journal.pbio.2006288 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hahn MW, Nakhleh L. Irrational exuberance for resolved species trees. Evolution. 2016;70(1):7–17. doi: 10.1111/evo.12832 [DOI] [PubMed] [Google Scholar]
  • 20.Mendes FK, Fuentes-Gonzalez JA, Schraiber JG, Hahn MW. A multispecies coalescent model for quantitative traits. eLife. 2018;7:e36482. doi: 10.7554/eLife.36482 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Moyle LC, Wu M, Gibson MJS. Reproductive proteins evolve faster than non-reproductive proteins among Solanum species. Frontiers in Plant Science. 2021;12:635990. doi: 10.3389/fpls.2021.635990 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Pease JB, Haak DC, Hahn MW, Moyle LC. Phylogenomics reveals three sources of adaptive variation during a rapid radiation. PLoS Biology. 2016;14(2):e1002379. doi: 10.1371/journal.pbio.1002379 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hamlin JAP, Hibbins MS, Moyle LC. Assessing biological factors affecting postspeciation introgression. Evolution Letters. 2020;4(2):137–54. doi: 10.1002/evl3.159 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Felsenstein J. Maximum-likelihood estimation of evolutionary trees from continuous characters. American Journal of Human Genetics. 1973;25(5):471–92. [PMC free article] [PubMed] [Google Scholar]
  • 25.Pollard DA, Iyer VN, Moses AM, Eisen MB. Widespread discordance of gene trees with species tree in Drosophila: evidence for incomplete lineage sorting. PLoS Genetics. 2006;2(10):e173. doi: 10.1371/journal.pgen.0020173 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.White MA, Ane C, Dewey CN, Larget BR, Payseur BA. Fine-scale phylogenetic discordance across the house mouse genome. PLoS Genetics. 2009;5(11):e1000729. doi: 10.1371/journal.pgen.1000729 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Hobolth A, Dutheil JY, Hawks J, Schierup MH, Mailund T. Incomplete lineage sorting patterns among human, chimpanzee, and orangutan suggest recent orangutan speciation and widespread selection. Genome Research. 2011;21(3):349–56. doi: 10.1101/gr.114751.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Fontaine MC, Pease JB, Steele A, Waterhouse RM, Neafsey DE, Sharakhov IV, et al. Extensive introgression in a malaria vector species complex revealed by phylogenomics. Science. 2015;347(6217):1258524. doi: 10.1126/science.1258524 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wu M, Kostyun JL, Hahn MW, Moyle LC. Dissecting the basis of novel trait evolution in a radiation with widespread phylogenetic discordance. Molecular Ecology. 2018;27(16):3301–3316. [DOI] [PubMed] [Google Scholar]
  • 30.Vanderpool D, Minh BQ, Lanfear R, Hughes D, Murali S, Harris RA, et al. Primate phylogenomics uncovers multiple rapid radiations and ancient interspecific introgression. PLoS Biology. 2020;18(12):e3000954. doi: 10.1371/journal.pbio.3000954 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Hime PM, Lemmon AR, Lemmon ECM, Prendini E, Brown JM, Thomson RC, et al. Phylogenomics reveals ancient gene tree discordance in the Amphibian tree of life. Systematic Biology. 2021;70(1):49–66. doi: 10.1093/sysbio/syaa034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Degnan JH, Rosenberg NA. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends in Ecology and Evolution. 2009;24(6):332–40. doi: 10.1016/j.tree.2009.01.009 [DOI] [PubMed] [Google Scholar]
  • 33.Hudson RR. Testing the constant-rate neutral allele model with protein sequence data. Evolution. 1983;37(1):203–17. doi: 10.1111/j.1558-5646.1983.tb05528.x [DOI] [PubMed] [Google Scholar]
  • 34.Pamilo P, Nei M. Relationships between gene trees and species trees. Molecular Biology and Evolution. 1988;5(5):568–83. doi: 10.1093/oxfordjournals.molbev.a040517 [DOI] [PubMed] [Google Scholar]
  • 35.Meng C, Kubatko LS. Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: a model. Theoretical Population Biology. 2009;75(1):35–45. doi: 10.1016/j.tpb.2008.10.004 [DOI] [PubMed] [Google Scholar]
  • 36.Liu KJ, Dai J, Truong K, Song Y, Kohn MH, Nakhleh L. An HMM-based comparative genomic framework for detecting introgression in eukaryotes. PLoS Computational Biology. 2014;10(6):e1003649. doi: 10.1371/journal.pcbi.1003649 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Hibbins MS, Hahn MW. The timing and direction of introgression under the multispecies network coalescent. Genetics. 2019;211(3):1059–73. doi: 10.1534/genetics.118.301831 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Wen D, Yu Y, Nakhleh L. Bayesian inference of reticulate phylogenies under the multispecies network coalescent. PLoS Genetics. 2016;12(5):e1006006. doi: 10.1371/journal.pgen.1006006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Degnan JH. Modeling hybridization under the network multispecies coalescent. Systematic Biology. 2018;67(5):786–99. doi: 10.1093/sysbio/syy040 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Hahn MW, Hibbins MS. A three-sample test for introgression. Molecular Biology and Evolution. 2019;36(12):2878–82. doi: 10.1093/molbev/msz178 [DOI] [PubMed] [Google Scholar]
  • 41.Reich D, Thangaraj K, Patterson N, Price AL, Singh L. Reconstructing Indian population history. Nature. 2009;461(7263):489–94. doi: 10.1038/nature08365 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, et al. A draft sequence of the Neandertal genome. Science. 2010;328(5979):710–22. doi: 10.1126/science.1188021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Durand EY, Patterson N, Reich D, Slatkin M. Testing for ancient admixture between closely related populations. Molecular Biology and Evolution. 2011;28(8):2239–52. doi: 10.1093/molbev/msr048 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Wray GA, Hahn MW, Abouheif E, Balhoff JP, Pizer M, Rockman MV, et al. The evolution of transcriptional regulation in eukaryotes. Molecular Biology and Evolution. 2003;20(9):1377–419. doi: 10.1093/molbev/msg140 [DOI] [PubMed] [Google Scholar]
  • 45.Hill MS, Vande Zande P, Wittkopp PJ. Molecular and evolutionary processes generating variation in gene expression. Nature Reviews Genetics. 2021;22:203–215. doi: 10.1038/s41576-020-00304-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Pease JB, Hahn MW. More accurate phylogenies inferred from low-recombination regions in the presence of incomplete lineage sorting. Evolution. 2013;67(8):2376–84. doi: 10.1111/evo.12118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Munch K, Nam K, Schierup MH, Mailund T. Selective sweeps across twenty million years of primate evolution. Molecular Biology and Evolution. 2016;33(12):3065–74. doi: 10.1093/molbev/msw199 [DOI] [PubMed] [Google Scholar]
  • 48.Setter D, Mousset S, Cheng X, Nielsen R, DeGiorgio M, Hermisson J. VolcanoFinder: Genomic scans for adaptive introgression. PLoS Genetics. 2020;16(6):e1008867. doi: 10.1371/journal.pgen.1008867 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Hansen TF. Stabilizing selection and the comparative analysis of adaptation. Evolution. 1997;51(5):1341–51. doi: 10.1111/j.1558-5646.1997.tb01457.x [DOI] [PubMed] [Google Scholar]
  • 50.Simpson GG. Tempo and mode in evolution. New York: Columbia University Press; 1944. 237 p. [Google Scholar]
  • 51.Blomberg SP, Garland T Jr., Ives AR. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution. 2003;57(4):717–45. doi: 10.1111/j.0014-3820.2003.tb00285.x [DOI] [PubMed] [Google Scholar]
  • 52.Harmon LJ, Losos JB, Jonathan Davies T, Gillespie RG, Gittleman JL, Bryan Jennings W, et al. Early bursts of body size and shape evolution are rare in comparative data. Evolution. 2010;64(8):2385–96. doi: 10.1111/j.1558-5646.2010.01025.x [DOI] [PubMed] [Google Scholar]
  • 53.Bedford T, Hartl DL. Optimization of gene expression by natural selection. Proceedings of the National Academy of Sciences. 2009;106(4):1133–8. doi: 10.1073/pnas.0812009106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Catalán A, Briscoe AD, Hohna S. Drift and directional selection are the evolutionary forces driving gene expression divergence in eye and brain tissue of Heliconius butterflies. Genetics. 2019;213(2):581–94. doi: 10.1534/genetics.119.302493 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Chen J, Swofford R, Johnson J, Cummings BB, Rogel N, Lindblad-Toh K, et al. A quantitative framework for characterizing the evolutionary history of mammalian gene expression. Genome Research. 2019;29(1):53–63. doi: 10.1101/gr.237636.118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Cooper N, Thomas GH, Venditti C, Meade A, Freckleton RP. A cautionary note on the use of Ornstein Uhlenbeck models in macroevolutionary studies. Biological Journal of the Linnean Society. 2016;118(1):64–77. doi: 10.1111/bij.12701 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Guerrero RF, Posto AL, Moyle LC, Hahn MW. Genome-wide patterns of regulatory divergence revealed by introgression lines. Evolution. 2016;70(3):696–706. doi: 10.1111/evo.12875 [DOI] [PubMed] [Google Scholar]
  • 58.Catalán A, Macias-Munoz A, Briscoe AD. Evolution of sex-biased gene expression and dosage compensation in the eye and brain of Heliconius butterflies. Molecular Biology and Evolution. 2018;35(9):2120–34. doi: 10.1093/molbev/msy111 [DOI] [PubMed] [Google Scholar]
  • 59.Scally A, Dutheil JY, Hillier LW, Jordan GE, Goodhead I, Herrero J, et al. Insights into hominid evolution from the gorilla genome sequence. Nature. 2012;483(7388):169–75. doi: 10.1038/nature10842 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–3. doi: 10.1093/bioinformatics/btu033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Huerta-Cepas J, Serra F, Bork P. ETE 3: Reconstruction, analysis, and visualization of phylogenomic data. Molecular Biology and Evolution. 2016;33(6):1635–8. doi: 10.1093/molbev/msw046 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Kirsten Bomblies, Alex Buerkle

19 Sep 2021

Dear Dr Hibbins,

Thank you very much for submitting your Research Article entitled 'The effects of introgression across thousands of quantitative traits revealed by gene expression in wild tomatoes' to PLOS Genetics.

The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some concerns that we ask you address in a revised manuscript

We therefore ask you to modify the manuscript according to the review recommendations. Your revisions should address the specific points made by each reviewer.

In addition we ask that you:

1) Provide a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images.

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, you will need to go to the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

Please let us know if you have any questions while making these revisions.

Yours sincerely,

Alex Buerkle

Associate Editor

PLOS Genetics

Kirsten Bomblies

Section Editor: Evolution

PLOS Genetics

This manuscript has been reviewed thoroughly by three referees, all of whom find the work interesting and compelling. The reviews offer suggestions for improvement of the manuscript, including some requests for possible additions to the analyses, which I encourage the authors to consider. Reporting a false positive rate along with the power analysis seems like a good idea. The reviews are also very clear in their praise for the clarity of the presentation and the important contribution this work will make to how we think about and analyze trait covariation across species. I agree and think this manuscript will be read with lots of interest.

As a minor point, I will add that it would be good to emphasize the time scale a bit more in the introduction. For example, on line 92, perhaps "To address the effects of introgression on quantitative traits" could be modified to "To address the effects of historical introgression on quantitative traits". I think this is worthwhile because people study trait variation due to contemporary admixture and introgression regularly with admixture mapping. The text is clear that this manuscript is about phylogenomics, deeper timescales reflected on trees, and comparative methods, but I believe readers will be benefit from a very explicit statement in the introduction of the relevant time scale.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The paper by Hibbins and Hahn proposes a statistical framework for estimating the amount of introgression between two species while accounting for incomplete lineage sorting (ILS). Both ILS and introgression are common in evolution and failure to account for them can influence inferences of the rate, degree of convergence, or amount of selection that has occurred on a particular trait. As such, this work is an important step towards a more complete understanding of the evolution of quantitative traits. The paper is clearly written and the results presented in an easy to follow manner. However, I do have a couple of concerns about the current presentation.

1. The authors did not present data on false positive rates and this information is crucial for understanding the power analyses performed. This data should already be included in the simulations, so additional simulations aren’t needed.

2. The power analysis comparing a model of ILS only and of ILS + introgression used a single set of parameters and the values chosen for some parameters (e.g. number of genes and introgression amount) are more conducive to identifying a signal of introgression than what is presented in the power analysis of just the ILS+introgression model or the empirical data analyzed. It would be useful to see the results of this comparison for a range of conditions, especially those that covered what would be expected for the empirical data to better understand the ability to distinguish between these two models.

3. It is hard to tell exactly, but the power analyses performed suggest relatively low power to detect introgression in the empirically analyzed data using the ILS+introgression approach. This is particularly true for the ‘low’ data set, yet introgression is found. A discussion or resolution of this discrepancy would be useful to readers.

4. The presented Q3 statistic allows one to estimate the extent and species involved in introgression. However, as currently presented it is not integrated into a framework that allows the estimation of the rate of trait evolution or the degree of ILS. Maybe I’m mistaken, but the models seem to be nested, with the ILS only model being a special case of the ILS+introgression model. This would suggest that a likelihood ratio approach could be used to determine whether the inclusion of introgression in the model is actually warranted by the data. I recognize that this was not the approach taken by the authors, but this seems like a limitation to the current approach that would be useful to mention.

Minor point

Line 482 – missing location of script

Reviewer #2: This paper develops an approach to examine gene expression variation and covariation between species that takes into account both incomplete lineage sorting and introgression. The paper then goes to show that gene expression covariation between species that have been subject to introgression also show the effect in gene expression. They show this in particular within 2 subclades in Solanum that show varying levels of introgression in the genome.

The modelling seems sound although I do not have the expertise to comment on this deeply. However, I do think this is a very interesting area of study as we begin to grasp the multilayered effects of interspecies hybridization on evolution. The results with gene expression data do show a significant effect of introgression, although the magnitude of the difference was not great (476 vs. 449 genes).

My one concern is that the authors really only apply their method to 2 groups – the low and high triplet clades they identify. It would have been much stronger if they could find other comparisons, as there will always be a question of whether this particular high triplet is a fluke. Conversely, maybe showing what the results look like in clades that have no evidence of introgression may also provide greater confidence.

Reviewer #3: The evolution of gene expression has been an important source of research in the last three decades. Nonetheless, most, if not all studies of gene expression, ignore the effect that introgression might have on the distribution of gene expression as a trait in phylogenetic trees. The first part of the paper addresses this very gap and develops an analytical framework to recapitulate the evolution of quantitative traits following a Brownian motion model of evolution in the face of introgression and incomplete lineage sorting. The piece provides a derivation of the expected trait variances and covariances among hybridizing species using the multispecies network coalescent as a framework. The methods are clear, the appendix is welcome and the results are well presented. The authors are forward and state that while are other models of trait evolution but elaborate on the reasons to not expand their model. The model is elegant and timely and I think will be an excellent addition to the literature. The authors present a fairly large simulation analyses in Figure 3 which also bolster the importance of the piece.

The second part of the paper takes the theoretical prediction and uses the model to evaluate the evolution of ovule gene expression 12 species of Solanum. Using the model described above, the authors find that gene expression similarity is correlated with the tree topology of protein-coding genes which is consistent with cis-acting effects of introgressed variation on expression. To my knowledge, this is one of the finest assessments of the evolution of gene expression in diverging species because instead of focusing only on reporting the results, there is an underlying model and a potential explanation of the reasons that lead to such patterns.

The manuscript is exceptionally well written, has an excellent level of scholarship, is well contextualized, has clear results, and the discussion points out the relevant caveats. Overall, the piece is a timely and excellent contribution to the literature and I am certain will have an important effect on how we understand the evolution of gene expression, speciation, and introgression altogether.

I just have a few comments, all of which are rather minor and are meant to help the readability of the piece.

The colors in Figure 1 are hard to follow, especially for a color-blind person.

Line 282. I think “thousands of quantitative traits’ refers to the expression of individual genes. I would rephrase this sentence to be more clear.

Line 499. Elaborate on the rationale for this ordination.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Decision Letter 1

Kirsten Bomblies, Alex Buerkle

18 Oct 2021

Dear Dr Hibbins,

We are pleased to inform you that your manuscript entitled "The effects of introgression across thousands of quantitative traits revealed by gene expression in wild tomatoes" has been editorially accepted for publication in PLOS Genetics. Congratulations!

Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made.

Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics!

Yours sincerely,

Alex Buerkle

Associate Editor

PLOS Genetics

Kirsten Bomblies

Section Editor: Evolution

PLOS Genetics

www.plosgenetics.org

Twitter: @PLOSGenetics

----------------------------------------------------

Comments from the reviewers (if applicable):

I appreciate the authors' clear and comprehensive responses to the suggestions and comments from the previous round of review. The manuscript is likely to prompt further empirical and methodological advances and represents a clear contribution to the field.

----------------------------------------------------

Data Deposition

If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website.

The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: 

http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-21-01009R1

More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support.

Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present.

----------------------------------------------------

Press Queries

If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org.

Acceptance letter

Kirsten Bomblies, Alex Buerkle

27 Oct 2021

PGENETICS-D-21-01009R1

The effects of introgression across thousands of quantitative traits revealed by gene expression in wild tomatoes

Dear Dr Hibbins,

We are pleased to inform you that your manuscript entitled "The effects of introgression across thousands of quantitative traits revealed by gene expression in wild tomatoes" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Agnes Pap

PLOS Genetics

On behalf of:

The PLOS Genetics Team

Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom

plosgenetics@plos.org | +44 (0) 1223-442823

plosgenetics.org | Twitter: @PLOSGenetics

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig

    Power analysis results using (A) 5000 and (B) 10,000 genes.

    (EPS)

    S2 Fig. Power analysis results under no introgression (i.e. false positive rates).

    (EPS)

    S3 Fig. Modelling introgression using parent trees.

    Parent trees represent the possible histories of speciation and/or introgression at a locus, with their probabilities determined by the rate of introgression. Within each parent tree, gene trees sort according to the multispecies coalescent process. Adapted from [10].

    (EPS)

    S1 Table. Full 3x3 contingency table for relationship between gene-level expression and local gene tree topology in the low triplet.

    (DOCX)

    S2 Table. Full 3x3 contingency table for relationship between gene-level expression and local gene tree topology in the high triplet.

    (DOCX)

    S1 Data. Pairwise distances between simulated trait values for 20,000 genes.

    (XLSX)

    S2 Data. P-value and sign of Q3 statistic reported for each combination of parameters in our power analysis, with 15,000 simulated traits.

    (TXT)

    S3 Data. Normalized (RPKM) gene expression data from Solanum ovules used in our empirical analyses.

    (CSV)

    S4 Data. Gene tree topology counts for each trio calculated from Pease et al. [22] transcriptomic dataset.

    Used in the gene-level expression analysis.

    (CSV)

    S5 Data. P-value and sign of Q3 statistic reported for each combination of parameters in our power analysis, under the no-introgression condition.

    (TXT)

    S6 Data. P-value and sign of Q3 statistic reported for each combination of parameters in our power analysis, with 5000 simulated traits.

    (TXT)

    S7 Data. P-value and sign of Q3 statistic reported for each combination of parameters in our power analysis, with 10,000 simulated traits.

    (TXT)

    S1 Text. Derivation of expected trait variances and covariances under the multispecies network coalescent.

    (DOCX)

    Attachment

    Submitted filename: intro_expression_responses.docx

    Data Availability Statement

    Raw sequencing reads for the gene expression dataset are available on the SRA under BioProject PRJNA714065. Quantitative expression data, as well as scripts for all analyses, are available from https://github.com/mhibbins/intro_quant_traits.


    Articles from PLoS Genetics are provided here courtesy of PLOS

    RESOURCES