Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2022 Jan 8.
Published in final edited form as: Nat Ecol Evol. 2021 Nov 8;5(12):1624–1636. doi: 10.1038/s41559-021-01573-2

Plasmids do not consistently stabilize cooperation across bacteria but may promote broad pathogen host-range

Anna E Dewar 1,a,*, Joshua L Thomas 1,a, Thomas W Scott 1, Geoff Wild 2, Ashleigh S Griffin 1, Stuart A West 1,b, Melanie Ghoul 1,b
PMCID: PMC7612097  EMSID: EMS135883  PMID: 34750532

Abstract

Horizontal gene transfer via plasmids could favour cooperation in bacteria, because transfer of a cooperative gene turns non-cooperative cheats into cooperators. This hypothesis has received support from theoretical, genomic and experimental analyses. In contrast, we show here, with a comparative analysis across 51 diverse species, that genes for extracellular proteins, which are likely to act as cooperative ‘public goods’, were not more likely to be carried on either: (i) plasmids compared to chromosomes; or (ii) plasmids that transfer at higher rates. Our results were supported by theoretical modelling which showed that while horizontal gene transfer can help cooperative genes initially invade a population, it has less influence on the longer-term maintenance of cooperation. Instead, we found that genes for extracellular proteins were more likely to be on plasmids when they coded for pathogenic virulence traits, in pathogenic bacteria with a broad host-range.

Key words/phrases: extracellular proteins, genetic architecture, horizontal gene transfer, inclusive fitness, kin selection, secretome

Introduction

The growth and success of many bacterial populations depends upon the production of cooperative ‘public goods’14. Public goods are molecules whose secretion provides a benefit to the local group of cells. Examples include iron-scavenging siderophores5, exotoxins that disintegrate host cell membranes6,7, and elastases that break down connective tissues810. A problem is that cooperation can be exploited by ‘cheats’: cells which avoid the cost of producing public goods but can still use and benefit from those produced by cooperative cells3,11,12. What prevents cheats from outcompeting cooperators, and ultimately destabilising cooperation?

In bacteria, some genetic elements are able to move between cells13. This horizontal gene transfer has been suggested as a mechanism to help stabilize the production of cooperative public goods1418 (Figure 1a). If a gene coding for the production of a public good can be transferred horizontally, it would allow cheats to be ‘infected’ with the cooperative gene and turned into cooperators. Theoretical models have shown that this can facilitate the invasion of cooperative genes, in conditions where they would not be favoured on chromosomes1418. Experiments on a synthetic Escherichia coli system have shown that location on a plasmid helped the gene for a cooperative public good to invade, particularly in structured populations18. In addition, bioinformatic analyses across a range of species found that genes that code for extracellular proteins, many of which act as public goods, are more likely to be found on plasmids than the chromosome15,19,20.

Figure 1. Three hypotheses for why selection might favour genes coding for extracellular proteins to be located on plasmids.

Figure 1

(a) Cooperation Hypothesis. Blue cells produce extracellular proteins which act as cooperative public goods, while red cells are ‘cheats’ which exploit this cooperation. Over time cheats grow faster than cooperators since they forgo the cost of public good production. However, because the gene for the extracellular protein is located on a plasmid, cooperators can transfer the gene to the cheats, turning them into cooperators, increasing genetic relatedness at the cooperative locus, and stabilising cooperation1418. (b) Gain and Loss Hypothesis. The production of the extracellular protein is required in some environments, but not others. Transitions between these environments can result from temporal or spatial change. Cells are selected to either lose (Environment A) or gain (Environment B) the plasmid coding for the production of the extracellular protein. (c) Beyond Horizontal Gene Transfer Hypothesis. The location of a gene on a plasmid could provide a number of benefits, other than the possibility for horizontal gene transfer38. For example, when the quantity of extracellular protein required varies across environments (A versus B), plasmid copy number could be varied to adjust production38. Created with BioRender.com.

There are, however, three potential problems for the hypothesis that horizontal gene transfer favours cooperation. First, previous bioinformatic analyses made important first steps, but are not conclusive. One study examined only a single species, which may not be representative of all bacteria15. Two additional studies examined multiple species, but assumed that genes and genomes from the same and different species can be treated as independent data points, in a way that could have led to spurious results19,20. Statistical tests typically assume that data points are independent, and even slight non-independence can lead to heavily biased results (type I errors)21,22. There is an extensive literature in the field of evolutionary biology showing that species share characteristics inherited though common descent, rather than through independent evolution, and so cannot be considered independent data points2325. Genomes are nested within species, and genes are nested within genomes, multiplying this problem of non-independence, analogous to the problem of pseudoreplication in experimental studies2629. Phylogenetically-controlled bioinformatic analyses are required to address this problem of non-independence, and test the robustness of previous conclusions.

Second, from a theoretical perspective, while horizontal gene transfer can favour the initial invasion of cooperation, it is not clear if it favours the maintenance of cooperation in the long run16. For example, after a plasmid carrying a cooperative gene has spread through a population, a loss of function mutation could easily lead to a cheat plasmid evolving, which could then potentially outcompete the plasmid carrying the cooperative gene16,30. Theory is required that examines the maintenance as well as the invasion of cooperation, while accounting for important biological details, such as how plasmid transmission depends on the population frequency of the plasmid, and how frequently plasmids are lost, for example by segregation during cell division.

Third, there are alternative hypotheses for why genes coding for extracellular proteins might be preferentially carried on plasmids in some species (Figure 1)20,31. Bacteria can rapidly adapt to new and/or changing environments by acquiring new genes via horizontal gene transfer, and losing genes no longer required but costly to maintain (Figure 1b)3234. Genes which facilitate adaptation to environmental variability are often those which code for molecules secreted outside the cell3437. Consequently, we might expect to find genes for extracellular proteins on plasmids to facilitate rapid gain and loss of genes depending on environmental conditions, and not because they are cooperative per se. Alternatively, genes may be favoured to be on plasmids for reasons other than horizontal gene transfer (Figure 1c)38. For example, a higher plasmid copy number offers a mechanism for more expression of a gene, potentially even conditionally, in response to certain environmental conditions38. The benefit of being able to regulate gene expression in this way could be higher in genes which code for molecules that are secreted outside the cell, when different quantities of molecule are required in different environments. These different hypotheses are not mutually exclusive.

We addressed all three of these potential problems for the hypothesis that horizontal gene transfer favours cooperation. We first tested two predictions that would be expected to hold if horizontal gene transfer favours cooperation. Specifically, cooperative genes would be more likely to be found on: (i) plasmids relative to chromosomes; (ii) more mobile plasmids relative to less mobile plasmids1420. We used phylogeny-based statistical methods that control for the problem of non-independence, analysing 1632 genomes from 51 bacterial species, to examine the location of genes that code for extracellular proteins. We then used theoretical models, to examine whether horizontal gene transfer facilitates the evolution as well as the initial spread of cooperation.

Finally, we also tested alternative hypotheses for why genes coding for extracellular proteins might be preferentially carried on plasmids. We used three measures of environmental variability to ask whether species which had more variable environments were those most likely to carry genes for extracellular proteins on their plasmids. Additionally, we examined one of these measures in more detail, to help determine whether genes for extracellular proteins were located on plasmids so that they could be gained and lost easily (Figure 1b), or instead because of some additional benefit conferred by plasmid carriage (Figure 1c).

Results

Genomic Analyses

We use the approach developed by Nogueira et al.15,19,20, of using PSORTb39 to predict the subcellular location of every protein encoded by 1632 complete genomes from 51 diverse bacterial species (Extended Data Figure 1; Table S3). We are also building upon the work of researchers who pointed out that extracellular (secreted) proteins are likely to provide a benefit to the local population of cells, and hence act as cooperative public goods2,15,19,20,40. The advantage of this method is that it allows a large number of genes to be examined, across multiple species.

Overall, we found the average bacterial genome had 2696 protein-coding genes on the chromosome(s), and 223 on the plasmid(s). Of these, an average of 57 genes (~2%) coded for the production of an extracellular protein, with 52 on the chromosome(s) and 5 on the plasmid(s). This means, on average, 1.9% of chromosome genes and 2.4% of plasmid genes coded for extracellular proteins. To control for the number of genomes per species, we first calculated the mean number of genes for each species, and then the mean of these species means. Therefore, the values above give an indication of the location of genes coding for extracellular proteins in an average genome. Genes with unknown protein localisations were not included (Chromosome: 26.2%; Plasmid: 38.3%). Across species, the proportion of genes coding for extracellular proteins for plasmid(s) was generally more variable than for the chromosome(s) (Figure S2). These patterns are very similar to those found previously3,15,19,20.

Extracellular proteins are not overrepresented on plasmids

We found that extracellular proteins were not more likely to be carried on plasmids compared to chromosomes (Figure 2). The difference in the proportion of genes that coded for extracellular proteins between plasmid and chromosome was not significantly different from zero across all species (MCMCglmm41; posterior mean = 0.004, 95% CI = -0.063 to 0.057, pMCMC= 0.87; n = 1632 genomes; R2 of species sample size = 0.47, R2 of phylogeny = 0.17; Table S2, row 1a). This result was robust to alternative forms of analysis. We also found no significant difference when we: (i) compared chromosomes to plasmids of only certain mobilities (Fig S3; Table S2, rows 20-22); (ii) analysed our data by two alternative methods, by looking at the ratio of proportions instead of the difference, or by considering only whether the plasmid proportion was greater than the chromosome proportion, removing any effect of the magnitude of this difference (Extended Data Figure 2; Table S2, rows 2 and 3). Our analyses use a bacterial phylogeny, which assumes plasmid evolution follows bacterial phylogeny, but we also found no significant pattern if we ignored phylogeny and analysed species as independent data points (Figure 2; Table S2, row 1b; pMCMC = 0.644).

Fig 2. Extracellular proteins are not overrepresented on plasmids.

Fig 2

For each species we calculated the mean difference between plasmid(s) and chromosomes in the proportion of genes coding for extracellular proteins. Species in blue have a difference greater than zero, meaning their plasmid genes code for a greater proportion of extracellular proteins than chromosome genes. Species in red have a difference less than zero, meaning their chromosome genes code for a greater proportion of extracellular proteins than plasmid genes. Error bars indicate the standard error. The dot and error bar at the top of the graph indicate the mean difference and 95% Credible Interval given by a MCMCglmm analysis across all species, controlling for phylogeny and sample size. We arcsine square root transformed proportion data before calculating the difference. Overall, there is no consistent trend that genes coding for extracellular proteins are more likely to be carried on plasmids (i.e. no consistent trend towards species in blue).

The lack of an overall significant result was clear when looking at the raw data for the different species that we examined (Figure 2; Extended Data Figure 2). There was considerable variation across species in the location of genes coding for extracellular proteins. Overall, extracellular proteins were more likely to be on plasmids in 51% of species (26/51), and more likely to be on the chromosome(s) in 49% (25/51) of species (Extended Data Figure 2). For example, in Bacillus anthracis genes coding for extracellular proteins were three times more likely to be on plasmids, whereas in Acinetobacter baumannii genes coding for extracellular proteins were three times more likely to be on the chromosome(s) (Extended Data Figure 2). Clearly, across species, genes coding for extracellular proteins are not consistently more likely to be on plasmids.

As a control, we also analysed the genomic location of the genes coding for all other classes of protein (Extended Data Figure 1). Specifically, we analysed genes that coded for the production of Cytoplasmic, Cytoplasmic Membrane, Periplasmic, Outer Membrane and Cell Wall proteins. We found that none of these protein localisations were significantly overrepresented on plasmids or chromosomes across the 51 species (Extended Data Figure 3; Table S2, rows 5-10). Plasmids are highly variable in the genes they carry.

Importance of controlling for non-independence of genomes

Our results contrast with previous studies, which found that plasmid genes code for proportionally more extracellular proteins than chromosomes15,19,20. The first of these studies found this pattern across 20 Escherichia coli genomes15. We also found that genes coding for extracellular proteins in E. coli were more likely to be found on plasmids (Figure 2; Extended Data Figure 2). However, Figure 2 shows that this is not a consistent pattern across species: approximately half (25/51) of the species we analysed showed a pattern in the opposite direction, with genes coding for extracellular proteins more likely to be on their chromosome(s) than their plasmid(s).

Two subsequent, multi-species studies found that plasmid genes were significantly more likely to code for extracellular proteins than chromosome genes19,20. These studies used statistical tests such as Wilcoxon signed-rank test to ask whether there was a consistent pattern, using bacterial genomes as independent data points. When we analysed our data with the same statistical methods used in these studies, we also obtained a significant result (Wilcoxon signed-rank test; V= 826530, p-value <0.001, R2 = 0.385; n = 1632 plasmid-chromosome pairs). When analysing other questions, Garcia-Garcera & Rocha20 used MCMCglmm to control for phylogeny.

Why does using bacterial genomes as independent data points lead to a significant result? By using a Wilcoxon signed-rank test, at the level of the genome, we are implicitly assuming that all the genomes analysed are: (i) independent from one another; (ii) a representative sample of bacteria in nature. Neither of these are true for multi-species genomic datasets. First, due to shared ancestry, species are not independent from one another, and so neither are genomes in such analyses24,42. Even a slight lack of independence can lead to heavily biased results in statistical analyses and spurious conclusions21. Second, genomic databases tend to have a disproportionate abundance of certain species and genera. This will bias the results towards commonly sequenced species.

Consequently, when asking questions across species, it is inappropriate to treat all the genomes in genomic datasets as independent data points. When we performed an analysis analogous to the Wilcoxon signed-rank test, using the same untransformed data which produced a significant result above, but controlled for the number of genomes per species and the non-independence of species, we no longer found any significant difference between the proportion of plasmid and chromosome genes coding for extracellular proteins (MCMCglmm; posterior mean = 0.017, 95% CI = -0.021 to 0.057, pMCMC = 0.332; n = 1632 plasmid-chromosome paired differences in extracellular proportion; R2: species sample size = 0.46, phylogeny = 0.34; Table S2, row 4). Furthermore, we found that the number of genomes per species and the non-independence of species explained 46% and 34% of the variation in data respectively (paired plasmid and chromosome differences across our 1632 genomes). Taken together, this illustrates that it is not our data which disagrees with previous studies, but instead our use of statistical analyses appropriate for multi-genome, multi-species datasets2325.

These data also illustrate the importance of examining effect sizes, and not just whether results are statistically significant. With large sample sizes it is possible to get results that are significant but not biologically important. The percentage of variance explained that is considered biologically significant can depend upon the kind of data you are examining and the field of research, but a baseline of 5-10% seems reasonable for many areas of evolutionary biology (Supp. Info. 1)4345. When bacterial genomes are assumed to be independent data points in across species analyses, this leads to inflated sample sizes. Consequently, even when results are statistically significant at P<0.05, they can still only explain 1-2% of the variation in the data, which is clearly not biologically significant. The flip side of such considerations is that effects sizes and examination of raw data at the species level (e.g. Figure 2) are also useful checks against non-significant results due to a lack of statistical power (type II errors).

Plasmids with higher mobility do not carry more genes for extracellular proteins

We then tested another prediction of the cooperation hypothesis: cooperation is more likely to be favoured when coded for on more mobile plasmids1418. We used data from the MOBsuite database to assign plasmids to one of three levels of mobility (Fig 3a)46,47. We classify: conjugative plasmids, which carry all genes necessary to transfer, as the most mobile; mobilizable plasmids, which are dependent upon conjugative plasmids’ machinery to transfer, to have intermediate mobility; non-mobilizable plasmids, which cannot be transferred via conjugation, to be the least mobile (Fig 3a)46,48.

Figure 3. Plasmid mobility and extracellular proteins.

Figure 3

(a) We divided plasmids into three mobility types: non-mobilizable (lowest or no mobility); mobilizable (intermediate mobility); conjugative (highest mobility). Blue cells are potential plasmid donors, while red cells are potential recipients. Each panel shows when plasmid transfer is possible for one of the three plasmid mobility types. Non-mobilizable plasmids cannot be transferred. Mobilizable plasmids cannot be transferred alone, but they carry enough genes to ‘hijack’ the machinery of a conjugative plasmid that is in the same cell. Conjugative plasmids carry all genes necessary to transfer independently. Created with BioRender.com. (b) The 40 species which carried plasmids of all three mobilities are shown, with a panel for each of these species. Dots in each panel indicate the mean % of genes coding for extracellular proteins of all plasmids of each mobility level. The lines are the linear regression of these three points, coloured blue if the slope is positive and orange if the slope is negative. Note that each row of species has a different y-axis scale, indicated on the left, which applies to all species in that row. We arcsine square root transformed proportion data before calculating the mean for each species, and then back-transformed these values for display of the data. Overall, there is no consistent trend for genes that code for extracellular proteins to be on more mobile plasmids.

Genes coding for extracellular proteins were not more likely to be on plasmids with higher transfer rates (Figure 3b). Examining the slope of the regression between plasmid mobility and the proportion of genes coding for extracellular proteins, we found no consistent pattern across species (MCMCglmm; posterior mean = 0.006, 95% CI = -0.040 to 0.052, pMCMC = 0.73; n = 40; Table S2, row 11). This lack of a significant relationship was robust to different forms of analysis, including an examination of the means of each mobility type of each species (Figure S4; Table S2, row 12). We also found no correlation between the proportion of a species’ plasmids which can transfer and how overrepresented or underrepresented extracellular proteins are on plasmids compared to chromosomes (Extended Data Figure 4; Table S2, rows 16 and 17).

To examine our assumption that mobilizable plasmids are likely to be less mobile than conjugative plasmids, we examined how frequently these two kinds of plasmids co-occurred within a genome. If mobilizable plasmids are present in the same cell as conjugative plasmids, they could be transmitted at similar rates. However, we found that of genomes with a mobilizable plasmid(s), 60% did not also carry a conjugative plasmid (434/727). In addition, when mobilizable plasmids did co-occur with a conjugative plasmid, they did not have a higher proportion of genes coding for extracellular proteins (Supp. Info. 1; Figure S6). A caveat here is that our estimates of transfer rates across different types of plasmid is relative, and it would be very useful to obtain quantitative estimates of transfer rates.

Theoretical Stability of Cooperation

Our empirical results did not support the theoretical prediction that cooperative genes should be overrepresented on plasmids, relative to the chromosome1418,49. Consequently, we then extended existing theory, to examine whether we could find conditions where cooperative genes were not predicted to be overrepresented on plasmids. We investigated the consequences of two factors: (1) allowing for a greater range of possible genetic architectures, especially plasmids that lacked the gene for cooperation (non-cooperative or ‘cheat’ plasmids); and (2) examining the evolutionary stability (maintenance) of cooperation, not just its initial invasion16,49.

We examined two possible reasons for why cooperative genes could be overrepresented on plasmids, relative to the chromosome. First, horizontal gene transfer on a plasmid could allow cooperation to be favoured in conditions where it would otherwise not be favoured1418. For example, because plasmid transfer can turn non-cooperators into cooperators, and increase relatedness at the loci for cooperation17. Second, even if horizontal gene transfer did not increase the range of biological scenarios (parameter space) where cooperation was favoured, there could be selection for cooperation to be coded for on a plasmid, rather than a chromosome.

We assumed an infinite population of haploid individuals (bacterial cells). Individuals may carry a cooperative gene, that codes for public goods production, either on a plasmid, or the chromosome, or both (redundancy). We also allowed for the possibility of: non-cooperative plasmids and chromosomes; plasmid-free cells; a cost of plasmid carriage (CC).

Each generation, the population is divided into patches, each founded by N independent cells. Cells reproduce clonally until there are a large number of cells per patch. Cells are then randomly shuffled into pairs on their patch and, if a plasmid-free individual has a plasmid-bearing partner, with probability β, the plasmid-free individual acquires a copy of its partner’s plasmid (horizontal gene transfer). Individuals with a gene for cooperation then produce a public good, at a cost CG, which generates a benefit B that is shared between all members of the patch. Individuals then survive according to their fitness. Plasmid-bearing individuals lose their plasmid with probability s. Finally, individuals disperse to found new patches.

Consistent with previous analyses, we found that, in the short term, horizontal gene transfer on a plasmid can initially help cooperation invade (Figure 4)1418. Horizontal gene transfer increased the frequency of cooperation, by turning non-cooperators into cooperators, which also increases relatedness at the cooperative locus on the plasmid1418,49. Relatedness is increased because, in the short term, whilst plasmids are spreading from rarity, there are many plasmid-free cells available, meaning plasmids have many opportunities to be transferred, generating genetic similarity.

Figure 4. Plasmids facilitate the invasion but not the maintenance of cooperation.

Figure 4

In parts (a) and (b), we plot the results of our theoretical model for the case when there is no plasmid loss (s=0). (a) Cooperation is only maintained at equilibrium (green shaded area) when it is favoured at the chromosomal level RB > CG, which is unaffected by plasmid transfer (β). (b) Plasmids can facilitate the invasion and initial spread of cooperation (blue line shoots above red line), but cooperative plasmids are eventually outcompeted by cheat plasmids (red line goes to 1). We note that, in (b), all individuals are chromosomal defectors – chromosomal cooperation was permitted, but did not evolve in this run. To generate the plots in (a) and (b), we assumed the following parameter values: (a & b) B = 1.435, CG = 0.1, CC = 0.2; (b) β = 0.5, N = 16.

In contrast, we found that transfer on a plasmid did not appreciably increase the range of parameter space where cooperation was maintained at evolutionary equilibrium (Fig 4a & 5) (Supp. Info. 4). First, in the absence of plasmid loss (s=0), cooperation was only favoured when RB-CG>0, where R is the genetic relatedness at the chromosomal (individual) level (R=1/N). Cooperation was therefore only favoured on the plasmid when it provided a kin selected benefit at the level of the chromosome (individual), as predicted by Hamilton’s rule50,51.

Figure 5. Plasmid loss can favour the maintenance of cooperation.

Figure 5

We plot the results of our theoretical model for different levels of plasmid loss (s=0-1). The areas encapsulated by the coloured lines show the regions of parameter space where cooperation is polymorphic at equilibrium (i.e. population comprises some cooperators & some defectors). When plasmid loss is absent (s=0), there is no polymorphism (encapsulated area collapses to nothing), meaning cooperation is only maintained at equilibrium (at fixation) when it is favoured at the chromosomal level RB > CG (to the left of the black dotted line) (R=1/N). When plasmid loss is intermediate (s=0.1,0.2,0.3,0.4), cooperation can be polymorphic at equilibrium (encapsulated areas), with cooperation being disfavoured in the encapsulated areas to the left of the black dotted line, and favoured in the encapsulated areas to the right of the black dotted line, relative to when plasmids are absent (β=0). When plasmid loss is high (s≥0.5), or when transmission (β) is low, plasmids fail to persist at equilibrium, meaning they have no long-term effect on cooperation (encapsulated areas collapse to nothing). Overall, plasmid loss can facilitate cooperation, but only if plasmid loss (s) is intermediate and transmission (β) is high. To generate this plot, we assumed the following parameter values: B = 1.435, CG = 0.1, CC = 0.2 (same as Fig. 4).

The reason for this result is that, in the absence of plasmid loss (s=0), plasmids continue to increase in frequency after invasion, ultimately reaching fixation in the population. This means that, in the long term, there are no plasmid-free individuals left to infect, which means that the overall level of horizontal gene transfer in the population goes to zero. Consequently, competition between plasmids with and without a cooperative gene (cooperators and cheats) becomes analogous to the scenario in which the gene for cooperation is on the chromosome17.

Second, when plasmids can be lost (s>0), this can favour cooperation on plasmids, but only in certain areas of parameter space (Figure 5). Plasmid loss means that plasmids do not reach fixation in the population, and so some plasmid transfer still occurs in the evolutionary long term, increasing relatedness at the cooperative plasmid locus. This increased relatedness may favour cooperation on the plasmid, when it would not otherwise be favoured on the chromosome, if plasmids are transferred rapidly (high β) and rates of plasmid loss are intermediate (Figure 5). Specifically, plasmids need to be lost quickly enough that plasmid relatedness appreciably deviates from chromosomal relatedness, but not too quickly that plasmids are not maintained (Figure 5). Another factor that might prevent plasmids from reaching fixation is if there was a constant, high influx of plasmid-free cells (immigration).

Overall, our model suggests that horizontal gene transfer can help cooperation initially invade, but will then often have less influence on whether cooperation is maintained in the long term (Figures 4 & 5). We are not saying that horizontal gene transfer can never favour cooperation, just that there is an appreciable area of parameter space where it does not. Consequently, our model provides an explanation for why cooperative genes are not consistently overrepresented on plasmids (Figures 2 & 3). An analogous theoretical result for the case without plasmid loss (s=0) was also found in a meta-population model by Mc Ginty et al.16. Our predictions are consistent with experiments carried out by Bakkeren et al.30, who found that location on a conjugative plasmid could help a cooperative trait invade in Salmonella Typhimurium (S.Tm), but that this was only stable with strong population bottlenecks (high relatedness). Dimitriu et al.18 found that cooperative plasmids were favoured in structured but not well-mixed populations, and that cooperation was favoured more during ‘epidemic spreads’ into a population.

In addition, we found that, when cooperation is favoured, cooperative traits are not more likely to be favoured on, or transferred to, plasmids. The reason is that, when cooperation is favoured, non-cooperators (cheats) are purged from the population, which means there is no extra fitness benefit of coding for the cooperative trait on a plasmid rather than the chromosome. Consequently, our results suggest that horizontal gene transfer only favours cooperation in a restricted area of parameter space. Although, there could be interesting transient dynamics, with cooperation being favoured temporarily (Figure 4), or when cooperation has other consequences, such as increasing plasmid transmission52,53. Another important factor is the rate of horizontal gene transfer. While plasmids clearly transmit fast enough to influence evolution, the transfer rates per cell per generation might not be high enough to significantly influence relatedness at the locus for cooperation (i.e. a high enough β)54.

Alternate hypotheses

Finally, we examined whether alternate hypotheses may better explain the considerable variation in the location of genes coding for extracellular proteins across species. Species which live in more variable environments may be more likely to carry extracellular genes on plasmids. This could be expected for different reasons, including plasmid transfer allowing genes for different environments to be gained and lost (Figure 1b), or plasmids conferring some other advantage not associated with horizontal gene transfer, such as allowing copy number to be conditionally adjusted (Figure 1c)31,32,38,55. There are a number of different ways to classify environmental variability, and so we used three different methods.

Broad host-range pathogens are most likely to carry genes for extracellular proteins on plasmids

We first used the diversity of pathogen hosts as a proxy for environmental variability. Although this does not capture all environmental variability experienced by species in our data set, pathogenicity is a key aspect of bacterial lifestyle that has been suggested to be important for plasmid gene content, such as antibiotic resistance and virulence factors6,40,56,57. We divided species into three categories: pathogens with broad host-range, pathogens with narrow host-range, and non-pathogens. Broad host-range pathogens are expected to encounter more variable environments than narrow host-range pathogens.

We found that pathogens with a broad host-range were more likely to carry genes coding for extracellular proteins on their plasmids, compared with both narrow host-range pathogens and non-pathogens (Fig 6a). Specifically, we compared the difference in the proportion of genes coding for extracellular proteins between plasmid(s) and chromosome(s) across these three categories of species (MCMCglmm; Narrow compared to Broad host-range pathogens: posterior mean = -0.222, 95% CI = -0.322 to -0.123, pMCMC = <0.001; Non-pathogens compared to Broad host-range pathogens: posterior mean = -0.161, 95% CI = -0.252 to -0.067, pMCMC = <0.001; n = 701 genomes; R2 of pathogenicity/host-range = 0.35, R2 of species sample size = 0.28, R2 of phylogeny = 0.11; Table S2, row 23). There was no significant difference between narrow host-range pathogens and non-pathogens in the proportion of genes coding for extracellular proteins on their plasmids compared to chromosome(s) (MCMCglmm; Non-pathogens compared to Narrow host-range pathogens: posterior mean = 0.031, 95% CI = -0.065 to 0.127, pMCMC = 0.482; n = 389; Table S2, row 25). These patterns hold irrespective of whether we included species that we could not reliably classify into either category, such as opportunistic pathogens, in our analyses (Extended Data Figure 5).

Figure 6. Pathogenicity, host-range and the location of genes coding for extracellular proteins.

Figure 6

We have divided species into either pathogens or non-pathogens, with pathogens further categorised into those with a narrow or broad host-range. The y-axis in (a) shows the difference in the proportion of genes on plasmids and chromosomes coding for extracellular proteins – this is the same as the x-axis in Figure 2. The y-axes in (b)(i) and (b)(ii) show the difference in the proportion of a subset of genes coding for extracellular proteins on plasmids and chromosomes which are predicted by MP3 as either (i) pathogenic or (ii) non-pathogenic. Each dot is the mean for all genomes in a species. Species in blue are those with the relevant subset of extracellular proteins overrepresented on plasmids, while species in red are those with the subset of extracellular proteins overrepresented on chromosomes. (c) Phylogeny based on recently published maximum likelihood tree using 16S ribosomal protein data64. The inner ring indicates whether extracellular proteins were more likely to be coded for on the plasmid(s) or chromosome(s), as in Figure 2. The outer ring indicates how we classified each species’ pathogenicity, and the presence or absence of diagonal lines for pathogens indicates narrow or broad host-range, respectively. Species with a pink or green label in the outer ring are those included in (a) and (b), since for these we could be reasonably confident of whether or not pathogenicity was an important and consistent aspect of their lifestyle. Overall, pathogens with a broad host-range are more likely to have genes coding for extracellular proteins, and particularly those involved in pathogenicity, on their plasmids.

Plasmids of broad host-range pathogens carry many pathogenicity genes

We suspected that the additional extracellular proteins coded for by plasmids of broad host-range species, compared to narrow host-range species, may be particularly involved in facilitating pathogenicity40,56,57. To investigate this, we used the program MP358 to assign each extracellular protein as either ‘pathogenic’ or ‘non-pathogenic’.

We found that plasmids of broad host-range pathogens were particularly enriched with extracellular proteins involved in facilitating pathogenicity, compared to plasmids of narrow host-range species (Figure 6b(i)). Specifically, we found that pathogens with a broad host-range were significantly more likely to code for pathogenic extracellular proteins on their plasmids compared to narrow host-range species (Figure 6b(i)) (MCMCglmm; Narrow compared to Broad host-range pathogens: posterior mean = -0.209, 95% CI = -0.350 to -0.086, pMCMC = 0.012; n=474 genomes; Table S2, row 26). In contrast, the relative location of non-pathogenic extracellular proteins did not vary between broad and narrow host-range pathogens (Figure 6b(ii)) (MCMCglmm; Narrow compared to Broad host-range pathogens: posterior mean = -0.036, 95% CI = -0.115 to 0.040, pMCMC = 0.296; n=474 genomes; Table S2, row 27). Consequently, the excess of genes coding for extracellular proteins on the plasmids of broad host-range species (Figure 6a) appears to arise due to an excess of pathogenicity genes coding for extracellular proteins (Figure 6b).

Most genomic databases are biased towards species that interact with and/or infect humans, so we examined whether human pathogens had driven the above results. In our dataset, 5 out of 10 broad host-range species and 3 out of 5 narrow host-range species can infect humans. We found no significant difference in how likely both pathogenic and non-pathogenic extracellular proteins were to be on plasmids of human pathogens compared to non-human pathogens. We also found that while host-range had a significant effect on how likely plasmids were to code for pathogenic extracellular proteins, whether a species could infect humans had no significant effect (Table S2, rows 28 to 30).

Pathogenic extracellular proteins could be preferentially coded for on plasmids to facilitate their gain and loss (Figure 1b: Gain and loss hypothesis), or because of some other benefit provided by being carried on a plasmid (Figure 1c: Beyond horizontal gene transfer hypothesis). We tested these possibilities by examining whether pathogenic extracellular proteins were more likely to be on plasmids that transfer at higher rates. This would be predicted by the gain and loss hypothesis, but not the beyond horizontal gene transfer hypothesis. We found that plasmids with higher mobility did not code for more pathogenic extracellular proteins. Specifically, across broad host-range pathogen species, the slope of the regression between plasmid mobility and the proportion of genes coding for pathogenic extracellular proteins was not consistently positive (Figure S7) (MCMCglmm; posterior mean = -0.020, 95% CI = -0.224 to 0.185, pMCMC = 0.774; n=7; Table S2, row 31). This lack of a significant relationship was robust to additional forms of analysis, such as considering all pathogenic species, including narrow host-range pathogens and those not carrying plasmids of all three mobility types (Figure S8; Table S2, rows 32 and 33).

Taken together, our results are most consistent with the hypothesis that genes coding for extracellular proteins are overrepresented on plasmids when plasmid carriage provides a benefit other than mobility (Figure 1c). A number of other factors may influence which genes are carried on plasmids, beyond horizontal gene transfer. First, there is evidence that increasing the copy number of plasmids can lead to increasing rates of evolution in the genes they carry59, and it also may act as a mechanism to increase the expression of genes carried on plasmids60,61. For example, increased expression of genes coding for extracellular public goods such as virulence factors could help invasion of a host and utilisation of host resources. This could be particularly beneficial for broad host-range pathogens that frequently invade a variety of different hosts. Copy number of plasmids has also recently been shown to lead to genetic dominance effects55, with likely implications for the phenotypes of genes selected for plasmid carriage55. Second, plasmids compete with their bacterial hosts for resources such as replication machinery and nucleotides62,63. To resolve this competition, plasmids should be under selection to reduce their cost to the host, with a likely impact on their gene content. For example, extracellular proteins are, on average, cheaper to produce than intracellular proteins15,20. Plasmid-host competition could consequently select for plasmids to carry more genes coding for cheaper proteins, and so more extracellular proteins. Our conclusion here should be seen as tentative, as some form of the gain and loss hypothesis (Figure 1b) could still be argued to be consistent with the data, if it is just the potential for horizontal gene transfer that matters, and not the rate.

Number of environments and core vs accessory genes

To further examine a potential association with environmental variability, as could be predicted by both hypotheses b (“Gain and Loss”) and c (“Beyond Horizontal Gene Transfer”), we also looked at two additional measures of environmental variability: (i) the number of five broad environments a species was sequenced in20,65,66; (ii) the proportion of a species’ genomes that is composed of ‘core’ genes, which are those found in all genomes of the species – species which experience more variable environments appear to have relatively smaller core genomes32. We found no significant correlation between either of these measures and the likelihood that genes coding for extracellular proteins were carried on plasmids (Extended Data Figure 6) (Supp. Info. 1; Table S2, rows 35 and 37). Garcia-Garcera & Rocha20 previously analysed a different but related question, examining the type of environment, and also used a MCMCglmm to control for the phylogenetic structure of the data (Supp. Info. 1). Our finding of no correlation between these two measures of environmental variability and whether plasmids code for extracellular proteins is in contrast to our above results with respect to pathogen host-range (Figure 6). This suggests that hypothesis c, which our data is most consistent with, may be important for pathogens in particular, but not necessarily across all bacterial species and lifestyles.

Complementary Analyses

There a number of directions in which our analyses could be expanded. We focused on plasmids because they have been the focus of previous theoretical and empirical work14,1618. Other mobile genetic elements include bacteriophages and integrative conjugative elements67,68. Comparing core and accessory genes could be a potential way to lump all causes of horizontal gene transfer15,19. We considered the relative transfer rates among mobility types; quantitative estimates of plasmid transfer rates would be very useful for further examination of plasmid mobility48,54,6971. We followed previous genomic studies by using extracellular proteins as indicators of cooperative traits2,15,19,20. The advantages of this approach are that: (i) we could compare our results with those from previous studies; (ii) secretion systems are highly conserved, allowing us to examine a large number of species, where detailed genetic annotations are lacking; (iii) cooperation mediated by extracellular proteins is usually controlled by only one gene, making them potentially more suitable for plasmid carriage compared to cassettes of multiple genes72,73. However, while extracellular proteins are likely to be cooperative traits, not all cooperative genes code for extracellular proteins (e.g. secondary metabolites such as siderophores), and not all extracellular proteins are involved in cooperation (e.g. those involved in motility such as flagellin). It would be very useful to examine more detailed annotations of social genes, and expand to other mobile genetic elements.

Discussion

We found no support for the hypothesis that horizontal gene transfer generally favours cooperation. Our genomic analyses showed that extracellular proteins are not: (i) overrepresented on plasmids compared to chromosomes (Figure 2); (ii) more likely to be carried by plasmids that transfer at higher rates (Figure 3). These patterns could be explained by our theoretical modelling, which showed that while horizontal gene transfer may help cooperation to initially invade a population, it has less influence on the maintenance of cooperation in the long term (Figures 4 & 5). Once plasmids become common, cheat plasmids that do not code for cooperation are able to outcompete cooperative plasmids, analogous to selection at the level of the chromosome16,30. Our results suggest that horizontal gene transfer on plasmids has not consistently favoured cooperation across bacterial species – but it is still possible that horizontal gene transfer could have an influence in certain scenarios or species. In contrast, we found that genes coding for extracellular proteins involved in pathogenicity and virulence are preferentially located on plasmids in pathogens with a broad host-range (Figure 6). These pathogenic virulence genes were not preferentially located on plasmids that transfer at a higher rate, suggesting that the benefit of being located on a plasmid is something other than horizontal gene transfer, such as the ability to vary copy number.

Methods

Genome Collection

We retrieved 1632 complete genomes comprising 51 bacterial species from GenBank RefSeq (https://www.ncbi.nlm.nih.gov) between February-November 2019. We used species on panX (http://pangenome.tuebingen.mpg.de)74 as a list of potential species for our dataset, since these comprise the most sequenced bacterial species. To allow comparison of chromosome and plasmid genes within the same genome, we only retrieved genomes that contained at least one plasmid sequence. We included species with 10 or more RefSeq genomes with one or more plasmids available in our analysis. We retrieved up to 100 genomes for each species; this was either all complete genomes available for the species, or a random sample where more than 100 were available. Where two or more genomes had the same strain name, we randomly retrieved one genome to reduce the risk of pseudoreplication.

Prediction of Subcellular Location of Proteins

We used PSORTb v.339 to predict the subcellular location of every protein encoded by each genome in our dataset. We used a Docker image of PSORTb developed by the Brinkman Lab, available at: https://github.com/brinkmanlab/psortb_commandline_docker. We chose PSORTb because it is widely regarded as one of the best performing programs of its kind75. It has also been used in previous analyses to identify ‘ cooperative’ genes and/or extracellular proteins in bacteria15,20. The program has a number of modules which are trained to recognise particular features of proteins. Results from these modules are combined to give a Final Prediction for each protein. We consulted the literature to confirm the Gram stain of each of our species. For Gram-positive species, PSORTb assigns proteins to one of four locations within the cell: cytoplasmic, cytoplasmic membrane, extracellular or cell wall (Extended Data Figure 1). The locations for Gram-negative species are the same, except that cell wall is replaced with outer membrane and periplasmic, meaning there are five possible locations for proteins of Gram-negative species (Extended Data Figure 1). We used these predicted locations throughout all subsequent analyses in this work. PSORTb could not reliably assign a subcellular location to 27% of proteins we analysed, giving a final prediction of ‘unknown’ (Table S1). Unless explicitly stated, we did not include these unknown proteins in our analyses.

Predicting Plasmid Mobility

We also predicted the mobility of every plasmid in our dataset using the MOB-typer tool of the program MOBsuite46. This searches for features of plasmid sequences including the origin of transfer (oriT), relaxase and mating-pair formation to give each plasmid one of three mobility predictions: (i) conjugative, where plasmids encode all machinery required to transfer via conjugation; (ii) mobilizable, where plasmids do not encode all machinery, but encode oriT and/or relaxase, allowing them to ‘hijack’ another plasmid’s conjugation machinery and mobilize; (iii) non-mobilizable, where plasmids do not encode the genes necessary to be mobilized by themselves or other plasmids, and so cannot transfer via conjugation. 628 of the 4150 plasmids in our dataset were flagged as ‘unverified’ against the MOBsuite dataset, meaning their mobility prediction was unreliable and they were not included. This left 3522 plasmids for subsequent analysis.

Effect of Mobility on Plasmid Extracellular Protein Content

We next examined how plasmid mobility correlates with each plasmid’s extracellular protein proportion. As part of its mobility prediction, MOBsuite46 identifies sequences within each plasmid involved with conjugation. To control for the possibility that conjugative plasmids, by definition of being conjugative, must carry genes controlling this process, we subtracted the total number of these sequences from the total number of proteins when calculating the extracellular proportion of each plasmid. This is a highly conservative control, since it assumes none of the proteins predicted as extracellular are involved in conjugation. We did all analyses on these data with and without removing these mating-pair accessions to ensure any results were not affected by factors unrelated to plasmids’ extracellular protein content.

Additionally, we used the plasmid mobility predictions to ask whether differences in the mobility of species’ plasmids correlated with whether genes encoding extracellular proteins are overrepresented on plasmids compared to chromosomes. We calculated the proportion of plasmids in each genome capable of transferring via conjugation (conjugative and mobilizable plasmids), and averaged across all genomes to give a general measure of the mobility of each species’ plasmids.

Measures of Bacterial Lifestyle and Environmental Variability

We classified a species as pathogenic if it was described in the literature as an obligate or facultative pathogen. Given some bacterial species only rarely act as pathogens, such as opportunistic pathogens, we only included species where we could be sure pathogenicity was a key aspect of their lifestyle and a regular selection pressure acting on their genome content. For this reason, we decided not to include species described as opportunistic pathogens in the literature and those which frequently live as commensals in their hosts. We classified non-pathogens as species which are strictly environmental (never live in hosts) or strictly mutualists and/or commensals (never cause pathogenicity in their hosts). There were 26 species we could not definitively assign to either of these categories. These were not included in our main analyses, although we carried out additional analyses to ensure that removing these species did not bias our results (Extended Data Figure 5).

To estimate the host-range of pathogens, we used information from the literature to determine the maximum taxonomic level of hosts each species is able to invade. We defined narrow host-range species as those which can invade either only one host species, or host species within the same genus or family. In contrast, we defined broad-host range pathogens as those capable of invading host species within the same order, class or phylum. For example, Xanthomonas citri acts as a plant pathogen within the genus Citrus76, while Pseudomonas syringae acts as plant pathogen across multiple orders of flowering plants77. For more details and references to the literature used for this classification, please see Table S3.

We completed additional analyses for other two measures and proxies of environmental variability, the details and results of which can be found in Supp. Info. 1. In brief, we used previously published data which classified the habitat diversity of species using 16S RNA environmental datasets across five broad habitats: water, wastewater, sediment, soil and host65,66. We also supplemented this with information from the literature for species not included in the published data. We used this to ask whether species which lived in multiple habitats had genes encoding extracellular proteins more overrepresented on their plasmids.

We also looked at bacterial pangenomes as a proxy for environmental variability, since it has been noted that species with a high % of accessory genes, defined as genes found in only a subset of genomes within a species, are generally those with more variable environments. All pangenome data was collected from panX74 (http://pangenome.tuebingen.mpg.de), since this calculates the pangenome using the same method across all of our species.

Pathogenicity categorisation of extracellular proteins

We used MP358 to examine the pathogenicity of extracellular protein-coding genes in broad host-range and narrow host-range pathogens. MP3 compares protein sequences to a curated dataset of proteins known to be involved in various aspects of pathogenicity: adhesion, invasion, secretion and resistance58. MP3 uses two modules to produce a ‘Hybrid’ prediction for each protein: either ‘Pathogenic’ or ‘Non-Pathogenic’. We used MP3 with default parameters to gain this prediction for every extracellular protein in all genomes of broad and narrow host-range species. MP3 was unable to give a prediction for approximately 9% of extracellular proteins, and so these were not included in this analysis.

For each genome in broad and narrow host-range pathogens, we summed the MP3 predictions to give the total number of ‘Pathogenic’ and ‘Non-Pathogenic’ extracellular proteins on the chromosome and on the plasmid(s). We then calculated the proportions of plasmid and chromosome genes which code for ‘Pathogenic’ and ‘Non-Pathogenic’ extracellular proteins.

Statistical analyses

MCMCglmm

Many commonly used statistical methods in biology require data points to be independent from one another. However, due to shared ancestry, species cannot be considered as independent data points24. Recently developed statistical methods now allow for phylogenetic relationships to be controlled for within mixed effects models. For all statistical analyses we used the MCMCglmm (Markov Chain Monte Carlo generalised linear mixed effects model) package in R with phylogeny a random effect41,78. This means the phylogeny is implemented in the model as a covariance matrix of the relationships between species, which is controlled for when considering whether patterns exist across species41,78. We also included sample size as a random effect when analysing at the genome level to control for differences in the number of genomes per species. Specific details of each model can be found in Table S2. We extracted from each model the posterior mean, 95% Credible Intervals (functionally similar to 95% Confidence Intervals), and the pMCMC value (generally interpreted in a similar way to a ‘p-value’). We also calculated R2 values for models of particular interest using methods described in79,80. A detailed description of MCMCglmm can be found elsewhere41,78.

The response variable in all of our analyses is either a proportion or a measure calculated from proportions. Proportion data is bound between 0 and 1 and has a non-normal distribution. To control for this, all proportion data in our analyses has been arcsine square root transformed to improve normality.

Phylogeny

To control for species relationships, we generated a phylogeny including all 51 species in our dataset (Figure S1). We used a recently published maximum likelihood tree using 16S ribosomal protein data as the basis for our phylogeny64. This tree of life typically had only one representative species per genus. We used the R package ‘ape’ to extract all branches matching species in our dataset81. In cases where the genus representative was different to the species in our dataset, we swapped the tip name with our species, since all members of the same genus are equally related to members of a sister genus. In cases where we had multiple species within a single genus in our dataset, we used the R package ‘phylotools’ to add these species as additional branches into their genus82. We used published phylogenies from the literature to add any within-genus clustering of species’ branches. We used this phylogeny in nexus format for all our MCMCglmm analyses (Fig S1, Table S2). Methods are also available to control for uncertainty in phylogenetic reconstruction83,84, although we have not done this here.

Extended Data

Extended Data Fig. 1. Protein subcellular localisations.

Extended Data Fig. 1

Visualisation of all possible subcellular locations predicted by PSORTb. The left panel shows a crosssection of a typical Gram-negative bacterium and the right panel shows the equivalent for a Gram-positive bacterium. Both kinds of bacteria have an inner membrane, known as the cytoplasmic membrane. The main difference is that Gram-positive bacteria are surrounded by a thick layer of a molecule called peptidoglycan, while Gram-negative bacteria have a much thinner layer of peptidoglycan, and have an additional membrane. Created with BioRender.com.

Extended Data Fig. 2. Substantial variation within and between species in the genomic location of extracellular proteins.

Extended Data Fig. 2

The x-axis is the % of genomes in each species where the proportion of plasmid proteins predicted as extracellular is greater than the proportion of chromosome proteins predicted as extracellular. Crucially, this considers only whether the plasmid proportion is greater than the chromosome proportion for each genome, rather than also considering the magnitude of the difference (Figure 2). Error bars are the 95% Confidence Intervals from a binomial test on each species, comparing the number of genomes which have plasmid proportion > chromosome proportion to a null prediction of 50% of genomes. Species in blue have >50% of genomes where plasmid > chromosome extracellular proportion, meaning extracellular proteins are significantly overrepresented on plasmids. Species in red have <50% of genomes where plasmid > chromosome extracellular proportion, meaning extracellular proteins are significantly overrepresented on chromosomes. Species in grey have a 95% CI which overlaps 50%, so extracellular proteins are not significantly overrepresented on either plasmids or chromosomes in these species.

Extended Data Fig. 3. Difference in plasmid and chromosome proportion for all protein classes predicted by PSORTb.

Extended Data Fig. 3

The x-axis is the difference in plasmid and chromosome extracellular proportions, as in Figure 2. The y-axis is all possible subcellular locations predicted by PSORTb. These protein ‘classes’ are ordered along the y-axis by location within the cell, from intracellular to increasingly extracellular. Each dot is the posterior mean and 95% Credible Intervals from a MCMCglmm42 on the difference in plasmid and chromosome proportion across all species, accounting for phylogeny and sample size. The only proteins significantly overrepresented in either direction are unknown proteins, which make up a higher proportion of plasmid proteins in all species we analysed.

Extended Data Fig. 4. No effect of plasmid mobility on the difference in plasmid and chromosome proportion of genes coding for extracellular proteins.

Extended Data Fig. 4

The x-axis is the % of a species’ plasmids which are conjugative or mobilizable. The y-axis shows the difference in the plasmid and chromosome proportions of genes coding for extracellular proteins, as in Figure 2. Each dot is the mean for all genomes in a species. Species in blue are those with genes coding for extracellular proteins overrepresented on plasmids, while species in red have genes coding for extracellular proteins overrepresented on chromosomes.

Extended Data Fig. 5. No difference in where extracellular proteins are coded for in pathogens compared to non-pathogens.

Extended Data Fig. 5

The y-axis shows the difference in the plasmid and chromosome proportion of genes coding for extracellular proteins. Each dot is the mean for all genomes in a species. Species in blue are those with genes coding for extracellular proteins overrepresented on plasmids, while species in red have genes coding for extracellular proteins overrepresented on chromosomes. Species were categorised as pathogens or non-pathogens; those we could not classify as either are shown in the ‘Opportunistic + others” category. The black bars indicate the mean for all species in each category.

Extended Data Fig. 6. Additional measures of environmental variability.

Extended Data Fig. 6

We used two additional methods to estimate the environmental variability encountered by these species. (a) The x-axis shows published data on the number of five broad environments each species was recorded in, which we supplemented with information from the literature to include all species. (b) The x-axis shows the proportion of each species’ genes which are ‘core’ genes, meaning they are found in all members of the species. The y-axis in both graphs shows the difference in the proportion of genes on plasmids and chromosomes coding for extracellular proteins. Each dot is the mean for all genomes in a species. Species in blue are those with extracellular proteins overrepresented on plasmids, while species in red are those with extracellular proteins overrepresented on chromosomes. For both these measures, we found no significant correlation with the genomic location of genes coding for extracellular proteins across species.

Supplementary Material

Peer Review File
Supplementary Information

Acknowledgements

We thank: Craig MacLean, Kevin Foster, Laurence Belcher, Chunhui Hao, and especially Eduardo Rocha for their helpful comments; James Robertson for providing plasmid mobility data from the MOBsuite database; the BBSRC (BB/M011224/1: A.E.D.), ERC (SESE: J.L.T., A.S.G., and M.G.; 834164: T.W.S and S.A.W.), and NSERC-CRSNG of Canada (G.W.) for funding. We also thank Alex Washburne and three anonymous reviewers for comments which greatly improved the manuscript. Conceptual figures were created with BioRender.com.

Footnotes

Author Contributions

A.E.D., J.L.T., A.S.G., S.A.W. and M.G. conceived the genomic analyses and interpreted results. A.E.D. and J.L.T. collected and analysed the genomic data, and A.E.D. produced the corresponding statistical analyses and figures. T.W.S, G.W. and S.A.W. conceived the theoretical modelling and interpreted results. T.W.S. completed the formal theoretical modelling. A.E.D., J.L.T, T.W.S., S.A.W., and M.G. wrote and/or edited the manuscript. A.E.D. wrote and put together S1, S2 and S3, and T.W.S. wrote and put together S4. All authors commented on and approved the manuscript for submission.

Competing Interests

The authors declare no competing interests.

Data Availability Statement

The dataset of genomes analysed during this study, including PSORTb results and plasmid mobility predictions of MOBsuite, will be made available in the public repository Dryad when published at the following https://doi.org/10.5061/dryad.gxd2547n4

Code Availability Statement

Code used to solve equations in the theoretical modelling section of the paper can be found at: https://github.com/ThomasWilliamScott/Plasmid_cooperation.git

References

  • 1.Foster KR. In: Social Behaviour. Szekely T, Moore AJ, Komdeur J, editors. Cambridge University Press; 2010. Social behaviour in microorganisms; pp. 331–356. [DOI] [Google Scholar]
  • 2.McNally L, Viana M, Brown SP. Cooperative secretions facilitate host range expansion in bacteria. Nat Commun. 2014;5 doi: 10.1038/ncomms5594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.West SA, Griffin AS, Gardner A, Diggle SP. Social evolution theory for microorganisms. Nat Rev Microbiol. 2006;4:597–607. doi: 10.1038/nrmicro1461. [DOI] [PubMed] [Google Scholar]
  • 4.Simonet C, McNally L. Kin selection explains the evolution of cooperation in the gut microbiota. Proc Natl Acad Sci. 2021;118 doi: 10.1073/pnas.2016046118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Griffin AS, West SA, Buckling A. Cooperation and competition in pathogenic bacteria. Nature. 2004;430:1024–1027. doi: 10.1038/nature02744. [DOI] [PubMed] [Google Scholar]
  • 6.Hale TL. Genetic basis of virulence in Shigella species. Microbiol Rev. 1991;55:206–224. doi: 10.1128/mr.55.2.206-224.1991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Dinges MM, Orwin PM, Schlievert PM. Exotoxins of Staphylococcus aureus. Clin Microbiol Rev. 2000;13:16–34. doi: 10.1128/cmr.13.1.16-34.2000. table of contents. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Diggle SP, Griffin AS, Campbell GS, West SA. Cooperation and conflict in quorum-sensing bacterial populations. Nature. 2007;450:411–414. doi: 10.1038/nature06279. [DOI] [PubMed] [Google Scholar]
  • 9.Jones S, et al. The lux autoinducer regulates the production of exoenzyme virulence determinants in Erwinia carotovora and Pseudomonas aeruginosa. EMBO J. 1993;12:2477–2482. doi: 10.1002/j.1460-2075.1993.tb05902.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Sandoz KM, Mitzimberg SM, Schuster M. Social cheating in Pseudomonas aeruginosa quorum sensing. Proc Natl Acad Sci. 2007;104:15876–15881. doi: 10.1073/pnas.0705653104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ghoul M, Griffin AS, West SA. Toward an evolutionary definition of cheating. Evolution. 2014;68:318–331. doi: 10.1111/evo.12266. [DOI] [PubMed] [Google Scholar]
  • 12.Butaitė E, Baumgartner M, Wyder S, Kümmerli R. Siderophore cheating and cheating resistance shape competition for iron in soil and freshwater Pseudomonas communities. Nat Commun. 2017;8:414. doi: 10.1038/s41467-017-00509-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Thomas C, Nielsen K, Thomas CM, Nielsen KM. Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nat Rev Micro. 2005;3:711–721. doi: 10.1038/nrmicro1234. Nat. Rev. Microbiol. 3, 711-21. [DOI] [PubMed] [Google Scholar]
  • 14.Smith J. The social evolution of bacterial pathogenesis. Proc R Soc Lond B Biol Sci. 2001;268:61–69. doi: 10.1098/rspb.2000.1330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Nogueira T, et al. Horizontal Gene Transfer of the Secretome Drives the Evolution of Bacterial Cooperation and Virulence. Curr Biol. 2009;19:1683–1691. doi: 10.1016/j.cub.2009.08.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Mc Ginty SE, Rankin DJ, Brown SP. Horizontal gene transfer and the evolution of bacterial cooperation: mobile elements and bacterial cooperation. Evolution. 2011;65:21–32. doi: 10.1111/j.1558-5646.2010.01121.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Mc Ginty SÉ, Lehmann L, Brown SP, Rankin DJ. The interplay between relatedness and horizontal gene transfer drives the evolution of plasmid-carried public goods. Proc R Soc B Biol Sci. 2013;280:20130400. doi: 10.1098/rspb.2013.0400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Dimitriu T, et al. Genetic information transfer promotes cooperation in bacteria. Proc Natl Acad Sci. 2014;111:11103–11108. doi: 10.1073/pnas.1406840111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Nogueira T, Touchon M, Rocha EPC. Rapid Evolution of the Sequences and Gene Repertoires of Secreted Proteins in Bacteria. PLoS ONE. 2012;7:e49403. doi: 10.1371/journal.pone.0049403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Garcia-Garcera M, Rocha EPC. Community diversity and habitat structure shape the repertoire of extracellular proteins in bacteria. Nat Commun. 2020;11:758. doi: 10.1038/s41467-020-14572-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kruskal W. Miracles and Statistics: The Casual Assumption of Independence. J Am Stat Assoc. 1988;83:929–940. [Google Scholar]
  • 22.Ives AR, Zhu J. Statistics for correlated data: phylogenies, space, and time. Ecol Appl Publ Ecol Soc Am. 2006;16:20–32. doi: 10.1890/04-0702. [DOI] [PubMed] [Google Scholar]
  • 23.Felsenstein J. Phylogenies and the Comparative Method. Am Nat. 1985;125:1–15. [Google Scholar]
  • 24.Harvey PH, Pagel MD. The Comparative Method in Evolutionary Biology. Oxford University Press; 1991. [Google Scholar]
  • 25.Grafen A. The phylogenetic regression. Philos Trans R Soc Lond B Biol Sci. 1989;326:119–157. doi: 10.1098/rstb.1989.0106. [DOI] [PubMed] [Google Scholar]
  • 26.Hurlbert SH. Pseudoreplication and the Design of Ecological Field Experiments. Ecol Monogr. 1984;54:187–211. [Google Scholar]
  • 27.Ruxton G, Colegrave N. Experimental Design for the Life Sciences. OUP Oxford; 2011. [Google Scholar]
  • 28.Stone GN, Nee S, Felsenstein J. Controlling for non-independence in comparative analysis of patterns across populations within species. Philos Trans R Soc B Biol Sci. 2011;366:1410–1424. doi: 10.1098/rstb.2010.0311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Ives AR, Midford PE, Garland T., Jr Within-Species Variation and Measurement Error in Phylogenetic Comparative Methods. Syst Biol. 2007;56:252–270. doi: 10.1080/10635150701313830. [DOI] [PubMed] [Google Scholar]
  • 30.Bakkeren E, et al. Cooperative virulence can emerge via horizontal gene transfer but is stabilized by transmission. bioRxiv. 2021:2021.02.11.430745. doi: 10.1101/2021.02.11.430745. [DOI] [Google Scholar]
  • 31.Ghoul M, Andersen SB, West SA. Sociomics: Using Omic Approaches to Understand Social Evolution. Trends Genet. 2017;33:408–419. doi: 10.1016/j.tig.2017.03.009. [DOI] [PubMed] [Google Scholar]
  • 32.McInerney JO, McNally A, O’Connell MJ. Why prokaryotes have pangenomes. Nat Microbiol. 2017;2:17040. doi: 10.1038/nmicrobiol.2017.40. [DOI] [PubMed] [Google Scholar]
  • 33.Niehus R, Mitri S, Fletcher AG, Foster KR. Migration and horizontal gene transfer divide microbial genomes into multiple niches. Nat Commun. 2015;6 doi: 10.1038/ncomms9924. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Cordero OX, et al. Ecological Populations of Bacteria Act as Socially Cohesive Units of Antibiotic Production and Resistance. Science. 2012;337:1228–1231. doi: 10.1126/science.1219385. [DOI] [PubMed] [Google Scholar]
  • 35.Rakoff-Nahoum S, Coyne MJ, Comstock LE. An Ecological Network of Polysaccharide Utilization among Human Intestinal Symbionts. Curr Biol. 2014;24:40–49. doi: 10.1016/j.cub.2013.10.077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Nocelli N, Bogino PC, Banchio E, Giordano W. Roles of Extracellular Polysaccharides and Biofilm Formation in Heavy Metal Resistance of Rhizobia. Materials. 2016;9:418. doi: 10.3390/ma9060418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Ciofu O, Beveridge TJ, Kadurugamuwa J, Walther-Rasmussen J, Høiby N. Chromosomal β-lactamase is packaged into membrane vesicles and secreted from Pseudomonas aeruginosa. J Antimicrob Chemother. 2000;45:9–13. doi: 10.1093/jac/45.1.9. [DOI] [PubMed] [Google Scholar]
  • 38.Rodríguez-Beltrán J, DelaFuente J, León-Sampedro R, MacLean RC, San Millán Á. Beyond horizontal gene transfer: the role of plasmids in bacterial evolution. Nat Rev Microbiol. 2021:1–13. doi: 10.1038/s41579-020-00497-1. [DOI] [PubMed] [Google Scholar]
  • 39.Yu NY, et al. PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics. 2010;26:1608–1615. doi: 10.1093/bioinformatics/btq249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Rankin DJ, Rocha EPC, Brown SP. What traits are carried on mobile genetic elements, and why? Heredity. 2011;106:1–10. doi: 10.1038/hdy.2010.24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Hadfield JD. MCMC Methods for Multi-Response Generalized Linear Mixed Models: The MCMCglmm R Package. J Stat Softw. 2010;33:1–22. [Google Scholar]
  • 42.Clutton-Brock TH, Harvey PH. Primate ecology and social organization. J Zool. 1977;183:1–39. [Google Scholar]
  • 43.Jennions MD, Møller AP. A survey of the statistical power of research in behavioral ecology and animal behavior. Behav Ecol. 2003;14:438–445. [Google Scholar]
  • 44.Crawley MJ. Statistics: An Introduction Using R. John Wiley & Sons; 2014. [Google Scholar]
  • 45.Cohen J. Statistical Power Analysis for the Behavioral Sciences. Routledge; 1988. [Google Scholar]
  • 46.Robertson J, Nash JHE. MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies. Microb Genomics. 2018;4 doi: 10.1099/mgen.0.000206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Robertson J, Bessonov K, Schonfeld J, Nash JHE. Universal whole-sequence-based plasmid typing and its utility to prediction of host range and epidemiological surveillance. Microb Genomics. 2020;6 doi: 10.1099/mgen.0.000435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Smillie C, Garcillan-Barcia MP, Francia MV, Rocha EPC, de la Cruz F. Mobility of Plasmids. Microbiol Mol Biol Rev. 2010;74:434–452. doi: 10.1128/MMBR.00020-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Mc Ginty SÉ, Rankin DJ. The evolution of conflict resolution between plasmids and their bacterial hosts. Evolution. 2012;66:1662–1670. doi: 10.1111/j.1558-5646.2011.01549.x. [DOI] [PubMed] [Google Scholar]
  • 50.Hamilton WD. Genetical evolution of social behaviour I & II. J Theor Biol. 1964;7:1–52. doi: 10.1016/0022-5193(64)90038-4. [DOI] [PubMed] [Google Scholar]
  • 51.work(s):, W. D. H. R. The Evolution of Altruistic Behavior Am Nat. 1963;97:354–356. [Google Scholar]
  • 52.Ghigo JM. Natural conjugative plasmids induce bacterial biofilm development. Nature. 2001;412:442–445. doi: 10.1038/35086581. [DOI] [PubMed] [Google Scholar]
  • 53.Di Venanzio G, et al. Multidrug-resistant plasmids repress chromosomally encoded T6SS to enable their dissemination. Proc Natl Acad Sci U S A. 2019;116:1378–1383. doi: 10.1073/pnas.1812557116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Sheppard RJ, Beddis AE, Barraclough TG. The role of hosts, plasmids and environment in determining plasmid transfer rates: A meta-analysis. Plasmid. 2020;108:102489. doi: 10.1016/j.plasmid.2020.102489. [DOI] [PubMed] [Google Scholar]
  • 55.Rodríguez-Beltrán J, et al. Genetic dominance governs the evolution and spread of mobile genetic elements in bacteria. Proc Natl Acad Sci. 2020;117:15755–15762. doi: 10.1073/pnas.2001240117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Cornelis GR, et al. The Virulence Plasmid of Yersinia, an Antihost Genome. Microbiol Mol Biol Rev. 1998;62:1315–1352. doi: 10.1128/mmbr.62.4.1315-1352.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Köstlbacher S, Collingro A, Halter T, Domman D, Horn M. Coevolving Plasmids Drive Gene Flow and Genome Plasticity in Host-Associated Intracellular Bacteria. Curr Biol. 2021;31:346–357.:e3. doi: 10.1016/j.cub.2020.10.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Gupta A, Kapil R, Dhakan DB, Sharma VK. MP3: A Software Tool for the Prediction of Pathogenic Proteins in Genomic and Metagenomic Data. PLOS ONE. 2014;9:e93907. doi: 10.1371/journal.pone.0093907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.San Millan A, Escudero JA, Gifford DR, Mazel D, MacLean RC. Multicopy plasmids potentiate the evolution of antibiotic resistance in bacteria. Nat Ecol Evol. 2017;1:0010. doi: 10.1038/s41559-016-0010. [DOI] [PubMed] [Google Scholar]
  • 60.Carrier T, Jones KL, Keasling JD. mRNA stability and plasmid copy number effects on gene expression from an inducible promoter system. Biotechnol Bioeng. 1998;59:666–672. doi: 10.1002/(sici)1097-0290(19980920)59:6<666::aid-bit2>3.0.co;2-d. [DOI] [PubMed] [Google Scholar]
  • 61.Rodríguez-Beltrán J, et al. Multicopy plasmids allow bacteria to escape from fitness trade-offs during evolutionary innovation. Nat Ecol Evol. 2018;2:873–881. doi: 10.1038/s41559-018-0529-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Dietel A-K, Kaltenpoth M, Kost C. Convergent Evolution in Intracellular Elements: Plasmids as Model Endosymbionts. Trends Microbiol. 2018;26:755–768. doi: 10.1016/j.tim.2018.03.004. [DOI] [PubMed] [Google Scholar]
  • 63.Rocha EPC, Danchin A. Base composition bias might result from competition for metabolic resources. Trends Genet. 2002;18:291–294. doi: 10.1016/S0168-9525(02)02690-2. [DOI] [PubMed] [Google Scholar]
  • 64.Hug LA, et al. A new view of the tree of life. Nat Microbiol. 2016;1:16048. doi: 10.1038/nmicrobiol.2016.48. [DOI] [PubMed] [Google Scholar]
  • 65.Garcia-Garcera M, Touchon M, Brisse S, Rocha EPC. Metagenomic assessment of the interplay between the environment and the genetic diversification of Acinetobacter. Environ Microbiol. 2017;19:5010–5024. doi: 10.1111/1462-2920.13949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Kümmerli R, Schiessl KT, Waldvogel T, McNeill K, Ackermann M. Habitat structure and the evolution of diffusible siderophores in bacteria. Ecol Lett. 2014;17:1536–1544. doi: 10.1111/ele.12371. [DOI] [PubMed] [Google Scholar]
  • 67.Canchaya C, Fournous G, Chibani-Chennoufi S, Dillmann ML, Brüssow H. Phage as agents of lateral gene transfer. Curr Opin Microbiol. 2003;6:417–424. doi: 10.1016/s1369-5274(03)00086-9. [DOI] [PubMed] [Google Scholar]
  • 68.Burrus V, Waldor MK. Shaping bacterial genomes with integrative and conjugative elements. Res Microbiol. 2004;155:376–386. doi: 10.1016/j.resmic.2004.01.012. [DOI] [PubMed] [Google Scholar]
  • 69.O’Brien FG, et al. Origin-of-transfer sequences facilitate mobilisation of non-conjugative antimicrobial-resistance plasmids in Staphylococcus aureus. Nucleic Acids Res. 2015;43:7971–7983. doi: 10.1093/nar/gkv755. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Rodríguez-Rubio L, et al. Extensive antimicrobial resistance mobilization via multicopy plasmid encapsidation mediated by temperate phages. J Antimicrob Chemother. 2020;75:3173–3180. doi: 10.1093/jac/dkaa311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Ramsay JP, Firth N. Diverse mobilization strategies facilitate transfer of non-conjugative mobile genetic elements. Curr Opin Microbiol. 2017;38:1–9. doi: 10.1016/j.mib.2017.03.003. [DOI] [PubMed] [Google Scholar]
  • 72.Jain R, Rivera MC, Lake JA. Horizontal gene transfer among genomes: the complexity hypothesis. Proc Natl Acad Sci U S A. 1999;96:3801–3806. doi: 10.1073/pnas.96.7.3801. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Cohen O, Gophna U, Pupko T. The complexity hypothesis revisited: connectivity rather than function constitutes a barrier to horizontal gene transfer. Mol Biol Evol. 2011;28:1481–1489. doi: 10.1093/molbev/msq333. [DOI] [PubMed] [Google Scholar]
  • 74.Ding W, Baumdicker F, Neher RA. panX: pan-genome analysis and exploration. Nucleic Acids Res. 2018;46:e5. doi: 10.1093/nar/gkx977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Gardy JL, Brinkman FSL. Methods for predicting bacterial protein subcellular localization. Nat Rev Microbiol. 2006;4:741–751. doi: 10.1038/nrmicro1494. [DOI] [PubMed] [Google Scholar]
  • 76.Ference CM, et al. Recent advances in the understanding of Xanthomonas citri ssp. citri pathogenesis and citrus canker disease management. Mol Plant Pathol. 2018;19:1302–1318. doi: 10.1111/mpp.12638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Morris CE, Lamichhane JR, Nikolić I, Stanković S, Moury B. The overlapping continuum of host range among strains in the Pseudomonas syringae complex. Phytopathol Res. 2019;1:4. [Google Scholar]
  • 78.Hadfield JD. MCMCglmm Course Notes. 2019 Available at cran.us.r-project.org/web/packages/MCMCglmm/vignettes/CourseNotes.pdf.
  • 79.Nakagawa S, Schielzeth H. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol Evol. 2013;4:133–142. [Google Scholar]
  • 80.Nakagawa S, Johnson PCD, Schielzeth H. The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. J R Soc Interface. 2017;11 doi: 10.1098/rsif.2017.0213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Paradis E, Schliep K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinforma Oxf Engl. 2019;35:526–528. doi: 10.1093/bioinformatics/bty633. [DOI] [PubMed] [Google Scholar]
  • 82.Revell LJ. phytools: an R package for phylogenetic comparative biology (and other things. Methods Ecol Evol. 2012;3:217–223. [Google Scholar]
  • 83.Washburne AD, et al. Methods for phylogenetic analysis of microbiome data. Nat Microbiol. 2018;3:652–661. doi: 10.1038/s41564-018-0156-0. [DOI] [PubMed] [Google Scholar]
  • 84.Som A. Causes, consequences and solutions of phylogenetic incongruence. Brief Bioinform. 2015;16:536–548. doi: 10.1093/bib/bbu015. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Peer Review File
Supplementary Information

Data Availability Statement

The dataset of genomes analysed during this study, including PSORTb results and plasmid mobility predictions of MOBsuite, will be made available in the public repository Dryad when published at the following https://doi.org/10.5061/dryad.gxd2547n4

Code used to solve equations in the theoretical modelling section of the paper can be found at: https://github.com/ThomasWilliamScott/Plasmid_cooperation.git

RESOURCES