Abstract
Integrons are genetic elements that facilitate gene acquisition. They have been extensively studied in clinical bacteria, but their evolutionary role in phytopathogens remains underexplored. Here, we analysed complete genomes of Xanthomonas species to investigate the origin, distribution and functional dynamics of integrons in this genus. We found that 93% of genomes harboured integrons. The integron-integrase gene intI was predominantly located downstream of ilvD, indicating an ancestral acquisition of integrons, predating diversification within the genus. Phylogenetic analyses support vertical inheritance of intI, with the exception of rare horizontal gene transfer events, notably in Xanthomonas arboricola. Despite their widespread presence, full-length intI genes and active integron platforms are only retained in some species, especially Xanthomonas campestris, which shows high integron gene cassette variability and functional integron activity. In contrast, species such as Xanthomonas cissicola and Xanthomonas phaseoli exhibit widespread intI inactivation, likely occurring early in their divergence, leading to more stable cassette arrays and conserved integron-associated phenotypes. The number and diversity of genes within cassette arrays varied significantly by species and, to a lesser extent, by the ecological context of plant host cultivation. While most cassettes encoded proteins without a known function, those with annotated roles were associated with stress response mechanism, competitive exclusion and plant-associated functions. Together, our findings demonstrate that integrons in Xanthomonas likely originated from a single ancient acquisition event, preceding genus-wide speciation, and have co-evolved with Xanthomonas pathovars as they adapted to distinct plant hosts.
Keywords: genome evolution, horizontal gene transfer, mobile genetic elements, niche adaptation, plant pathogen
Impact Statement.
This study provides the first comprehensive genus-wide analysis of integron evolution and dynamics in Xanthomonas, a globally distributed and versatile plant pathogen. Integrons are genetic platforms that allow horizontal transfer of genes. They are well characterized in human pathogens where they mediate transfer of antibiotic resistance genes, but little is known about their role in plant-associated bacteria. We showed that in Xanthomonas, the integron platform was ancestrally acquired, yet integrons have undergone repeated lineage-specific inactivation events. Despite widespread erosion of integron activity, some species such as Xanthomonas campestris maintain robust and functionally diverse integrons that continue to shape genome plasticity. Notably, high cassette diversity, combined with the presence of rare and often uncharacterized genes within these arrays (some potentially involved in environmental sensing or host interaction), suggests that integrons may serve as reservoirs of adaptive potential. Our findings reshape current views of integron function beyond antibiotic resistance and highlight their long-term role in microbial evolution, niche adaptation and genome innovation in plant-associated bacteria.
Data Summary
The authors confirm that all supporting data, code and protocols have been provided within the article or through supplementary data files. Accession numbers of the genomes analysed in this manuscript are listed in Table S1 (available in the online Supplementary Material).
Introduction
Integrons are genetic elements that facilitate horizontal gene transfer (HGT) in Bacteria [1] and Archaea [2]. They are ancient structures that have been involved in the evolution of bacterial genomes for hundreds of millions of years [3]. Functional integrons are composed of an integron-integrase gene (intI), an integron recombination site (attI) and a promoter that drives the expression of the integrase (Pint) and generally also have a second promoter (Pc), oriented for the expression of gene cassette array proteins [4,6].
Gene cassettes are mobile, non-replicating elements, which generally consist of an ORF and an attC site. They exist as circular molecules when excised from a cassette array. Although cassettes generally consist of a single ORF, cassettes that have two ORFs or ORF-less cassettes with promoter activity have been observed [7,9]. The tyrosine recombinase IntI can capture circular cassettes and integrate them into the genome by mediating the recombination between attC and attI sites, forming arrays of 1 to >200 gene cassettes. As well as mediating the insertion of cassettes, IntI can also excise cassettes from an array. Cassette arrays lacking an integrase are called CALIN (clusters of attC sites lacking integron-integrase) [10] and can be mobilized in trans by a functional IntI.
Technically, integrons themselves are not mobile, whereas their gene cassette cargo is. Integron activity can facilitate horizontal gene cassette transfer between prokaryotes, contributing to the rapid evolution and genetic diversity of Bacteria and Archaea [4]. Because DNA damage and nutrient starvation induce the activity of the integron integrase via bacterial SOS and stringent responses [11,13], acquisition of novel genetic functions is preferentially triggered during periods of environmental challenge. This provides a mechanism for adaptive innovation when survival pressures are highest.
Integrons have been extensively studied because of their role in the accumulation and dissemination of antibiotic resistance genes in clinical isolates [14,15]. However, integrons are also common in other environments such as marine environments [16], soil [17] and the plant rhizoplane [18,19]. In these environments, the majority of gene cassettes remain functionally uncharacterized, but predicted functions encode extensive functional diversity including traits involved in microbe–host interactions [4,20, 21].
Xanthomonas is a genus of Gram-negative bacteria in the class Gammaproteobacteria that contains pathogenic strains able to cause disease in more than 400 different plants. In common with other genera of plant pathogens, Xanthomonas also contains strains associated with plants that do not cause disease symptoms [22,24]. Xanthomonas is a large genus that encompasses more than 35 species, which are subdivided into pathovars [25].
Integrons have been described in some Xanthomonas isolates [26,28]. Using PCR with site-specific primers annealing to the proximal part of intI and to the attI region, Gillings and colleagues screened 32 Xanthomonas strains representing 12 pathovars and found that all the strains analysed had an integron. In all cases, intI was integrated downstream of the acid dehydratase gene, ilvD; however, the majority of integrase genes were predicted to be inactivated by frameshifts, stop codons or large deletions. Groups of strains with the same deletions or stop codons/frameshifts in intI usually contained identical arrays of genes. There was no evidence of integrons generating diversity within pathovars but strong evidence for diversity between pathovars. With some minor exceptions, individual pathovars had distinct proximal gene cassette arrays, and every cassette identified was found only in one pathovar.
Complete genome sequences for hundreds of Xanthomonas isolates are now available, allowing for a systematic and detailed investigation of integrons in these important plant-associated bacteria. Here, we have examined the presence of integrons in all publicly available complete genomes of Xanthomonas, significantly expanding our knowledge of these elements and showing how integrons have been a dynamic component during Xanthomonas genome evolution.
Methods
Xanthomonas classification and phylogeny
Complete genomes of Xanthomonas were downloaded from the National Center for Biotechnology Information (NCBI) database (NCBI txid: 338) (708 genomes, last accessed September 2024) (Table S1). Taxonomic classifications of the genomes were based on the Genome Taxonomy Database (GTDB) [29] release 2.2.0 using GTDB-Tk v2.4.0 [30] (Table S1). The command classify_wf was used with default settings. GTDB-Tk first aligns 120 single-copy phylogenetic marker genes and then classifies each genome based on its placement into domain-specific reference trees (built from 47,899 prokaryote genomes), its relative evolutionary divergence and average nt identity (ANI) to reference genomes in the GTDB. Two genomes (GCA_041228075.1 and GCA_002240395.1) did not belong to the Xanthomonas genus (Luteibacter and Luteimonas_D, respectively) and were removed from the dataset. The GTDB-Tk classification system does not strictly adhere to the traditional nomenclature of Xanthomonas strains, sometimes reassigning species. We use GTDB-Tk classification in order to use a standardized microbial taxonomy method based on genome phylogeny and not to attempt to re-classify or rename isolates.
Phylogeny of the Xanthomonas genus was inferred using the alignment of the 120 marker genes produced with the GTDB-Tk command classify_wf using GTDB-Tk command infer (default setting with --gamma) that uses FastTree 2.1 [31] to build an approximately-maximum-likelihood phylogenetic tree. The tree was rooted with GCA_041228075.1 (Luteibacter), and then, the tip was removed with the R package ape 5.8-1 [32]. Single-species phylogeny was built using Realphy [33], which maps genomes to a series of reference genomes via bowtie2 2.0.0-beta7 [34]. From these, multiple sequence alignments are constructed and used to infer phylogenetic trees via PhyML 3.1 [34]. Trees are visualized using the ggtree 3.16 [35] and ggtreeExtra 2.3 [36] R packages.
Analysis of integrons
Integrons were identified with IntegronFinder 2.0.5 [37] (parameters: --local-max --gbk --promoter-attI) (Table S1). IntegronFinder 2.0.5 was run with default --calin-threshold (default: 2), and putative CALINs were identified if they carried at least two attC sites. To determine the genomic location of the integrons, identified integron sequences were extracted together with 5 kb of flanking regions upstream and downstream (fasta files deposited at https://github.com/EC-MQ/Xanthomonas_integrons) and annotated with Bakta 1.11.3 [38]. Taxonomic origins of integron gene cassette recombination sites (attCs) were predicted using the script attC-taxa.sh [39,40]. The attC-taxa pipeline (https://github.com/timghaly/attC-taxa) can detect attC sites that are conserved among 1 of the 11 bacterial taxa, comprising Alteromonadales, Methylococcales, Oceanospirillales, Pseudomonadales, Vibrionales, Xanthomonadales, Acidobacteria, Cyanobacteria, Deltaproteobacteria, Planctomycetes and Spirochaetes.
IntI sequences identified by IntegronFinder were aligned using PRANK [41] run with default settings, the alignment was stripped of gaps with the goalign ‘clean sites’ command [42] and the tree was inferred with FastTree 2.1 [31] using the generalized time-reversible model (--gtr).
We used multiple approaches to annotate the cassette-encoded proteins. Proteins were functionally annotated with eggNOG-mapper v2 [43,44], executed in DIAMOND [45] mode. Additionally, Foldseek [46] was used to create a protein structural database with the ProstT5 protein language model [47]. The database was then used to perform structural alignment against the Swiss-Prot database using a Foldseek search. Foldseek convertalis was used to convert the alignment database in a tab-delimited output, and only functional annotations against the Swiss-Prot database with a coverage of 80% and an e-value <0.001 were considered.
To identify genes involved in functions of ecological relevance (trace gas oxidation, carbon cycling, nitrogen cycling, sulphur cycling, phosphorus cycling, iron cycling, plant–microbe interactions and osmotic stress tolerance), we used the EcoFoldDB-annotate pipeline [48].
Plant growth-promoting traits were identified using the Plant Growth-Promoting Traits Prediction (PGPT-Pred) tool [49]. SignalP v6.0 [50] was used for signal peptide predictions as a marker for identifying transmembrane or secreted gene products. DefenseFinder [51] was used to detect known anti-phage systems.
Unique gene orthologues encoded by the cassettes and their distributions were identified with Proteinortho 6.0.22 [52] run with the -singles option.
To detect genes encoding type III secretion effectors (T3SEs), we used a database of 66 previously characterized Xanthomonas T3SEs retrieved from the EuroXanth platform [53,54]. T3SEs in the cassettes were identified with Proteinortho v6.0.22 [52] by querying the curated effector database against the amino acid sequences of the cassette-encoded genes.
Phylogenetic signal in the number of attC sites across genomes was first assessed with Pagel’s λ tests using the R package caper 1.0.3 [55]. To evaluate the influence of cultivation type on attC site number while accounting for shared evolutionary history, phylogenetic generalized least squares regression was conducted using the R package nlme 3.1 [56].
HGT events between species
Alfy 1.0.5 [57] was used to guide the detection of highly similar cassettes present in the integron arrays. Alfy was run with the -M option, selecting only matches with a P value <0.05 within a sliding window of 100 bp. Only arrays sharing the closest homology (recombination), with at least 10% of their array sequence, were considered. The output of this was further filtered, using ANI comparisons to assess the likelihood that shared cassettes represented probable HGT events. FastANI 1.33 [58] was run between the arrays and the whole genomes of the strains. We classified an event as HGT when the ANI between arrays exceeded 96%, the ANI between the genomes harbouring them was below 95% and the array ANI was at least 2% higher than the genome ANI. Cytoscape [59] was then used to visualize the numbers of cassettes transferred between Xanthomonas species.
Results and discussion
The Xanthomonas integron platform is ancestral
Xanthomonas complete genomes classified based on the GTDB as ‘Xanthomonas’ (n=629) and ‘Xanthomonas_A’ (n=78), known also as Xanthomonas groups 2 and 1, respectively [60], were analysed for integron features (Table S1). We retained the genomes of Xanthomonas_A, even though they are classified as a different genus by GTDB-Tk, because from a plant pathology perspective, they are still considered Xanthomonas and encompass Xanthomonas translucens, an important causal agent of diseases in cereal crops and forage grasses [61].
Using IntegronFinder 2.0.5 [37], we identified integrons, In0 (integron-integrases that lack cassettes) or CALINs in 93% of the genomes (657 genomes) (Fig. 1, Table S1). Xanthomonas albilineans (n=6, group 1) and Xanthomonas fragariae (n=6, group 2) were the only species with more than two genome sequences available that did not harbour either an intI or attC sites. However, integrons have been previously detected in some isolates of X. fragariae via PCR amplification [26].
Fig. 1. Distribution of integron platforms in Xanthomonas groups 1 and 2. The maximum likelihood phylogenetic tree is based on a concatenated alignment of 120 top-ranked marker proteins. The tree was rooted at the midpoint. Tips are labelled with species names assigned with GTDB phylogeny. Coloured clades show species with more than genomes. A) Integrons were classified as present if the isolates carried a functional integron, intI only or an attC array only. B) indicates the presence of a complete integron or attC sites in the ilvD locus. C) indicates the presence of an intI lacking cassettes (In0). D) indicates the presence of CALINs (cluster of attC sites) not in the ilvD locus. E) indicates the number of attC sites (number of cassettes) in the genome. Only bootstrap values 0.80 are shown. Scale bar indicates substitutions per site.
Because integrons are widespread in Xanthomonas, our first question was whether the acquisition of the integron module was ancestral. The integron-integrase encoding gene intI was integrated downstream of ilvD in all genomes, with the two exceptions being isolates of Xanthomonas oryzae and Xanthomonas cucurbitae. Of the 187 oryzae isolates examined, X. oryzae pv. oryzicola (26 genomes) harbour a truncated intI (561 bp) that was not integrated downstream of ilvD. However, the integron locus (intI and cassette array) is surrounded by transposases, which could indicate that the integron was moved via transposition or could be a result of assembly errors, which are common in regions flanked by insertion sequences. In all seven X. cucurbitae genomes, intI was not integrated downstream of ilvD. However, downstream of ilvD, there are two genes without recognizable attCs, which are encoded in the other integrons (including a DUF1488 encoding gene which occurs in multiple species).
The different genomic locations of intI could be explained by either within-genome transposition or recombination (intI was initially located downstream of ilvD and then moved), or by an independent acquisition of intI from another source, which was integrated in a different locus. To test this, we constructed the phylogeny of all the Xanthomonas intIs (Fig. S1, Table S2). If intIs located elsewhere from ilvD were acquired independently, they would likely have a different evolutionary history and form separate clades in the intI phylogeny.
The phylogeny of intI clusters this gene within species, generally being independent of the integration locus. X. oryzae intIs cluster together, and the X. cucurbitae intIs form a clade with pv. phaseoli (Fig. S1). The only notable exception is Xanthomonas arboricola. X. arboricola strains can carry one of the two variant intIs. In two strains, intI clusters closest to the Xanthomonas campestris intIs are truncated and likely not functional. Phylogenetically close X. arboricola strains do not harbour an intI at all. The other X. arboricola strains carry intI (either full-length or with an early stop codon) that are more distantly related to all other intIs. These more distantly related intI could have been acquired from a source outside Xanthomonas. We compared the ilvD-intI locus sequence in two representative X. arboricola strains harbouring the two intI variants (GCA_018141705.1 and GCA_905367745.1). While the ilvD genes share 97.1% ANI, the intIs share 68.2% ANI. This discrepancy in ANI values between two adjacent genes suggests a recombination event [62], with abrupt shifts in ANI at gene boundaries a common indicator of a recombination breakpoint.
We used two full-length intI variants (one belonging to X. arboricola and one to X. campestris) as a query in a blastn search on complete genomes in the NCBI excluding Xanthomonas (taxid: 338). The two closest intIs belonged to Lysobacter sp. CECT 30171 (locus tag: LYB30171_00805, 93% coverage and 74% identity against X. campestris intI) and [Pseudomonas] boreopolis strain GO2 (locus tag: M3M27_18655, 92% coverage and 91% identity against X. arboricola intI) (both genera in the Xanthomonadaceae family). Curiously, in isolate GO2, intI was also located downstream of ilvD. We constructed a phylogenetic tree using Xanthomonas intI sequences longer than 900 bp and included the intIs of CECT 30171 and GO2, using Vibrio sp. SCSIO 43136 intI as an outgroup (Fig. 2). The intI variants in X. arboricola were more closely related to the intI in GO2 than the other Xanthomonas intIs. ilvD phylogeny instead clusters together all X. arboricola isolates within the Xanthomonas group 2, and GO2 as basal to both Xanthomonas group 1 and 2 isolates (Fig. S2). This suggests that, in those X. arboricola strains, intI was acquired via HGT from another member of the Xanthomonadaceae family.
Fig. 2. Phylogeny of intI. Sequences of intI longer than 900 bp were aligned with PRANK, stripped of gaps and used to infer the tree with FastTree. intI from Vibrio sp. SCSIO 43136 (locus tag: J4N39_08275) was used as an outgroup. intI from group 1 Xanthomonas are indicated on the side of the plot; arrows indicate intIs of Lysobacter sp. CECT 30171 and [Pseudomonas] boreopolis strain GO2. The scale bar indicates substitutions per site. Only bootstrap values 0.80 are shown.
The phylogeny of ilvD and intI is incongruent, with the ilvDs of Xanthomonas group 1 forming a separate group in comparison to group 2 (Fig. S2), while the group 1 intIs are nested within the most common Xanthomonas intIs (Fig. 2). Genetic exchange between a restricted number of strains of group 2 and the entire group 1 clade has been reported [60]; therefore, it is plausible that horizontal transfer of an intI gene from a group 2 Xanthomonas species occurred during the early evolutionary history of group 1 Xanthomonas.
Together, this suggests that the acquisition of the integron platform was ancestral to the Xanthomonas genus, and then, intI diversified with the species. The exceptions are some X. arboricola strains that could have recruited intI from another source, while group 1 Xanthomonas could have acquired intI from group 2 Xanthomonas in one single-gene flow event.
Additionally, ~50% of genomes harbour CALINs, here defined as clusters of at least two attC sites, in loci other than the canonical ilvD region and in the absence of a proximal intI gene. In genomes where intI is retained, this gene is consistently integrated downstream of ilvD, while these CALINs show spatial separation, being located in other regions of the genome. Among the species represented by more than ten genomes, CALINs not adjacent to ilvD were identified at seven distinct chromosomal loci. In X. arboricola, X. campestris, X. cissicola, X. euvesicatoria and X. hortorum, CALINs consistently occurred adjacent to lamG and/or secF. In X. cissicola, additional CALINs were located near recD and a gene encoding an NAD(P)/FAD-dependent oxidoreductase. In X. oryzae, two strains possessed CALINs adjacent to an outer membrane protein gene, while in X. translucens, CALINs were observed at two distinct genomic positions. One X. campestris strain (GCA_028749605.1) harboured a CALIN on a conjugative plasmid. The relatively conserved positioning of CALINs across species supports the hypothesis of ancestral integron activity. This contrasts with experimental data from Escherichia coli, where cassette libraries integrated via attG sites (consensus recombination sequences that are not attC or attI) exhibited a broad distribution across numerous genomic loci, indicating that de novo integration is typically less site-specific [63].
Acquisition of the integron platform is ancestral but its activity is progressively lost
To investigate integron integrase activity and the complement of integron gene cassettes in Xanthomonas genomes, we focused on species which had more than ten genomes in our dataset: X. campestris (n=111) (Fig. 3), X. cissicola (n=124) (Fig. 4), X. arboricola (n=41) (Fig. S3), X. translucens (n=46) (Fig. S4), X. euvesicatoria (n=46) (Fig. S5), X. oryzae (n=161) (Fig. S6), X. phaseoli (n=23) (Fig. S7) and X. hortorum (n=12) (Fig. S8). A detailed description of the integrons in these eight species is reported in the Supplementary Results Section.
Fig. 3. Integrons in X. campestris. The phylogeny of X. campestris was built using Realphy, using GCA_000007145.1 as a reference genome. The tree was rooted with GCA_000972745 (X. arboricola); then, the tip was removed from the tree. Tip labels show pathovars, which were assigned from the literature search (Table S3). The scale bar indicates substitutions per site. A) indicates the presence of a complete integron or attC sites in the ilvD locus. B) indicates the presence of an intI lacking cassettes (In0). C) indicates the presence of CALINs (cluster of attC sites) not in the ilvD locus. D) indicates whether IntI is predicted to be functional (full length) (Table S3). E) indicates the number of attC sites (number of cassettes) in the genome. Panel F) represents the distribution of orthologous genes among all cassettes carried by the corresponding isolate inferred with Proteinortho. Orthologous genes are listed in decreasing order based on the number of strains within the species that carry them.
Fig. 4. Integrons in X. cissicola. The phylogeny of X. cissicola was built using Realphy, using GCA_000007165.1 as a reference genome. The tree was rooted with GCA_000009165.1 (X. euvesicatoria); then, the tip was removed from the tree. Tip labels show pathovars, which were assigned from the literature search (Table S3). The scale bar indicates substitutions per site. A) Indicates the presence of a complete integron or attC sites in the ilvD locus. B) indicates the presence of an intI lacking cassettes (In0). C) indicates the presence of CALINs (cluster of attC sites) not in the ilvD locus. D) indicates the number of attC sites (number of cassettes) in the genome (Table S3). Panel E) represents the distribution of orthologous genes among all the cassettes carried by the corresponding isolate inferred with Proteinortho. Orthologous genes are given in decreasing order based on the number of strains within the species that carry them.
Although integron-associated sequences are common in Xanthomonas groups 1 and 2, most strains appear to have lost integron integrase activity (intI is either truncated or absent), and the distribution of functional integrons appears to be species-specific. However, among the species analysed, X. campestris stands out for maintaining a largely functional integron platform (Fig. 3). A high proportion (85.6%) of its genomes carries a full-length intI gene, and integrons are consistently flanked by extensive cassette arrays (ranging from 3 to 29 attCs, averaging 14.3 per genome). Cassette composition is highly variable even among strains of the same pathovar, suggesting that, in X. campestris, integron activity drives rapid diversification within genetically related groups. This robust activity is also evident in the gene content: 216 orthologous cassette-encoded genes were identified, 41.1% of which are singletons (observed in only 1 instance in this dataset). Despite this extensive variability, a gene encoding a DUF1488 domain-containing protein is present in the cassettes of 81.5% of genomes, while symE (a toxin-encoding gene) is found in 56.8% of the cassette arrays. This level of integron retention suggests that active integrons continue to play a key role in the adaptive evolution of X. campestris.
X. arboricola exhibited a modest preservation of the intI gene with 7% of strains retaining a full-length intI. In the remaining genomes, intI is either deleted or contains an early stop codon. The X. arboricola phylogeny divides strains into two clades, clade A which includes pathovars that cause diseases in Prunus, Juglans and Corylus spp. (pvs. pruni, juglandis and corylina, respectively) and clade B which comprises many isolates lacking metadata (Fig. S3). Among clade A isolates, none possess a full-length intI. In contrast, clade B strains that carry intI, either full-length or truncated (accounting for 58.8% of clade B isolates), harbour a variant likely acquired via HGT, suggesting the acquisition event dated prior to the diversification of this clade. Strains of clade B also carried more cassettes and a broader array of genes compared to clade A.
X. translucens (Fig. S4) and X. euvesicatoria (Fig. S5) emphasize the dynamic nature of integrons in Xanthomonas. Both species exhibit high frequencies of full-length intI (22.7% and 45.6%, respectively), and variability in cassette content is high (44.6% and 50%, respectively).
X. oryzae includes pathogens of rice (pv. oryzae and pv. oryzicola) and pathogens of a pervasive weed species that grows along rivers and canals surrounding rice paddies (pv. leersiae). However, only X11-5A, a weakly pathogenic strain [64] distantly related to the above-mentioned pathovars, carries a full-length intI. Divergent cassette compositions were observed among the pathovars (Fig. S6). This supports the hypothesis that integron activity in X. oryzae was ancestrally active, followed by progressive inactivation concomitant with pathovar diversification.
None of the X. cissicola (Fig. 4) and X. phaseoli (Fig. S7) isolates retain a full-length, functional intI. However, 28.7% and 38% of the genes carried by the cassettes in these species, respectively, appear in one isolate only (singletons). X. cissicola comprises isolates commonly known as X. citri, an important pathogen able to infect many plants including Citrus (pv. citri). X. citri pv. citri includes three recognized pathotypes: A, A* and AwA. All pv. citri genomes analysed here, which include both A and AwA pathotypes, harbour a CALIN element adjacent to secD and lack the integrase gene intI. All the CALINs in the pv. citri contain the same two gene cassettes, which are not present in any other genome analysed in this study, suggesting that the integron platform was inactivated prior to pathotype diversification (Fig. 4). A recent genomic study of 95 pv. citri strains estimated that the diversification of these pathotypes occurred 1,730 to 5,663 years ago [65]. Given that citrus domestication is thought to have occurred at least 2,000 years ago [66], the inactivation event likely predates the domestication of the host genus. One of the two cassettes encodes a protein homologous to members of the late embryogenesis abundant (LEA) protein family, which is known to confer protection against water deficit in bacteria [67]. This indicates that the cassette’s function could be oriented toward general environmental stress rather than mediating specific interactions with the plant host.
Gene cassette abundance is species-specific
We investigated whether variation in integron cassette array size, quantified by the number of attC sites per genome (Table S1), is primarily shaped by phylogeny or by the ecological context of isolation, categorized by cultivation type (crop, non-crop, crop tree, or ornamental host plants) (Table S4). We used only the species that had more than ten genomes in our dataset; additionally, we excluded genomes lacking information on the host of isolation and one strain isolated from mud. We first assessed whether variation in attC site counts exhibited a phylogenetic signal. Pagel’s λ test indicated a very strong and highly significant phylogenetic signal (λ=0.99, P<2.22e-16), indicating that attC abundance is strongly structured by evolutionary history. Given this strong phylogenetic structure, we applied a phylogenetic generalized least squares regression to evaluate the effect of cultivation type while accounting for shared ancestry. Isolates from ornamental plants had significantly more attC sites (P<0.001; mean=12.1), whereas those from tree-crop environments had fewer (P≈0.017; mean=3.05). No significant differences were observed for crop (mean=6.01) and non-crop (mean=9.77) isolates (Fig. S9). Overall, these results suggest that while integron array size is strongly shaped by phylogenetic inertia, reflecting higher similarity within rather than between species, ecological factors, such as characteristics of the plant host, may also contribute. The prevalence of a functional integron integrase appears to influence cassette abundance: species with predominantly inactive integrases tend to carry fewer attC sites, likely due to gradual cassette loss. However, the evolutionary forces determining why some lineages retain active intI while others don’t are unclear.
Cassette arrays in Xanthomonas harbour diverse but poorly characterized gene functions
Prediction of the taxonomic origins of integron attCs was performed with a classification tool developed for attCs, which uses both sequence and structural homology information [39], to identify matches to known attC ‘types’. The method classified all attC sites in the Xanthomonas strains analysed as originating within the Xanthomonadales. Therefore, there is no evidence of long-distance acquisition of cassettes, and the cassettes appear to circulate at least within the Xanthomonadales family.
Arrays greatly varied in length across genomes, with a maximum of 46 cassettes in a single genome (X. arboricola_F GCA_040182365.1) and 33 cassettes in a single array (X. arboricola GCA_041475745.1 and GCA_041475785.1) (Table S1). Despite the majority of the isolates carrying inactive integron modules, with intI deleted or truncated, the cassette arrays appear to have been extremely dynamic before the inactivation of the integron integrase.
Overall, the 706 Xanthomonas genomes carried 1,004 cassette arrays, with a total of 4,773 cassettes. Within these, Proteinortho [52] identified 1,087 different orthologous genes. Of the 1,087 orthologous genes identified by Proteinortho, 476 (43.8%) were present as singletons, unique to one isolate.
Of the 1,087 orthologous genes, only 11.9% could be classified into a known Clusters of Orthologous Groups of proteins (COG) category [68] (Table S5). This underrepresentation of known COGs among cassette proteins is well known and has been reported for integrons from a range of different hosts and environments [19,69,72]. This may be due to sampling bias in databases, with captured cassettes potentially being from uncharacterized environmental organisms [73] or may be a result of cassette genes being subject to high mutation rates after capture.
Of the most represented functions, excluding unknown and recombination and repair, 1.47% of the total cassette-encoded proteins were predicted to play a role in transcription, 0.9% in amino acid transport and metabolism, and 0.55% in defence mechanisms. To gain further insight into possible cassette functions, we performed both sequence-based homology searching against eggNOG 5.0 [44] and protein structural homology searching against the Swiss-Prot database. From this combined approach, putative functions could only be assigned to 6.8% of the proteins. Of these annotated cassette-encoded proteins, 47.7% were classified either as transposase, integrases or insertion sequences (4.8% of the total). The presence of insertion sequences and transposases within gene cassettes [70], and targeting attC sites [74,75], has been frequently observed in past integron studies.
Signal peptides were identified in 11.6% of cassette-encoded proteins, using SignalP v6.0 [76]. Again, transmembrane and secreted proteins are commonly encoded by gene cassettes in Bacteria [72,77] and are hypothesized to help facilitate interactions with their broader environment. Additionally, PGPT-Pred [49] predicted 6.7% of cassette-encoded proteins to have plant growth-promoting traits such as biofertilization, plant signal production and stress control. However, the most frequent category, representing 4.9% of total proteins, was associated with competitive exclusion functions. These included antimicrobial resistance and detoxification functions, and toxin–antitoxin systems, which in a plant pathogenic context could contribute to niche adaptation by enhancing survival in a competitive microbial environment. One gene was classified by the EcoFoldDB [48] pipeline as involved in spermidine production, which can play an important role in plant growth and stress [78].
DefenseFinder [51] identified 20 orthologous genes (1.8%) as genes involved in defences against phages. Twelve different defence systems were detected; these included restriction-modification and abortive infection systems, but also the newly described Kiwa [79] and Shedu defence systems [80].
Using a database of previously characterized type three secretion system effectors (T3SEs) retrieved from the EuroXanth platform [53,54], we identified two T3SEs: AvrBs3 [a transcription activator-like effector (TALe)] in African-like pv. oryzae [81] and XopAF2 in three X. vasicola isolates. T3SEs are proteins secreted via the type III secretion system (T3SS) that suppresses or induces plant defences. TALes are a particular type of T3SE; once inside the host cell, they translocate to the nucleus where their unique domain of tandemly arranged 34-aa repeats mediates binding to specific promoter elements. Several TALes are known to induce host SWEET sucrose uniporter genes, thereby facilitating sucrose efflux from xylem parenchyma into the apoplasm at the infection sites. TALes may enhance disease by targeting susceptibility genes or may trigger a resistance response, and are therefore important for pathogen host range and virulence [81].
Given that the majority of cassette-encoded proteins are of unknown function and often occur as singletons (observed only in one strain), the phenotypic impact of these cassettes remains unclear. It is uncertain whether they confer a selective advantage or are maintained in the arrays through neutral processes. A cassette that is conserved across multiple species is more likely to encode a beneficial function. The most widespread cassette, identified in 13 species, encodes symE and the non-coding small RNA sRNA-Xcc1. sRNA-Xcc1 is known to be activated by regulatory elements of the T3SS [82], suggesting a potential role in plant host colonization. In E. coli, symE forms part of the SymE-SymR toxin–antitoxin system, where SymE induces nucleoid condensation, disrupts DNA replication and transcription and causes dsDNA breaks [83]. Regulation in E. coli occurs via the cis-encoded small RNA symR [84], whereas in the integron cassettes, sRNA-Xcc1 is located upstream of symE, suggesting a divergent regulatory mechanism. Notably, unlike most integron-associated genes oriented to be transcribed from the Pc promoter (typically located within intI), both sRNA-Xcc1 and symE are oriented in the opposite direction and not under Pc control.
Excluding transposases, the second most recurrent cassette, shared among eight species, encoded a protein of unknown function. The third most recurrent, present in seven species, included two hypothetical proteins and a gene conferring resistance to bleomycin. Bleomycin resistance genes have been previously identified as enriched in gene cassettes relative to total metagenomes in environmental samples, as shown by both cassette-targeted amplicon sequencing and shotgun metagenomics [77]. One of the recurring cassettes is an ‘empty’ cassette, which can occur multiple times in the same array. This cassette could have promoter activity as it was previously suggested in other species [7,9].
Detection of inter-species horizontal transfer of cassette arrays
To assess interspecies cassette gene sharing, we examined pairs of species that most frequently share orthologous genes. As expected, species harbouring a greater number of cassettes tend to share more orthologous genes with others. For example, X. arboricola and X. campestris shared the highest number of orthologous genes (21), with X. campestris sharing the most overall (57 orthologous genes across multiple species).
We then looked for more recent evidence of HGT of cassettes present in multiple species. We used Alfy v1.0.5 [57] to identify putative regions subject to HGT. We further selected the regions identified by Alfy 1.0.5, selecting those based on the difference in ANI between the cassette and the genomes harbouring them (Fig. S10). With this approach, we detected the movement of 95 cassettes between 38 pairs of species, even between group 1 and 2 Xanthomonas (Fig. 5). Fig. 5 shows the number of cassettes exchanged between species, without normalization for genome or cassette abundance. While this approach does not account for differences in genome diversity within and between species, or for uneven sampling depth (i.e. pandemic lineages are often oversampled and appear less diverse in comparison to commensal isolates), it nevertheless provides a direct view of observed horizontal transfer events. The highest flux of cassettes (10) was between X. arboricola and X. euroxanthea and between X. campestris and X. hortorum. However, X. campestris is the species with the highest flux of cassettes compared to any other species (36). One of the cassettes that showed evidence of horizontal transfer was that carrying symE and sRNA-Xcc1.
Fig. 5. Horizontal transfer of cassettes among Xanthomonas species. Cytoscape network illustrating the transfer of cassettes between species. Nodes represent Xanthomonas species; species containing more than ten genomes in the dataset are coloured as in Fig. 1 (tip labels). Line width between species is proportional to the number of cassettes moved between species.
Conclusions
Our comprehensive analysis of 706 complete genomes reveals that the acquisition of the integron platform is likely an ancestral event in the evolution of the Xanthomonas genus. The widespread conservation of the intI gene downstream of ilvD across species, along with intI phylogeny, strongly supports the early acquisition of the integron platform. Only in the X. arboricola clade B, the ‘original’ intI appears to have been substituted via HGT with an intI from another member of the Xanthomonadaceae family.
Despite the ancestral acquisition, our results show that integron activity, defined by the presence of full-length, potentially functional intI genes, and large and diverse cassette arrays, has been progressively lost in many Xanthomonas lineages. This inactivation appears to be species-specific and often predates major evolutionary or ecological transitions, such as pathovar diversification, as seen in X. cissicola and X. oryzae. Nevertheless, in species such as X. campestris, integrons remain robustly functional and continue to contribute to genomic diversification, as evidenced by the presence of full-length intIs, high cassette variability and the presence of strain-specific gene cassettes in cassette arrays. Selective pressures and ecological niches may favour the retention of an active integron system in this species and not in the others, for reasons not clear at this time. Loss of integron integrase activity in plant-associated pathovars may be favoured when it stabilizes cassette-associated genetic traits that confer a fitness advantage. Once a beneficial cassette configuration is established, inactivation of the integrase can prevent rearrangements or excisions that would disrupt key functions, enabling these lineages to undergo clonal expansion. Integron integrase activity may also be lost in the absence of positive selection to maintain it. Integrons provide the greatest adaptive advantage in dynamic environments with a diverse pool of accessible gene cassettes. Such diversity is potentially diminished in intensive agricultural systems, where pathogens encounter repetitive host-pathogen cycles and microbial communities are less diverse than in wild plant environments [85]. Under such conditions, the selective pressure to maintain integrase activity might be reduced. In agricultural settings, other types of mobile genetic elements carrying genes already optimized by selection in donor strains may play a more prominent role in driving plant–bacteria evolution [86,90].
The phenotypic impact of the large pool of gene cassettes residing in Xanthomonas genomes remains largely unresolved. Nearly half of orthologous genes carried by gene cassettes occurred only once, contributing to strain-level genetic variability. These rare genes may represent a reservoir of adaptive potential but could also be selectively neutral or even transient. Cassettes in Vibrio chromosomal integron platforms have been suggested to have been selected to be as neutral as possible [7]. The frequent occurrence of hypothetical proteins, including highly conserved cassette ORFs, suggests important functional roles that remain undiscovered. For instance, a cassette carrying the DUF1488 encoding gene is present in 16.7% of the genomes across six species, sometimes with multiple copies within the same integron array. Among cassette-encoded genes with predicted putative functions, many have predicted functions linked to environmental interaction and thus have the potential to contribute to niche adaptation. For example, the most widespread cassette in our dataset encodes symE and the small coding sRNA-Xcc1, known to be activated by T3SS regulatory proteins [82], which may influence bacterial growth during plant colonization. In Xanthomonas, we found a fairly small fraction of cassette genes to be classified as involved in defence mechanisms, despite the emerging role of integron platforms in encoding anti-phage genes [91,93]. However, this may be an underestimate, as work on large sedentary chromosomal integrons in Vibrio cholerae showed that 20% of their cassettes with no predicted function encode for anti-phage defence systems, despite these genes not being detected as such by bioinformatic tools [91].
Importantly, our analysis revealed evidence of interspecies horizontal transfer of cassettes. For both X. campestris and X. arboricola, there is evidence of recent horizontal transfer of integron-associated cassettes with other Xanthomonas species. Interestingly, however, despite their high levels of cassette acquisition, there is little evidence of recent and direct cassette exchange between these two species. Instead, X. campestris shows the highest degree of gene sharing with X. hortorum, a species more closely related to X. arboricola than to X. campestris. Conversely, the species with which X. arboricola shares the greatest number of cassettes is X. euroxanthea, its closest phylogenetic neighbour among the genomes analysed. These patterns suggest that both phylogenetic proximity and ecological overlap, such as co-occurrence in the phyllosphere or endosphere during mixed infections, have contributed to the dynamics of cassette exchange. The shared habitat provides repeated opportunities for interspecies contact and genetic exchange, likely promoting the horizontal dissemination of integron cassettes across the genus.
In summary, our analysis indicates that integron platforms were likely acquired early in the evolution of the Xanthomonas genus and have since followed divergent evolutionary trajectories. While many lineages show signs of integron inactivation, others, such as X. campestris, retain active systems contributing to genomic diversification. The gene cassettes carried by these integrons, including many with unknown or potentially adaptive functions, add to strain-level variability and may influence niche adaptation. Patterns of HGT suggest that both phylogenetic relatedness and ecological overlap shape cassette exchange across species, highlighting integrons as dynamic elements in the evolutionary landscape of Xanthomonas.
Supplementary material
Abbreviations
- ANI
average nt identity
- CALIN
clusters of attC sites lacking integron-integrase
- COG
Clusters of Orthologous Groups of proteins
- GTDB
Genome Taxonomy Database
- HGT
horizontal gene transfer
- NCBI
National Center for Biotechnology Information
- PGPT-Pred
Plant Growth-Promoting Traits Prediction
- TALe
transcription activator-like effector
- T3SE
type III secretion effector
- T3SS
type III secretion system
Footnotes
Funding: This work is supported by funding from the ARC Centre of Excellence in Synthetic Biology (CE200100029).
Contributor Information
Elena Colombi, Email: elena.colombi@mq.edu.au.
Timothy M. Ghaly, Email: timothy.ghaly@mq.edu.au.
Vaheesan Rajabal, Email: vaheesan.rajabal@mq.edu.au.
Liam D.H. Elbourne, Email: liam.elbourne@mq.edu.au.
Michael Gillings, Email: michael.gillings@mq.edu.au.
Sasha Tetu, Email: sasha.tetu@mq.edu.au.
References
- 1.Stokes HW, Hall RM. A novel family of potentially mobile DNA elements encoding site‐specific gene‐integration functions: integrons. Mol Microbiol. 1989;3:1669–1683. doi: 10.1111/j.1365-2958.1989.tb00153.x. [DOI] [PubMed] [Google Scholar]
- 2.Ghaly TM, Tetu SG, Penesyan A, Qi Q, Rajabal V, et al. Discovery of integrons in Archaea: Platforms for cross-domain gene transfer. Sci Adv. 2022;8:eabq6376. doi: 10.1126/sciadv.abq6376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Rowe-Magnus DA, Mazel D. Integrons: natural tools for bacterial genome evolution. Curr Opin Microbiol. 2001;4:565–569. doi: 10.1016/s1369-5274(00)00252-6. [DOI] [PubMed] [Google Scholar]
- 4.Escudero JA, Loot C, Mazel D. Molecular Mechanisms of Microbial Evolution 199-239. 2018. Integrons as adaptive devices; pp. 199–239. [Google Scholar]
- 5.Hall RM, Collis CM. Mobile gene cassettes and integrons: capture and spread of genes by site-specific recombination. Mol Microbiol. 1995;15:593–600. doi: 10.1111/j.1365-2958.1995.tb02368.x. [DOI] [PubMed] [Google Scholar]
- 6.Mazel D, Dychinco B, Webb VA, Davies J. A distinctive class of integron in the Vibrio cholerae genome. Science. 1998;280:605–608. doi: 10.1126/science.280.5363.605. [DOI] [PubMed] [Google Scholar]
- 7.Blanco P, Hipólito A, García-Pastor L, Trigo da Roza F, Toribio-Celestino L, et al. Identification of promoter activity in gene-less cassettes from Vibrionaceae superintegrons. Nucleic Acids Res. 2024;52:2961–2976. doi: 10.1093/nar/gkad1252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Michael CA, Labbate M. Gene cassette transcription in a large integron-associated array. BMC Genet. 2010;11:1–13. doi: 10.1186/1471-2156-11-82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Tansirichaiya S, Mullany P, Roberts AP. Promoter activity of ORF-less gene cassettes isolated from the oral metagenome. Sci Rep. 2019;9:8388. doi: 10.1038/s41598-019-44640-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Cury J, Jové T, Touchon M, Néron B, Rocha EP. Identification and analysis of integrons and cassette arrays in bacterial genomes. Nucleic Acids Res. 2016;44:4539–4550. doi: 10.1093/nar/gkw319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Cambray G, Sanchez-Alberola N, Campoy S, Guerin É, Da Re S, et al. Prevalence of SOS-mediated control of integron integrase expression as an adaptive trait of chromosomal and mobile integrons. Mob DNA. 2011;2:6. doi: 10.1186/1759-8753-2-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Guerin E, Cambray G, Sanchez-Alberola N, Campoy S, Erill I, et al. The SOS response controls integron recombination. Science. 2009;324:1034. doi: 10.1126/science.1172914. [DOI] [PubMed] [Google Scholar]
- 13.Strugeon E, Tilloy V, Ploy M-C, Da Re S. The stringent response promotes antibiotic resistance dissemination by regulating integron integrase expression in biofilms. mBio. 2016;7:10. doi: 10.1128/mBio.00868-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Gillings M, Boucher Y, Labbate M, Holmes A, Krishnan S, et al. The evolution of class 1 integrons and the rise of antibiotic resistance. J Bacteriol. 2008;190:5095–5100. doi: 10.1128/JB.00152-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Hipólito A, García-Pastor L, Vergara E, Jové T, Escudero JA. Profile and resistance levels of 136 integron resistance genes. NPJ Antimicrob Resist. 2023;1:13. doi: 10.1038/s44259-023-00014-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Buongermino Pereira M, Österlund T, Eriksson KM, Backhaus T, Axelson-Fisk M, et al. A comprehensive survey of integron-associated genes present in metagenomes. BMC Genomics. 2020;21:1–14. doi: 10.1186/s12864-020-06830-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ghaly TM, Geoghegan JL, Alroy J, Gillings MR. High diversity and rapid spatial turnover of integron gene cassettes in soil. Environ Microbiol. 2019;21:1567–1574. doi: 10.1111/1462-2920.14551. [DOI] [PubMed] [Google Scholar]
- 18.Qi Q, Ghaly TM, Rajabal V, Russell DH, Gillings MR, et al. Vegetable phylloplane microbiomes harbour class 1 integrons in novel bacterial hosts and drive the spread of chlorite resistance. Sci Total Environ. 2024;954:176348. doi: 10.1016/j.scitotenv.2024.176348. [DOI] [PubMed] [Google Scholar]
- 19.Rajabal V, Ghaly TM, Egidi E, Ke M, Penesyan A, et al. Exploring the role of mobile genetic elements in shaping plant–bacterial interactions for sustainable agriculture and ecosystem health. Plants People Planet . 2024;6:408–420. doi: 10.1002/ppp3.10448. [DOI] [Google Scholar]
- 20.Ghaly TM, Geoghegan JL, Tetu SG, Gillings MR. The peril and promise of integrons: beyond antibiotic resistance. Trends Microbiol. 2020;28:455–464. doi: 10.1016/j.tim.2019.12.002. [DOI] [PubMed] [Google Scholar]
- 21.Gillings MR. Integrons: past, present, and future. Microbiol Mol Biol Rev. 2014;78:257–277. doi: 10.1128/MMBR.00056-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Essakhi S, Cesbron S, Fischer-Le Saux M, Bonneau S, Jacques M-A, et al. Phylogenetic and variable-number tandem-repeat analyses identify nonpathogenic Xanthomonas arboricola lineages lacking the canonical type III secretion system. Appl Environ Microbiol. 2015;81:5395–5410. doi: 10.1128/AEM.00835-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Garita-Cambronero J, Palacio-Bielsa A, López MM, Cubero J. Pan-genomic analysis permits differentiation of virulent and non-virulent strains of Xanthomonas arboricola that cohabit Prunus spp. and elucidate bacterial virulence factors. Front Microbiol. 2017;8:573. doi: 10.3389/fmicb.2017.00573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Vauterin L, Yang P, Alvarez A, Takikawa Y, Roth DA, et al. Identification of non-pathogenic Xanthomonas strains associated with plants. Syst Appl Microbiol. 1996;19:96–105. doi: 10.1016/S0723-2020(96)80016-6. [DOI] [Google Scholar]
- 25.Timilsina S, Potnis N, Newberry EA, Liyanapathiranage P, Iruegas-Bocardo F, et al. Xanthomonas diversity, virulence and plant-pathogen interactions. Nat Rev Microbiol. 2020;18:415–427. doi: 10.1038/s41579-020-0361-8. [DOI] [PubMed] [Google Scholar]
- 26.Barionovi D, Scortichini M. Assessment of integron gene cassette arrays in strains of Xanthomonas fragariae and X. arboricola pvs. fragariae and pruni. J Plant Pathol. 2006:279–284. [Google Scholar]
- 27.Barionovi D, Scortichini M. Integron variability in Xanthomonas arboricola pv. juglandis and Xanthomonas arboricola pv. pruni strains. FEMS Microbiol Lett. 2008;288:19–24. doi: 10.1111/j.1574-6968.2008.01315.x. [DOI] [PubMed] [Google Scholar]
- 28.Gillings MR, Holley MP, Stokes HW, Holmes AJ. Integrons in Xanthomonas: A source of species genome diversity. Proc Natl Acad Sci USA. 2005;102:4419–4424. doi: 10.1073/pnas.0406620102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Parks DH, Chuvochina M, Rinke C, Mussig AJ, Chaumeil P-A, et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res. 2022;50:D785–D794. doi: 10.1093/nar/gkab776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics. 2022;38:5315–5316. doi: 10.1093/bioinformatics/btac672. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5:e9490. doi: 10.1371/journal.pone.0009490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Paradis E, Schliep K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2019;35:526–528. doi: 10.1093/bioinformatics/bty633. [DOI] [PubMed] [Google Scholar]
- 33.Bertels F, Silander OK, Pachkov M, Rainey PB, van Nimwegen E. Automated reconstruction of whole-genome phylogenies from short-sequence reads. Mol Biol Evol. 2014;31:1077–1088. doi: 10.1093/molbev/msu088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Yu G, Smith DK, Zhu H, Guan Y, Lam TTY. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol. 2017;8:28–36. doi: 10.1111/2041-210X.12628. [DOI] [Google Scholar]
- 36.Xu S, Dai Z, Guo P, Fu X, Liu S, et al. ggtreeExtra: compact visualization of richly annotated phylogenetic data. Mol Biol Evol. 2021;38:4039–4042. doi: 10.1093/molbev/msab166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Néron B, Littner E, Haudiquet M, Perrin A, Cury J, et al. IntegronFinder 2.0: identification and analysis of integrons across bacteria, with a focus on antibiotic resistance in Klebsiella. Microorganisms. 2022;10:700. doi: 10.3390/microorganisms10040700. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Schwengers O, Jelonek L, Dieckmann MA, Beyvers S, Blom J, et al. Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microb Genom. 2021;7:000685. doi: 10.1099/mgen.0.000685. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Ghaly TM, Tetu SG, Gillings MR. Predicting the taxonomic and environmental sources of integron gene cassettes using structural and sequence homology of attC sites. Commun Biol. 2021;4:946. doi: 10.1038/s42003-021-02489-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Ghaly TM, Penesyan A, Pritchard A, Qi Q, Rajabal V, et al. Methods for the targeted sequencing and analysis of integrons and their gene cassettes from complex microbial communities. Microb Genom. 2022;8:000788. doi: 10.1099/mgen.0.000788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Löytynoja A. Multiple Sequence Alignment Methods. Springer; 2013. Phylogeny-aware alignment with PRANK; pp. 155–170. [DOI] [PubMed] [Google Scholar]
- 42.Lemoine F, Gascuel O. Gotree/Goalign: toolkit and Go API to facilitate the development of phylogenetic workflows. NAR Genom Bioinform . 2021;3:lqab075. doi: 10.1093/nargab/lqab075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Cantalapiedra CP, Hernández-Plaza A, Letunic I, Bork P, Huerta-Cepas J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol Biol Evol. 2021;38:5825–5829. doi: 10.1093/molbev/msab293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Huerta-Cepas J, Szklarczyk D, Heller D, Hernández-Plaza A, Forslund SK, et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 2019;47:D309–D314. doi: 10.1093/nar/gky1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2015;12:59–60. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]
- 46.van Kempen M, Kim SS, Tumescheit C, Mirdita M, Lee J, et al. Fast and accurate protein structure search with Foldseek. Nat Biotechnol. 2024;42:243–246. doi: 10.1038/s41587-023-01773-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Heinzinger M, Weissenow K, Sanchez JG, Henkel A, Mirdita M, et al. Bilingual language model for protein sequence and structure. NAR Genom Bioinform. 2024;6:lqae150. doi: 10.1093/nargab/lqae150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Ghaly TM, Rajabal V, Russell D, Colombi E, Tetu SG. EcoFoldDB: protein structure-guided functional profiling of ecologically relevant microbial traits at the metagenome scale. bioRxiv. :2025.2004. 2002.646905. doi: 10.1101/2025.04.02.646905. [DOI] [PubMed] [Google Scholar]
- 49.Ashrafi S, Kuzmanović N, Patz S, Lohwasser U, Bunk B, et al. Two new Rhizobiales species isolated from root nodules of common sainfoin (Onobrychis viciifolia) show different plant colonization strategies. Microbiol Spectr. 2022;10:e0109922. doi: 10.1128/spectrum.01099-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Almagro Armenteros JJ, Tsirigos KD, Sønderby CK, Petersen TN, Winther O, et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat Biotechnol. 2019;37:420–423. doi: 10.1038/s41587-019-0036-z. [DOI] [PubMed] [Google Scholar]
- 51.Tesson F, Hervé A, Mordret E, Touchon M, d’Humières C, et al. Systematic and quantitative view of the antiviral arsenal of prokaryotes. Nat Commun. 2022;13:2561. doi: 10.1038/s41467-022-30269-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Klemm P, Stadler PF, Lechner M. Proteinortho6: pseudo-reciprocal best alignment heuristic for graph-based detection of (co-)orthologs. Front Bioinform . 2023;3:1322477. doi: 10.3389/fbinf.2023.1322477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Biessy A, Cadieux M, Ciotola M, McDuff F, Soufiane B, et al. Characterization of a plant‐pathogenic T3SS‐lacking Xanthomonas strain isolated from common ragweed. Plant Pathol. 2025;74:308–319. doi: 10.1111/ppa.14020. [DOI] [Google Scholar]
- 54.Costa J, Pothier JF, Bosis E, Boch J, Kölliker R, et al. A community-curated DokuWiki resource on diagnostics, diversity, pathogenicity, and genetic control of Xanthomonads. Mol Plant Microbe Interact. 2024;37:347–353. doi: 10.1094/MPMI-11-23-0184-FI. [DOI] [PubMed] [Google Scholar]
- 55.Orme D, Freckleton R, Thomas G, Petzoldt T, Fritz S, et al. The caper package: comparative analysis of phylogenetics and evolution in R. R Package Version. 2013;5:1–36. [Google Scholar]
- 56.Pinheiro J, Bates D, DebRoy S, Sarkar D, Heisterkamp S, et al. Package ‘nlme: linear and nonlinear mixed effects models. R Core Team. 2017;version 3:274. [Google Scholar]
- 57.Domazet-Lošo M, Haubold B. Alignment-free detection of local similarity among viral and bacterial genomes. Bioinformatics. 2011;27:1466–1472. doi: 10.1093/bioinformatics/btr176. [DOI] [PubMed] [Google Scholar]
- 58.Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9:5114. doi: 10.1038/s41467-018-07641-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Pena MM, Bhandari R, Bowers RM, Weis K, Newberry E, et al. Genetic and functional diversity help explain pathogenic, weakly pathogenic, and commensal lifestyles in the genus Xanthomonas. Genome Biol Evol. 2024;16:evae074. doi: 10.1093/gbe/evae074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Sapkota S, Mergoum M, Liu Z. The translucens group of Xanthomonas translucens: complicated and important pathogens causing bacterial leaf streak on cereals. Mol Plant Pathol. 2020;21:291–302. doi: 10.1111/mpp.12909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Ansari MA, Didelot X. Inference of the properties of the recombination process from whole bacterial genomes. Genetics. 2014;196:253–265. doi: 10.1534/genetics.113.157172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Loot C, Millot GA, Richard E, Littner E, Vit C, et al. Integron cassettes integrate into bacterial genomes via widespread non-classical attG sites. Nat Microbiol. 2024;9:228–240. doi: 10.1038/s41564-023-01548-y. [DOI] [PubMed] [Google Scholar]
- 64.Triplett LR, Hamilton JP, Buell CR, Tisserat NA, Verdier V, et al. Genomic analysis of Xanthomonas oryzae isolates from rice grown in the United States reveals substantial divergence from known X. oryzae pathovars. Appl Environ Microbiol. 2011;77:3930–3937. doi: 10.1128/AEM.00028-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Patané JSL, Martins J, Jr, Rangel LT, Belasque J, Digiampietri LA, et al. Origin and diversification of Xanthomonas citri subsp. citri pathotypes revealed by inclusive phylogenomic, dating, and biogeographic analyses. BMC Genomics. 2019;20:700. doi: 10.1186/s12864-019-6007-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Rao MJ, Zuo H, Xu Q. Genomic insights into citrus domestication and its important agronomic traits. Plant Commun . 2021;2:100138. doi: 10.1016/j.xplc.2020.100138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Raga-Carbajal E, Espin G, Ayala M, Rodríguez-Salazar J, Pardo-López L. Evaluation of a bacterial group 1 LEA protein as an enzyme protectant from stress-induced inactivation. Appl Microbiol Biotechnol. 2022;106:5551–5562. doi: 10.1007/s00253-022-12080-0. [DOI] [PubMed] [Google Scholar]
- 68.Galperin MY, Vera Alvarez R, Karamycheva S, Makarova KS, Wolf YI, et al. COG database update 2024. Nucleic Acids Res. 2025;53:D356–D363. doi: 10.1093/nar/gkae983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Boucher Y, Labbate M, Koenig JE, Stokes HW. Integrons: mobilizable platforms that promote genetic diversity in bacteria. Trends Microbiol. 2007;15:301–309. doi: 10.1016/j.tim.2007.05.004. [DOI] [PubMed] [Google Scholar]
- 70.Ghaly TM, Gillings MR, Rajabal V, Paulsen IT, Tetu SG. Horizontal gene transfer in plant microbiomes: integrons as hotspots for cross-species gene exchange. Front Microbiol. 2024;15:1338026. doi: 10.3389/fmicb.2024.1338026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Mazel D. Integrons: agents of bacterial evolution. Nat Rev Microbiol. 2006;4:608–620. doi: 10.1038/nrmicro1462. [DOI] [PubMed] [Google Scholar]
- 72.Rowe-Magnus DA, Guerout A-M, Biskri L, Bouige P, Mazel D. Comparative analysis of superintegrons: engineering extensive genetic diversity in the Vibrionaceae. Genome Res. 2003;13:428–442. doi: 10.1101/gr.617103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Rodríguez Del Río Á, Giner-Lamia J, Cantalapiedra CP, Botas J, Deng Z, et al. Functional and evolutionary significance of unknown genes from uncultivated taxa. Nature. 2024;626:377–384. doi: 10.1038/s41586-023-06955-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Post V, Hall RM. Insertion sequences in the IS1111 family that target the attC recombination sites of integron-associated gene cassettes. FEMS Microbiol Lett. 2009;290:182–187. doi: 10.1111/j.1574-6968.2008.01412.x. [DOI] [PubMed] [Google Scholar]
- 75.Tetu SG, Holmes AJ. A family of insertion sequences that impacts integrons by specific targeting of gene cassette recombination sites, the IS1111-attC Group. J Bacteriol. 2008;190:4959–4970. doi: 10.1128/JB.00229-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Teufel F, Almagro Armenteros JJ, Johansen AR, Gíslason MH, Pihl SI, et al. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat Biotechnol. 2022;40:1023–1025. doi: 10.1038/s41587-021-01156-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Ghaly TM, Rajabal V, Penesyan A, Coleman NV, Paulsen IT, et al. Functional enrichment of integrons: facilitators of antimicrobial resistance and niche adaptation. iScience. 2023;26:108301. doi: 10.1016/j.isci.2023.108301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Dunn MF, Becerra-Rivera VA. The biosynthesis and functions of polyamines in the interaction of plant growth-promoting rhizobacteria with plants. Plants. 2023;12:2671. doi: 10.3390/plants12142671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Zhang Z, Todeschini TC, Wu Y, Kogay R, Naji A, et al. Kiwa is a bacterial membrane-embedded defence supercomplex activated by phage-induced membrane changes. bioRxiv. 2023:2023.2002.2026.530102. doi: 10.1101/2023.02.26.530102. [DOI] [Google Scholar]
- 80.Loeff L, Walter A, Rosalen GT, Jinek M. DNA end sensing and cleavage by the Shedu anti-phage defense system. Cell. 2025;188:721–733. doi: 10.1016/j.cell.2024.11.030. [DOI] [PubMed] [Google Scholar]
- 81.Schepler-Luu V, Sciallano C, Stiebner M, Ji C, Boulard G, et al. Genome editing of an African elite rice variety confers resistance against endemic and emerging Xanthomonas oryzae pv. oryzae strains. Elife. 2023;12:e84864. doi: 10.7554/eLife.84864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Chen X-L, Tang D-J, Jiang R-P, He Y-Q, Jiang B-L, et al. sRNA-Xcc1, an integron-encoded transposon- and plasmid-transferred trans-acting sRNA, is under the positive control of the key virulence regulators HrpG and HrpX of Xanthomonas campestris pathovar campestris. RNA Biol. 2011;8:947–953. doi: 10.4161/rna.8.6.16690. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Thompson MK, Nocedal I, Culviner PH, Zhang T, Gozzi KR, et al. Escherichia coli SymE is a DNA-binding protein that can condense the nucleoid. Mol Microbiol. 2022;117:851–870. doi: 10.1111/mmi.14877. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Kawano M, Aravind L, Storz G. An antisense RNA controls synthesis of an SOS-induced toxin evolved from an antitoxin. Mol Microbiol. 2007;64:738–754. doi: 10.1111/j.1365-2958.2007.05688.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Karasov TL, Almario J, Friedemann C, Ding W, Giolai M, et al. Arabidopsis thaliana and Pseudomonas pathogens exhibit stable associations over evolutionary timescales. Cell Host Microbe. 2018;24:168–179. doi: 10.1016/j.chom.2018.06.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Cesbron S, Briand M, Essakhi S, Gironde S, Boureau T, et al. Comparative genomics of pathogenic and nonpathogenic strains of Xanthomonas arboricola unveil molecular and evolutionary events linked to pathoadaptation. Front Plant Sci. 2015;6:1126. doi: 10.3389/fpls.2015.01126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Colombi E, Hill Y, Lines R, Sullivan JT, Kohlmeier MG, et al. Population genomics of Australian indigenous Mesorhizobium reveals diverse nonsymbiotic genospecies capable of nitrogen-fixing symbioses following horizontal gene transfer. Microb Genom. 2023;9:000918. doi: 10.1099/mgen.0.000918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Colombi E, Bertels F, Doulcier G, McConnell E, Pichugina T, et al. Rapid dissemination of host metabolism–manipulating genes via integrative and conjugative elements. Proc Natl Acad Sci USA. 2024;121:e2309263121. doi: 10.1073/pnas.2309263121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Jibrin MO, Sharma A, Mavian CN, Timilsina S, Kaur A, et al. Phylodynamic insights into global emergence and diversification of the tomato pathogen Xanthomonas hortorum pv. gardneri. Mol Plant Microbe Interact. 2024;37:712–720. doi: 10.1094/MPMI-04-24-0035-R. [DOI] [PubMed] [Google Scholar]
- 90.Sparks AH, Adorada DL, Colombi E, Kelly LA, Young A, et al. Clonal expansion from standing genetic variation underpins the evolution of Curtobacterium flaccumfaciens pv. flaccumfaciens in Australia. Phytopathology. 2025 doi: 10.1094/PHYTO-01-25-0032-R. [DOI] [PubMed] [Google Scholar]
- 91.Darracq B, Littner E, Brunie M, Bos J, Kaminski PA, et al. Sedentary chromosomal integrons as biobanks of bacterial antiphage defense systems. Science. 2025;388:eads0768. doi: 10.1126/science.ads0768. [DOI] [PubMed] [Google Scholar]
- 92.Getz LJ, Fairburn SR, Vivian Liu Y, Qian AL, Maxwell KL. Integrons are anti-phage defence libraries in Vibrio parahaemolyticus. Nat Microbiol. 2025;10:724–733. doi: 10.1038/s41564-025-01927-7. [DOI] [PubMed] [Google Scholar]
- 93.Kieffer N, Hipólito A, Ortiz-Miravalles L, Blanco P, Delobelle T, et al. Mobile integrons encode phage defense systems. Science. 2025;388:eads0915. doi: 10.1126/science.ads0915. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





