Given its naturally large antimicrobial resistance profile, the Stenotrophomonas maltophilia complex (Smc) is a set of emerging pathogens of immunosuppressed and cystic fibrosis patients. As it is group of environmental microorganisms, this adaptation to humans is an opportunity to understand the genetic and metabolic selective mechanisms involved in this process. The previously reported genomic organization was incomplete, as data from animal strains were underrepresented. We added the missing piece of the puzzle with whole-genome sequencing of 93 strains of animal origin. Beyond describing the phylogenetic organization, we confirmed the genetic diversity of the Smc, which could not be estimated through routine phenotype- or matrix-assisted laser desorption ionization–time of flight mass spectrometry (MALDI-TOF)-based laboratory tests. Animals strains seem to play a key role in the diversity of Smc and could act as a reservoir for mobile resistance genes. Some genogroups seem to be associated with particular hosts; the genetic support of this association and the role of the determinants/corresponding genes need to be explored.
KEYWORDS: Stenotrophomonas maltophilia complex, animal, human, environmental, Genogroup, whole-genome sequencing
ABSTRACT
The Stenotrophomonas maltophilia complex (Smc) comprises opportunistic environmental Gram-negative bacilli responsible for a variety of infections in both humans and animals. Beyond its large genetic diversity, its genetic organization in genogroups was recently confirmed through the whole-genome sequencing of human and environmental strains. As they are poorly represented in these analyses, we sequenced the whole genomes of 93 animal strains to determine their genetic background and characteristics. Combining these data with 81 newly sequenced human strains and the genomes available from RefSeq, we performed a genomic analysis that included 375 nonduplicated genomes with various origins (animal, 104; human, 226; environment, 30; unknown, 15). Phylogenetic analysis and clustering based on genome-wide average nucleotide identity confirmed and specified the genetic organization of Smc in at least 20 genogroups. Two new genogroups were identified, and two previously described groups were further divided into two subgroups each. Comparing the strains isolated from different host types and their genogroup affiliation, we observed a clear disequilibrium in certain groups. Surprisingly, some antimicrobial resistance genes, integrons, and/or clusters of attC sites lacking integron-integrase (CALIN) sequences targeting antimicrobial compounds extensively used in animals were mainly identified in animal strains. We also identified genes commonly found in animal strains coding for efflux systems. The result of a large whole-genome analysis performed by us supports the hypothesis of the putative contribution of animals as a reservoir of Stenotrophomonas maltophilia complex strains and/or resistance genes for strains in humans.
IMPORTANCE Given its naturally large antimicrobial resistance profile, the Stenotrophomonas maltophilia complex (Smc) is a set of emerging pathogens of immunosuppressed and cystic fibrosis patients. As it is group of environmental microorganisms, this adaptation to humans is an opportunity to understand the genetic and metabolic selective mechanisms involved in this process. The previously reported genomic organization was incomplete, as data from animal strains were underrepresented. We added the missing piece of the puzzle with whole-genome sequencing of 93 strains of animal origin. Beyond describing the phylogenetic organization, we confirmed the genetic diversity of the Smc, which could not be estimated through routine phenotype- or matrix-assisted laser desorption ionization–time of flight mass spectrometry (MALDI-TOF)-based laboratory tests. Animals strains seem to play a key role in the diversity of Smc and could act as a reservoir for mobile resistance genes. Some genogroups seem to be associated with particular hosts; the genetic support of this association and the role of the determinants/corresponding genes need to be explored.
INTRODUCTION
The Stenotrophomonas maltophilia complex (Smc) comprises nonfermentative Gram-negative bacilli whose natural habitats are diverse, including fresh water, soils, and rhizospheres (1). In humans and animals, it is responsible for a variety of infections, mostly respiratory tract and bloodstream infections (1). From a genetic point of view, it is characterized by its diversity and is often presented as the Smc. Nevertheless phylogenic studies using different molecular approaches, such as arbitrarily primed PCR (AP-PCR) and multilocus sequence type (MLST), have established (i) a genogroup-based organization and (ii) the predominance—sometimes the exclusive presence—of strains with a human or environmental origin in some genogroups (2, 3). Additional studies based on whole-genome sequencing of an increasing number of strains confirmed this phylogenic organization (4–6). We recently reported the disproportionate representation of some genogroups, not only among human but also among animal infectious strains (7, 8). From an epidemiological point of view, animals could be a reservoir for human diseases; while from a phylogenic point of view, animals could constitute an intermediate step for the adaptation of environmental strains to humans. The putative role of animals as a “nursery” for the adaptation of environmental strains to humans has been previously reported (9).
We aim to decipher the phylogenic position, genetic characteristics, and putative role of Smc animal strains in human disease through whole-genome analysis. For this purpose, we sequenced two previously published collections of strains, including 93 animal and 81 human strains. We performed a genome-wide phylogenetic analysis including additional genomes with human, animal, and environmental origins available in databases.
RESULTS
Genome collection.
In addition to the newly sequenced 174 Smc strains, mainly from France (Fig. 1), we also obtained 201 complete genomes from the RefSeq database. After removing duplicate and contaminated sequences, we identified 145 strains from humans, 30 from environment/plants, 11 from animals, and 15 from unknown origins. Among these 375 genomes, the strains were mainly isolated from respiratory samples (n = 168), unknown origins (n = 54), blood (n = 49), plant/rhizosphere/soil (n = 17), urine (n = 16), gastrointestinal tract (n = 13), suppurating wound (n = 11), water (n = 11), upper aerodigestive tract (n = 10), and genital tract (n = 9) (Fig. 2; see Table S1 in the supplemental material). In terms of hosts, the zoonotic strains were mainly isolated from horses (n = 82/104) (Table S1).
FIG 1.
Geographical distribution of the human and animal strains sequenced in this study (n = 174).
FIG 2.
Distribution of the origin and sample types of the 375 genomes sequenced in this study (n = 174) or collected from the RefSeq database (n = 201).
Population structure.
The MLST-based phylogeny highlighted the great diversity of the complex and confirmed its organization into genogroups, as previously described (see Fig. S1 in the supplemental material). However, many strains were not included in these genogroups, and some groups could probably be divided, as they contain divergent clades. In a second step, we performed a phylogenetic analysis based on 264 genes shared by all the strains and the outgroup Stenotrophomonas pictorum JCM 9942 (Fig. 3; see Fig. S2 in the supplemental material). The resulting phylogeny globally agreed with the genogroup organization observed through the MLST analysis, but the topology was slightly different. It confirmed the basal position of genogroups A and 9, which do not belong to S. maltophilia species sensu stricto. However, it also showed that genogroup 5 is not very close to the Smc genogroups. The mean genome-wide average nucleotide identity (gANI) and AF values between each possible pair of strains ranged from 86.90% to 100.00% and from 67.00% to 100.00%, respectively (Fig. 4), thus indicating a genogroup-like organization. Indeed, when plotting gANI versus the alignment fraction (AF) for each pair of strains, we observed a noncontinuous graph corresponding to the intra- or intergroup comparison. Varghese et al. proposed a threshold of 96.5% of gANI to delineate a species (10). This cutoff appears extremely strict in our case given the observed diversity. So, based on the shape of the plot, we chose a gANI mean value of 95% to cluster the population using the Louvain algorithm. This allowed us to identify 18 clusters of at least 5 strains, and 5 strains that remain ungrouped. Of these ungrouped strains, 1 belonged to the genospecies Smc1, 2 to the genogroup B, and the remaining 3 could not be linked to previously known genogroups (6). Four new genogroups, namely, G, H, I, and J, were identified, and 2 previously described groups were further divided into 2 subgroups, namely, (i) 2-a and 2-b and (ii) 9-a and 9-b. Moreover, when combining the phylogenetic analysis with the gANI results, we found that genogroup 2_a was actually split into two subgroups (Fig. 3 and Fig. S2). However, there were no sequences in our set of genomes for the genogroup E described by Kaiser et al. (3).
FIG 3.
Phylogenetic tree of the 375 strains and their classification into genogroups according to gANI-based clustering, rooted on Stenotrophomonas pictorum JCM 9942.
FIG 4.
Whole-genome-based average nucleotide identity (gANI) and alignment fraction between each pair of genomes. Owing to the shape of the plot, a threshold of 95% was used to define genogroups with the Louvain clustering algorithm (red dotted line). Pairs of genomes belonging to same genogroup are colored green, while unrelated strains are in red. The blue dotted line represents the threshold classically used to define species.
Genogroup and host distribution.
We compared the hosts from which the strains were isolated and their genogroup affiliation and observed a clear disequilibrium in certain groups (Table 1; Fig. 3 and Fig. S2). Indeed, more than 75% of the strains were of human origin in genogroups 1, 3, 6, and C, whereas more than 77% strains in the genogroups 2-b and 5 were isolated from animals. Interestingly, the genogroups 2-a (36% human, 59% animal) and D (50% human, 50% animal) consisted of a mix of strains from humans and animals, whereas genogroup 9-a mainly consisted of animal (46%) and environmental (36%) strains. Some groups, such as genogroups 4, 7, and F comprised strains of all origins, i.e., human, animal, and environment. In the remaining groups, the low number of genomes (<10) prevents any conclusions.
TABLE 1.
Host distribution in the genogroups of the Stenotrophomonas maltophilia complex
Genogroup | No. of samples (%) by host |
Total no. of samples | |||
---|---|---|---|---|---|
Human | Animal | Environment/plant | Unknown | ||
1 | 9 (100) | 0 (0) | 0 (0) | 0 (0) | 9 |
2-a | 14 (35.9) | 23 (59) | 2 (5.1) | 0 (0) | 39 |
2-b | 1 (11.1) | 7 (77.8) | 1 (11.1) | 0 (0) | 9 |
3 | 21 (77.8) | 5 (18.5) | 0 (0) | 1 (3.7) | 27 |
4 | 6 (46.2) | 5 (38.5) | 2 (15.4) | 0 (0) | 13 |
5 | 1 (4.3) | 19 (82.6) | 1 (4.3) | 2 (8.7) | 23 |
6 | 108 (78.3) | 16 (11.6) | 5 (3.6) | 9 (6.5) | 138 |
7 | 6 (60) | 2 (20) | 2 (20) | 0 (0) | 10 |
9-a (Smc_4) | 1 (10) | 5 (50) | 3 (30) | 1 (10) | 10 |
9-b (Smc_4) | 0 (0) | 0 (0) | 2 (100) | 0 (0) | 2 |
A (Smc_3) | 4 (50) | 3 (37.5) | 0 (0) | 1 (12.5) | 8 |
B | 0 (0) | 0 (0) | 1 (100) | 0 (0) | 1 |
C | 25 (86.2) | 1 (3.4) | 2 (6.9) | 1 (3.4) | 29 |
D | 6 (50) | 6 (50) | 0 (0) | 0 (0) | 12 |
F | 11 (50) | 8 (36.4) | 3 (13.6) | 0 (0) | 22 |
G | 4 (100) | 0 (0) | 0 (0) | 0 (0) | 4 |
H | 5 (55.6) | 2 (22.2) | 2 (22.2) | 0 (0) | 9 |
I | 1 (25) | 1 (25) | 2 (50) | 0 (0) | 4 |
J | 2 (100) | 0 (0) | 0 (0) | 0 (0) | 2 |
Smc_1 | 0 (0) | 0 (0) | 1 (100) | 0 (0) | 1 |
Single | 1 (33.3) | 1 (33.3) | 1 (33.3) | 0 (0) | 3 |
(v) Resistance genes, plasmids, and integrons.
We compared the resistance genes found in the strains belonging to different hosts and genogroups. Various aminoglycoside resistance genes were found almost exclusively in strains isolated from animals, including APH(6)-Id, APH(3′′)-Ib, APH(3′)-Ia, APH(4)-Ia, ANT(2′′)-Ia, ANT(3′′)-Ia, and AAC(3)-IV. Except for AAC(3)-IV and APH(4)-Ia which were only found in genogroups 2-a and 4 (7 strains in both cases), the other genes were distributed in various genogroups. However, other aminoglycoside resistance determinants were found in nearly all strains, namely, APH(3′)-IIc (n = 340) and APH(6) from S. maltophilia (n = 322), which were present as two variants (GenBank accession no. WP_012480135.1 and WP_063841666). The gene sul1 was also found more frequently in animal (16/104, 15.4%) than human (6/226, 2.7%) or environmental (0/30) isolates. The chloramphenicol resistance efflux transporters CmlA1 and CmlA10 were also found exclusively in strains from animals but were too few (n = 5 and n = 4, respectively) to be considered according to our parameters. None of the resistance genes were specifically found in human or environmental isolates according to our parameters, except for AAC(6′)-Iz. However, this determinant was strongly associated with the phylogeny, as 90.6% of the strains containing it belonged to genogroup 6. The presence or absences of many other genes were also often related to phylogeny. For example, AAC(6′)-Ia was almost exclusively found in genogroup 1 (9/9 versus 1/366).
We also searched for classical Stenotrophomonas beta-lactamases and found that only 10 and 3 strains did not carry the L1 and L2 beta-lactamases, respectively. Amino acid identity with the reference sequence ranged from 77.9% to 100% for L1 and from 62.3% to 99.7% for L2. No specific genogroups or hosts were disproportionally represented in the strains lacking the enzyme (data not shown). Phylogenetic analysis of the amino acid sequences of both beta-lactamases clustered them into genogroups (data not shown).
Finally, upon searching for complete integron and/or clusters of attC sites lacking integron-integrase (CALIN) sequences, we found that strains from animals were enriched (n = 24/104, 23.1%) in these mobile elements compared with the strains from other origins (human, n = 10/226, 4.4%; environment, n = 0/30). However, these data have to be confirmed because only a few strains carried these elements, namely, 14 strains with complete integrons, 18 with CALIN, and 2 strains with both.
Regarding plasmid identification, only 12 out of the 375 genomes showed a partial matching with the curated plasmids databases (six genomes from animal strains of our study and six human genomes from the RefSeq database). These matching sequences corresponded to integrative and conjugative element (ICE) in 6 out of 12 genomes.
Concerning resistance determinants preferentially identified in animals, a part or the totality of a mobile element was identified around many acquired-resistance genes (see Table S2 in the supplemental material). However, in some cases, the contig size was insufficient to explore informatively the genetic environment of the resistance gene.
Pangenome analysis.
Pangenome analysis of the 375 deduplicated strains reveals an open pangenome of 22,936 genes, with a core genome of 1,740 genes, a softcore genome of 1,083 genes, and a shell part of 2,042 genes. Notably, the majority of the genes (18,071) were cloud genes (i.e., shared by less than 15% of strains). First, we focused on the accessory genome by analyzing presence/absence of clusters of proteins in less than 99% and more than 1% of strains. Proteins present in all strains in a genogroup but isolated from a diverse range of hosts were also removed to ignore lineage-associated genes. Manhattan distance-based clustering of this presence/absence matrix shows that the genogroup-like organization remained stable in this analysis (see Fig. S3 in the supplemental material). The following few groups were split: genogroup F was split into two subgroups, one encompassing strains from humans (n = 11), animals (n = 4), and the environment (n = 4) and the other having strains from nonhuman animals only (n = 4); genogroup 2-a was split into two groups, one of them being close to the genogroup F subgroups; and only one strain from genogroup 3 was separated from the rest of the genogroup.
We then searched for genes associated with host type but independent from genogroups. Many clusters of proteins (n = 242) were mostly found in animal strains. Among them, we identified various proteins implicated in efflux, including (i) three proteins that were part of an efflux RND transporter with high homology to known proteins (GenBank accession no. WP_003092290.1, WP_003092293.1, and ALU64835.1), (ii) a protein with high homology to a small multidrug resistance (SMR) efflux transporter (WP_000539741.1), and (iii) two proteins from a multidrug resistance-nodulation-division (RND) efflux transporter (WP_004147062.1 and WP_100467923.1). The aminoglycoside resistance enzymes APH(3′)-Ib, APH(6)-Id, and APH(3′)-Ia were also found to be abundant in animal strains in this analysis. Many conjugative elements were also identified in animal strains. Unfortunately, most of the other proteins were hypothetical. We found considerably fewer human-associated proteins (n = 22). Some of them were frequently colocalized on the same contig as three proteins that included the trehalose synthase TreS and a F420-dependent glucose-6-phosphate dehydrogenase. The remaining proteins were mainly hypothetical proteins or poorly annotated proteins.
We searched for an association with the respiratory or respiratory and upper aerodigestive tract sample types. However, we did not find any protein associated with respiratory origin using our threshold and found only one protein (a conjugal transfer protein TrbE) associated with the respiratory and upper aerodigestive tract samples.
DISCUSSION
Smc is an archetypical environmental opportunistic bacterium responsible for hospital-associated infections (1, 7). It can also be responsible for infections in animals, especially horses (8, 11). So, advanced study of the relationships between its genetic background, including resistance gene content, and the origin of the strains requires a global approach that includes human, environmental, and animal strains. Unfortunately, few genomes of animal strains were available in the databases compared with genomes of environmental or human strains. Indeed, animal strains were scarcely included and not specifically analyzed in most recent studies on Smc population structure (4–6). Yet, we previously reported that although some animal strains shared a genomic background with their human counterparts, others were significantly different (8). To fill this gap and to study the potential link between host, population structure, and resistance, we sequenced 174 new strains, including 93 animal strains. Combining these data with genomes from RefSeq, we obtained a unique and diverse collection of isolates to scrutinize.
A classical phylogeny analysis, solely based on MLST, resulted in the segregation of strains into many groups. Among them, we clearly identified 13 previously described genogroups (3). It is noteworthy that this organization, originally reported in 1999 using poorly reproducible molecular methods, such as amplified fragment length polymorphism (AFLP) fingerprinting, was first confirmed through MLST and then by whole-genome sequencing (2–6). However, our large-scale study allowed us to (i) identify several new putative clusters representing more than 10% of the strains and (ii) highlight that some genogroups, such as genogroups 2 and 9, actually consisted of divergent subgroups. Using a combination of whole-genome-based phylogeny and gANI/AF clustering, we were able to define the structure of the Smc population more precisely, as being composed of at least 20 genogroups. Surprisingly, the gANI values of the strains from different genogroups were below the species cutoff of 96.5%, which was proposed by Varghese et al. (10). This correlates with the high heterogeneity observed by older methods, such as ribotyping or pulse-field gel electrophoresis (1). Some of the genogroups could be considered different species; unfortunately, none of the methods commonly available in current laboratories (phenotypic- or MALDI-TOF-based methods) are able to distinguish between these divergent groups. On the basis of these results, we propose an updated nomenclature, while striving to be as consistent as possible with the historical classification.
Some of the genogroup/host associations observed in past studies were confirmed, while new ones were observed. We found a predominance of animal strains in genogroups 5 and 9, while genogroups 2 and 6 included strains from both human and animals, as noted in the studies by Kaiser et al. and Jayol et al. (3, 8). Furthermore, genogroups 1, 3, and C contained almost exclusively human isolates, whereas strains from genogroups 2-b and 5 mainly originated from animals. Genogroups 2-a and D were essentially mixed. With this larger sample, genogroup 6 appears to be predominantly of human origin, with a smaller fraction of animal isolates than previously observed in smaller population samples (7, 8).
Considering these data, we compared resistance gene content to explain the segregation. Animal strains carried aminoglycoside and sulfonamide resistance-coding genes more frequently. Of note, these antibiotics are mostly used in veterinary medicine, leading to an antibiotic selection pressure among animal strains (12). These data are also in line with the higher prevalence of phenotypical co-trimoxazole resistance observed in animal strains than in human strains (18.8% versus 5.0%) (7, 8). However, we did not test the prevalence of fluoroquinolone resistance and its associated mechanisms since there are multiple mechanisms underlying fluoroquinolone resistance, mainly the overexpression of several native efflux pumps, which are difficult to explore through DNA whole-genome sequencing. The acquired resistance genes seem to be mobilized through mobile genetic elements as integrons or transposons. Plasmids do not play a significant role in the mobilization of these genes. These findings support the hypothesis that animal strains could act as a reservoir of mobile resistance genes for human strains, especially if they belong to the same genogroups, i.e., share the same genetic background.
We observed a large and open pangenome of 22,936 genes, with a relatively small core genome of 1,740 genes. This core genome size is comparable to that proposed by Youenou et al., which comprised 1,647 coding sequences from 14 strains (13), but lower than the 2,762 genes reported by Lira et al., from 24 strains (14). Our result is an approximation of the core genome of this species, owing to the huge variety of strains in our collection. Indeed, the sample of genomes studied by Lira et al. was quite small and excluded animal strains (14). Clustering based on the accessory genome also highlighted this robust genogroup-like organization with only a few discrepancies and did not display an association between the host and accessory genome. However, a pangenome-wide association study (panGWAS) approach with Scoary enabled us to find host-specific proteins. Indeed, animal strains were rich in many proteins, including multidrug efflux proteins. As with antibiotic determinants, this could also reflect a selective pressure encountered by these isolates. In human strains, our analysis highlighted, among a few others, colocalized proteins that could be part of a common pathway related to trehalose metabolism. Interestingly, trehalose metabolism has previously been linked with the emergence of hypervirulent strains of Clostridium difficile in humans (15).
Previous studies on the association between the host and Staphylococcus aureus have been able to identify a host-specific gene pool based on accessory genome clustering (16). However, such an association was not observed when clustering our Smc strains based on presence/absence of the accessory genes. In contrast, we observed a conservation of the genogroup organization with only a few exceptions (genogroup F and genogroup 2-a) that were not clearly linked with the host type. Based on these results, we can assume that the capacity to infect a specific host is likely due to a restricted number of genes or even specific alleles, although this last hypothesis cannot be tested through our approach.
Finally, we searched for proteins associated with the sample type, specifically whether it was isolated from the respiratory or respiratory and upper aerodigestive tract. However, we did not find determinants that could be linked to the pathophysiology of respiratory infection. The presence of respiratory samples in almost all genogroups also points to an overall capacity of this complex of bacteria to infect the respiratory tract (Table S1). Moreover, this group of microorganisms is not responsible for acute pneumonia in patients without comorbidity but acts as an opportunistic pathogen found in debilitated patients in intensive care units or cystic fibrosis patients. Under these conditions, it seems difficult to identify specific determinants associated with these infections only by intraspecies comparisons.
Our work suggests that, at least in part, a common reservoir exists between human, animal, and environmental strains. Some specific genogroups seem to have evolved in a host-specific way. Nevertheless, only a few bacterial determinants appear to be host specific. In the case of animals, they rather reflect a selective pressure acting on the strains, whereas in human strains, we found metabolic determinants that need to be phenotypically explored.
This work has several limitations. First, the majority of animal strains originated from horses, dogs, and cats, which are domestic animals in frequent contact with humans. Pet animals could be colonized by human strains and act as secondary reservoirs. Nevertheless, in addition to the differing genogroup distribution in animals and humans, the genetic profiles of strains from a few other animal species confirmed the distribution of animal strains among both human-affiliated (2 and 6) and animal-specific genogroups. Moreover, the characterization of genogroups that consisted of only animal strains confirmed that our animal set of strains is, at least partially, different from human strains. Our findings must be confirmed through testing samples originating from wild animals. Second, a large part of our samples originate from respiratory tract infections. Additional strains from other clinical origins would improve the quality of the comparison. Additionally, the strains were mainly from a single country, which could have impacted our findings to some degree. They should be tested using a larger and more geographically diverse sample of strains. Finally, our results came from in silico analyses of gene content and need to be confirmed phenotypically.
In conclusion, we performed a large genome-wide analysis of Smc, including numerous isolates from animals, that confirmed at the genome level (i) the genogroup phylogenetic organization of this complex, including previously undescribed clades; (ii) the presence of specific and distinct animal and human genogroups; and (iii) the presence of host-associated determinants. Future work is needed to confirm these results at the phenotypic level.
MATERIALS AND METHODS
Collection of strains.
We sequenced 174 strains from two collections of S. maltophilia sensu lato, including some never previously published (7, 8). These strains were responsible for infections in animals (n = 93) and humans (n = 81) and were representative of the diversity of Smc genogroups. The characteristics of these 174 strains, including their sample type and geographical origin, are reported in the Fig. 1 and 2 and in Table S1. Briefly the animal strains were isolated from horses (n = 78), dogs (n = 7), cats (n = 3), reptiles (n = 2), and a few other animals, such as a fish, seal, and lemur (n = 1). These clinical specimens were isolated from the respiratory tract (n = 73), the genital tract (n = 8), urine (n = 3), the upper aerodigestive tract (n = 6), and other locations (the gastrointestinal tract, pus, and unknown; n = 1 for each of these locations). In turn, the human strains were isolated from the respiratory tract (n = 33, including 11 specimens from cystic fibrosis patients); the blood (n = 18); urine (n = 10); skin, soft tissues, and pus (n = 9); an intravascular device (n = 4); the gastrointestinal tract (n = 2); the eyes (n = 2); and other locations (the upper aerodigestive tract, joints and bones, and the genital tract; n = 1 for each of these locations).
Whole-genome sequencing, assembly, and annotation.
DNA was extracted from a few colonies of each Smc strain with a DSP DNA minikit on a QIAsymphony instrument (Qiagen, Hilden, Germany) in accordance with the manufacturer’s instructions. Genomes were sequenced on an Illumina NextSeq instrument (read length, 2 × 150 bp; NextSeq 500/550 v2 reagent kits) after Nextera XT library preparation (Illumina, San Diego, CA). De novo assembly was performed using Velvet (v3.5.3) and annotated using Prokka (v1.13) (17).
Sequences from databases.
MLST sequences from the study by Kaiser et al. (3) were retrieved from the NCBI database. Complete genomes of Smc (taxid, 40324) as well as the associated BioSample report, when available, for the host and sample type were also retrieved from the RefSeq database on 31 July 2018 (Fig. 2; Table S1). The genomes from RefSeq were reannotated using Prokka and are available on Zenodo (https://doi.org/10.5281/zenodo.3530273).
Contamination and duplicate removal.
Contamination was assessed using the standard checkM (v1.0.11) workflow (18). Strains with a contamination rate of >3% were discarded from the remaining analysis (6 strains from RefSeq). Then, the Mash distances were computed between all pairs of genomes after sketching with 5,000 sketches with a k-mer size of 21 (19). To remove redundancy in our population, we excluded genomes separated by a Mash distance of <104. This approach enabled us to remove 77 genomes.
Genomic characterization.
(i) MLST-based phylogeny. All the MLST sequences available for Smc strains (i.e., from Kaiser et al. and from whole genomes) were concatenated (3). Strains lacking at least one sequence were excluded from this part of the analysis in order to get an unbiased phylogeny. The concatenated sequences were then aligned using MAFFT (v7.310) (20), and phylogenetic analysis was performed using FastTree (v2.1.8) with a general time reversible evolution model and a gamma distribution of the rates across sites (21). The tree was visualized and annotated using Itols (22). We kept the genogroups names previously described by Kaiser et al. as a basis for our updated classification. Indeed, all the strains that derived from a common ancestor with a strain from Kaiser were assigned to its genogroup (3).
(ii) Genome-wide average nucleotide identity (gANI) and alignment fraction (AF) computation and clustering.
We calculated gANI and AF between each pair of genomes using ANIcalcultator (v1) after tRNA and rRNA exclusion as required (10). We further clustered all the pairs with a mean gANI greater or equal to 95% with the Louvain algorithm (23). The results were used to define the updated classification, while trying as much as possible to conserve the previous names.
(iii) Pangenome analysis.
Due to the heterogeneity and diversity of the Smc population, the classical threshold for the minimum percentage identity between proteins cannot be used. In order to define the most accurate cutoff without increasing the number of multigenic families, we computed the best BLAST hit (BBH) between representative strains of each gANI-based genogroup as well as ungrouped strains with protein identity thresholds ranging from 35% to 80% in 5% steps. The lowest identity percentage enabling a ratio of BBH/total hits of 95% was chosen (i.e., 60% protein identity) (see Fig. S4 in the supplemental material). Then, we used Roary v3.12 with this identity threshold for BLASTP and the–s option to avoid splitting paralogs to compute the pangenome of the strains (24). We performed a hierarchical clustering (Manhattan distance, average linkage) based on the presence/absence of each cluster of genes in the strains after removal of (i) proteins found in more than 99% of the genomes, (ii) proteins found in less than 1% of the genomes, and (iii) proteins present only in one genogroup and associated with multiple hosts. We visualized the tree obtained from the distance matrix to search for host/accessory genome associations.
Finally, we explored the pangenome to find an association between proteins and host or sample type using Scoary (25). First, we compared the nonhuman animal (n = 104) strains to the human and environmental strains (n = 226 + 30). Then, we compared human strains and nonhuman animal and environmental strains. Finally, we compared strains from respiratory samples (n = 168) or from respiratory and upper aerodigestive tract samples (n = 178) with other strains (n = 153 or n = 143). Strains with an unknown sample type were discarded from this analysis. Only the most significant results were considered with the following criteria: specificity, > 70%, Bonferroni corrected P value of <0.05; and odds ratio, >5. Moreover, clusters of proteins found only in phylogenetically related strains (same group or related groups in the phylogenetic tree) were not taken into account.
(iv) Core gene-based phylogeny.
As previously described, we performed a pangenome analysis on the 375 genomes of Smc and the genome of Stenotrophomonas pictorum JCM 9942, which was used as an outgroup. From this pangenome, we extracted the nucleotide sequences corresponding to the coding sequences found in a single copy in all 376 strains (n = 264). These sequences were individually aligned using MAFFT (v7.310) (20). Then, the alignments for each strain were concatenated and used to generate a phylogenetic tree with FastTree (v2.1.9) with a general time reversible evolution model and a gamma distribution of the rates of change across the sites. Finally, we annotated the tree using Itol (22).
(v) Resistance determinants, plasmids, and integrons.
Resistance determinants were analyzed through blastP analysis against the Bacterial Antimicrobial Resistance Reference Gene Database (BioProject accession no. PRJNA313047) with 80% protein identity and 80% coverage. Owing to the high variability of the L1 and L2 beta-lactamases, we required only a minimum coverage of 80% for these determinants and then checked for the consistency of the sequences on the NCBI website (26, 27). Finally, we used IntegronFinder to search for complete integron and CALIN sequences (28).
The presence of plasmids was analyzed through blastN analysis with 30% nucleic identity and 30% coverage against (i) the complete genomes from the GenBank database of five plasmids identified in the Stenotrophomonas genus (GenBank accession no. L09673.1, NC_010464.1, CP031730.1, CP031731.1, and CP031732.1) and (ii) a large published database of plasmids from Enterobacteriaceae (29).
For each of the acquired-resistance genes preferentially identified in animals, an individual analysis of the carried contig was performed. We searched for integron or CALIN sequences, insertion sequences, or transposases in the vicinity of the gene. The size of the contig was also considered when none of these elements were found.
(vi) Association between genogroups and specific genes.
The presence of resistance genes was tested against the different genogroups and hosts of Smc isolates using Scoary (25). Briefly, as for pangenome analysis, Fisher’s exact test was performed and results only with a Bonferroni corrected P value of <0.05, a specificity of >70%, and an odds ratio of >5 were considered.
Data management.
Clinical data were collected into a Microsoft Excel 2010 database that was password protected.
Data availability.
All the new genomic data are publicly available through BioProject accession no. PRJEB33154.
Supplementary Material
ACKNOWLEDGMENTS
We thank the following collaborators from the Observatoire des Résistances du Collège de Bactériologie, Virologie, Hygiène (ColBVH; https://collegebvh.org) and of the RESAPATH (https://resapath.anses.fr) networks who contributed to the collection of the Stenotrophomonas maltophilia strains: S. Aberanne (Créteil), O. Belmonte (Saint Denis De La Réunion), N. Blondiaux (Tourcoing), V. Cattoir (Caen), S. Dekeyser (Fougères), J. M. Delarbre (Mulhouse), C. Corlouer (Créteil), M. Haenni (Lyon), A. C. Jaouen (Bayonne), E. Laurens (Cholet), O. Lemenand (Saint Nazaire), E. Parisi Duchene (Bastia), B. Pangon (Le Chesnay), C. Plassart (Beauvais), S. Picot (Saint Pierre De La Réunion), and A. Vachee (Roubaix).
This work was partially funded by a grant from the Association des Biologistes de l’Ouest (ABO).
The funder had no role in the study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.
Footnotes
Supplemental material is available online only.
REFERENCES
- 1.Brooke JS. 2012. Stenotrophomonas maltophilia: an emerging global opportunistic pathogen. Clin Microbiol Rev 25:2–41. doi: 10.1128/CMR.00019-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hauben L, Vauterin L, Moore ER, Hoste B, Swings J. 1999. Genomic diversity of the genus Stenotrophomonas. Int J Syst Bacteriol 49:1749–1760. doi: 10.1099/00207713-49-4-1749. [DOI] [PubMed] [Google Scholar]
- 3.Kaiser S, Biehler K, Jonas D. 2009. A Stenotrophomonas maltophilia multilocus sequence typing scheme for inferring population structure. J Bacteriol 191:2934–2943. doi: 10.1128/JB.00892-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Steinmann J, Mamat U, Abda EM, Kirchhoff L, Streit WR, Schaible UE, Niemann S, Kohl TA. 2018. Analysis of phylogenetic variation of Stenotrophomonas maltophilia reveals human-specific branches. Front Microbiol 9:806. doi: 10.3389/fmicb.2018.00806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ochoa-Sánchez LE, Vinuesa P. 2017. Evolutionary genetic analysis uncovers multiple species with distinct habitat preferences and antibiotic resistance phenotypes in the Stenotrophomonas maltophilia complex. Front Microbiol 8:1548. doi: 10.3389/fmicb.2017.01548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Vinuesa P, Ochoa-Sánchez LE, Contreras-Moreira B. 2018. GET_PHYLOMARKERS, a software package to select optimal orthologous clusters for phylogenomics and inferring pan-genome phylogenies, used for a critical geno-taxonomic revision of the genus Stenotrophomonas. Front Microbiol 9:771. doi: 10.3389/fmicb.2018.00771. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Corlouer C, Lamy B, Desroches M, Ramos-Vivas J, Mehiri-Zghal E, Lemenand O, Delarbre JM, Decousser JW; Collège de Bactériologie-Virologie-Hygiène des Hôpitaux de France. 2017. Stenotrophomonas maltophilia healthcare-associated infections: identification of two main pathogenic genetic backgrounds. J Hosp Infect 96:183–188. doi: 10.1016/j.jhin.2017.02.003. [DOI] [PubMed] [Google Scholar]
- 8.Jayol A, Corlouer C, Haenni M, Darty M, Maillard K, Desroches M, Lamy B, Jumas-Bilak E, Madec JY, Decousser JW. 2018. Are animals a source of Stenotrophomonas maltophilia in human infections? Contributions of a nationwide molecular study. Eur J Clin Microbiol Infect Dis 37:1039–1045. doi: 10.1007/s10096-018-3203-0. [DOI] [PubMed] [Google Scholar]
- 9.Aujoulat F, Roger F, Bourdier A, Lotthé A, Lamy B, Marchandin H, Jumas-Bilak E. 2012. From environment to man: genome evolution and adaptation of human opportunistic bacterial pathogens. Genes (Basel) 3:191–232. doi: 10.3390/genes3020191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Varghese NJ, Mukherjee S, Ivanova N, Konstantinidis KT, Mavrommatis K, Kyrpides NC, Pati A. 2015. Microbial species delineation using whole genome sequences. Nucleic Acids Res 43:6761–6771. doi: 10.1093/nar/gkv657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Winther L, Andersen RM, Baptiste KE, Aalbæk B, Guardabassi L. 2010. Association of Stenotrophomonas maltophilia infection with lower airway disease in the horse: a retrospective case series. Vet J 186:358–363. doi: 10.1016/j.tvjl.2009.08.026. [DOI] [PubMed] [Google Scholar]
- 12.Grave K, Torren-Edo J, Muller A, Greko C, Moulin G, Mackay D; ESVAC Group. 2014. Variations in the sales and sales patterns of veterinary antimicrobial agents in 25 European countries. J Antimicrob Chemother 69:2284–2291. doi: 10.1093/jac/dku106. [DOI] [PubMed] [Google Scholar]
- 13.Youenou B, Favre-Bonté S, Bodilis J, Brothier E, Dubost A, Muller D, Nazaret S. 2015. Comparative genomics of environmental and clinical Stenotrophomonas maltophilia strains with different antibiotic resistance profiles. Genome Biol Evol 7:2484–2505. doi: 10.1093/gbe/evv161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lira F, Berg G, Martínez JL. 2017. Double-face meets the bacterial world: the opportunistic pathogen Stenotrophomonas maltophilia. Front Microbiol 8:2190. doi: 10.3389/fmicb.2017.02190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Collins J, Robinson C, Danhof H, Knetsch CW, van Leeuwen HC, Lawley TD, Auchtung JM, Britton RA. 2018. Dietary trehalose enhances virulence of epidemic Clostridium difficile. Nature 553:291–294. doi: 10.1038/nature25178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Richardson EJ, Bacigalupe R, Harrison EM, Weinert LA, Lycett S, Vrieling M, Robb K, Hoskisson PA, Holden MTG, Feil EJ, Paterson GK, Tong SYC, Shittu A, van Wamel W, Aanensen DM, Parkhill J, Peacock SJ, Corander J, Holmes M, Fitzgerald JR. 2018. Gene exchange drives the ecological success of a multi-host bacterial pathogen. Nat Ecol Evol 2:1468–1478. doi: 10.1038/s41559-018-0617-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Seemann T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069. doi: 10.1093/bioinformatics/btu153. [DOI] [PubMed] [Google Scholar]
- 18.Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2015. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055. doi: 10.1101/gr.186072.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM. 2016. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 17:132. doi: 10.1186/s13059-016-0997-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Katoh K, Misawa K, Kuma K, Miyata T. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30:3059–3066. doi: 10.1093/nar/gkf436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Price MN, Dehal PS, Arkin AP. 2009. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol 26:1641–1650. doi: 10.1093/molbev/msp077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Letunic I, Bork P. 2019. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res 47:W256–W259. doi: 10.1093/nar/gkz239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Blondel VD, Guillaume JL, Lambiotte R, Lefebvre R. 2008. Fast unfolding of communities in large networks. J Stat Mech 2008:P10008. doi: 10.1088/1742-5468/2008/10/P10008. [DOI] [Google Scholar]
- 24.Bourrel AS, Poirel L, Royer G, Darty M, Vuillemin X, Kieffer N, Clermont O, Denamur E, Nordmann P, Decousser JW; IAME Resistance Group. 2019. Colistin resistance in Parisian inpatient faecal Escherichia coli as the result of two distinct evolutionary pathways. J Antimicrob Chemother 74:1521–1530. doi: 10.1093/jac/dkz090. [DOI] [PubMed] [Google Scholar]
- 25.Brynildsrud O, Bohlin J, Scheffer L, Eldholm V. 2016. Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biol 17:238. doi: 10.1186/s13059-016-1108-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Avison MB, Higgins CS, von Heldreich CJ, Bennett PM, Walsh TR. 2001. Plasmid location and molecular heterogeneity of the L1 and L2 beta-lactamase genes of Stenotrophomonas maltophilia. Antimicrob Agents Chemother 45:413–419. doi: 10.1128/AAC.45.2.413-419.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Mojica MF, Rutter JD, Taracila M, Abriata LA, Fouts DE, Papp-Wallace KM, Walsh TJ, LiPuma JJ, Vila AJ, Bonomo RA. 2019. Population structure, molecular epidemiology, and β-lactamase diversity among Stenotrophomonas maltophilia isolates in the United States. mBio 10:e00405-19. doi: 10.1128/mBio.00405-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Cury J, Jové T, Touchon M, Néron B, Rocha EP. 2016. Identification and analysis of integrons and cassette arrays in bacterial genomes. Nucleic Acids Res 44:4539–4550. doi: 10.1093/nar/gkw319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Orlek A, Phan H, Sheppard AE, Doumith M, Ellington M, Peto T, Crook D, Walker AS, Woodford N, Anjum MF, Stoesser N. 2017. A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide database. Data Brief 12:423–426. doi: 10.1016/j.dib.2017.04.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All the new genomic data are publicly available through BioProject accession no. PRJEB33154.