Skip to main content
BMC Infectious Diseases logoLink to BMC Infectious Diseases
. 2026 Jan 6;26:243. doi: 10.1186/s12879-025-12501-1

Genomic variants distinguish Pseudomonas aeruginosa from cystic fibrosis sputum and bloodstream infection

Samara T Choudhury 1, Samantha K Lindberg 1, Cheryl P Andam 1,
PMCID: PMC12870028  PMID: 41495674

Abstract

Background

Pseudomonas aeruginosa is an opportunistic human pathogen that causes a variety of acute and chronic infections in humans. However, the genetic basis of its ability to cause distinct diseases in different body sites remains unclear. We aim to identify the genomic elements that distinguish P. aeruginosa derived from bloodstream infections and sputum from cystic fibrosis (CF).

Methods

We carried out phylogenetic analysis and genome-wide association study (GWAS) of a P. aeruginosa dataset consisting of 840 genomes derived from CF sputum and 249 genomes from bloodstream infections.

Results

We identified 342 distinct sequence types (ST). Ten STs were most prevalent (ST111, ST244, ST235, ST549, ST463, ST233, ST309, ST654, ST253, and ST357) and collectively accounted for 38.7% of the dataset. These ten STs were detected in both bloodstream- and CF sputum-derived genomes, albeit at different proportions. We detected specific antimicrobial resistance and virulence genes that were significantly enriched in bloodstream- or CF sputum-derived genomes. GWAS based on single nucleotide polymorphisms (SNPs) in the core genome, unitigs, and accessory genes revealed genomic variants that differ between the two groups, which are associated with a variety of functions, including stress response, nutrient acquisition, defense and redox activity. Narrow-sense heritability associated with variation in infection site is moderately high for SNPs (h2 = 0.737) and accessory genes (h2 = 0.612), but low for unitig variation (h2 = 0.25). Controlling for population structure, we carried out GWAS on 110 phylogenetically matched genomes, which uncovered additional loci distinguishing bloodstream-derived and CF-derived genomes. Those genes with the most pronounced associations are related to nutrient transport and metabolism (carbohydrate, ethanol, lipid, menaquinone (vitamin K2), phenazine, phosphate), siderophore pyoverdine, peptidoglycan remodeling, and stress response.

Conclusions

In conclusion, our study reveals the loci that are enriched in P. aeruginosa derived from CF sputum and from bloodstream infections. The observed genetic associations lie largely on the differential antibiotic selection pressure and the dissimilarity between acute versus chronic isolates. The genomic heterogeneity of P. aeruginosa from distinct diseases underscores its opportunistic lifestyle and ecological versatility, and this knowledge is critical to improving our ability to manage infections and improve patient outcomes.

Supplementary information

The online version contains supplementary material available at 10.1186/s12879-025-12501-1.

Keywords: Pseudomonas aeruginosa, Genome, Genome-wide association study, Bloodstream infection, Cystic fibrosis

Background

Pseudomonas aeruginosa is a ubiquitous Gram-negative saprophyte and commensal microbe that is often found in water, soil, plants and non-clinical environments associated with human activities [1]. It is also a notorious opportunistic human pathogen, causing both acute and chronic infections in the respiratory tract, bloodstream, urinary tract, eyes, and soft-tissue wounds [2]. In healthcare settings, P. aeruginosa infections are often linked to contaminated fomites (e.g., liquid dispensers, humidifiers, nebulizers, stethoscopes, ultrasound probes, suction apparatus) [3], social contact and person-to-person transmission [4], and pre-colonization and intestinal carriage that can disseminate to other body sites [5]. In immunocompromised individuals, P. aeruginosa infections are particularly severe and can have debilitating and fatal outcomes [6, 7].

P. aeruginosa infections are notably difficult to treat because of its remarkable range of mechanisms in evading the impacts of antimicrobial agents – intrinsic resistance [8], adaptive resistance [9], horizontally acquired resistance [10], and heteroresistance [11]. As a result, multidrug-, extensively drug-, and pandrug-resistant P. aeruginosa strains have emerged and are now major public health threats [12]. The World Health Organization lists carbapenem-resistant P. aeruginosa in the critical group of pathogens for which new antimicrobials are urgently needed [13]. It is also one of the ESKAPE group of nosocomial pathogens that are highly virulent and antimicrobial resistant [14]. Antimicrobial resistance (AMR) in P. aeruginosa reduces treatment options, which is especially critical for serious infections and increases the burden of disease and mortality rates [15].

The ecological versatility of P. aeruginosa lies in part on its large genome (about 6–7 Mb) consisting of a relatively small set of core genes (i.e., genes found in all or nearly all strains) and an extensive accessory gene pool (i.e., genes found in one or few strains) [16, 17]. Pan-genome analyses show that each new P. aeruginosa genome contributes many unique, strain-specific genes and that the species pan-genome contains a vast, unlimited gene pool, i.e., open pan-genome [16, 17]. The accessory genome of P. aeruginosa includes niche-adaptive and mobile genetic elements (e.g., prophages, transposons, plasmids, genomic islands) acquired by horizontal transfer [18]. Yet nearly a third of its gene pool remains uncharacterized [1], hindering a comprehensive understanding of the genetic basis of the wide variety of diseases this pathogen causes and its ability to adapt successfully in disparate environments.

Here, we analyzed 1089 publicly available genomes to identify bacterial genetic signatures that distinguish P. aeruginosa derived from bloodstream infections and cystic fibrosis (CF) sputum. Altogether, our results indicate that P. aeruginosa causing disease in the bloodstream and CF lung is in part influenced by distinct sets of AMR and virulence genes, as well as specific genomic variants (single nucleotide polymorphisms (SNPs), unitigs, and accessory genes). Our findings will help inform efforts for accurate trait prediction of bacterial strains and guide diagnosis and targeted treatment of P. aeruginosa infections.

Methods

Genome collection, sequence quality assessment, and annotation

We retrieved 1115 publicly available P. aeruginosa genome assemblies (complete and draft) available as of March 2024 from the National Center for Biotechnology Information (NCBI). The genome collection consists of sequences from clinical bloodstream infections and CF sputum samples, spanning the years from 1999 to 2024, and originating from geographically diverse locations worldwide. Assembly quality was evaluated using QUAST v0.5.0.2 [19] and CheckM v0.1.1.3 [20]. Low-quality genomes with < 90% genome completeness or > 5% genome contamination were excluded, following recommended thresholds in CheckM [20]. We also removed assemblies consisting of > 200 contigs or those with an N50 < 40 kbp to ensure inclusion of only high-quality sequences. To confirm that the genomes are members of the same species, we calculated the genome-wide average nucleotide identity (ANI) for all genome pairs using fastANI v0.1.32 and any genome with < 95% ANI to the others was excluded [21].

Altogether, the final dataset consisted of 1089 high-quality genome assemblies (completeness 98.91–100%, contamination < 3.26%) (Table S1). Contig counts ranged from 1 to 199 and genome sizes ranged approximately 5.5–7.5 Mb. The genome assemblies were subsequently annotated using Prokka v0.1.14.6 [22] using default parameters, producing standardized GFF files of coding sequences and other features for each genome.

Pan-genome analysis and phylogenetic tree reconstruction

We used Panaroo v0.1.2.7 [23] to define the entire set of genes (or pan-genome [24]) of the dataset (Table S2). Panaroo was run in strict mode with the option -remove-invalid-genes to discard annotations that do not conform to expected gene structure. All gene clusters identified by Panaroo were aligned at the nucleotide level using MAFFT v0.7.487 [25] and the core gene sequences were concatenated. SNPs in the core genome alignment were identified using SNP-sites v0.2.5.1 [26]. The core genome SNP alignment (n = 3,74,356 SNPs) was then used to build a maximum likelihood tree using IQ-TREE v0.2.1.2 [27] with a generalized time reversible (GTR) [28] model of nucleotide substitution, Gamma distribution of rate heterogeneity, and 1000 bootstrap replicates. Population structure was assessed with fastBAPS v0.1.0.6, a Bayesian hierarchical clustering method that partitions the dataset into clusters of genetically similar sequences [29]. Pairwise SNP distances among all genomes were calculated using snp-dists v0.0.8.2 (https://github.com/tseemann/snp-dists).

In silico sequence typing and detection of AMR and virulence genes

Sequence types (STs) were determined for all isolates using MLST v0.2.19.0 (https://github.com/tseemann/mlst). This tool extracts the sequences of seven housekeeping genes (acsA, aroE, guaA, mutL, nuoD, ppsA, trpE) [30] and compares their allelic profiles to the P. aeruginosa PubMLST database to assign a known ST for each genome [31]. We identified genes associated with AMR and virulence in each genome using ABRicate v0.1.0.1 (https://github.com/tseemann/abricate) (Table S3) to screen assembled genomes against the Comprehensive Antibiotic Resistance Database (CARD) [32] and the Virulence Factor Database (VFDB) [33]. Gene hits were considered present if they met the minimum alignment criteria of ≥80% sequence coverage and ≥80% nucleotide identity to reference entries.

Genome-wide association analysis

We carried out a bacterial genome-wide association study (GWAS) on the 840 genomes derived from CF sputum and 249 genomes from bloodstream infection to identify genetic variants associated with each infection site. Three classes of genetic variation were considered: SNPs in the core genome, accessory gene presence/absence, and unitigs. Core genome SNPs were derived from the alignment of core genes described above. We converted the biallelic SNPs into a pedigree file format using VCFtools v0.0.1.16 [34]. Presence or absence of accessory genes was represented as binary variables based on the Panaroo pan-genome matrix and created using custom Python scripts. We performed a reference-free unitig analysis by using the SKA toolkit (Split Kmer Analysis; https://github.com/simonrharris/SKA) v0.1.0 that splits each genome sequence into 31-bp k-mers and generates a presence/absence matrix of k-mers across all genomes. To reduce spurious associations, we excluded variants or k-mers that were rare (i.e., present in < 1% of genomes) or too common (i.e., present in > 99% of genomes) prior to association testing, as these provide little statistical power or information. We used PLINK v0.1.9 to convert and filter genotype data for all three variant types [35]. For all three GWAS testing, genomic variants were mapped to P. aeruginosa reference genome PAO1 (NCBI Accession number NC_002516.2).

Genome-wide association testing was then carried out for each variant using GEMMA v0.0.98.3, which implements the efficient mixed-model association (EMMA) algorithm [36]. In the univariate linear mixed model, the binary phenotype (sample source) was treated as the outcome (coded 0/1) and each genetic variant (SNP, accessory gene, or unitig) as a fixed effect predictor, with a kinship matrix generated to control for clonal population structure. No additional covariates (e.g., collection year, geographic region, clinical features) were considered because these types of metadata were incomplete and not uniformly available for several of the publicly available genomes in our dataset. We considered associations to be significant genome-wide if they surpassed a Bonferroni-corrected value threshold of alpha = 0.05, adjusted for the number of variants tested. We used only one genome-wide significance threshold value based on genome size for all three GWAS following the method described by Chaguza et al. [37]. Thus, we included those genetic variants with p value < 7.98 × 10− 9, calculated using ⍺/G where the statistical significance threshold ⍺ = 0.05 and the genome size G = 6,264,404 bp for the PAO1 reference genome. We used this conservative Bonferroni-corrected value threshold to minimize false positives in a high-dimensional setting with extensive linkage and population structure, and to remain consistent with prior microbial GWAS using genome-wide thresholds based on genome size. We also used GEMMA to estimate the narrow-sense heritability, i.e., the overall proportion of phenotypic variability (in this study referring to bloodstream infection or CF) explained by variation in the bacterial genome.

To account for potential confounding effects resulting from clonal population structure, we carried out a second set of GWAS on phylogenetically matched pairs of genomes whereby a pair is selected if two genomes are each other’s closest relative based on their location in the phylogeny (Fig. 1A) and the pair consisted of one genome from bloodstream and another from CF sputum. In all, this matched dataset consisted of 110 genome pairs (Figure S1). GWAS of SNPs, unitigs, and accessory genes on this matched dataset was carried out using the methods described for the unmatched dataset described above. Functions of genes detected in GWAS were further obtained from the UniProtKB database [38].

Fig. 1.

Fig. 1

Phylogenetic relationships and population structure of 1089 P. aeruginosa genomes show high diversity. (A) maximum likelihood phylogenetic tree constructed using single nucleotide polymorphisms obtained from the sequence alignment of 3597 core genes and 1416 soft-core genes. Tree scale represents the number of nucleotide substitutions per site. The tree is rooted at the midpoint. Colored branches represent the BAPS sequence clusters (PC1 - PC5). Outer rings (from innermost to outermost) represent the year of collection, sequence type (ST), geographical region, and infection source (bloodstream or cystic fibrosis [CF] sputum). For visual clarity, only the ten major STs are displayed and other less common STs are grouped together in the category “others”. Panels B – D: proportion of genomes from blood and sputum according to collection year (B), geographical region (C), and STs (D). Colors in panels B – D correspond to those in panel A. Details of genome assembly features and metadata associated with each genome are presented in Table S1

Statistical analysis and visualization

All data processing, statistical analyses and generation of plots were carried out using R v0.4.3.1 [39]. Visualization of results was performed with the R packages ggplot2 v0.3.4.4, qqman v0.0.1.8 [40], and ggtree v0.3.4.0 [41]. Summary of GWAS results were visualized using allele frequency distributions and Manhattan plots. For the unmatched dataset, Wilcoxon signed-rank test was used to compare differences in the overall number of AMR and virulence genes per genome between bloodstream- and CF sputum-derived genomes and Fisher exact test for differences in the number of specific genes between the two groups. For the phylogenetically matched genome pairs, we used the McNemar’s test using the gene presence/absence data generated by Panaroo to identify genes overrepresented in bloodstream- or CF sputum-derived genomes. Likelihood ratio test was used to compare the genetic and environmental variance for SNPs, unitigs and accessory genes between the two sets of genomes. We used a p-value threshold ≤ 0.05 to consider the significance of our results. Unless otherwise noted, all bioinformatics tools and software were used with their default parameters.

Results

High genetic diversity characterizes P. aeruginosa from two human diseases

We retrieved and annotated 1089 publicly available genome sequences of P. aeruginosa available from NCBI. The dataset comprised 840 genomes derived from CF sputum and 249 genomes from bloodstream infections (Fig. 1A; Table S1). Available metadata from public records were primarily collection year, country/region, and isolation source, although some genomes were missing these information. Detailed clinical variables (e.g., antibiotic exposure, immune status, CF severity metrics such as FEV1/exacerbation status, or whether bacteremia was primary versus secondary) were also not consistently available. The genomes were collected between 1999 and 2024 from 32 countries spanning six continents (Fig. 1BC). As these are publicly available genomes, sampling is uneven across time and geographic regions (Fig. 1B–C; Table S1), and clinical metadata are limited. We therefore interpret pan-genome estimates as reflective of this dataset. The pan-genome consisted of 3597 core genes (present in ≥99% of genomes), 1416 soft core genes (95 to < 99%), 1983 shell genes (15 to < 95%), and 62,133 cloud genes ( < 15%), totaling 69,129 unique genes (Table S2). These values reflect the extensive accessory gene content of P. aeruginosa and are consistent with those reported in previous studies [17, 42].

Bayesian hierarchical clustering of the core genome alignment using fastBAPS [29] partitioned the P. aeruginosa genomes into five major phylogenetic groups referred to as phylogenetic clusters 1–5 (PC1 – PC5) (Fig. 1A and Table S1). Two dominant clusters PC4 and PC5 accounted for 69.4% and 28.9% of the dataset, respectively, while PC1–3 together comprised less than 2% of the dataset. The two large clusters are widely distributed across years and geographical regions (Fig. 1BC). Bloodstream- and CF sputum-derived genomes were present and intermingled within each of PC4 and PC5. When separated by infection site, PC5 genomes made up 66.7% and 70.2% of the bloodstream and CF sputum genomes, respectively. PC4 genomes comprised 32.1% and 28.0% of the bloodstream and CF sputum genomes, respectively.

In silico multilocus sequence typing (MLST) identified 342 distinct STs in our dataset, with ten STs containing more than 20 genomes each (ST111, ST244, ST235, ST549, ST463, ST233, ST309, ST654, ST253, and ST357) (Fig. 1D and Table S1). These ten STs collectively accounted for 38.7% of all genomes. All ten STs were detected in both bloodstream-derived and CF sputum-derived genomes, albeit at different proportions (Fig. 1D). ST111, ST244, ST549, ST463, ST654, and ST233 are part of phylogenetic cluster PC5, whereas ST235, ST253, ST309, and ST357 are members of cluster PC4. ST235, ST111, and ST244 were frequently recovered from blood, while ST549, ST463, and ST309 were primarily found in CF sputum.

Overall, we found high phylogenetic diversity of P. aeruginosa with no predominance of specific lineages in either bloodstream infection or CF.

Distribution of AMR and virulence genes

We examined the distribution of AMR and virulence-associated genes between the two infection sites (Figure S2AB, Table S3 and Table S4). Overall, we did not find significant difference in terms of the total number of AMR or virulence genes per genome (Figure S2AB; p values > 0.05 for both sets of genes, Wilcoxon signed-rank test). However, we detected specific AMR genes that were significantly enriched in bloodstream-derived genomes: aer-1 (beta-lactam), aph(3’)-IIa (aminoglycoside), aph(6)-Ic (aminoglycoside), blmt (bleomycin) (all with p < 0.05, Fisher exact test). AMR genes that were significantly enriched in CF sputum-derived genomes included genes encoding resistance to aminoglycosides (aac(6’)-Ib7, aph(3’’)-Ib) and beta-lactams (blaOXA-101, blaOXA-395, blaOXA-847, blaOXA-848, blaOXA-905, blaPDC-16, blaPDC-374, blaPDC-55, blaVIM-2) (all with p < 0.05, Fisher exact test).

We also identified virulence-associated genes that significantly differed between bloodstream- and CF sputum-derived genomes (Table S4). Most often detected in bloodstream-derived genomes were genes with functions related to type VI secretion system (hcpA), phenazine biosynthesis (phz), synthesis of the siderophore pyoverdine (pvdE), and synthesis of the O-antigen component of the lipopolysaccharide (wbp) (all with p < 0.05, Fisher exact test). Genes that were overrepresented in CF sputum-derived genomes include those associated with alginate production which is important in biofilm production (algP), type VI secretion system (fha1, ppkA, pscN), type III secretion system (pscP), flagellar assembly (fliR), pyoverdine synthesis and transport (fpvA, pvdD, pvdI, pvdJ), quorum sensing (lasI), synthesis of siderophore pyochelin (pchE, pchH), phenazine synthesis (phz), Type IV major pilin protein (pilA), synthesis of paerucumarin chromophore linked to biofilm formation (pvcABD), O-antigen synthesis (wbp, wzx, wzy, wzz) (all with p < 0.05, Fisher exact test).

The exoenzyme U (exoU) gene was more abundant in bloodstream-derived genomes (p value = 2.31 × 10− 4, Fisher’s exact test), while exoS and exoY were more common in CF sputum-derived genomes (p value = 2.10 × 10− 5 and 0.02885, respectively) (Table S3 and Table S4). The exoU gene encodes a potent A2 phospholipase cytotoxin responsible for disruption of the cellular membrane and cellular lysis, while exoS encodes a bifunctional cytotoxin that also functions in cell migration, apoptosis, phagocytosis and cytoskeleton organization. ExoS is involved in the dissemination of P. aeruginosa from the pneumonic lung to the bloodstream [43]. It has also been posited that exoU and exoS define ecologically distinct P. aeruginosa groups [44] and are rarely found together in the same strain [45], which may indicate antagonistic relationship between the two genes [46] and/or environmental selection [47]. The exoY gene encodes a type III secretion system effector protein.

Overall, we found differential distribution of known AMR and virulence genes among the genomes from bloodstream infection and CF, which may reflect differences in antimicrobial selection pressures between the two diseases.

Genome wide association analysis reveals infection site-specific genetic signatures

We hypothesized that certain genetic variants beyond the presence or absence of AMR and virulence genes distinguish bloodstream- versus CF sputum-derived P. aeruginosa. To test this, we carried out GWAS using a linear mixed model in GEMMA [36] correcting for population structure to identify genomic variants associated with infection site. Three types of genetic variants were analyzed – core genome SNPs, unitigs, and accessory genes.

SNP-based GWAS (Fig. 2A and Table S5) revealed 297 significant SNPs that were clustered into five discrete genomic regions. Five of these regions were enriched in bloodstream-derived genomes, while a single region was enriched in sputum-derived genomes. Bloodstream-enriched regions included pgrR, a regulator of iron acquisition and virulence; cmtAb, which is involved in the degradation of aromatic hydrocarbons through the p-cumate degradation pathway; fdhA, which encodes a subunit of formate dehydrogenase important for redox balance during anaerobic respiration; and emrB which is associated with multidrug efflux pumps. These genes reflect general functions related to iron homeostasis, oxidative stress response, and nutrient processing that may support survival in bloodstream environments. The function of the ydbD gene is not characterized in UniProt [38].

Fig. 2.

Fig. 2

Genome-wide association analysis uncovers loci significantly associated with bloodstream infection and CF sputum. Manhattan plots of (A) single nucleotide polymorphisms (SNPs) in the core genome, (B) unitigs, and (C) accessory genes associated with clinical source and determined using GEMMA. In all three panels, log-transformed p-values (–log₁₀ P) are shown and mapped to P. aeruginosa reference genome PAO1 (represented by the ×axis). The genome-wide significance threshold is indicated by the dashed horizontal line. Loci with significant results and with known functions are labeled. Details of each GWAS are presented in Tables S5S7

Unitig-based GWAS (Fig. 2B and Table S6) detected 146 significant unitigs. Unitigs associated with bloodstream infection were mapped to genes involved in key survival pathways in systemic infection. These included gltK, an amino acid transporter important for nutrient acquisition; clpX, a protease subunit involved in protein quality control and stress response; and dltA, a gene implicated in cell membrane adaptation and resistance to host defenses. Genes associated with CF are linked to mechanisms for survival in the respiratory environment. These genes include cdiA encoding an exoprotein involved in contact-dependent growth inhibition, a bacterial mechanism used for competition and defense.; tpx encoding thiol peroxidase, which removes toxic peroxides and functions in protection against oxidative stress; and ttgC encoding an efflux pump outer membrane protein.

GWAS of accessory genes (Fig. 2C and Table S7) was carried out using the set of genes present in 15% to 95% of genomes (n = 1,983 accessory genes). Six genes demonstrated significant association with infection site. Three genes were significantly associated with CF: oprB encoding a carbohydrate-selective porin that facilitates the transport of sugars, zwf encoding glucose-6-phosphate 1-dehydrogenase that functions in glucose metabolism and oxidative stress response, and imuB that functions in error-prone DNA repair and inducible mutagenesis. The gene imuB is also known to contribute to the generation of ciprofloxacin-resistance mutations in P. aeruginosa [48]. Significantly associated with bloodstream infection is the gene pstB, which encodes the ATPase component of the phosphate-specific transport system.

To account for the confounding effect of clonal population structure and to increase statistical power of genotype-phenotype associations, we carried out a second GWAS using 110 pairs of phylogenetically matched genomes (Figures S1 and S3). A total of 19 out of 192,325 core genome SNPs analyzed, 4864 out of 349,026 unitigs, and 434 out of 1758 accessory genes were significantly associated with infection site (Tables S8, S9, S10). GWAS of matched bloodstream- and CF sputum-derived genomes confirmed the results of the GWAS of the unmatched genomes (Fig. 2) and uncovered additional loci not identified in GWAS of unmatched genomes (Table 1). The additional loci with the most significant associations had functions related to menaquinone (vitamin K2) biosynthesis, peptidoglycan remodeling and cell wall growth, and lipid transport.

Table 1.

Significantly enriched genes in bloodstream- and cystic fibrosis (CF) sputum-derived P. aeruginosa genomes based on gene presence/absence analysis of 110 phylogenetically matched genome pairs. These genes represent the top discordant genes identified using McNemar’s test, indicating that they are significantly more likely to be present in isolates from one infection site compared to the other. Details of the GWAS for the matched pairs are presented in Tables S8S10

Gene Gene function Corrected p-value (McNemar’s test) Enriched in
adh Alcohol dehydrogenase; involved in ethanol and aldehyde metabolism 1.83 × 10 − 17 Bloodstream
anoR Transcriptional regulator of anaerobic respiration and quorum sensing 8.95 × 10 − 19 Bloodstream
cdiA Contact-dependent growth inhibition toxin 5.72 × 10 − 20 Bloodstream
cyaA Adenylate cyclase; converts ATP to cAMP, important for virulence regulation 5.72 × 10 − 20 Bloodstream
exaA Pyrroloquinoline quinone (PQQ)-dependent ethanol dehydrogenase 2.93 × 10 − 17 Bloodstream
gcd Glucose dehydrogenase; involved in carbon metabolism 1.31 × 10 − 21 Bloodstream
hcpA_1 Hemolysin co-regulated protein; effector of type VI secretion system (T6SS) 9.15 × 10 − 19 Bloodstream
hcpA_2 3.66 × 10−18 CF sputum
imuB DNA polymerase V subunit; involved in SOS response and error-prone repair 6.98 × 10 − 24 Bloodstream
menF Isochorismate synthase; involved in menaquinone (vitamin K2) biosynthesis 1.19 × 10 − 19 Bloodstream
mepM Murein endopeptidase; peptidoglycan remodeling, required for cell wall growth 1.31 × 10 − 21 Bloodstream
mpl Murein peptide ligase; peptidoglycan recycling 1.41 × 10 − 16 Bloodstream
oprB_2 Carbohydrate-selective outer membrane porin 1.12 × 10 − 22 Bloodstream
oprB _3 7.70 × 10−14 CF sputum
phzA2 Phenazine biosynthesis 2.34 × 10 − 19 CF sputum
phzB1 Phenazine biosynthesis 2.79 × 10 − 23 CF sputum
phzB2 Phenazine biosynthesis 6.10 × 10 − 20 CF sputum
phzD2 Phenazine biosynthesis 3.07 × 10 − 19 Bloodstream
phzF Phenazine biosynthesis 1.75 × 10 − 18 Bloodstream
phzG Phenazine biosynthesis 5.36 × 10 − 15 Bloodstream
phzM Phenazine biosynthesis 7.15 × 10 − 21 CF sputum
pipB2 Type III effector, may influence intracellular trafficking 3.66 × 10 − 18 Bloodstream
potE Putrescine transporter and ornithine antiporter 7.32 × 10 − 18 Bloodstream
pstA1 Membrane component of phosphate-specific ABC transporter 5.72 × 10 − 20 Bloodstream
pstB ATP-binding component of Pst phosphate transporter; regulates uptake of inorganic phosphate 5.72 × 10 − 20 CF sputum
sasA Histidine kinase; involved in stress response 4.47 × 10 − 22 CF sputum
shlB Hemolysin transporter, outer membrane component 9.15 × 10 − 19 Bloodstream
ttgI Multidrug efflux transporter component 1.31 × 10 − 21 CF sputum
yebT Lipid transporter 3.07 × 10 − 19 Bloodstream
zwf Glucose-6-phosphate dehydrogenase (oxidative pentose phosphate pathway) 1.12 × 10 − 22 Bloodstream

Notably, two variants of the hcpA gene (labeled hcpA_1 and hcpA_2 in Table 1), which encodes a hemolysin co-regulated protein involved in type VI secretion system, exhibited contrasting distribution in which one was associated with bloodstream infection, while another variant was associated with CF. Despite sharing the same predicted function, they differ by 5% in their amino acid sequences. Variants of the oprB gene (labeled oprB_2 and oprB_3 in Table 1), which encodes a carbohydrate-selective outer membrane porin, showed similar results. The two oprB variants, differing by 1% in their amino acid sequences, was detected in either bloodstream-derived genomes or CF-derived genomes.

Overall, we found specific genomic variants (core genome SNPs, unitigs, accessory genes) that were significantly associated with either bloodstream or CF infection.

Heritable genetic variation contributes to infection site differences in P. aeruginosa

We calculated the narrow-sense heritability (h2, range = 0 to 1) to quantify the contributions of P. aeruginosa genetic variation to site-specific infection (Fig. 3A and Table S11). High heritability values suggest that a locus or variant is under strong natural selection pressure. We found moderately high heritability for infection of different body sites based on the SNP variation in the core genome (h2 = 0.737, standard error [SE] = ±0.022) and accessory genes (h2 = 0.612, SE = ±0.047), but low heritability based on unitig variation (h2 = 0.25, SE = ±0.056). This means that SNP, accessory genes and unitig variation contribute 73.7%, 61.2%, and 25.0%, respectively, of the phenotypic variability in terms of disease types. We also examined the underlying components of the variance to understand the patterns of heritability (Fig. 3B). For SNPs, the genetic variance (vg = 0.475) was much higher than the residual (environmental or unexplained) variance (ve = 0.019) (p = 2.2 × 10−94, likelihood ratio test). Similar results were observed for accessory genes (vg = 0.571 and ve = 0.102) (p = 1 × 10−23). In contrast, we did not observe significant difference between the genetic and residual variance for unitigs (vg = 0.231 and ve = 0.140) (p = 0.0537). These results show that the type of disease (bloodstream or CF) caused by P. aeruginosa is largely attributable to genetic variation due to core genome SNPs and accessory genes.

Fig. 3.

Fig. 3

Heritability and variance explained by genetic features associated with infection source. (A) bar plot showing narrow-sense heritability (h2) estimates for core genome SNPs, unitigs, and accessory genes derived from a linear mixed model implemented in GEMMA. (B) grouped bar plot showing the variance components, genetic variance and environmental variance for each genomic feature (core genome SNPs, unitigs, accessory genes). Statistical significance was analyzed using likelihood ratio test. We used a p value threshold ≤ 0.05 to consider the significance of our results. Details of the heritability estimates are presented in Table S11

Discussion

P. aeruginosa is a formidable opportunist that cause a variety of life-threatening infections. Here, we identified genomic variants that distinguish strains from bloodstream infections and from CF sputum. Both bloodstream and CF populations of P. aeruginosa were remarkably diverse and such heterogeneity within the species has previously been described at various geographic levels, from global scales to within hosts [4951]. Here, we did not find lineage-specific differences in infection sites, with the two major phylogenetic clusters (PC4 and PC5) and the ten most prevalent STs in our dataset consisting of genomes from both bloodstream infection and CF sputum. This suggests that any one lineage has the potential to cause either bloodstream infection or CF, emphasizing the opportunistic lifestyle of P. aeruginosa. This is certainly a concern especially because six of the predominant clones (ST111, ST244, ST235, ST309, ST253, ST357; Fig. 1) in bloodstream infections and CF are included in the top ten globally disseminated epidemic clones associated with multidrug- and extensively drug-resistant phenotypes (referred to as high-risk clones) [49, 52].

P. aeruginosa causing disease in distinct body sites carries distinct set of AMR and virulence genes as well as specific genomic variants (SNPs, unitigs, and accessory genes) that may largely reflect treatment-driven selection pressures and distinct bacterial evolutionary histories in acute versus chronic infections. In the CF lung, long-term antibiotic exposure and the viscous mucus favor genetic variants associated with enhanced biofilm production, motility and iron scavenging [53], whereas in the bloodstream the pathogen must evade immune defenses and exploit host nutrients [54]. Specifically, the management of CF lung infections and bloodstream infections involves distinct antibiotic regimens. CF isolates are subject to chronic, suppressive, often inhaled antibiotics such as tobramycin and aztreonam [55]. In contrast, patients with bloodstream infection undergo aggressive, broad-spectrum intravenous therapy such as carbapenems and advanced beta-lactam/beta-lactamase inhibitors [56, 57]. The genetic associations identified, particularly those involving AMR genes and linked loci, more likely reflect selection by these differential antibiotic pressures to which P. aeruginosa is exposed to. Moreover, isolates from these two sources represent fundamentally different evolutionary histories. CF isolates often reflect chronic, adapted lineages that have evolved in situ over years, accumulating classic pathoadaptive mutations (e.g., in mucA, lasR) [58, 59]. In contrast, bloodstream infections caused by P. aeruginosa are typically acute events. It is highly uncommon for P. aeruginosa bacteremia to become a chronic, persistent state without focal primary infection (e.g., an infected prosthetic device) [60]. Moreover, a recent study identified a P. aeruginosa small RNA (sicX) that helps coordinate the transition between chronic and acute infection in human-derived samples, highlighting that regulatory shifts can contribute to infection phenotypes alongside genomic variation [61]. This acute/chronic dichotomy is therefore a powerful driver of genetic divergence. The observed genomic variants we report in this study will be important basis for future experimental validation efforts to better understand the pathogenicity of P. aeruginosa. Our study expands existing knowledge about bacterial genetic variation from ecologically distinct infection sites, as have been reported in Streptococcus agalactiae [62], Helicobacter pylori [63], and Neisseria meningitidis [64].

In microbes, GWAS is a powerful approach to precisely identify genomic variants that are significantly associated with binary phenotypes, such as AMR, host specificity, and metabolic growth [65]. GWAS in P. aeruginosa have been limited but previous studies have revealed important insights into its clinically relevant features, such as resistance to ceftazidime-avibactam [66] and murepavadin [67] as well as biofilm production [68]. Our study contributes to this growing work on applying GWAS to understand clinically relevant traits in P. aeruginosa. A GWAS study comparing CF and non-CF genomes, the latter encompassing all other kinds of diseases in humans, used counting of 31-bp k-mers and identified specific deletions, insertions and single nucleotide replacement that underlie CF infection [53]. These mutations include those associated with alginate production, glucose catabolism, multidrug efflux system, mannitol utilization, cysteine uptake, and heme uptake [53]. Some of the loci and/or functions we identified in our study that were associated with CF are consistent with the results of this previous study, such as those mutations related to pentose phosphate pathway, ABC transporters, and glycolysis/gluconeogenesis. Yet we also uncovered additional loci associated with CF as well as distinguished them from those causing bloodstream infections (described in the next paragraph). We attribute these to our GWAS approach of using SNPs, unitigs, and accessory genes in both unmatched and matched genomes from two distinct diseases that provided a more comprehensive, fine-scale picture of these associations. Compared with prior k-mer–based approaches, our phylogenetically matched-pair design reduces the confounding influence due to shared ancestry by comparing nearest-neighbor genomes across infection sources, which can help refine signals to allelic variants (e.g., distinct hcpA and oprB variants) and highlight loci that persist after controlling for clonal structure.

Genomic variants that significantly differ between bloodstream and CF sputum isolates of P. aeruginosa are associated with diverse functional categories, including nutrient biosynthesis and transport, stress response, redox activity, and defense systems. For instance, several genes involved in phenazine biosynthesis were differentially enriched, consistent with roles in redox cycling and virulence regulation under oxygen-limited and nutrient-restricted conditions, though enrichment is also influenced by chronicity or treatment pressures [69, 70]. Genes encoding putrescine and phosphate transporters as well as ethanol and glucose dehydrogenases were also differentially distributed, emphasizing the importance of metabolic flexibility and nutrient uptake. This interpretation is supported by recent infection metabolomics work in Gram-negative bloodstream infections, where an iterative metabolomics pipeline detected bacterially derived metabolites in patient samples and identified a specific bacterial metabolic enzyme (SpeG) as a determinant of bloodstream infection pathogenesis and a potential therapeutic target [71]. Pyoverdine in pigment production enhances the bacterium’s ability to withstand oxidative stress in CF infections [72, 73]. Interestingly, the presence of divergent sequence variants of the same gene (hcpA and oprB), which exhibited site-specific enrichment in either bloodstream or CF sputum isolates, suggests that even subtle allelic differences may contribute to infection site specialization. We emphasize that as with all microbial GWAS, these associations are hypothesis-generating and will require targeted experimental validation to confirm causal mechanisms. Future work to investigate the utility of using hcpA and oprB as potential biomarkers for predicting infection site or adaptation strategies in clinical settings would certainly prove useful. In practice, infection-site–associated signatures could be incorporated into genomic surveillance pipelines to support source attribution and to monitor the emergence of lineages and trait combinations over time. For bloodstream infections specifically, population-based sequencing studies have demonstrated how genomics can track dominant clones/serotypes and changes in incidence over time in geographically defined regions, supporting surveillance use cases [74]. In addition, recent work integrating bacterial genomic features with clinical data showed that virulence genotypes can improve prediction of severe outcomes in P. aeruginosa bloodstream infection, suggesting that genomic markers may ultimately contribute to prognostic models alongside clinical covariates [75]. Nevertheless, any clinical deployment of site-associated markers (including hcpA/oprB variants identified here) will require prospective validation in cohorts with linked patient metadata and standardized sampling.

Heritability estimates of the genomic variation in SNPs (0.737) and accessory genes (0.612) associated with infection site are moderately high. These results indicate that the identified variants are under strong selection pressure, as expected in these unique ecological niches, and therefore highlight the major contribution of P. aeruginosa to disease types. Our heritability estimates are comparable to those reported for similar infection-related traits in other bacteria. For example, genetic variation in S. pneumoniae explains 70% heritability for invasive potential for pneumococcal meningitis, but has no effect on severity [76]. Likewise, genetic variation in Staphylococcus aureus accounts for variation in pyomyositis, a bacterial infection of skeletal muscles that lead to abscess, with 63.8% heritability and driven largely by the presence of the Panton–Valentine leukocidin toxin gene [77]. However, the contribution of pathogen genetics to disease is not always significant, as was observed in a study of N. meningitidis whereby pathogen genetic variation did not account for variance in disease severity of meningococcal meningitis [78]. In our study, we show that a large proportion of the variation in terms of disease types is due to heritable variation in multiple loci in P. aeruginosa. This knowledge is critical to understanding precisely the underlying bacterial factors that contribute to bloodstream infection and CF, which is imperative to developing effective strategies for disease control and treatment.

We recognize the limitations of our study. First, our genome collection is unevenly distributed across geographical regions, time, and disease types. Incomplete metadata (e.g., patient treatment history, disease outcomes, co-infections, co-morbidities, etc.) also limit interpretation of our findings. In particular, without patient-level antibiotic exposure and clinical severity data, we cannot distinguish whether some AMR or virulence gene enrichments reflect within-host treatment histories versus broader niche-associated selection. Unequal representation of environments can skew estimates of core and accessory gene content, which may consequently influence GWAS outcomes, and may also influence inferred population structure and lineage effects given uneven sampling across years and regions. In addition, our association framework focuses primarily on chromosomal variation mapped to PAO1 and presence/absence calls, and does not explicitly resolve the contribution of mobile genetic elements (e.g., plasmids or integrative conjugative elements) that may also shape niche adaptation. Future studies focused on the contributions of horizontal gene transfer via homologous recombination and mobile genetic elements in generating bacterial genetic variation between infection sites will be insightful. Nonetheless, our phylogenetically matched pair approach helps mitigate the confounding effect of population structure and improve power to detect genetic signals. The absence of detailed patient antibiotic treatment histories is another significant limitation, as many AMR gene associations may primarily reflect exposure regimens (e.g., chronic inhaled therapies in CF versus broad-spectrum therapies in bloodstream infections). Third, as with all GWAS, we have identified candidate loci but not proven causality. Distinguishing causal genetic variants from linked, non-causal variants require extensive experimental validation (e.g., mutagenesis, phenotype assays) to confirm the contributions of the loci we identified to disease types. Nonetheless, our study presents a first step to prioritizing those loci that may play a role in disease and uncover yet unknown variants that may contribute to P. aeruginosa infection. Our findings should be considered hypothesis-generating and require functional and clinical validation to confirm causal roles in infection. This work also provides a framework to investigate other traits related to disease and pathogenicity not only in P. aeruginosa but also in other pathogens with complex ecologies.

Conclusions

In conclusion, our study reveals the loci that are enriched in P. aeruginosa derived from CF sputum and from bloodstream infections. The observed genetic associations lie largely on the differential antibiotic selection pressure and the dissimilarity between of acute versus chronic isolates. The genomic heterogeneity of P. aeruginosa from distinct disease types underscores its opportunistic lifestyle and ecological versatility, and this knowledge is critical to improving our ability to manage infections and optimize patient outcomes. These results will help inform efforts for accurate trait prediction of bacterial strains and have direct implication for designing effective and targeted treatment strategies.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (60.5MB, xlsx)

Acknowledgements

We are grateful to the staff of the SUNY University at Albany Information Technology Services where all bioinformatics analyses were carried out. C.P.A. thanks Reginald and Priscilla Mae Farnsworth for insightful discussion and assistance.

Abbreviations

AMR

Antimicrobial resistance

CF

Cystic fibrosis

GWAS

Genome-wide association study

SNP

Single nucleotide polymorphism

ST

Sequence type

Author contributions

C.P.A. designed and guided the work. S.T.C. and S.K.L. carried out all bioinformatics analyses. S.T.C. wrote the initial manuscript. All authors have read and approved the final manuscript.

Funding

This work was supported by the National Institutes of Health (R35GM142924) to C.P.A. The funders had no role in study design, data collection and analysis, decision to publish, and preparation of the manuscript and the findings do not necessarily reflect views and policies of the authors’ institutions and funders.

Data availability

The dataset supporting the conclusions of this article is included within the article and its supplementary files. Genome sequence data of P. aeruginosa included in this study are available in the NCBI Sequence Read Archive. BioProject and BioSample accession numbers for each genome are listed in Table S1.

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Letizia M, Diggle SP, Whiteley M. Pseudomonas aeruginosa: ecology, evolution, pathogenesis and antimicrobial susceptibility. Nat Rev Microbiol. 2025;1–17. 10.1038/s41579-025-01193-8. [DOI] [PubMed]
  • 2.Qin S, Xiao W, Zhou C, Pu Q, Deng X, Lan L, et al. Pseudomonas aeruginosa: pathogenesis, virulence factors, antibiotic resistance, interaction with host, technology advances and emerging therapeutics. Signal Transduct Target Ther. 2022;7:199. 10.1038/s41392-022-01056-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Kanamori H, Rutala WA, Weber DJ. The role of patient care items as a fomite in healthcare-associated outbreaks and infection prevention. Clin Infect Dis. 2017;65:1412–19. 10.1093/cid/cix462. [DOI] [PubMed] [Google Scholar]
  • 4.Kidd TJ, Magalhães RJS, Paynter S, Bell SC. The social network of cystic fibrosis centre care and shared Pseudomonas aeruginosa strain infection: a cross-sectional analysis. The Lancet Respir Med. 2015;3:640–50. 10.1016/S2213-2600(15)00228-3. [DOI] [PubMed] [Google Scholar]
  • 5.Wheatley RM, Caballero JD, van der Schalk TE, De Winter FHR, Shaw LP, Kapel N, et al. Gut to lung translocation and antibiotic mediated selection shape the dynamics of Pseudomonas aeruginosa in an ICU patient. Nat Commun. 2022;13:6523. 10.1038/s41467-022-34101-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Dropulic LK, Lederman HM. Overview of infections in the immunocompromised host. Microbiol Spectr. 2016;4:10.1128/microbiolspec.dmih2–0026–2016. 10.1128/microbiolspec.dmih2-0026-2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hernández-Jiménez P, López-Medrano F, Fernández-Ruiz M, Silva JT, Corbella L, San-Juan R, et al. Risk factors and outcomes for multidrug resistant Pseudomonas aeruginosa infection in immunocompromised patients. Antibiotics. 2022;11:1459. 10.3390/antibiotics11111459. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Murray JL, Kwon T, Marcotte EM, Whiteley M. Intrinsic antimicrobial resistance determinants in the superbug Pseudomonas aeruginosa. mBio. 2015;6:10.1128/mbio.01603–15. 10.1128/mbio.01603-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Coleman SR, Blimkie T, Falsafi R, Hancock REW. Multidrug adaptive resistance of Pseudomonas aeruginosa swarming cells. Antimicrob Agents and Chemother. 2020;64:10.1128/aac.01999–19. 10.1128/aac.01999-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Thrane SW, Taylor VL, Freschi L, Kukavica-Ibrulj I, Boyle B, Laroche J, et al. The widespread multidrug-resistant serotype O12 Pseudomonas aeruginosa clone emerged through concomitant horizontal transfer of serotype antigen and antibiotic resistance gene clusters. mBio. 2015;6:10.1128/mbio.01396–15. 10.1128/mbio.01396-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Maxwell DN, Kim J, Pybus CA, White L, Medford RJ, Filkins LM, et al. Clinically undetected polyclonal heteroresistance among Pseudomonas aeruginosa isolated from cystic fibrosis respiratory specimens. J Antimicrob Chemother. 2022;77:3321–30. 10.1093/jac/dkac320. [DOI] [PubMed] [Google Scholar]
  • 12.Saha P, Kabir RB, Ahsan CR, Yasmin M. Multidrug resistance of Pseudomonas aeruginosa: do virulence properties impact on resistance patterns? Front Microbiol. 2025, 16. 10.3389/fmicb.2025.1508941. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.World Health Organization. WHO bacterial priority pathogens list, 2024: bacterial pathogens of public health importance to guide research, development and strategies to prevent and control antimicrobial resistance. 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Miller WR, Arias CA. ESKAPE pathogens: antimicrobial resistance, epidemiology, clinical impact and therapeutics. Nat Rev Microbiol. 2024;22:598–616. 10.1038/s41579-024-01054-w. [DOI] [PubMed] [Google Scholar]
  • 15.GBD. Antimicrobial resistance collaborators. Global burden of bacterial antimicrobial resistance 1990-2021. A systematic analysis with forecasts to 2050. Lancet. 2021. 2024;404:1199–226. 10.1016/S0140-6736(24)01867-1. [DOI] [PMC free article] [PubMed]
  • 16.Mosquera-Rendón J, Rada-Bravo AM, Cárdenas-Brito S, Corredor M, Restrepo-Pineda E, Benítez-Páez A. Pangenome-wide and molecular evolution analyses of the Pseudomonas aeruginosa species. BMC Genomics. 2016;17:45. 10.1186/s12864-016-2364-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Freschi L, Vincent AT, Jeukens J, Emond-Rheault J-G, Kukavica-Ibrulj I, Dupont M-J, et al. The Pseudomonas aeruginosa pan-genome provides new insights on its population structure, horizontal gene transfer, and pathogenicity. Genome Biol Evol. 2019;11:109–20. 10.1093/gbe/evy259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kung VL, Ozer EA, Hauser AR. The accessory genome of Pseudomonas aeruginosa. Microbiol Mol Biol Rev. 2010;74:621–41. 10.1128/MMBR.00027-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–75. 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25:1043–55. 10.1101/gr.186072.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9:5114. 10.1038/s41467-018-07641-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30:2068–69. 10.1093/bioinformatics/btu153. [DOI] [PubMed] [Google Scholar]
  • 23.Tonkin-Hill G, MacAlasdair N, Ruis C, Weimann A, Horesh G, Lees JA, et al. Producing polished prokaryotic pangenomes with the Panaroo pipeline. Genome Biol. 2020;21:180. 10.1186/s13059-020-02090-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R. The microbial pan-genome. Curr Opin Genet Dev. 2005;15:589–94. 10.1016/j.gde.2005.09.006. [DOI] [PubMed] [Google Scholar]
  • 25.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–80. 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Page AJ, Taylor B, Delaney AJ, Soares J, Seemann T, Keane JA, et al. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microb Genom. 2016;2:e000056. 10.1099/mgen.0.000056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37:1530–34. 10.1093/molbev/msaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Tavaré S. Some probabilistic and statistical problems in the analysis of DNA sequences. Lectures on mathematics in the life sciences. 1986;17:57–86.
  • 29.Tonkin-Hill G, Lees JA, Bentley SD, Frost SDW, Corander J. Fast hierarchical Bayesian analysis of population structure. Nucleic Acids Res. 2019;47:5539–49. 10.1093/nar/gkz361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Curran B, Jonas D, Grundmann H, Pitt T, Dowson CG. Development of a multilocus sequence typing scheme for the opportunistic pathogen Pseudomonas aeruginosa. J Clin Microbiol. 2004;42:5644–49. 10.1128/JCM.42.12.5644-5649.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Jolley KA, Bray JE, Maiden MCJ. Open-access bacterial population genomics: BIGSdb software, the PubMLST.Org website and their applications. Wellcome Open Res. 2018;3:124. 10.12688/wellcomeopenres.14826.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Alcock BP, Huynh W, Chalil R, Smith KW, Raphenya AR, Wlodarski MA, et al. CARD 2023: expanded curation, support for machine learning, and resistome prediction at the comprehensive antibiotic resistance Database. Nucleic Acids Res. 2023;51:D690–9. 10.1093/nar/gkac920. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Liu B, Zheng D, Zhou S, Chen L, Yang J. VFDB 2022: a general classification scheme for bacterial virulence factors. Nucleic Acids Res. 2022;50:D912–7. 10.1093/nar/gkab1107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–58. 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75. 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Zhou X, Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 2012;44:821–24. 10.1038/ng.2310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Chaguza C, Smith JT, Bruce SA, Gibson R, Martin IW, Andam CP. Prophage-encoded immune evasion factors are critical for Staphylococcus aureus host infection, switching, and adaptation. Cell Genom. 2022;2:100194. 10.1016/j.xgen.2022.100194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Consortium U. UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res. 2023;51:D523–31. 10.1093/nar/gkac1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.R Core Team. R: a language and environment for statistical computing. 2025. https://www.R-project.Org/.
  • 40.Turner SD. Qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots. J Open Source Softw. 2018;3:731. 10.21105/joss.00731. [Google Scholar]
  • 41.Yu G, Smith DK, Zhu H, Guan Y, Lam T-Y. Ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods In Ecol and Evol. 2017;8:28–36. 10.1111/2041-210X.12628. [Google Scholar]
  • 42.Whelan FJ, Hall RJ, McInerney JO. Evidence for selection in the abundant accessory gene content of a prokaryote pangenome. Mol Biol Evol. 2021;38:3697–708. 10.1093/molbev/msab139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Rangel SM, Diaz MH, Knoten CA, Zhang A, Hauser AR. The role of ExoS in dissemination of Pseudomonas aeruginosa during pneumonia. PLoS Pathog. 2015;11:e1004945. 10.1371/journal.ppat.1004945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Ozer EA, Nnah E, Didelot X, Whitaker RJ, Hauser AR. The population structure of Pseudomonas aeruginosa is characterized by genetic isolation of exoU+ and exoS+ lineages. Genome Biol Evol. 2019;11:1780–96. 10.1093/gbe/evz119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Nolasco-Romero CG, Prado-Galbarro F-J, Jimenez-Juarez RN, Gomez-Ramirez U, Cancino-Díaz JC, López-Marceliano B, et al. The exoS, exoT, exoU and exoY virulotypes of the type 3 secretion system in multidrug resistant Pseudomonas aeruginosa as a death risk Factor in Pediatric patients. Pathogens. 2024;13:1030. 10.3390/pathogens13121030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Beavan AJS, Domingo-Sananes MR, McInerney JO. Contingency, repeatability, and predictability in the evolution of a prokaryotic pangenome. Proc Natl Acad Sci USA. 2024;121:e2304934120. 10.1073/pnas.2304934120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Fondi M, Karkman A, Tamminen MV, Bosi E, Virta M, Fani R, et al. “Every gene is everywhere but the environment selects”: global geolocalization of gene sharing in environmental samples through network analysis. Genome Biol Evol. 2016;8:1388–400. 10.1093/gbe/evw077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Fahey D, O’Brien J, Pagnon J, Page S, Wilson R, Slamen N, et al. DinB (DNA polymerase IV), ImuBC and RpoS contribute to the generation of ciprofloxacin-resistance mutations in Pseudomonas aeruginosa. Mutat Res. 2023;827:111836. 10.1016/j.mrfmmm.2023.111836. [DOI] [PubMed] [Google Scholar]
  • 49.Weimann A, Dinan AM, Ruis C, Bernut A, Pont S, Brown K, et al. Evolution and host-specific adaptation of Pseudomonas aeruginosa. Science. 2024;385:eadi0908. 10.1126/science.adi0908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Feliziani S, Marvig RL, Luján AM, Moyano AJ, Di Rienzo JA, Krogh Johansen H, et al. Coexistence and within-host evolution of diversified lineages of hypermutable Pseudomonas aeruginosa in long-term cystic fibrosis infections. PLoS Genet. 2014;10:e1004651. 10.1371/journal.pgen.1004651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Harrington NE, Kottara A, Cagney K, Shepherd MJ, Grimsey EM, Fu T, et al. Global genomic diversity of Pseudomonas aeruginosa in bronchiectasis. J Infect. 2024;89:106275. 10.1016/j.jinf.2024.106275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Del Barrio-Tofiño E, López-CausapéC, Oliver A. Pseudomonas aeruginosa epidemic high-risk clones and their association with horizontally-acquired β-lactamases: 2020 update. Int J Antimicrob Agents. 2020;56:106196. 10.1016/j.ijantimicag.2020.106196. [DOI] [PubMed] [Google Scholar]
  • 53.Hwang W, Yong JH, Min KB, Lee K-M, Pascoe B, Sheppard SK, et al. Genome-wide association study of signature genetic alterations among Pseudomonas aeruginosa cystic fibrosis isolates. PLoS Pathog. 2021;17:e1009681. 10.1371/journal.ppat.1009681. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Lê-Bury P, Echenique-Rivera H, Pizarro-Cerdá J, Dussurget O. Determinants of bacterial survival and proliferation in blood. FEMS Microbiol Rev. 2024;48:fuae013. 10.1093/femsre/fuae013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Cogen JD, Nichols DP, Goss CH, Somayaji R. Drugs, drugs, drugs: current treatment paradigms in cystic fibrosis airway infections. J Pediatr Infect Dis Soc. 2022;11 Supplement_2):S32–9. 10.1093/jpids/piac061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Micek ST, Lloyd AE, Ritchie DJ, Reichley RM, Fraser VJ, Kollef MH. Pseudomonas aeruginosa bloodstream infection: importance of appropriate initial antimicrobial treatment. Antimicrob Agents Chemother. 2005;49:1306–11. 10.1128/AAC.49.4.1306-1311.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Hakeam HA, Askar G, Al Sulaiman K, Mansour R, Al Qahtani MM, Abbara D, et al. Treatment of multidrug-resistant Pseudomonas aeruginosa bacteremia using ceftolozane-tazobactam-based or colistin-based antibiotic regimens: a multicenter retrospective study. J Infect Public Health. 2022;15:1081–88. 10.1016/j.jiph.2022.08.020. [DOI] [PubMed] [Google Scholar]
  • 58.Ciofu O, Mandsberg LF, Bjarnsholt T, Wassermann T, Høiby N. Genetic adaptation of Pseudomonas aeruginosa during chronic lung infection of patients with cystic fibrosis: strong and weak mutators with heterogeneous genetic backgrounds emerge in mucA and/or lasR mutants. Microbiol (read). 2010;156 Pt, 4):1108–19. 10.1099/mic.0.033993-0. [DOI] [PubMed] [Google Scholar]
  • 59.Winstanley C, O’Brien S, Brockhurst MA. Pseudomonas aeruginosa evolutionary adaptation and diversification in cystic fibrosis chronic lung infections. Trends Microbiol. 2016;24:327–37. 10.1016/j.tim.2016.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Gürtler N, Osthoff M, Rueter F, Wüthrich D, Zimmerli L, Egli A, et al. Prosthetic valve endocarditis caused by Pseudomonas aeruginosa with variable antibacterial resistance profiles: a diagnostic challenge. BMC Infect Dis. 2019;19:530. 10.1186/s12879-019-4164-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Cao P, Fleming D, Moustafa DA, Dolan SK, Szymanik KH, Redman WK, et al. A Pseudomonas aeruginosa small RNA regulates chronic and acute infection. Nature. 2023;618:358–64. 10.1038/s41586-023-06111-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Chaguza C, Jamrozy D, Bijlsma MW, Kuijpers TW, et al. Population genomics of Group B Streptococcus reveals the genetics of neonatal disease onset and meningeal invasion. Nat Commun 2022;13:4215–27. 10.1038/s41467-022-31858-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Berthenet E, Yahara K, Thorell K, Pascoe B, Meric G, Mikhail JM, et al. A GWAS on Helicobacter pylori strains points to genetic variants associated with gastric cancer risk. BMC Biol. 2018;16:84. 10.1186/s12915-018-0550-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Eriksson L, Johannesen TB, Stenmark B, Jacobsson S, Säll O, Hedberg ST, et al. Genetic variants linked to the phenotypic outcome of invasive disease and carriage of Neisseria meningitidis. Microb Genom. 2023;9:001124. 10.1099/mgen.0.001124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Power RA, Parkhill J, de Oliveira T. Microbial genome-wide association studies: lessons from human GWAS. Nat Rev Genet. 2017;18:41–50. 10.1038/nrg.2016.132. [DOI] [PubMed] [Google Scholar]
  • 66.Chen Y, Xiang G, Liu P, Zhou X, Guo P, Wu Z, et al. Prevalence and molecular characteristics of ceftazidime-avibactam resistance among carbapenem-resistant Pseudomonas aeruginosa clinical isolates. J Glob Antimicrob Resist. 2024;36:276–83. 10.1016/j.jgar.2024.01.014. [DOI] [PubMed] [Google Scholar]
  • 67.Hernández-García M, Barbero-Herranz R, Bastón-Paz N, Díez-Aguilar M, López-Collazo E, Márquez-Garrido FJ, et al. Unravelling the mechanisms causing murepavadin resistance in Pseudomonas aeruginosa: lipopolysaccharide alterations and its consequences. Front Cell Infect Microbiol. 2024;14:1446626. 10.3389/fcimb.2024.1446626. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Redfern J, Wallace J, van Belkum A, Jaillard M, Whittard E, Ragupathy R, et al. Biofilm associated genotypes of multiple antibiotic resistant Pseudomonas aeruginosa. BMC Genomics. 2021;22:572. 10.1186/s12864-021-07818-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Glasser NR, Kern SE, Newman DK. Phenazine redox cycling enhances anaerobic survival in Pseudomonas aeruginosa by facilitating generation of ATP and a proton-motive force. Mol Microbiol. 2014;92:399–412. 10.1111/mmi.12566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Schiessl KT, Hu F, Jo J, Nazia SZ, Wang B, Price-Whelan A, et al. Phenazine production promotes antibiotic tolerance and metabolic heterogeneity in Pseudomonas aeruginosa biofilms. Nat Commun. 2019;10:762. 10.1038/s41467-019-08733-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Mayers JR, Varon J, Zhou RR, Daniel-Ivad M, Beaulieu C, Bhosle A, et al. A metabolomics pipeline highlights microbial metabolism in bloodstream infections. Cell. 2024;187:4095–112.e21. 10.1016/j.cell.2024.05.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Nguyen AT, O’Neill MJ, Watts AM, Robson CL, Lamont IL, Wilks A, et al. Adaptation of iron homeostasis pathways by a Pseudomonas aeruginosa pyoverdine mutant in the cystic fibrosis lung. J Bacteriol. 2014;196:2265–76. 10.1128/JB.01491-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.da Cruz Nizer WS, Inkovskiy V, Versey Z, Strempel N, Cassol E, Overhage J. Oxidative stress response in Pseudomonas aeruginosa. Pathogens. 2021;10:1187. 10.3390/pathogens10091187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Peirano G, Matsumara Y, Nobrega D, Church D, Pitout JDD. Population-based genomic surveillance of Pseudomonas aeruginosa causing bloodstream infections in a large Canadian health region. Eur J Clin Microbiol Infect Dis. 2024;43:501–10. 10.1007/s10096-024-04750-w. [DOI] [PubMed] [Google Scholar]
  • 75.Valik JK, Giske CG, Hasan B, Gozalo-Margüello M, Martínez-Martínez L, Premru MM, et al. Genomic virulence markers are associated with severe outcomes in patients with Pseudomonas aeruginosa bloodstream infection. Commun Med (lond). 2024;4:264. 10.1038/s43856-024-00696-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Lees JA, Ferwerda B, Kremer PHC, Wheeler NE, Serón MV, Croucher NJ, et al. Joint sequencing of human and pathogen genomes reveals the genetics of pneumococcal meningitis. Nat Commun. 2019;10:2176. 10.1038/s41467-019-09976-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Young BC, Earle SG, Soeng S, Sar P, Kumar V, Hor S, et al. Panton-Valentine leucocidin is the key determinant of Staphylococcus aureus pyomyositis in a bacterial GWAS. Elife. 2019;8:e42486. 10.7554/eLife.42486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Kremer PHC, Lees JA, Ferwerda B, van de Ende A, Brouwer MC, Bentley SD, et al. Genetic variation in Neisseria meningitidis does not influence disease severity in meningococcal meningitis. Front Med (Lausanne). 2020;7:594769. 10.3389/fmed.2020.594769. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material 1 (60.5MB, xlsx)

Data Availability Statement

The dataset supporting the conclusions of this article is included within the article and its supplementary files. Genome sequence data of P. aeruginosa included in this study are available in the NCBI Sequence Read Archive. BioProject and BioSample accession numbers for each genome are listed in Table S1.


Articles from BMC Infectious Diseases are provided here courtesy of BMC

RESOURCES