Abstract
Pneumonia is a leading cause of morbidity and mortality in children, with bacterial pathogens being important etiologic agents. Most microbiome studies in pneumonia use technologies with limited taxonomical resolution and few include lung aspirate or blood samples. In this study, we assessed the microbial communities of the nasopharynx, nasopharynx/oropharynx, induced sputum, lung aspirate and blood, and recovered metagenome-assembled genomes from the same sites using shotgun metagenomics sequencing of samples from children with severe and very severe pneumonia in The Gambia. Our data show that Proteobacteria and Firmicutes were the most common phyla across the body sites, and this was largely driven by S. pneumoniae, H. influenzae/aegyptius and M. catarrhalis. Furthermore, we observed species overlap of blood and respiratory samples with average Jaccard similarity index values ranging from 34% to 58%. We recovered 60 medium and 35 high-quality MAGs in these niches including 11 S. pneumoniae, 10 H. influenzae strains and a limosilactobacillus with less than 95% Average Nucleotide Identity to any known species in GTDB-TK. We also showed that the resistomes in our MAGs were highly species specific with more than 70% of the detected AMR genes found exclusively in a single species.
Keywords: microbiome, metagenome-assembled genomes, antimicrobial resistance, Streptococcus pneumoniae, Haemophilus strains
Introduction
Lower respiratory tract infections (LRI) pose a major public health burden and remain one of the main infectious causes of death in low- and middle-income countries1,2. LRI, defined by the Global Burden of Disease report as pneumonia or bronchiolitis, was responsible for more than 500,000 deaths in children under five years of age worldwide in 20213. A wide range of organisms cause LRI including bacteria, viruses and fungi. Among the bacterial causes of LRI in 2021, Streptococcus pneumoniae was responsible for the highest number of episodes and deaths with an estimated 97·9 million episodes and 505 000 deaths globally across all ages3. The risk factors for pneumonia include poverty, malnutrition, lack of exclusive breastfeeding and inadequate vaccination4,5.
Effective prevention and treatment strategies for pneumonia, including vaccination and antimicrobials, rely partly on the identification of the microbial etiologic agent of the pneumonia episode, which can be done using culture, nucleic acid amplification, or antigen testing. However, the use of these targeted detection methods limits our understanding of the complex interaction between microbes in the lungs whose interactions affect disease outcome6. Samples from the lower respiratory tract are difficult to obtain; therefore, the upper respiratory tract is often studied to gain insights into the microbiome and its role in disease7. Understanding the microbiome of the upper respiratory tract is important because it has been postulated that microaspiration is the primary route through which bacteria in the upper airways enter the lungs8,9, although the temporal, spatial, and regulatory steps that lead to disease are unknown. Moreover, bacterial translocation across the lung epithelium into the blood has also been reported in pneumonia cases10, highlighting the significance of profiling the microbiome of both the respiratory tract and blood in disease.
The relationship between the upper respiratory airway microbiome and respiratory health is gaining attention11,12. For instance, it has been shown that a high abundance of commensal bacterial species of the Corynebacterium and Dolosigranulum genera prime the respiratory tract in children through stimulation of innate immunity and competition for resources to restrict colonisation by pathogenic bacteria13–15. Conversely, an increased nasopharyngeal density of S. pneumoniae may be associated with respiratory illnesses in children16,17.
Our understanding of the microbiome has traditionally been based on findings from 16S rRNA sequencing technology18–20. However, this method lacks species resolution. An untargeted microbial profiling approach, such as shotgun metagenomic sequencing, will provide deeper taxonomical insight into the microbial landscape of the respiratory tract and blood during disease.
An added advantage of shotgun metagenomics over 16S rRNA sequencing is its ability to recover metagenome- assembled genomes (MAGs). MAGs are generated using machine learning methods to group assembled contigs into bins that represent individual organisms using nucleotide usage or coverage21. This method has become increasingly important in microbiome research for reconstructing the draft genomes of community members, including uncultured microbes22–24. However, many MAGs reported in published studies are of low quality and completeness 24. Additionally, attempts to recover MAGs from respiratory specimens remain limited. Notably, Li et al. constructed a comprehensive respiratory genome catalogue of 552 MAGs, although these were derived from only 20 studies25.
Given the common use of antibiotics regimens in pneumonia patients, profiling antimicrobial resistance genes (AMR) of MAGs in respiratory and blood samples is essential for informing targeted antimicrobial therapy. Moreover, profiling AMR genes contributes to broader efforts in combating antimicrobial resistance which is a global health concern.
In this study, we performed shotgun metagenomics and analysed nasopharyngeal swab (NPS), combined nasopharyngeal and oropharyngeal swabs (NP/OP), induced sputum (IS), lung aspirate (LA), and blood samples from children diagnosed with severe pneumonia to characterise the microbiomes across these sites. Moreover, we aimed to recover MAGs from sample metagenomes of the same specimen types and to characterise their antimicrobial resistance determinants.
Results
Study population demography, clinical features and specimen types
We analysed 71 samples from 37 children hospitalised with pneumonia and who were recruited as part of the PERCH cohort study. The demographic and clinical features of the children are presented in Table 1. The majority were male, 59% (22/37) and 41% (15/37) of the children were less than 6 months old. A total of 54% (20/37) and 51% (19/37) of the cases were fully vaccinated with the Haemophilus influenzae type b vaccine and the Pneumococcal Conjugate Vaccine (PCV13) respectively. Very severe pneumonia cases accounted for 57% (21/37) of the cases. NPS comprised the largest proportion of specimens accounting for 41% (29/71) (Supplementary information Table S1). Lung aspirates were the least represented sample type with seven specimens. Figure 1 shows specimens analysed from each of the 37 children. Overall, 62% (23/37) of the children provided more than one specimen type. The number of samples analysed in this study was based on availability.
Table 1.
Demographic and clinical characteristics of children. This table includes all children who contributed at least one specimen to any analysis.
| Classification | Frequency (%) | |
|---|---|---|
| Sex | Male | 22 (59) |
| Female | 15 (41) | |
| Age | 1–5 months | 15 (41) |
| 6–11 months | 4 (11) | |
| 12–23 months | 8 (22) | |
| 24–59 months | 10 (27) | |
| Hib vaccination status | Not vaccinated | 6 (16) |
| Partially vaccinated | 9 (24) | |
| Fully vaccinated | 20 (54) | |
| No data | 2 (5) | |
| PCV vaccination status | Not vaccinated | 8 (22) |
| Partially vaccinated | 8 (22) | |
| Fully vaccinated | 19 (51) | |
| No data | 2 (5) | |
| Pneumonia severity | Severe | 16 (43) |
| Very severe | 21 (57) | |
| HIV status | Positive | 2 (5) |
| Negative | 31 (84) | |
| Unconfirmed | 4 (11) |
Figure 1.

List of specimen type analysed for each of the 37 participants. Participants are sorted based on similarity in sample availability.
Microbiome composition of children with pneumonia
Shotgun metagenomic sequencing yielded a median of 19.4 million reads per sample. On average, 85% of the reads were human-derived. After quality trimming and removal of human reads, the median read count per sample was 1.3 million. All ecological analyses, including relative abundance bar plots, diversity measures, and differential abundance testing, were performed using the SLC relative abundance and prevalence across samples. The Haemophilus aegyptius (a H. influenzae biotype) MAG obtained in our dataset clustered at the 95% ANI with H. influenzae MAGS. This SLC is designated as Haemophilus influenzae/aegyptius. In SLCs that lack species-level classification, the genus name was used along with the SLC identifier as a suffix. To minimize the effects of sequencing noise, taxa with a relative abundance of < 0.1% in any given sample were filtered out from those samples in the analysis.
The mean relative abundance of bacterial DNA at the phylum and species levels across all samples is shown in Fig. 2. The relative abundance of bacterial DNA at the same taxonomic levels for individual samples is shown in the Supplementary Information (Figure S1). Overall, the Proteobacteria (Pseudomonadota) phylum predominated in IS (66.0%), NPS (63.3%), LA (55.7%) and blood (66.6%) samples. In contrast, Firmicutes was the dominant colonizer in NP/OP samples (46.6%). Bacteroidata and Actinobacteriota contributed less than 5% of the bacterial DNA reads in all specimen types except NP/OP. Fusobacteriota was detected at low mean compositional relative abundance in NPS, NP/OP and IS (0.3% – 0.5%) samples and was absent in LA and Blood samples. Patescibacteria (Candidate Phyla Radiation) was detected in one IS sample and one NP/OP sample, with compositional relative abundances of 0.6% and 0.1% respectively in those samples. S. pneumoniae, H. influenzae/aegyptius and M. catarrhalis consistently had the highest compositional mean relative abundances across all specimen types. M. catarrhalis predominated in IS samples with a compositional mean relative abundance of 25.0%; S. pneumoniae predominated in NP/OP (29.9%), NPS (26.4%) and LA (40%) samples. In blood samples, H. influenzae/aegyptius predominated with 31.5%. Although S. pneumoniae, H. influenzae/aegyptius and M. catarrhalis were the most abundant based on the mean relative abundance across specimen types, they were not consistently the most dominant at the individual sample level, except in LA samples, where 57% (4/7) and 43% (3/7) of the samples had H. influenzae/aegyptius and S. pneumoniae as the dominant specie (Supplementary information, Fig. 2). We defined a species as dominant in a sample if it had the highest relative abundance in that sample. In both IS and NP/OP specimens, eight different species dominated across the 12 samples in each group.
Figure 2.
Barplots of the compositional mean relative abundances of bacteria DNA at the phylum (A) and species (B) level. All six phyla detected in our dataset are shown in the phylum-level plot (A). The Top 15 bacterial species is shown in the species-level Plot (B). The top 15 species were identified based on the sum of their compositional relative abundances across all specimen type. Specimen type is shown on the Y axis, and the compositional mean relative abundance of each taxonomic group is shown on the X axis.
We compared Shannon diversity across specimen types (Fig. 3). The NP/OP samples had the highest diversity (2.13), whereas the LA samples had the lowest diversity (1.19). Pairwise comparison showed significant difference in Shannon diversity between NP/OP and LA (p = 0.0004), NP/OP and blood (p = 0.007), NPS and LA (p = 0.007), IS and Blood (p = 0.036), IS and LA (p = 0.002).
Figure 3.

Shannon diversity compared by specimen type. Median Shannon diversity values are represented by horizontal lines within each boxplot while the upper and lower ranges of each boxplot represent the 75% and 25% quartiles. The pairwise comparison of Shannon diversity was obtained using Tukey’s Honest Significant Difference test and significant differences are indicated by an Asterix.
The overlap in detected species between respiratory and blood samples from the same participants was evaluated using Jaccard similarity index (Fig. 4). On average, 58% of species were shared between paired NPS and blood samples. Blood and IS shared 41% of species, while 34% of species found in the NP/OP were also found in their paired blood samples.
Figure 4.
Jaccard similarity index of paired blood and respiratory body sites.
We restricted our analyses to the nasopharynx and compared the microbial DNA profiles of children with severe and very severe pneumonia. The compositional mean relative abundance of S. pneumoniae in the nasopharynx was 33.1% in severe cases and 22.1% in very severe cases (Fig. 5). M. catarrhalis was detected at a compositional mean relative abundance of 21.3% in severe cases and 27.5% in very severe cases. The compositional mean relative abundance of H. influenzae/aegyptius was similar between the two groups (11.1% in severe cases vs. 11.6% in very severe cases). Notably, the relative abundance of M. nonliquefaciens in very severe cases (13.1%) was threefold higher than that in severe cases (3.9%). MaAsLin3 differential abundance testing of data from nasopharyngeal samples did not yield any significant association between any species and pneumonia severity classification. There was no significant difference in nasopharyngeal Shannon diversity between severe and very severe cases (p = 0.21), (Supplementary information, Figure S3). Furthermore, PERMANOVA test detected no significant difference in the nasopharyngeal microbiome composition between the two groups (P = 0.413, Supplementary information, Figure S4.)
Figure 5.
Barplots of the mean relative abundances of bacterial DNA at the species level in the nasopharynx (NPS) of severe and very severe cases. Case type is shown on the Y axis, and the relative abundance of each species is shown on the X axis.
Metagenomic Assembled Genomes
The MAGs metrices reported follow the Minimum Information about a Metagenome-Assembled Genome (MIMAG) guidelines26. A total of 60 Medium-quality MAGs (≥ 50% completion and < 10% contamination) were recovered, of which 35 met the criteria for high-quality MAGs (> 90% completion and < 5% contamination) based on MIMAG standards. Details of these 35 high-quality MAGs, including their contamination and completeness as calculated using CheckM are provided in Table 3.
Table 2.
Characterisation of high-quality metagenome-assembled genomes obtained from blood and respiratory samples
| Specie | ANI with type strain (%) | Number of contigs | N50 | Genome Size (Mb) | Completeness (%) | Contamination (%) | Strain heterogeneity | Source specimen | |
|---|---|---|---|---|---|---|---|---|---|
| MAG1 | Neisseria meningitidis | 97.58 | 227 | 11345 | 1.80 | 95.79 | 0.23 | 0 | NP-OP |
| MAG2 | Haemophilus parahaemolyticus | 97.3 | 146 | 17420 | 1.78 | 90.94 | 1.24 | 66.67 | IS |
| MAG3 | Moraxella catarrhalis | 99.12 | 273 | 8974 | 1.75 | 92.31 | 1.38 | 22.22 | IS |
| MAG4 | Streptococcus pneumoniae | 98.53 | 114 | 26698 | 2.00 | 98.5 | 0 | 0 | LA |
| MAG5 | Moraxella catarrhalis | 99.08 | 69 | 38190 | 1.77 | 95.76 | 0.42 | 33.33 | NP-OP |
| MAG6 | Streptococcus pneumoniae | 98.62 | 69 | 44339 | 2.16 | 99.63 | 0.06 | 0 | NP-OP |
| MAG7 | Haemophilus influenzae | 97.17 | 96 | 50042 | 1.92 | 99.37 | 0.25 | 50 | IS |
| MAG8 | Haemophilus parahaemolyticus | 97.24 | 265 | 9977 | 1.83 | 90.75 | 1.05 | 28.57 | IS |
| MAG9 | Streptococcus pneumoniae | 98.63 | 244 | 11618 | 1.80 | 93.6 | 0.41 | 20 | NPS |
| MAG10 | Haemophilus influenzae | 97.14 | 56 | 93732 | 1.84 | 99.55 | 0.76 | 16.67 | LA |
| MAG11 | Streptococcus pneumoniae | 99.52 | 76 | 67168 | 2.02 | 99.63 | 0 | 0 | LA |
| MAG12 | Moraxella catarrhalis | 96.14 | 175 | 28060 | 2.00 | 97.56 | 3.54 | 10.53 | LA |
| MAG13 | Haemophilus influenzae | 97.49 | 247 | 11400 | 1.83 | 92.84 | 3.45 | 56.52 | LA |
| MAG14 | Streptococcus pneumoniae | 98.6 | 88 | 32376 | 2.05 | 99.63 | 0 | 0 | Blood |
| MAG15 | Haemophilus influenzae | 97.39 | 167 | 20433 | 1.63 | 91.37 | 0.8 | 100 | NP-OP |
| MAG16 | Streptococcus pneumoniae | 98.57 | 87 | 39492 | 2.07 | 99.44 | 0 | 0 | LA |
| MAG17 | Haemophilus influenzae | 97.1 | 60 | 71164 | 1.80 | 98.98 | 0 | 0 | LA |
| MAG18 | Haemophilus influenzae | 97.38 | 29 | 122554 | 1.79 | 99.55 | 0 | 0 | LA |
| MAG19 | Haemophilus influenzae | 98.8 | 22 | 137938 | 1.85 | 99.66 | 0.23 | 0 | Blood |
| MAG20 | Streptococcus pneumoniae | 98.45 | 197 | 16266 | 2.01 | 97.44 | 1.07 | 0 | Blood |
| MAG21 | Moraxella catarrhalis | 99.23 | 139 | 28747 | 1.94 | 97.29 | 0.63 | 75 | Blood |
| MAG22 | Moraxella catarrhalis | 99.12 | 26 | 120380 | 1.85 | 99.45 | 1.07 | 14.29 | Blood |
| MAG23 | Streptococcus pneumoniae | 98.65 | 62 | 45294 | 2.08 | 99.63 | 0 | 0 | Blood |
| MAG24 | Haemophilus influenzae | 97.11 | 26 | 186272 | 1.86 | 99.55 | 0 | 0 | Blood |
| MAG25 | Limosilactobacillus Sp. | 228 | 8961 | 1.50 | 97.81 | 0.13 | 0 | NP-OP | |
| MAG26 | Neisseria meningitidis | 97.61 | 158 | 18553 | 1.97 | 99.18 | 0.38 | 0 | Blood |
| MAG27 | Streptococcus pneumoniae | 98.56 | 64 | 48015 | 2.12 | 99.63 | 0 | 0 | Blood |
| MAG28 | Moraxella nonliquefaciens | 98.47 | 56 | 343360 | 2.28 | 96.85 | 3.48 | 0 | Blood |
| MAG29 | Haemophilus influenzae | 97.21 | 157 | 29106 | 1.93 | 95.55 | 1.05 | 28.57 | Blood |
| MAG30 | Moraxella catarrhalis | 99.08 | 65 | 74719 | 1.98 | 98.88 | 0.44 | 25 | NPS |
| MAG31 | Haemophilus influenzae | 97.28 | 156 | 19152 | 1.75 | 98.05 | 0.84 | 40 | NPS |
| MAG32 | Streptococcus pneumoniae | 98.59 | 339 | 6926 | 1.83 | 91.90 | 1.37 | 83.33 | NPS |
| MAG33 | Moraxella lincolnii | 97.95 | 243 | 13659 | 2.07 | 92.60 | 0.5 | 66.67 | NP-OP |
| MAG34 | Streptococcus pneumoniae | 98.86 | 238 | 10781 | 1.90 | 94.76 | 0.75 | 50 | NPS |
| MAG35 | Moraxella nonliquefaciens | 98.22 | 103 | 57578 | 2.34 | 95.75 | 4.84 | 6.06 | NPS |
Seven species were identified from the 35 high-quality MAGs recovered. These MAGs were obtained from 22 out of 71 samples (31%), and the highest number from a single sample was three. Most MAGs were recovered from blood specimens (11/35, 31%), whereas the fewest were recovered from IS (4/35, 11%). Notably, in 83% (29/35) of the high-quality MAGs, the recovered species corresponded to the species with the highest or second highest relative abundance in the sample from which the MAG was obtained. The mean number of contigs across all 35 MAGs was 136.2 (Min:22, Max:339), and the mean contig length was 23934 bp (Min: 1500, Max: 472844; data not shown). MAG19, which had the lowest number of contigs, had the highest genome completeness at 99.7%.
Species assignment of the genomes was based on the widely used ANI threshold of ≥ 95% compared to reference genomes. MAG25 (Limosilactobacillus sp.) did not meet this threshold for any known bacterial species and was therefore not assigned a species-level taxonomic classification. The closest match was Limosilactobacillus gastricus (GCF_001434365.1), with an ANI of 83.26%. The ANI values for the remaining 42 genetically related Limosilactobacillus genomes in the database ranged from 76.2% to 78.9% relative to our MAG. The most common species across all MAGs were S. pneumoniae (11/35, 31%) and H. influenzae (10/35, 29%). We obtained MAGs from three unique species of Moraxella, including five M. catarrhalis MAGs.
We performed in silico serotyping and MLST on S. pneumoniae and H. influenzae MAGs. Serotype 19A was the most common S. pneumoniae serotype (45%, 5/11), as shown in the supplementary information (Table S2). Three of the five serotype 19A strains belonged to the same sequence type (ST847). Three S. pneumoniae strains could not be assigned a sequence type on the MLST database. In the only case where we recovered S. pneumoniae MAGs from two body sites in the same individual, the strain from blood sample was serotype 19F and ST925, while the strain from NPS was serotype 19A with an unassigned ST. All 10 of the H. influenzae strains were non-typeable. Five of them were assigned to distinct sequence types, while the other five could not be assigned a sequence type in the MLST database.
A total of 78 AMR genes, including 15 unique genes were detected across the 35 high-quality MAGs (Fig. 6A). These genes conferred resistance to eight antibiotic classes. AMR genes conferring resistance to fluoroquinolones were the most prevalent with four unique genes detected, followed by tetracycline resistance genes which included three unique genes across all MAGs. The average number of genes per MAG was 2.2 (range: 0–6) (Fig. 6B). Of the nine MAGs with no AMR genes detected, 78% (7/9) were Haemophilus species while the remaining two were MAG 25 (Limosilactobacillus sp.) from the NP/OP and MAG 35 (M. lincolnii) from the NPS.
Figure 6.
(A) Prevalence of AMR genes in all 35 high-quality MAGs. Bars are coloured according to antibiotic class of each corresponding gene. (B) AMR genes detected in the high-quality MAGs grouped by specimen type. The y axis of each plot lists all AMR genes detected within that specimen type. The AMR gene names are coloured according to their antibiotic class. MLS: Macrolide-Lincosamide-Streptogramins
Nm: Neisseria meningitidis, Hi: Haemophilus influenzae, Ls: Limosilactobacillus Sp. Ml: Moraxella lincolnii, Mc: Moraxella catarrhalis, Sp: Streptococcus pneumoniae, Mn: Moraxella nonliquefaciens, Hp: Haemophilus parahaemolyticus
The AMR genes were largely species specific. Up to 73% (11/15) of the unique genes were detected in a single species. The exceptions were AMR genes tet(B), tet(R), BRO-1 and ICR-Mc each of which was found in MAGs from two different species. In terms of antibiotic class, AMR genes conferring resistance to tetracycline and peptide antibiotic class were present in MAGs of three different species.
Notably, all four H. influenzae MAGs from LA carried the LpsA gene while none from NPS, NP/OP and IS had the LpsA gene. Among the three H. influenzae MAGs from blood, two lacked the LpsA gene. All 11 S. pneumoniae MAGs carried the patA and the patB genes. The only non-fluoroquinolone or MLS conferring resistance genes in S. pneumoniae MAGs were tet(M) and tet(B) found in blood-derived MAGs. The ICR-Mc gene was found in all M. catarrhalis MAGs
Discussion
Few studies have characterised the respiratory or blood microbiome in children with pneumonia in such detail. Using shotgun metagenomics in this pilot study, we identified S. pneumoniae, H. influenzae/aegyptius and M. catarrhalis were the most common species from all sample types. Furthermore, we observed species overlap of blood and respiratory samples with average Jaccard similarity index values ranging from 34% to 58%. A major strength of our work lies in the recovery of 35 high-quality MAGs in these niches, including what appears to be a novel limosilactobacillus with less than 95% ANI to any species in GTDB-K.
Our analysis of the microbial profile of the respiratory tract and blood during infection achieved species-level resolution which is seldom achieved in respiratory microbiome studies. Proteobacteria and Firmicutes dominated in the respiratory samples, and this was largely driven by S. pneumoniae, H. influenzae and M. catarrhalis. These are clinically important pathogens, and vaccines targeting S. pneumoniae and H. influenzae type b are routinely administered in many countries including The Gambia27–29. Our findings are consistent with other studies in The Gambia where S. pneumoniae and H. influenzae were identified as the primary bacterial pathogens causing pneumonia in children, based on microbiology culture and PCR analyses of lung and pleural aspirates30. The high prevalence of S. pneumoniae carriage observed in our analysis is also consistent with previous reports from The Gambia where carriage rates among children under one year of age were as high as 97%31. Although the introduction of Pneumococcal Conjugate Vaccines (PCVs) has led to the reduction of vaccine-type carriage, the overall carriage of S. pneumoniae remains unchanged. The reason for this persistence is likely due to the increase in the carriage of non-vaccine serotypes32.
S. pneumoniae serotype 19A was the most common serotype in our dataset among the high-quality MAGs we recovered. This serotype increased in prevalence in many countries following the introduction of the seven valent pneumococcal vaccine (PCV7)33. ST847 which was the most common sequence type in our data has been associated with serotype 19A in both carriage34 and disease35. H. influenzae type b (Hib) disease and carriage have declined globally due to the introduction of the Hib vaccine36,37. However, non-typeable H. influenzae (NTHi), which was the most common in our dataset among H. influenzae high-quality MAGs is now responsible for the majority of cases of otitis media, sinusitis and pneumoniae among patients that were vaccinated with the Hib vaccine38.
In our analysis, there were no significant differences in nasopharyngeal microbiome diversity between severe and very severe cases. Differential abundance testing also revealed no significant differences in any species by case type. This finding might be due to our small sample size given that the microbiome composition in the upper respiratory tract has been shown to serve as a marker for severity of lower respiratory tract disease39.
It is traditionally understood that the blood is sterile, largely from the fact that the presence of bacteria as low as 1–10 bacterial cells per millilitre of whole blood is potentially life threatening40. However, there is growing evidence that bacterial DNA of various taxa can be detected in blood driven by 16S or shotgun sequencing 41–43. It remains unclear whether bacterial DNA detected in blood represents viable organisms or remnants, and whether these arise from leakages from other body sites. Proteobacteria, Firmicutes and Actinobacteria have been reported as predominant phyla in blood across various conditions and age group44–46. Consistently, our analysis of children with pneumonia revealed a similar dominance of the same phyla.
The fact that DNA from many different bacterial taxa can be detected in blood poses a problem in using bacterial reads from metagenomic sequencing in determining etiology. Barbeta, E. et al. found 21 Operational Taxonomic Units in sixteen blood samples, even though only fifteen percent of those samples had positive blood culture47. A key finding from our study is the 34%–58% species overlap between respiratory and blood samples from the same individuals. Other sources of bacterial DNA in blood include the skin, or the gut with previous studies showing that a 20% overlap between bacteria Operational Taxonomic Units in blood and the gut47
We conducted a comprehensive sample level recovery of metagenomes resulting in the identification of 35 genomes that meet the MIMIG criteria for high-quality genomes. This was achieved using consensus binning, a very stringent approach that leverages the strengths of the different binning algorithms (MaxBin2, MetaBAT2 and CONCOT) while minimising their individual limitations. This strategy recovered high-quality bins that would have been missed by using a single algorithm alone.
Recovery of MAGs from metagenomes enhances the identification of organisms that are not routinely targeted for disease diagnostic purposes by other methods such as PCR. Furthermore, it is a valuable for identifying species that are difficult to culture in the lab. Notably, we recovered a high-quality genome belonging to the limosilactobacillus genus that did not meet the 95% ANI threshold for species-level assignment in the Genome Database Taxonomy. An ANI range of 95–96% is commonly used to delineate species boundary48,49. Limosilactobacillus is a genus within the family lactobacillaceae. It was formally classified as lactobacillus until a reclassification in 2020 divided lactobacillus into 25 novel genera including Limosilactobacillus50. To date, up to 23 species of Limosilactobacillus have been described in both human and non-human hosts51. Our Limosilactobacillus sp. MAG was isolated from the NP/OP, and it potentially represents a previously undescribed or novel species within this genus.
While we used checkM metrices to ascertain the quality of our genomes, some studies have raised concern that the quality of MAGs may be overestimated by commonly used pipelines for assessing genome quality. Meziti et al.52 explored this by comparing gene variability between pathogenic Escherichia coli isolates from diarrheal samples against their corresponding MAGs recovered from the same samples. Their analysis showed that completeness estimates near 95% as reported by the MiGA workflow captured only 77% of the population core genes and 50% of the variable genes. They were also able to find that about 5% of the genes from the MAGs were missing in the isolate and were of different taxonomic origin, even though pipeline-based contamination estimates were as low as 1.5%. However, these findings were based on a single bacterial species and may not apply to the species present in our dataset. Meziti et al. further outlined key criteria that contributes to MAGs reliability beyond standard pipeline-based completion and contamination estimates. These include MAGs having fewer than 500 contigs, contig lengths greater than 1000 bp, N50 values above 20,000 bp and genome size deviations less than 0.5 Mb compared to reference strains. Our high-quality genomes meet these benchmarks in terms of contig number, contig length, genome size consistency. Additionally, more than 60% of our MAGs had N50 values exceeding 20,000 bp further supporting the overall high-quality of our MAGs. However, we still would recommend conducting gene content comparison and SNP analysis between paired S. pneumoniae isolate and MAG strains obtained from the same sample. This will be crucial for assessing the reliability of using sample driven S. pneumoniae MAGs for public health applications such as tracking vaccine impacts and in outbreak investigations. What we have not been able to achieve is the recovery of multiple high-quality MAGs of the same species from a single sample. In each case, only one strain of a single species is successfully binned. This limitation is particularly important for S. pneumoniae where multiple strain carriage in the upper respiratory tract is common53,54. Co-carriage was observed in 15% of children in South Africa55 using shotgun metagenomics sequencing and more than > 48% of children in Southeast Asia by microarray56. In our upper respiratory tract samples where S. pneumoniae was recovered, it is possible that more than one strain was present in some samples but only one dominant strain was recovered. This suggest that our sequencing procedure or sample-specific binning procedures are limited in their ability to recover co-existing strains of the same species in a sample. The only pair of S. pneumoniae MAGs recovered from blood and NPS of the same individual belonged to different serotypes and sequence types. This highlights that using MAGS to study strain-level similarity of S. pneumoniae across body sites may be unreliable.
While the presence of an AMR genes does not equate to clinical resistance, monitoring their presence in the respiratory tract and blood could provide insights into which antibiotics the bacteria are resistant to. Additionally, given that some AMR genes can be transferred to other bacterial species in the same niche through horizontal gene transfer, it is important to identify which genes have the potential to spread to other bacteria. The patA an patB AMR genes were highly prevalent in our S. pneumoniae MAGs. These genes interact to form an ATP-binding cassette (ABC) antibiotic efflux pump that confer resistance to fluoroquinolones in S. pneumoniae57,58. Resistance to the common fluoroquinolone, ciprofloxacin in S. pneumoniae mainly occurs through the acquisition of mutations in the quinolone resistance-determining region of the ParC and gyrA genes. However, Lupien et al. have shown that the patA and patB genes also contribute to low-level resistance to ciprofloxacin in a clinical isolate59. M. catarrhalis is among the species that are intrinsically resistant to colistin. The ICR-Mc gene found in our M. catarrhalis MAGs encodes a chromosomally located colistin resistance phosphoethanolamine (PEtN) transferase60. The protein is the closest known ortholog to the well-known plasmid mediated colistin resistance genes mcr-1 and mcr-260. The LpsA gene was detected in H. influenzae MAGs in LA but absent in MAGs from NP/OP, NPS and IS. LpsA confers intrinsic resistance to peptide antibiotics61 but it is more widely known for its role in lipooligosaccharide biosynthesis62 and it has been associated with adaptation to the lung environment and invasiveness63
A limitation of our study is the sample size and therefore some of our findings are limited by insufficient statistical power and generalisability. Secondly, we did not obtain controls at the sampling point to control for contamination during sampling. However, we were able to minimise this effect by excluding taxa corresponding to Cutibacterium acnes, Staphylococcus saprophyticus and Staphylococcus epidermidis from all samples in this analysis. This decision was made because these bacteria primarily colonise the skin and not the specimen types we analysed and could have contaminated our samples during sample collection. They have also been implicated as contaminants in microbiome studies of low-biomass samples40,64. A third limitation is that some children were administered antibiotics before sample collection, and this could have impacted their microbiome profiles. Finally, our analysis couldn’t establish whether the relative abundances of the different species in our samples reflected the existence of viable bacteria or bacteria-derived DNA. Therefore, the bacterial profiles of samples may not represent the viable microbiome at the time of sample collection, and as such our findings should be interpreted with caution.
Conclusion
Despite our limitations, we have been able to show that overall, bacterial DNA from S. pneumoniae, H. influenzae/aegyptius and M. catarrhalis were the most common across the different body sites we studied in children with pneumonia in The Gambia. We have also shown an overlap in DNA from bacterial taxa present in the respiratory tract and blood. In addition, we have been able to recover high quality MAGs from our samples further advancing the field of genomics of important pathogens like H. influenzae and S. pneumoniae. For future studies, it is essential to include sampling controls that undergo the same sequencing procedure as the samples. We also propose future studies in metatranscriptomics to study the activities of live bacteria present in blood and respiratory samples of pneumonia patients. Finally, we propose comparing MAGs and isolates of S. pneumoniae strains from the same respiratory or blood sample to assess how well the MAG genome reflects that of the isolate.
Methods
Study participants and sample collection
We used stored samples obtained between 2011 and 2014 from participants recruited in the Pneumonia Etiology Research for Child Health (PERCH) study65. PERCH was a case-control study that sought to characterise the causes of severe childhood pneumonia in children living in high pneumonia burden and low-resource regions, including The Gambia. The samples analysed in this study were exclusively from children with pneumonia. In the PERCH study, children aged 1–59 months who presented and were admitted to the hospital in Basse, Upper River Region of The Gambia, with severe or very severe pneumonia were recruited. Severe pneumonia was defined as cough or difficulty breathing with lower chest wall indrawing, and very severe pneumonia was defined as cough or difficulty breathing and at least one of the following signs: vomiting everything, difficulty in drinking or breastfeeding, central cyanosis, lethargy, convulsions, unconsciousness, or head-nodding. Specimens, including NPS, NP/OP swabs, blood, IS and LA, were collected at enrolment. Further details on participant recruitment, including case definitions, inclusion criteria and sample collection procedures of the PERCH study are described elsewhere65,66.
Ethics
Ethical approval for the Gambian samples used in the PERCH study was obtained from the Johns Hopkins Bloomberg School of Public Health Institutional Review Board (JHSPH IRB), and local ethical approval was obtained from the Joint MRC Unit The Gambia at The London School of Hygiene and Tropical Medicine and The Government of The Gambia Ethics Committee. Written informed consent was obtained from the patient’s guardians for all procedures performed in the study. All procedures were performed in accordance with relevant guidelines and regulations.
Laboratory procedures
DNA was extracted from all samples using QIAGEN kits, following the manufacturer’s instructions. Total DNA was extracted from the stored NPS, NP/OP, and IS samples using the DNeasy PowerSoil Kit67, whereas DNA from the LA and blood samples was extracted using the QIAamp MinElute Virus Spin kit68 and the QiAamp DNA Blood Mini Kit69, respectively. The laboratory subscribed to a Canadian External Quality Assessment (EQA) provider. Extracted DNA samples were enriched for microbial DNA using the NEBNext Microbiome DNA kit (New England Biolabs) following the manufacturer’s instructions before whole metagenomic sequencing using the Illumina NovaSeq 2X150bp. Blank controls (sterile swabs) were included throughout the library preparation process to monitor for contamination in the laboratory reagents.
Sequencing reads processing
Sequencing reads were mapped to the human reference sequence (T2T), and all matching reads were removed. Quality assessment of reads and the recovery and classification of metagenomes were performed using modules in the Viral (micro) Eukaryotic Bacterial Archaeal (VEBA) open-source software suite70,71. Adapter removal and quality trimming were performed using the preprocess.py module, and paired trimmed reads were assessed and corrected using the repair.sh module of BBTools. Cleaned reads were input into the assembly.py module, where they were assembled using SPAdes-based assemblers. Binning was done using MetaBAT272, CONCOCT73, and MaxBin274 followed by DAS Tool75 for consensus binning. Metagenome-assembled genomes (MAGs) were quality assessed by CheckM76. The genome statistics such as N50, number of contigs and genome size were calculated using SeqKit77. Taxonomic classification of the genomes was performed using the Genome Taxonomy Database Toolkit GTDB-TK78 based on sequence similarity (FASTANI) and phylogenetic placement.
MAGs with ≥95% Average Nucleotide Identity (ANI) were clustered to form a species-level cluster (SLC). Sequencing reads were mapped to SLCs representative genomes, and counts were generated using featureCounts as part of the coverage.py module. For samples with unbinned contigs, pseudo- coassemblies were created to enable additional binning and taxonomic assignment.
S. pneumoniae and H. influenzae serotyping and Multi-Locus Sequence Typing
FASTA files from S. pneumoniae MAGs were analysed using PneumoKITy79 to assign in silico S. pneumoniae serotypes. In silico serotyping of H. influenzae MAGs was performed using hicap80. Multi-locus sequence types for both S. pneumoniae and H. influenzae were assigned using PubMLST81. The seven housekeeping genes analysed for S. pneumoniae were: aroE, ddl, gdh, gki, recP, spi, and xpt. For H. influenzae, the seven genes were: adk, atpG, frdB, fucK, mdh, pgi and recA
Antimicrobial Resistance determinants for MAGs
The recovered MAGs were submitted to the Comprehensive Antimicrobial Resistance Database (CARD) to identify and annotate AMR genes. Only AMR gene hits classified under “perfect” or “strict” categories and a cutoff of >90% sequence identity of the matching region were considered.
Statistical analysis
Statistical analysis and visualisation were conducted using R studio (version 2024.12.0.467). We applied a one-way analysis of variance (ANOVA) to compare Shannon diversity indices between specimen types and applied Tukey’s Honest Significant Difference test for pairwise comparison of specimen types. The Jaccard similarity index was calculated to assess bacterial species overlap in paired blood and respiratory samples.
Kruskal-Wallis test was used to compare Shannon diversity between the nasopharynx of severe cases and very severe cases. We computed the Aitchison distance (i.e., Euclidean distance on the centered log-ratio transformed data) and used principal coordinates analysis (PCoA) to plot the beta diversity of the nasopharynx in severe and very severe cases. Permutational multivariate analysis of variance (PERMANOVA) was performed to determine whether beta diversity differed between the two groups.
Differential abundance testing was performed to identify differences in microbial species between the nasopharynx of patients with severe and very severe disease using Microbiome Multivariable Associations with Linear Models (MaAsLin3)82, accounting for both taxa prevalence and abundance. We controlled for age and applied Benjamini-Hochberg False Discovery Rate (FDR) for multiple testing correction (q value). The q value threshold for significance was set to the default <0.1 for MaAsLin3. All figures were generated using ggplot2 and microViz83.
Supplementary Material
Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.
Acknowledgements
We acknowledge all participants who contributed samples and their guardians. We also acknowledge the contributions of the laboratory staff at the West Africa Research Platform, Medical Research Council Unit The Gambia at LSHTM, as well as the J. Craig Venter Institute, La Jolla, California
Funding
This work is supported by the U.S. National Institutes of Health (NIH), Office of the Director (OD), the National Institute of Environmental Health Sciences (NIEHS), and the National Human Genome Research Institute (NHGRI) grant number U54HG009824.
Footnotes
Declarations
Additional information
The authors declare no competing interests
Additional Declarations: No competing interests reported.
Contributor Information
Dam Khan, Medical Research Council Unit The Gambia at the London School of Hygiene & Tropical Medicine.
Josh L. Espinoza, J. Craig Venter Institute
Peggy-Estelle Tientcheu, Medical Research Council Unit The Gambia at the London School of Hygiene & Tropical Medicine.
Isaac Darko Otchere, Noguchi Memorial Institute for Medical Research.
Nuredin Ibrahim Mohammed, Medical Research Council Unit The Gambia at the London School of Hygiene & Tropical Medicine.
Archibald Worwui, Medical Research Council Unit The Gambia at the London School of Hygiene & Tropical Medicine.
Mark P. Nicol, University of Western Australia
Brenda Kwambana-Adams, Liverpool School of Tropical Medicine.
Martin Antonio, Medical Research Council Unit The Gambia at the London School of Hygiene & Tropical Medicine.
Chris L. Dupont, J. Craig Venter Institute
Data Availability
Shotgun metagenomic sequencing data has been deposited in the NCBI Sequence Read Archive associated with BioProject PRJNA727021. The accession numbers of the 71 samples used in the analysis in this paper are detailed in Supplementary Table S3. The genome assemblies of the 35 high-quality MAGs reported in this paper have been deposited in the Zenodo repository under https://zenodo.org/records/18494646. Other data will be made available on request
References
- 1.Liu L. et al. Global, regional, and national causes of under-5 mortality in 2000–15: an updated systematic analysis with implications for the Sustainable Development Goals. Lancet 388, 3027–3035. 10.1016/s0140-6736(16)31593-8 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Walker C. L. F. et al. Global burden of childhood pneumonia and diarrhoea. Lancet 381, 1405–1416. 10.1016/s0140-6736(13)60222-6 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Global national incidence and mortality burden of non-COVID-19 lower respiratory infections and aetiologies, 1990–2021: a systematic analysis from the Global Burden of Disease Study 2021. Lancet Infect. Dis. 24, 974–1002. 10.1016/s1473-3099(24)00176-2 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Goyal J. P. et al. Risk Factors for the Development of Pneumonia and Severe Pneumonia in Children. Indian Pediatr. 58, 1036–1039 (2021). [PubMed] [Google Scholar]
- 5.Nguyen T. K. et al. Risk factors for child pneumonia - focus on the Western Pacific Region. Paediatr. Respir Rev. 21, 95–101. 10.1016/j.prrv.2016.07.002 (2017). [DOI] [PubMed] [Google Scholar]
- 6.Pirrone M., Pinciroli R. & Berra L. Microbiome, biofilms, and pneumonia in the ICU. Curr. Opin. Infect. Dis. 29, 160–166. 10.1097/qco.0000000000000255 (2016). [DOI] [PubMed] [Google Scholar]
- 7.Kumpitsch C., Koskinen K., Schöpf V. & Moissl-Eichinger C. The microbiome of the upper respiratory tract in health and disease. BMC Biol. 17, 87. 10.1186/s12915-019-0703-z (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Dickson R. P. et al. Bacterial Topography of the Healthy Human Lower Respiratory Tract. mBio 8, 10.1128/mbio.02287-02216 (2017). https://doi.org:doi:10.1128/mbio.02287-16 [DOI] [Google Scholar]
- 9.Mitsi E. et al. Nasal Pneumococcal Density Is Associated with Microaspiration and Heightened Human Alveolar Macrophage Responsiveness to Bacterial Pathogens. Am. J. Respir Crit. Care Med. 201, 335–347. 10.1164/rccm.201903-0607OC (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bhowmick R. et al. Systemic Disease during Streptococcus pneumoniae Acute Lung Infection Requires 12-Lipoxygenase–Dependent Inflammation. J. Immunol. 191, 5115–5123. 10.4049/jimmunol.1300522 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.de Piters S., Sanders W. A., Bogaert D. & E. A. & The role of the local microbial ecosystem in respiratory health and disease. Philos. Trans. R Soc. Lond. B Biol. Sci. 370 10.1098/rstb.2014.0294 (2015). [DOI] [Google Scholar]
- 12.Camelo-Castillo A. et al. Nasopharyngeal Microbiota in Children With Invasive Pneumococcal Disease: Identification of Bacteria With Potential Disease-Promoting and Protective Effects. Front. Microbiol. 10, 11. 10.3389/fmicb.2019.00011 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Biesbroek G. et al. Early respiratory microbiota composition determines bacterial succession patterns and respiratory health in children. Am. J. Respir Crit. Care Med. 190, 1283–1292. 10.1164/rccm.201407-1240OC (2014). [DOI] [PubMed] [Google Scholar]
- 14.Stubbendieck R. M., Hurst J. H. & Kelly M. S. Dolosigranulum pigrum: A promising nasal probiotic candidate. PLoS Pathog. 20, e1011955. 10.1371/journal.ppat.1011955 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Raita Y. et al. Maturation of nasal microbiota and antibiotic exposures during early childhood: a population-based cohort study. Clin. Microbiol. Infect. 27 10.1016/j.cmi.2020.05.033 (2021). 283.e281–283.e287. [DOI] [Google Scholar]
- 16.Chochua S. et al. Increased Nasopharyngeal Density and Concurrent Carriage of Streptococcus pneumoniae, Haemophilus influenzae, and Moraxella catarrhalis Are Associated with Pneumonia in Febrile Children. PLoS One. 11, e0167725. 10.1371/journal.pone.0167725 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Fan R. R. et al. Nasopharyngeal Pneumococcal Density and Evolution of Acute Respiratory Illnesses in Young Children, Peru, 2009–2011. Emerg. Infect. Dis. 22, 1996–1999. 10.3201/eid2211.160902 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Xu L., Earl J. & Pichichero M. E. Nasopharyngeal microbiome composition associated with Streptococcus pneumoniae colonization suggests a protective role of Corynebacterium in young children. PLoS One. 16, e0257207. 10.1371/journal.pone.0257207 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.de Piters S. Interaction between the nasal microbiota and S. pneumoniae in the context of live-attenuated influenza vaccine. Nat. Commun. 10, 2981. 10.1038/s41467-019-10814-9 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kelly M. S. et al. Non-diphtheriae Corynebacterium species are associated with decreased risk of pneumococcal colonization during infancy. Isme j. 16, 655–665. 10.1038/s41396-021-01108-4 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Han H., Wang Z. & Zhu S. Benchmarking metagenomic binning tools on real datasets across sequencing platforms and binning modes. Nat. Commun. 16, 2865. 10.1038/s41467-025-57957-6 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.León M. J. et al. From metagenomics to pure culture: isolation and characterization of the moderately halophilic bacterium Spiribacter salinus gen. nov., sp. nov. Appl. Environ. Microbiol. 80, 3850–3857. 10.1128/aem.00430-14 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Stewart R. D. et al. Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery. Nat. Biotechnol. 37, 953–961. 10.1038/s41587-019-0202-3 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Almeida A. et al. A new genomic blueprint of the human gut microbiota. Nature 568, 499–504. 10.1038/s41586-019-0965-1 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Li Y. et al. Comprehensive human respiratory genome catalogue underlies the high resolution and precision of the respiratory microbiome. Brief. Bioinform. 26 10.1093/bib/bbae620 (2024). [DOI] [Google Scholar]
- 26.Bowers R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731. 10.1038/nbt.3893 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Wahl B. et al. Burden of Streptococcus pneumoniae and Haemophilus influenzae type b disease in children in the era of conjugate vaccines: global, regional, and national estimates for 2000–15. Lancet Glob Health. 6, e744–e757. 10.1016/s2214-109x(18)30247-x (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zaman S. M. et al. Impact of routine vaccination against Haemophilus influenzae type b in The Gambia: 20 years after its introduction. J. Glob Health. 10, 010416. 10.7189/jogh.10.010416 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Mackenzie G. A. et al. A cluster-randomised, non-inferiority trial of the impact of a two-dose compared to three-dose schedule of pneumococcal conjugate vaccination in rural Gambia: the PVS trial. Trials 23, 71. 10.1186/s13063-021-05964-5 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Howie S. R. et al. Etiology of severe childhood pneumonia in the Gambia, West Africa, determined by conventional and molecular microbiological analyses of lung and pleural aspirate samples. Clin. Infect. Dis. 59, 682–685. 10.1093/cid/ciu384 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Hill P. C. et al. Nasopharyngeal carriage of Streptococcus pneumoniae in Gambian villagers. Clin. Infect. Dis. 43, 673–679. 10.1086/506941 (2006). [DOI] [PubMed] [Google Scholar]
- 32.Kwambana-Adams B. et al. Rapid replacement by non-vaccine pneumococcal serotypes may mitigate the impact of the pneumococcal conjugate vaccine on nasopharyngeal bacterial ecology. Sci. Rep. 7, 8127. 10.1038/s41598-017-08717-0 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ruiz García Y. et al. THE IMPORTANCE OF MULTIDRUG RESISTANCE: A SYSTEMATIC LITERATURE REVIEW. Expert Rev. Vaccines. 20, 45–57. 10.1080/14760584.2021.1873136 (2021). [DOI] [PubMed] [Google Scholar]
- 34.Roca A. et al. Nasopharyngeal carriage of pneumococci four years after community-wide vaccination with PCV-7 in The Gambia: long-term evaluation of a cluster randomized trial. PLoS One. 8, e72198. 10.1371/journal.pone.0072198 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Li T. et al. Pan-Genome-Wide Association Study of Serotype 19A Pneumococci Identifies Disease-Associated Genes. Microbiol. Spectr. 11, e0407322. 10.1128/spectrum.04073-22 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Slack M. et al. Haemophilus influenzae type b disease in the era of conjugate vaccines: critical factors for successful eradication. Expert Rev. Vaccines. 19, 903–917. 10.1080/14760584.2020.1825948 (2020). [DOI] [PubMed] [Google Scholar]
- 37.Yang Y. et al. Haemophilus influenzae type b carriage and burden of its related diseases in Chinese children: Systematic review and meta-analysis. Vaccine 35, 6275–6282. 10.1016/j.vaccine.2017.09.057 (2017). [DOI] [PubMed] [Google Scholar]
- 38.Khattak Z. E. & Anjum F. in StatPearls (StatPearls Publishing Copyright © 2025 (StatPearls Publishing LLC., 2025). [Google Scholar]
- 39.Teo S. M. et al. The infant nasopharyngeal microbiome impacts severity of lower respiratory infection and risk of asthma development. Cell. Host Microbe. 17, 704–715. 10.1016/j.chom.2015.03.008 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Whittle E., Leonard M. O., Harrison R., Gant T. W. & Tonge D. P. Multi-Method Characterization of the Human Circulating Microbiome. Front. Microbiol. 9, 3266. 10.3389/fmicb.2018.03266 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Castillo D. J., Rifkin R. F., Cowan D. A. & Potgieter M. The Healthy Human Blood Microbiome: Fact or Fiction? Front. Cell. Infect. Microbiol. 9, 148. 10.3389/fcimb.2019.00148 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Schierwagen R. et al. Circulating microbiome in blood of different circulatory compartments. Gut 68, 578–580. 10.1136/gutjnl-2018-316227 (2019). [DOI] [PubMed] [Google Scholar]
- 43.Potgieter M., Bester J., Kell D. B. & Pretorius E. The dormant blood microbiome in chronic, inflammatory diseases. FEMS Microbiol. Rev. 39, 567–591. 10.1093/femsre/fuv013 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Amar J. et al. Blood microbiota dysbiosis is associated with the onset of cardiovascular events in a large general population: the D.E.S.I.R. study. PLoS One. 8, e54461. 10.1371/journal.pone.0054461 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Liu W. et al. Characterization of blood and urine microbiome temporal variability in patients with acute myeloid leukemia. Microb. Pathog. 206, 107734. 10.1016/j.micpath.2025.107734 (2025). [DOI] [PubMed] [Google Scholar]
- 46.Khan I. et al. Analysis of the blood bacterial composition of patients with acute coronary syndrome and chronic coronary syndrome. Front. Cell. Infect. Microbiol. 12, 943808. 10.3389/fcimb.2022.943808 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Barbeta E. et al. Biological effects of pulmonary, blood and gut microbiome alterations in patients with acute respiratory distress syndrome. ERJ Open. Res. 11 10.1183/23120541.00667-2024 (2025). [DOI] [Google Scholar]
- 48.Riesco R. & Trujillo M. E. Update on the proposed minimal standards for the use of genome data for the taxonomy of prokaryotes. Int. J. Syst. Evol. Microbiol. 74 10.1099/ijsem.0.006300 (2024). [DOI] [Google Scholar]
- 49.Ramasamy D. et al. A polyphasic strategy incorporating genomic data for the taxonomic description of novel bacterial species. Int. J. Syst. Evol. Microbiol. 64, 384–391. 10.1099/ijs.0.057091-0 (2014). [DOI] [PubMed] [Google Scholar]
- 50.Zheng J. et al. A taxonomic note on the genus Lactobacillus: Description of 23 novel genera, emended description of the genus Lactobacillus Beijerinck 1901, and union of Lactobacillaceae and Leuconostocaceae. Int. J. Syst. Evol. Microbiol. 70, 2782–2858. 10.1099/ijsem.0.004107 (2020). [DOI] [PubMed] [Google Scholar]
- 51.Ksiezarek M., Grosso F., Ribeiro T. G. & Peixe L. Genomic diversity of genus Limosilactobacillus. Microb. Genom. 8 10.1099/mgen.0.000847 (2022). [DOI] [Google Scholar]
- 52.Meziti A. et al. The Reliability of Metagenome-Assembled Genomes (MAGs) in Representing Natural Populations: Insights from Comparing MAGs against Isolate Genomes Derived from the Same Fecal Sample. Appl. Environ. Microbiol. 87 10.1128/aem.02593-20 (2021). [DOI] [Google Scholar]
- 53.Valente C. et al. Impact of the 13-valent pneumococcal conjugate vaccine on Streptococcus pneumoniae multiple serotype carriage. Vaccine 34, 4072–4078. 10.1016/j.vaccine.2016.06.017 (2016). [DOI] [PubMed] [Google Scholar]
- 54.Kamng’ona A. W. et al. High multiple carriage and emergence of Streptococcus pneumoniae vaccine serotype variants in Malawian children. BMC Infect. Dis. 15, 234. 10.1186/s12879-015-0980-2 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Manenzhe R. I. et al. Characterization of Pneumococcal Colonization Dynamics and Antimicrobial Resistance Using Shotgun Metagenomic Sequencing in Intensively Sampled South African Infants. Front. Public. Health. 8, 543898. 10.3389/fpubh.2020.543898 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Turner P. et al. Improved detection of nasopharyngeal cocolonization by multiple pneumococcal serotypes by use of latex agglutination or molecular serotyping by microarray. J. Clin. Microbiol. 49, 1784–1789. 10.1128/jcm.00157-11 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Boncoeur E. et al. PatA and PatB form a functional heterodimeric ABC multidrug efflux transporter responsible for the resistance of Streptococcus pneumoniae to fluoroquinolones. Biochemistry 51, 7755–7765. 10.1021/bi300762p (2012). [DOI] [PubMed] [Google Scholar]
- 58.Garvey M. I., Baylay A. J., Wong R. L. & Piddock L. J. Overexpression of patA and patB, which encode ABC transporters, is associated with fluoroquinolone resistance in clinical isolates of Streptococcus pneumoniae. Antimicrob. Agents Chemother. 55, 190–196. 10.1128/aac.00672-10 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Lupien A. et al. Genomic characterization of ciprofloxacin resistance in a laboratory-derived mutant and a clinical isolate of Streptococcus pneumoniae. Antimicrob. Agents Chemother. 57, 4911–4919. 10.1128/aac.00418-13 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Stogios P. J. et al. Substrate Recognition by a Colistin Resistance Enzyme from Moraxella catarrhalis. ACS Chem. Biol. 13, 1322–1332. 10.1021/acschembio.8b00116 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Lin J., Wang Y., Lin C., Li R. & Wang G. High Prevalence of Group III-Like Mutations Among BLPACR and First Report of Haemophilus influenzae ST95 Isolated from Blood in China. Infect. Drug Resist. 16, 999–1008. 10.2147/idr.S400207 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Hood D. W. et al. Three genes, lgtF, lic2C and lpsA, have a primary role in determining the pattern of oligosaccharide extension from the inner core of Haemophilus influenzae LPS. Microbiol. (Reading). 150, 2089–2097. 10.1099/mic.0.26912-0 (2004). [DOI] [Google Scholar]
- 63.Cardoso B. et al. Genomic insights of international clones of Haemophilus influenzae causing invasive infections in vaccinated and unvaccinated infants. Microb. Pathog. 150, 104644. 10.1016/j.micpath.2020.104644 (2021). [DOI] [PubMed] [Google Scholar]
- 64.Cho H. E. et al. Shotgun Metagenomic Sequencing Analysis as a Diagnostic Strategy for Patients with Lower Respiratory Tract Infections. Microorganisms 13 10.3390/microorganisms13061338 (2025). [DOI] [Google Scholar]
- 65.Causes of severe. pneumonia requiring hospital admission in children without HIV infection from Africa and Asia: the PERCH multi-country case-control study. Lancet 394, 757–779. 10.1016/s0140-6736(19)30721-4 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Scott J. A. et al. The definition of pneumonia, the assessment of severity, and clinical standardization in the Pneumonia Etiology Research for Child Health study. Clin. Infect. Dis. 54 (Suppl 2), 109–116. 10.1093/cid/cir1065 (2012). [DOI] [Google Scholar]
- 67.QIAGEN. DNeasy® PowerSoil® Kit Handbook, (2017). https://www.qiagen.com/us/resources/resourcedetail?id=5a0517a7-711d-4085-8a28-2bb25fab828a⟨=en
- 68.QIAGEN. QIAamp® MinElute®Virus Spin Handbook, (2020). https://www.qiagen.com/us/products/discovery-and-translational-research/dna-rna-purification/multianalyte-and-virus/qiaamp-minelute-virus-kits
- 69.QIAGEN. QIAamp DNA Mini and Blood Mini Handbook, https://www.qiagen.com/us/products/discovery-and-translational-research/dna-rna-purification/dna-purification/genomic-dna/qiaamp-dna-blood-kits (March 2024).
- 70.Espinoza J. L. & Dupont C. L. VEBA: a modular end-to-end suite for in silico recovery, clustering, and analysis of prokaryotic, microeukaryotic, and viral genomes from metagenomes. BMC Bioinform. 23, 419. 10.1186/s12859-022-04973-8 (2022). [DOI] [Google Scholar]
- 71.Espinoza J. L. et al. Unveiling the microbial realm with VEBA 2.0: a modular bioinformatics suite for end-to-end genome-resolved prokaryotic, (micro)eukaryotic and viral multi-omics from either short- or long-read sequencing. Nucleic Acids Res. 52, e63. 10.1093/nar/gkae528 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Kang D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359. 10.7717/peerj.7359 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Alneberg J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods. 11, 1144–1146. 10.1038/nmeth.3103 (2014). [DOI] [PubMed] [Google Scholar]
- 74.Wu Y. W., Simmons B. A. & Singer S. W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607. 10.1093/bioinformatics/btv638 (2016). [DOI] [PubMed] [Google Scholar]
- 75.Sieber C. M. K. et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat. Microbiol. 3, 836–843. 10.1038/s41564-018-0171-1 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Parks D. H., Imelfort M., Skennerton C. T., Hugenholtz P. & Tyson G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055. 10.1101/gr.186072.114 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Shen W., Le S., Li Y., Hu F. & SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLoS One. 11, e0163962. 10.1371/journal.pone.0163962 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Chaumeil P. A., Mussig A. J., Hugenholtz P. & Parks D. H. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 36, 1925–1927. 10.1093/bioinformatics/btz848 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Sheppard C. L. et al. A fast, flexible, specific, and sensitive tool for Streptococcus pneumoniae serotype screening and mixed serotype detection from genome sequence data. Microb. Genom. 8 10.1099/mgen.0.000904 (2022). PneumoKITy. [DOI] [Google Scholar]
- 80.Watts S. C. & Holt K. E. hicap: In Silico Serotyping of the Haemophilus influenzae Capsule Locus. J. Clin. Microbiol. 57 10.1128/jcm.00190-19 (2019). [DOI] [Google Scholar]
- 81.Jolley K. A., Bray J. E. & Maiden M. C. J. Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications. Wellcome Open. Res. 3, 124. 10.12688/wellcomeopenres.14826.1 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Nickols W. A. et al. MaAsLin 3: Refining and extending generalized multivariable linear models for meta-omic association discovery. bioRxiv (2024). 10.1101/2024.12.13.628459 [DOI] [Google Scholar]
- 83.Rylance J. et al. Household air pollution and the lung microbiome of healthy adults in Malawi: a cross-sectional study. BMC Microbiol. 16, 182. 10.1186/s12866-016-0803-7 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Shotgun metagenomic sequencing data has been deposited in the NCBI Sequence Read Archive associated with BioProject PRJNA727021. The accession numbers of the 71 samples used in the analysis in this paper are detailed in Supplementary Table S3. The genome assemblies of the 35 high-quality MAGs reported in this paper have been deposited in the Zenodo repository under https://zenodo.org/records/18494646. Other data will be made available on request




