ABSTRACT
We present the draft metagenome-assembled genomes (MAGs) of 13 Haemophilus representatives from human saliva. MAGs were reconstructed by a streamlined pre-assembly mapping approach performed against 9 clinically relevant reference genomes. Overall, genomes belonging to 2 potentially novel Haemophilus species and 11 strains were recovered, as determined by genome-wide ANI analysis.
KEYWORDS: Haemophilus, saliva, oral, bacteria, metagenome
ANNOUNCEMENT
Haemophilus are small gram-negative cocobacilli that inhabit the oral cavity and upper respiratory tract of humans (1, 2). Under dysbiotic conditions, Haemophilus spp. may engender localized and systemic infections, including otitis media, sinusitis, conjunctivitis, pneumonia, chancroid, meningitis, and bacteremia (2). Members of this genus are generally fastidious and difficult to culture in the laboratory (2); hence, molecular-based investigations can help shed additional light into their virulence and pathophysiology. In this study, 13 Haemophilus metagenome-assembled genomes (MAGs) were recovered from non-stimulated saliva of healthy and oral disease-associated subjects.
Twenty-seven volunteers were attended at the Dental Clinic of Amazonas State University (Brazil) with no distinction to gender, age, or ethnicity. All participants signed an informed consent complying to the seventh version of the Declaration of Helsinki (2013). Non-stimulated saliva samples were collected and submitted to total DNA extraction. Metagenomic DNA was hydrodynamically sonicated, and 1.0 µg DNA from each sample was used for preparation of sample-specific libraries with NEBNext Ultra DNA Library Prep Kit. Products in the range of 300 bp were selected and sequenced via Illumina HiSeq 2500 platform. Paired-end reads were merged and adapter sequences excised with PEAR v.0.9.8, while host-related sequences were removed by mapping to the GRCh38.p14 human genome data set (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.40, accessed on 02 June 2023) with Bowtie2 (3), using the “–un-conc” and “--very-sensitive-local” parameters. Species-level mapping of reads was performed with Bowtie2 (3) using the “--fast-local” parameter against NCBI’s reference genomes of H. aegyptius, H. ducreyi, H. haemolyticus, H. influenzae, H. parainfluenzae, H. parahaemolyticus, H. paraphrohaemolyticus, H. pittmaniae, and H. sputorum. Assembly of contigs and binning of MAGs were achieved with SPADES v.3.15.5 (4) and Maxbin2, respectively. Completeness (50% minimum) and contamination (10% maximum) values were assessed with CheckM v.1.10.18 (5). Taxonomic placement of MAGs was achieved with GTDB-Tk v1.7.0 (6) of Kbase (7), adopting ANI scores of 95% for species-level definition and between 95% and 97% for strain-level demarcation (8). General MAG annotation was performed with NCBI’s PGAP v.4.11. Search for antimicrobial resistance determinants was performed with CARD 2023 (9) and annotation of carbohydrate-active enzymes, with dbCAN3 (10).
In all, 113 MAGs were binned from a total of 27 clinical samples. Of these, 13 MAGs fully complied to the quality and taxonomic threshold parameters and were, therefore, further selected for taxonomic inference and gene annotation procedures. These MAGs were recovered from 11 distinct saliva specimens, with sample OHS0020_HPI being related to a chronic periodontitis case, and the remainder to healthy subjects. According to GTDB-tk analysis, all MAGs were placed within the Haemophilus genus, 11 of which corresponding to previously unreported strains of H. haemolyticus, H. parahaemolyticus, and H. parainfluenzae. In addition, MAGs OH0009_HAE and OH0010_HAE displayed ANI values <95% with those available in GTDB, suggesting them as putatively novel genomes closely related to H. parainfluenzae. These were deposited in Genbank under names Haemophilus bacterium OH0009 and OH0010, respectively. General taxonomic and annotation features are depicted in Table 1.
TABLE 1.
General information and annotation results of 13 metagenome-assembled genomes belonging to the Haemophilus genus retrieved from non-stimulated saliva of 11 individuals
| Characteristic | Data for MAG | ||||
|---|---|---|---|---|---|
| OHS002_HPH | OHS0003_HPI | OHS004_HHA | OHS0004_HPI | OHS0005_HPI | |
| Species-level taxonomy | H. parahaemolyticus | H. parainfluenzae | H. haemolyticus | H. parainfluenzae | H. parainfluenzae |
| Host clinical condition | Oral health | Oral health | Oral health | Oral health | Oral health |
| SRA accession no. | SRR14122749 | SRR14122738 | SRR14122730 | SRR14122730 | SRR14122729 |
| Biosample accession no. | SAMN34352703 | SAMN34352705 | SAMN34352701 | SAMN34352706 | SAMN34352707 |
| Genbank accession no. | JAUPSI000000000 | JAUPSK000000000 | JAUPSG000000000 | JAUPSL000000000 | JAUPSM000000000 |
| N50 (bp) | 12,080 | 1,944 | 4,297 | 3,889 | 3,178 |
| Genome size (nt) | 1,847,273 | 1,130,633 | 1,266,290 | 1,640,887 | 1,537,644 |
| Genome completeness | 97.84 | 68.84 | 77.93 | 92.06 | 86.26 |
| Genome contamination | 2.01 | 3.98 | 2.89 | 8.54 | 6.25 |
| Genome assembly coverage | 16.96 | 9.56 | 10.59 | 22.11 | 24.51 |
| GC content (%) | 40.45 | 39.86 | 38.85 | 39.85 | 39.81 |
| Protein-encoding genes | 1,823 | 1,503 | 1,322 | 1,879 | 1,846 |
| RNA genes | 45 | 21 | 18 | 35 | 41 |
| tRNA genes | 40 | 18 | 16 | 29 | 36 |
| ncRNA genes | 4 | 3 | 2 | 4 | 4 |
| Pseudogenes | 26 | 16 | 15 | 13 | 8 |
| Antibiotic resistance genes | 118 | 87 | 86 | 119 | 101 |
| Carbohydrate-active enzymes | 80 | 50 | 54 | 71 | 59 |
ACKNOWLEDGMENTS
We thank the Novogene Institute (Hong Kong, China) for DNA sequencing assistance, and CAPES (process no. 062.01941/2015) and FAPEAM (process no. 024/2014) research agencies for the financial support.
Contributor Information
Daniel Saito, Email: danielsaito@yahoo.com, dsaito@uea.edu.br.
Frank J. Stewart, Montana State University, Bozeman, USA
DATA AVAILABILITY
Raw reads and biosamples belong to NCBI’s bioproject no. PRJNA717815 are available under the accession numbers displayed in Table 1.
REFERENCES
- 1. Yamashita Y, Takeshita T. 2017. The oral microbiome and human health. J Oral Sci 59:201–206. doi: 10.2334/josnusd.16-0856 [DOI] [PubMed] [Google Scholar]
- 2. Nørskov-Lauritsen N. 2014. Classification, identification, and clinical significance of Haemophilus and Aggregatibacter species with host specificity for humans. Clin Microbiol Rev 27:214–240. doi: 10.1128/CMR.00103-13 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Langmead B, Trapnell C, Pop M, Salzberg SL. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25. doi: 10.1186/gb-2009-10-3-r25 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477. doi: 10.1089/cmb.2012.0021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2015. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055. doi: 10.1101/gr.186072.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. 2020. GTDB-TK: a toolkit to classify genomes with the genome taxonomy database. Bioinformatics 36:1925–1927. doi: 10.1093/bioinformatics/btz848 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, Dehal P, Ware D, Perez F, Canon S, et al. 2018. Kbase: the United States department of energy systems biology knowledgebase. Nat Biotechnol 36:566–569. doi: 10.1038/nbt.4163 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. 2018. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun 9:5114. doi: 10.1038/s41467-018-07641-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Alcock BP, Huynh W, Chalil R, Smith KW, Raphenya AR, Wlodarski MA, Edalatmand A, Petkau A, Syed SA, Tsang KK, et al. 2023. CARD 2023: expanded curation, support for machine learning, and resistome prediction at the comprehensive antibiotic resistance database. Nucleic Acids Res 51:D690–D699. doi: 10.1093/nar/gkac920 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Zheng J, Ge Q, Yan Y, Zhang X, Huang L, Yin Y. 2023. dbCAN3: automated carbohydrate-active enzyme and substrate annotation. Nucleic Acids Res 51:W115–W121. doi: 10.1093/nar/gkad328 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Raw reads and biosamples belong to NCBI’s bioproject no. PRJNA717815 are available under the accession numbers displayed in Table 1.
