Draft genome sequences of 13 putatively novel Haemophilus species and strains assembled from human saliva

Daniel Saito; Cristiane Pereira Borges Saito; Fabiana de Souza Cannavan; Siu Mui Tsai

doi:10.1128/mra.00945-23

. 2024 Feb 20;13(3):e00945-23. doi: 10.1128/mra.00945-23

Draft genome sequences of 13 putatively novel Haemophilus species and strains assembled from human saliva

Daniel Saito ^1,^✉, Cristiane Pereira Borges Saito ², Fabiana de Souza Cannavan ³, Siu Mui Tsai ³

Editor: Frank J Stewart⁴

PMCID: PMC10927657 PMID: 38376220

ABSTRACT

We present the draft metagenome-assembled genomes (MAGs) of 13 Haemophilus representatives from human saliva. MAGs were reconstructed by a streamlined pre-assembly mapping approach performed against 9 clinically relevant reference genomes. Overall, genomes belonging to 2 potentially novel Haemophilus species and 11 strains were recovered, as determined by genome-wide ANI analysis.

KEYWORDS: Haemophilus, saliva, oral, bacteria, metagenome

ANNOUNCEMENT

Haemophilus are small gram-negative cocobacilli that inhabit the oral cavity and upper respiratory tract of humans (1, 2). Under dysbiotic conditions, Haemophilus spp. may engender localized and systemic infections, including otitis media, sinusitis, conjunctivitis, pneumonia, chancroid, meningitis, and bacteremia (2). Members of this genus are generally fastidious and difficult to culture in the laboratory (2); hence, molecular-based investigations can help shed additional light into their virulence and pathophysiology. In this study, 13 Haemophilus metagenome-assembled genomes (MAGs) were recovered from non-stimulated saliva of healthy and oral disease-associated subjects.

Twenty-seven volunteers were attended at the Dental Clinic of Amazonas State University (Brazil) with no distinction to gender, age, or ethnicity. All participants signed an informed consent complying to the seventh version of the Declaration of Helsinki (2013). Non-stimulated saliva samples were collected and submitted to total DNA extraction. Metagenomic DNA was hydrodynamically sonicated, and 1.0 µg DNA from each sample was used for preparation of sample-specific libraries with NEBNext Ultra DNA Library Prep Kit. Products in the range of 300 bp were selected and sequenced via Illumina HiSeq 2500 platform. Paired-end reads were merged and adapter sequences excised with PEAR v.0.9.8, while host-related sequences were removed by mapping to the GRCh38.p14 human genome data set (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.40, accessed on 02 June 2023) with Bowtie2 (3), using the “–un-conc” and “--very-sensitive-local” parameters. Species-level mapping of reads was performed with Bowtie2 (3) using the “--fast-local” parameter against NCBI’s reference genomes of H. aegyptius, H. ducreyi, H. haemolyticus, H. influenzae, H. parainfluenzae, H. parahaemolyticus, H. paraphrohaemolyticus, H. pittmaniae, and H. sputorum. Assembly of contigs and binning of MAGs were achieved with SPADES v.3.15.5 (4) and Maxbin2, respectively. Completeness (50% minimum) and contamination (10% maximum) values were assessed with CheckM v.1.10.18 (5). Taxonomic placement of MAGs was achieved with GTDB-Tk v1.7.0 (6) of Kbase (7), adopting ANI scores of 95% for species-level definition and between 95% and 97% for strain-level demarcation (8). General MAG annotation was performed with NCBI’s PGAP v.4.11. Search for antimicrobial resistance determinants was performed with CARD 2023 (9) and annotation of carbohydrate-active enzymes, with dbCAN3 (10).

In all, 113 MAGs were binned from a total of 27 clinical samples. Of these, 13 MAGs fully complied to the quality and taxonomic threshold parameters and were, therefore, further selected for taxonomic inference and gene annotation procedures. These MAGs were recovered from 11 distinct saliva specimens, with sample OHS0020_HPI being related to a chronic periodontitis case, and the remainder to healthy subjects. According to GTDB-tk analysis, all MAGs were placed within the Haemophilus genus, 11 of which corresponding to previously unreported strains of H. haemolyticus, H. parahaemolyticus, and H. parainfluenzae. In addition, MAGs OH0009_HAE and OH0010_HAE displayed ANI values <95% with those available in GTDB, suggesting them as putatively novel genomes closely related to H. parainfluenzae. These were deposited in Genbank under names Haemophilus bacterium OH0009 and OH0010, respectively. General taxonomic and annotation features are depicted in Table 1.

TABLE 1.

General information and annotation results of 13 metagenome-assembled genomes belonging to the Haemophilus genus retrieved from non-stimulated saliva of 11 individuals

Characteristic	Data for MAG
	OHS002_HPH	OHS0003_HPI	OHS004_HHA	OHS0004_HPI	OHS0005_HPI
Species-level taxonomy	H. parahaemolyticus	H. parainfluenzae	H. haemolyticus	H. parainfluenzae	H. parainfluenzae
Host clinical condition	Oral health	Oral health	Oral health	Oral health	Oral health
SRA accession no.	SRR14122749	SRR14122738	SRR14122730	SRR14122730	SRR14122729
Biosample accession no.	SAMN34352703	SAMN34352705	SAMN34352701	SAMN34352706	SAMN34352707
Genbank accession no.	JAUPSI000000000	JAUPSK000000000	JAUPSG000000000	JAUPSL000000000	JAUPSM000000000
N50 (bp)	12,080	1,944	4,297	3,889	3,178
Genome size (nt)	1,847,273	1,130,633	1,266,290	1,640,887	1,537,644
Genome completeness	97.84	68.84	77.93	92.06	86.26
Genome contamination	2.01	3.98	2.89	8.54	6.25
Genome assembly coverage	16.96	9.56	10.59	22.11	24.51
GC content (%)	40.45	39.86	38.85	39.85	39.81
Protein-encoding genes	1,823	1,503	1,322	1,879	1,846
RNA genes	45	21	18	35	41
tRNA genes	40	18	16	29	36
ncRNA genes	4	3	2	4	4
Pseudogenes	26	16	15	13	8
Antibiotic resistance genes	118	87	86	119	101
Carbohydrate-active enzymes	80	50	54	71	59

Open in a new tab

ACKNOWLEDGMENTS

We thank the Novogene Institute (Hong Kong, China) for DNA sequencing assistance, and CAPES (process no. 062.01941/2015) and FAPEAM (process no. 024/2014) research agencies for the financial support.

Contributor Information

Daniel Saito, Email: danielsaito@yahoo.com, dsaito@uea.edu.br.

Frank J. Stewart, Montana State University, Bozeman, USA

DATA AVAILABILITY

Raw reads and biosamples belong to NCBI’s bioproject no. PRJNA717815 are available under the accession numbers displayed in Table 1.

REFERENCES

1. Yamashita Y, Takeshita T. 2017. The oral microbiome and human health. J Oral Sci 59:201–206. doi: 10.2334/josnusd.16-0856 [DOI] [PubMed] [Google Scholar]
2. Nørskov-Lauritsen N. 2014. Classification, identification, and clinical significance of Haemophilus and Aggregatibacter species with host specificity for humans. Clin Microbiol Rev 27:214–240. doi: 10.1128/CMR.00103-13 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Langmead B, Trapnell C, Pop M, Salzberg SL. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25. doi: 10.1186/gb-2009-10-3-r25 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477. doi: 10.1089/cmb.2012.0021 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2015. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055. doi: 10.1101/gr.186072.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. 2020. GTDB-TK: a toolkit to classify genomes with the genome taxonomy database. Bioinformatics 36:1925–1927. doi: 10.1093/bioinformatics/btz848 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, Dehal P, Ware D, Perez F, Canon S, et al. 2018. Kbase: the United States department of energy systems biology knowledgebase. Nat Biotechnol 36:566–569. doi: 10.1038/nbt.4163 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. 2018. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun 9:5114. doi: 10.1038/s41467-018-07641-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Alcock BP, Huynh W, Chalil R, Smith KW, Raphenya AR, Wlodarski MA, Edalatmand A, Petkau A, Syed SA, Tsang KK, et al. 2023. CARD 2023: expanded curation, support for machine learning, and resistome prediction at the comprehensive antibiotic resistance database. Nucleic Acids Res 51:D690–D699. doi: 10.1093/nar/gkac920 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Zheng J, Ge Q, Yan Y, Zhang X, Huang L, Yin Y. 2023. dbCAN3: automated carbohydrate-active enzyme and substrate annotation. Nucleic Acids Res 51:W115–W121. doi: 10.1093/nar/gkad328 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Raw reads and biosamples belong to NCBI’s bioproject no. PRJNA717815 are available under the accession numbers displayed in Table 1.

[B1] 1. Yamashita Y, Takeshita T. 2017. The oral microbiome and human health. J Oral Sci 59:201–206. doi: 10.2334/josnusd.16-0856 [DOI] [PubMed] [Google Scholar]

[B2] 2. Nørskov-Lauritsen N. 2014. Classification, identification, and clinical significance of Haemophilus and Aggregatibacter species with host specificity for humans. Clin Microbiol Rev 27:214–240. doi: 10.1128/CMR.00103-13 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3. Langmead B, Trapnell C, Pop M, Salzberg SL. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25. doi: 10.1186/gb-2009-10-3-r25 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477. doi: 10.1089/cmb.2012.0021 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] 5. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2015. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055. doi: 10.1101/gr.186072.114 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6. Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. 2020. GTDB-TK: a toolkit to classify genomes with the genome taxonomy database. Bioinformatics 36:1925–1927. doi: 10.1093/bioinformatics/btz848 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7. Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, Dehal P, Ware D, Perez F, Canon S, et al. 2018. Kbase: the United States department of energy systems biology knowledgebase. Nat Biotechnol 36:566–569. doi: 10.1038/nbt.4163 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8. Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. 2018. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun 9:5114. doi: 10.1038/s41467-018-07641-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9. Alcock BP, Huynh W, Chalil R, Smith KW, Raphenya AR, Wlodarski MA, Edalatmand A, Petkau A, Syed SA, Tsang KK, et al. 2023. CARD 2023: expanded curation, support for machine learning, and resistome prediction at the comprehensive antibiotic resistance database. Nucleic Acids Res 51:D690–D699. doi: 10.1093/nar/gkac920 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10. Zheng J, Ge Q, Yan Y, Zhang X, Huang L, Yin Y. 2023. dbCAN3: automated carbohydrate-active enzyme and substrate annotation. Nucleic Acids Res 51:W115–W121. doi: 10.1093/nar/gkad328 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Draft genome sequences of 13 putatively novel Haemophilus species and strains assembled from human saliva

Daniel Saito

Cristiane Pereira Borges Saito

Fabiana de Souza Cannavan

Siu Mui Tsai

Roles

ABSTRACT

ANNOUNCEMENT

TABLE 1.

ACKNOWLEDGMENTS

Contributor Information

DATA AVAILABILITY

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Draft genome sequences of 13 putatively novel Haemophilus species and strains assembled from human saliva

Daniel Saito

Cristiane Pereira Borges Saito

Fabiana de Souza Cannavan

Siu Mui Tsai

Roles

ABSTRACT

ANNOUNCEMENT

TABLE 1.

ACKNOWLEDGMENTS

Contributor Information

DATA AVAILABILITY

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases