Skip to main content
BMC Genomics logoLink to BMC Genomics
. 2025 Oct 10;26:908. doi: 10.1186/s12864-025-12028-4

Impact of RNA extraction on respiratory microbiome analysis using third-generation sequencing

Alice Michel 1,, Marie Leoz 1, Nicolas Nesi 2, Hortense Petat 3, Meriadeg Ar Gouilh 4, Camille Charbonnier Le Clezio 5, Christophe Marguet 3, Chervin Hassel 1, Jean-Christophe Plantier 6
PMCID: PMC12512950  PMID: 41073888

Abstract

Background

The respiratory microbiome, which comprises bacteria, fungi, and viruses, plays a crucial role in respiratory health and disease. However, its study is limited by the low microbial biomass in respiratory samples and the dominance of host RNA. Metatranscriptomics offers comprehensive insights into active microbial communities and their interactions with the host but requires optimized RNA extraction protocols for robust and unbiased analysis. This study evaluated two RNA extraction kits—one employing chemical lysis (CL) and another combining chemical and mechanical lysis (CML)—to determine their effectiveness for metatranscriptomic analysis of respiratory samples.

Results

The CML protocol significantly increased double-stranded DNA (dsDNA) library yields, leading to higher sequencing read counts for both sample types (p < 0.0001). The read length was unaffected by the lysis protocol for the BAL and NPS samples. Taxonomic profiling revealed that CML enhanced the detection of robust microorganisms, such as gram-positive bacteria and fungi, without compromising viral detection.

Conclusions

The CML protocol demonstrated superior recovery of genetic material, particularly for fungi and gram-positive bacteria, making it better suited for comprehensive metatranscriptomic analyses. These findings underscore the need for tailored RNA extraction strategies on the basis of sample type and research objectives. Optimized metatranscriptomic protocols are pivotal for advancing our understanding of the respiratory microbiome and its role in health and disease.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12864-025-12028-4.

Keywords: Respiratory microbiome, Metatranscriptomics, RNA extraction, Chemicallysis, Mechanical lysis, Nasopharyngeal swabs, Bronchoalveolar lavage, Microbial diversity, Fungi, Gram-positive bacteria

Background

First described in the 2010 s, the respiratory microbiome—composed of bacteria, fungi, and viruses—remains poorly understood despite growing interest and research efforts in this field [1, 2]. Emerging evidence suggests that dysbiosis of the respiratory microbiome may play a significant role in the pathogenesis of several respiratory diseases, including cystic fibrosis (CF), chronic obstructive pulmonary disease (COPD), lung cancer, and asthma [35].

Because respiratory samples typically have low microbial biomass, most microbiome studies rely on amplicon-based metagenomics (metabarcoding) to characterize microbial communities across various human anatomical sites [6]. While this approach is well suited for profiling bacterial [7, 8] and fungal populations [9], it is not adapted for studying viruses. Moreover, it is limited in providing taxonomic insights and cannot address functional questions.

To bridge this gap, metatranscriptomics has emerged as a powerful alternative. This sequencing approach allows for simultaneous identification of active microbial species (including RNA viruses) and the genes they express, providing functional insights into microbial communities. Working with RNA reduces the likelihood of detecting nonviable organisms, a common limitation of DNA-based metabarcoding, which often targets intact DNA of dead bacteria.

Additionally, it enables the analysis of the host transcriptome, offering a more comprehensive view of interactions between the microbiome and the host in respiratory disease contexts. Long-read metatranscriptomics can be even more informative because of the possibility of resolving full-length transcripts when third-generation sequencing, such as Nanopore sequencing, is used. Moreover, nanopore sequencing provides a cost-effective alternative to short-read technologies, making it accessible for a broader range of research [10, 11].

However, metatranscriptomics presents several technical challenges. The low microbial RNA content in respiratory samples, often overshadowed by host RNA from epithelial cells (contained in nasopharyngeal swabs or bronchoalveolar lavages), requires a high sequencing depth to ensure adequate coverage of microbial transcripts. Without sufficient sequencing effort, host-derived reads dominate, compromising the ability to accurately profile the microbiome, in addition to bioinformatics challenges. However, the first critical step consists of recovering sufficient amounts of high-quality RNA representing the original microbiome composition. The inherent fragility of RNA requires careful handling to maintain its integrity throughout the extraction process, which involves significant physicochemical variations.

However, the diverse array of microorganisms found in the respiratory microbiome (bacteria, viruses, fungi), each with unique structural properties, complicates lysis. These structural differences necessitate tailored lysis strategies: mechanical lysis, which physically disrupts robust cell walls such as those of fungi and bacteria, and chemical lysis, which uses reagents to permeabilize or dissolve cellular membranes. Mechanical lysis, often involving bead beating or sonication, is effective for microorganisms with thick or rigid cell walls but can lead to RNA degradation if it is overapplied. Conversely, chemical lysis is gentler, minimizing RNA damage, but may be insufficient to break open more resilient cell types without supplementation.

To address these complexities, this study compared two representative RNA extraction kits for two sample types: bronchoalveolar lavage (BAL), which is representative of the lower respiratory microbiome, and nasopharyngeal swab (NPS), which is representative of the upper respiratory microbiome. The first kit, which is specifically designed for viruses, employs chemical lysis exclusively and is optimized for its relatively simple structure. The second kit combines mechanical and chemical lysis, aiming to efficiently process a broader range of microbial taxa, including bacteria and fungi. Kit performance was assessed through key metrics, including RNA yield, dsDNA library yield, total read count, median read length, host read proportion and taxonomic resolution.

Methods

Sample collection

A total of 31 samples (16 NPS, 15 BAL) were collected at the Charles Nicolle University Hospital Center in Rouen, Normandy, France (N° AC-2020–4274). BALs were sampled via a 3% sodium chloride solution, while NPS samples were stored in disposable virus specimen collection tubes containing transport medium (JunNuo, Shandong Province, China). All samples tested negative for the respiratory pathogen targets (21 types or subtypes of viruses and 3 bacteria) of the ePlex RP2 CE-IVD Panel, as assessed by multiplex PCR (Genmark Diagnostics, Carlsbad, CA, USA). To facilitate downstream analyses—notably to secure sufficient sample volume for a comparison of extraction methods, each using two distinct input volumes in duplicate—samples were pooled into eight composite samples (four NPS samples and four BAL samples), each with a final volume of 5 mL.

RNA extraction

Two different extraction kits were tested for DNA and RNA extraction. The first, NucleoSpin Virus (Macherey–Nagel, Düren, Germany), uses chemical cell lysis (CL), and the second, Quick-DNA/RNA Miniprep Plus (Zymo-Research, Irvine, CA, USA), uses both chemical and mechanical cell lysis (CML) with bead beating. Both extraction methods were performed following the supplier's instructions, with 200 µL of sample input as recommended or 400 µL, which was assumed to recover a higher RNA yield. Each extraction was performed in duplicate.

Library prep and sequencing

DNase treatment and rRNA depletion

To prevent DNA contamination, RNA extraction products were treated via TURBO DNase (Invitrogen, Carlsbad, CA, USA) – a high-activity recombinant enzyme designed to remove residual DNA without compromising RNA integrity – and Baseline-ZERO DNase (Lucigen, Middleton, WI, USA). Eukaryotic ribosomal RNAs (rRNAs) were depleted via the NEBNext rRNA Depletion Kit v2 (New England Biolabs, Ipswich, MA, USA), as detailed in the supplementary methods.

RNA ligation circularization, reverse transcription and whole-transcriptome amplification

Single-stranded RNAs were circularized via T4 RNA Ligase I (New England Biolabs, Ipswich, MA, USA) according to the supplier’s recommendations. First-strand cDNA synthesis was performed via the SuperScript VILO cDNA Synthesis Kit (Invitrogen, Carlsbad, CA, USA), and whole-transcriptome amplification was performed via the QuantiTect Whole Transcriptome Kit (Qiagen, Venlo, The Netherlands), as detailed in the supplementary methods.

T7 endonuclease treatment, DNA repair and end-prep

Recognition and cleavage of nonβ DNA structures were performed on amplified dsDNA samples via T7 endonuclease I (New England Biolabs, Ipswich, MA, USA). DNA repair was performed via a combination of the NEBNext FFPE DNA Repair Mix and the NEBNext Ultra II End Repair/dA-tailing Kit (New England Biolabs, Ipswich, MA, USA), as detailed in the supplementary methods.

Barcoding, adaptor ligation and sequencing

The samples were barcoded via a Native Barcoding Extension Kit (EXP-NBD104, Oxford Nanopore Technologies, Oxford, UK). Adapter ligation and sequencing library preparation followed the Ligation Sequencing Kit (SQK-LSK109, Oxford Nanopore Technologies, Oxford, UK) supplier’s recommendations. Sequencing was performed on R9.4.1 flow cells with a MinION Mk1C device (Oxford Nanopore Technologies, Oxford, UK). Each sequencing run analyzed a single respiratory sample pool (BAL or NPS), totaling eight runs.

Positive control

A positive control (PC) of known composition was included in the study. We used the ZymoBIOMICS Microbial Community Standard (D6300, Zymo-Research, Irvine, CA, USA), an inactivated microorganisms mixture containing 8 different bacteria and 2 fungi, and spiked it with 2 viral supernatants available in the laboratory (HIV and SARS-CoV-2, two RNA viruses). The positive control was treated the same way as the respiratory sample pools for RNA extraction and sequencing, except for the libraries which were prepared using the Native Barcoding Kit 24 V14 (Oxford Nanopore Technologies, Oxford, UK) and sequenced on a R10.4.1 flow cell, in accordance with the supplier's protocol (version “NBE_9169_v114_revU_30Jan2025” of Ligation sequencing gDNA protocol).

Bioinformatics

Basecalling, trimming and quality analysis

For the NPS and BAL pools, basecalling was performed via Guppy (version 6.5.7) in recursive mode. The positive control was processed via the Dorado fast model. For both, the minimal quality score was set at 7. Adapter removal was performed via porechop-abi (version 0.5.0). The quality-based trimming of the raw reads was performed via filtration (version 0.2.1) and fastp (version 0.23.4). Nanoq (version 0.10.0) was used for quality analysis of both the raw and the cleaned reads.

Taxonomic classification of reads

Taxonomic classification of the cleaned reads from the NPS and BAL pools was performed via Kraken2 (v2.1.2) with the PlusPFP-2023 database (May 2023 release, 148 Gb) and a confidence score threshold of 0.2. Abundance estimation was subsequently conducted with Bracken (v2.8.0) using a threshold (-t) of 10. Reads from the positive control were mapped to reference genomes of the expected microbial taxa (eight bacteria, two fungi, and two viruses) via Minimap2 (v2.26).

Statistics

The comparison of the RNA extraction kits and sample input volumes was assessed on multiple parameters: post extraction RNA yield, dsDNA library yield, read counts, median read length, and proportion of reads passing sequencing filters. Two-way analysis of variance (ANOVA) was performed, followed by multiple pairwise comparisons with Bonferroni correction, implemented in the rstatix package (v0.7.2). These analyses were conducted separately for the NPS and BAL pools. CLR-based differential abundance testing – defined as the sum of absolute differences in relative abundances between two profiles – was performed to compare the theoretical genomic composition of the positive control with its observed transcriptomic profile via the compositions (v2.0.8) and vegan (v2.6.8) packages. Alpha diversity metrics were calculated with the vegan package (v2.6.8). Data visualization was carried out via ggplot2 (v3.5.1.9000) and ggpubr (v0.6.0).

Results

Study design

On the basis of the 15 BAL samples and the 16 NPS samples, we defined four BAL pools and four NPS pools for analysis, along with a positive control of known composition. RNA was extracted via two lysis protocols: chemical lysis only (CL) and chemical and mechanical lysis (CML). Each protocol was applied to two different sample volumes (200 µL and 400 µL) with the assumption that increasing the input volume might increase the amount of recovered biological material. All extraction conditions were performed in duplicate. Each sequencing run focused on a single pool of respiratory samples (either BAL or NPS), with a total of eight runs conducted (four BAL pools and four NPS pools). The positive control was processed in a separate independent run. The organization of the sequencing runs and extraction conditions are detailed in Fig. 1.

Fig. 1.

Fig. 1

Methods global overview. Global workflow illustrating the wet-lab methods of the study, from samples (NPS and BAL pools, positive control) to RNA extraction conditions – all treated in duplicate – including two sample volume intake and two different cell lysis methods (either chemical and mechanical lysis – CML – or chemical lysis only – CL) and finally Whole Transcriptome Amplification, Library Preparation and Nanopore sequencing

The CML protocol provides a higher dsDNA library yield in respiratory samples

The RNA concentrations were significantly higher following extraction via the CL protocol than via the CML protocol (p < 0.0001), for both the four BAL pools and the four NPS pools (Fig. 2A and B, Supplementary Table 1). The input sample volume had no significant effect on the RNA yield in the BAL pools, regardless of the extraction method (p = 0.244). However, for the NPS pools, increasing the input volume from 200 µL to 400 µL in the CL protocol significantly improved the RNA concentration (p < 0.0001) (Fig. 2B).

Fig. 2.

Fig. 2

Nucleic acid concentrations. RNA concentration after extraction for (A) BAL and (B) NPS pools, dsDNA library concentration for (C) BAL and (D) NPS pools. Data are represented as boxplots with jittered points corresponding to individual samples. The dashed green line indicates the detection threshold of the quantification assays. Significant differences between groups were assessed using a two-way ANOVA (kit × volume) with post-hoc comparisons adjusted by the Bonferroni method. The displayed p-values (indicated by ns or ****) represent these post-hoc comparisons (ns p > 0.005 and **** p ≤ 0.0001). All samples were treated in duplicate

In contrast, the concentration of dsDNA library was significantly higher when CML was used than when CL was used, for both BAL (p < 0.0001, Fig. 2C) and NPS samples (p < 0.0001, Fig. 2D). Finally, increasing the input sample volume from 200 µL to 400 µL did not significantly improve the dsDNA library yield.

The CML protocol provides a higher sequencing yield without altering read length or quality in respiratory samples

Read quality—assessed by the proportion of reads passing sequencing filters—was not significantly impacted by the lysis protocol, regardless of sample type or input volume (Fig. 3A and B, Supplementary Table 1).

Fig. 3.

Fig. 3

Quality metrics evaluation from sequencing results. Boxplots illustrating reads passing sequencing filters for (A) BAL and (B) NPS pools, read counts for (C) BAL and (D) NPS pools and median read length (in base pairs) for (E) BAL and (F) NPS. Significant differences between groups were assessed using a two-way ANOVA (kit × volume) with post-hoc comparisons adjusted by the Bonferroni method. The displayed p-values (indicated by ns or ****) represent these post-hoc comparisons (ns p > 0.005 and **** p ≤ 0.0001). All samples were treated in duplicate

Consistent with the dsDNA library concentrations prior to library preparation, the read counts were significantly higher for the BAL samples treated with CML than for those treated with CL (p < 0.0001, Fig. 3C). Increasing the sample volume to 400 µL for RNA extraction did not result in a significant increase in read counts for either CL (p = 0.955) or CML (p = 0.086). Similar results were observed for the NPS pools (Fig. 3D), with significantly higher read counts obtained via CML (p < 0.0001) but no significant improvement with a 400 µL input volume (p = 0.450 for CL; p = 0.484 for CML).

The read length was not affected by the lysis method or sample volume, as shown by the median read length, for either the BAL pool or the NPS pool (Fig. 3E and F, respectively).

The CML protocol provides a closer taxonomic profile of the positive control

The taxonomic profile of positive control reads revealed that gram-positive bacteria such as Bacillus subtilis, Listeria monocytogenes, Staphylococcus aureus and Enterococcus faecalis were underrepresented when CL was used (p < 0.001) compared with the theoretical genomic composition (Fig. 4, Supplementary Table 2). The detection rates of gram-negative Salmonella enterica (p = 0.005), Escherichia coli (p < 0.001) and Pseudomonas aeruginosa (p < 0.001), as well as the other two fungi (p < 0.001 for Cryptococcus neoformans; p < 0.001 for Saccharomyces cerevisiae), also decreased when CL was used. Owing to underrepresentation of the previous bacteria and fungi, gram-negative Limosilactobacillus fermentum bacteria as well as SARS-CoV-2 and HIV were overrepresented.

Fig. 4.

Fig. 4

Taxonomic assignation of metatranscriptomics reads from the positive control. Relative abundance of each expected Species, depending on the RNA extraction condition tested, compared to the theoretical genomic composition of the positive control. All samples were treated in duplicate

With CML extraction, the detection of gram-negative Salmonella enterica was further impaired (p < 0.001), but it was improved for all the other genera (gram-positive Enterococcus faecalis, p = 0.039 and Listeria monocytogenes, p = 0.018; gram-negative Escherichia coli, p = 0.010; fungi Cryptococcus neoformans, p < 0.001; and Saccharomyces cerevisiae, p = 0.004), or there was no significant difference (gram-positive Bacillus subtilis and Staphylococcus aureus, gram-negative Pseudomonas aeruginosa) compared with the theoretical composition. Gram-negative Limosilactobacillus fermentum bacteria as well as SARS-CoV-2 and HIV were still overrepresented.

The L1 distance was calculated between each observed taxonomic profile and the theoretical genomic composition (Supplementary Fig. 1). A significantly higher L1 distance was observed when the positive control was processed with CL compared to CML (p < 0.001). However, no significant difference in L1 distance was observed depending on the sample volume used for RNA extraction.

The CML protocol optimizes the respiratory microbiome study

Taxonomic profiling of reads from BAL pools revealed that host reads was predominant (Fig. 5A, 72.14% mean value), without a significant difference depending on the extraction protocol (p = 0.35). Host, bacterial and fungal reads were found in these pools. When abundant, fungi were detected via both lysis protocols (Fig. 5A, BAL1 pool – mean value of 10.30%), whereas a smaller proportion of fungal reads were only detected via CML (0.10% and 0.11% mean values in the BAL2 and BAL3 pools, respectively). In the NPS pools, only host (Fig. 5B, 14.22% mean value) and bacterial reads were found. The host read proportion was not significantly affected by the extraction method considered (p = 0.65).

Fig. 5.

Fig. 5

Taxonomic assignation of host and microbiome reads from respiratory samples. Relative abundance of host (red), bacteria (blue), fungi (green) and viruses (yellow) associated reads for both (A) BAL pools and (B) NPS pools. All samples were treated in duplicate

Taxonomic profiling at the genus level revealed diverse compositions in the BAL pools (Fig. 6A). The most abundant genera included Malassezia, a fungus, and Rothia, a gram-positive bacterium [12], both of which were present in notable proportions in BAL samples. Pseudomonas, Klebsiella, Stenotrophomonas, and Streptococcus [3, 13] were also among the dominant taxa, with varying abundances across pools. No significant difference in the relative abundances of these major genera were observed between extraction protocols or sample volumes. However, Malassezia was exclusively detected with the CML protocol in BAL2 and BAL3. Similarly, Candida was only identified with the CML protocol across all pools.

Fig. 6.

Fig. 6

Taxonomic assignation of microbiome reads from respiratory samples. Relative abundance of microbiome specific reads (bacteria, fungi and viruses) at genus level for both (A) BAL and (B) NPS pools. All samples were treated in duplicate

The NPS pools presented a consistent taxonomic profile (Fig. 6B), with predominant genera, including Flavobacterium, Phyllobacterium, Empedobacter, Brevundimonas, and Hyphomonas. Staphylococcus and Streptococcus were also detected, particularly in the NPS2 pool.

While the RNA extraction conditions did not significantly impact the overall genus-level compositions, as indicated by the Shannon alpha diversity index results (Supplementary Fig. 2), some taxa were selectively identified with the CML protocol. For example, Corynebacterium was detected in small proportions (mean value of 0.58%) in NPS samples processed with CML but was absent in those treated with CL. Additionally, Dolosigranulum was identified only in NPS1 and NPS2 with the CML protocol.

Discussion

Metatranscriptomic sequencing is a highly demanding yet complete approach, making the optimization of every step essential, particularly for RNA extraction. In this study, we focused on respiratory samples, specifically bronchoalveolar lavage (BAL) fluid and nasopharyngeal swabs (NPS), which represent the lower and upper respiratory microbiomes, respectively. Microbial biomass is particularly low in these samples [14], which poses a major challenge for sequencing applications such as metatranscriptomics.

To overcome this limitation, we implemented a whole‐transcriptome amplification (WTA) step prior to third‐generation sequencing, since direct RNA or cDNA sequencing requires higher nucleic acid inputs. We selected a primer‐agnostic WTA method to avoid preferential amplification of any taxon—which is crucial when profiling complex microbiomes of unknown compositions—and to remain compatible with virome workflows that do not rely on targeted amplification [15]. This approach delivered an unbiased, high‐yield pre-amplification step, enabling robust library preparation without compromising the breadth of microbial detection.

Studying the microbiome also necessitates a protocol capable of extracting genetic material from bacteria, fungi, and viruses alike. Therefore, we tested two distinct kits: one specifically designed for viral RNA extraction and another more generalized kit to which we added mechanical lysis, as described by Zinter et al [16].

Interestingly, the RNA yield with the CML protocol was consistently undetectable via fluorometry, whereas the CL protocol yielded detectable, albeit low, RNA concentrations for most pools. However, dsDNA library concentration was significantly higher for samples treated with CML than for those treated with CL. This discrepancy could be explained by the presence of enzymatic inhibitors in the CL-extracted RNA samples. Notably, the CL protocol involves the use of RNAlater as an RNA stabilization solution, whereas the CML protocol uses DNA/RNA Shield. RNAlater is known to contain high concentrations of salts, which can co-purify with RNA and interfere with enzymatic reactions that are sensitive to inhibitors such as chaotropic salts or phenolic compounds [1719]. The low biomass and very small volumes throughout the library preparation workflow did not allow for intermediate quality controls, preventing us from identifying which enzymatic step were the most affected.

Consistent with the dsDNA library yields, the sequencing results revealed significantly higher read counts for the CML protocol than for the CL protocol. These observations suggest that fluorometric RNA quantification may not always be a reliable predictor of sequencing yield in this context. In addition, we also found that using a larger input volume (400 µL of sample instead of 200 µL) did not significantly increase the RNA, dsDNA library or sequencing yield with either the CML or the CL extraction kit on low-biomass samples.

In this study, we employed long-read sequencing to increase the taxonomic resolution and enable the reconstruction of full-length gene sequences, which are key aspects for studying complex microbial communities. The CML protocol significantly increased sequencing yields, but a potential drawback of this method is the introduction of temperature fluctuations and physical stress, which can fragment RNA molecules. Thus, it was essential to evaluate whether CML affected read length, potentially introducing biases in downstream analyses. However, we observed no significant difference in sequence length between CML and CL for either the BAL or NPS pools, confirming that CML is compatible with long-read sequencing of low-biomass samples.

The CML protocol uses tube-based bead‐beating (e.g., FastPrep) for efficient and broad spectrum lysis of microorganisms, especially in low‐biomass respiratory samples. However, the enhancement in microbial detection comes with some limitations: it requires the use of a specialized equipment, and as opposed to downstream RNA purification steps, it cannot be adapted to high‐throughput microplate formats. To our knowledge, mechanical lysis requires higher volumes to ensure uniform bead agitation and thus remains a tube‐based step. Since mechanical lysis enhances RNA recovery from fungi and gram-positive bacteria (organisms known for their resistant cell walls), the higher read counts observed in respiratory sample pools processed with CML may have been due to improved recovery of these taxa. The analysis of the positive control supported this hypothesis, as it revealed increased detection of gram-positive bacterial and fungal species. However, when the taxonomic compositions of the respiratory samples were compared, no significant differences were detected between the CML and CL protocols. This finding indicates that CML actually enhanced the sequencing of difficult-to-lyse microorganisms as well as those that were more easily lysed, such as gram-negative bacteria, in the low-biomass samples analyzed in this study.

Furthermore, in lower respiratory tract samples such as BAL fluid, host RNA often dominates the sequencing data because of the abundance of host cells collected during sampling. This can lead to a significant proportion of sequencing reads being unusable for microbiome analysis. Despite its advantages in improving total read recovery, CML did not significantly increase the proportion of microbial reads relative to host reads in BAL or NPS. Taxonomic profiles remained globally similar across input volumes within each protocol, which confirmed that doubling sample input is not justified, particularly when working with precious samples.

With respect to microbiome composition, taxonomic analysis of sequencing results is a critical step, as the choice of software, database, and parameters significantly impacts the outcomes. Nanopore long reads exhibit a higher error rate compared to Illumina—though this rate continuously tends to decrease—but may provide superior resolution by covering full-length transcripts and complex genomic regions. The positive control reads were investigated by mapping to the expected genome sequences included. the observed transcriptomic composition deviated substantially from the expected genomic composition, which can at least partially due to differences in transcriptional activity between taxa. This highlights the challenges of directly comparing genomic and transcriptomic data. Nevertheless, the relative taxonomic abundances derived from metatranscriptomic reads remains valuable for intra-sample comparisons of extraction efficiency. Interestingly, while both protocols produced transcriptomics profiles that deviated from the theoretical genomic composition, it was closer when using the CML protocol. Moreover, both protocols were able to detect RNA viruses—key players in respiratory disease pathology—in the Positive Control, thus fulfilling another crucial criterion for the extraction method choice.

For the respiratory sample pools, taxonomic assignation was performed by using Kraken2 to compare the reads to the Plus-PFP (2023) database. Plus-PFP_2023 expands beyond the bacterial and viral references in the standard Kraken2 database to include fungal genomes, an essential feature for accurately characterizing respiratory microbiomes that often harbor taxa such as Malassezia and Candida. To minimize false-positive taxonomic assignments without compromising sensitivity, we used a Kraken2 confidence-score threshold of 0.2 (more stringent than the default 0.0 that is often used with Illumina reads). To our knowledge, no clear threshold is defined regarding the minimum number of microbial reads required for reliable taxonomic assignment, especially in metatranscriptomics, where read yield heavily depends on sample type and biomass. We observed that CML yielded significantly higher read counts (total and microbial), but despite the low number of microbial reads obtained with CL, the taxonomic profiles generated by the two RNA extraction protocols were mostly similar.

In the BAL pools, both protocols detected bacterial genera such as Rothia, Pseudomonas, Klebsiella, Stenotrophomonas and Streptococcus, which are recognized members of the respiratory microbiota [3, 12, 13]. Similarly, in NPS pools, both protocols successfully identified relevant microbial taxa, including Flavobacterium, Phyllobacterium, Empedobacter, Brevundimonas and Hyphomonas [2024]. However, subtle yet biologically relevant differences were observed between protocols. Notably, Dolosigranulum and Corynebacterium, gram-positive bacteria commonly found in the respiratory tract [25], were readily identified in NPS samples with CML, while they were less frequently detected or even entirely absent with CL, despite their importance in the respiratory microbiome. Similarly, fungal taxa such as Malassezia and Candida, both of which are frequently identified in the respiratory tract [26, 27], were more effectively recovered with the CML protocol, a crucial consideration given their relevance as respiratory microbiome members.

Conclusions

Our study demonstrated that chemical and mechanical lysis (CML) achieved adequate sequencing yields for both NPS and BAL samples without significantly compromising data quality. While CML is well suited for extracting viral RNA/DNA and genetic material from hard-to-lyse bacteria and fungi, we found that it also yielded more metatranscriptomic reads from gram-negative bacteria and viruses than the CL protocol did. This finding supports its suitability for broad-spectrum applications in respiratory microbiome research.

Supplementary Information

12864_2025_12028_MOESM1_ESM.docx (34.7KB, docx)

Supplementary Material 1: Supplementary Methods.

12864_2025_12028_MOESM2_ESM.docx (166.7KB, docx)

Supplementary Material 2: Supplementary Table 1.

12864_2025_12028_MOESM3_ESM.docx (121.8KB, docx)

Supplementary Material 3: Supplementary Table 2.

12864_2025_12028_MOESM4_ESM.docx (46.2KB, docx)

Supplementary Material 4: Supplementary Figure 1.

12864_2025_12028_MOESM5_ESM.docx (92.3KB, docx)

Supplementary Material 5: Supplementary Figure 2.

Abbreviations

BAL

Bronchoalveolar lavage

CF

Cystic fibrosis

CL

Chemicallysis

CML

Chemical and Mechanical Lysis

COPD

Chronic obstructive pulmonary disease

dsDNA

Double-stranded DNA

HIV

Human immunodeficiency virus

NPS

Nasopharyngeal swab

PC

Positive control

RSV

Respiratory syncytial virus

RV

Rhinovirus

SARS-CoV-2

Severe acute respiratory syndrome

ssDNA

Single-stranded DNA

Authors’ contributions

A.M, M.L, M.AG, C.H and JC.P designed the experiments. A.M wrote the original draft. A.M performed the experiments. A.M, M.L and N.N conducted bioinformatic analysis. A.M and C.LCC conducted statistical analysis. M.L, N.N, H.P, M.AG, C.M, C.H and JC.P reviewed and edited the paper. All authors have read and approved the final manuscript.

Funding

The PhD program of Alice Michel was funded at 50% by “Région Normandie” and at 50% by “Université de Rouen Normandie” in collaboration with the “Fédération Hospitalo-Universitaire RESPIRE”. This study was co‐supported by the European Union and Région Normandie in the context of “Contrats de plan État-Région” entitled “GEMM 2023: Génétique des Maladies et des Microbes”.

Data availability

The metatranscriptomic sequencing data have been deposited in the NCBI Sequence Read Archive (SRA) under BioProject accession number PRJNA1242689. All data are publicly available at: [https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1242689].

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Hilty M, Burke C, Pedro H, Cardenas P, Bush A, Bossley C, et al. Disordered microbial communities in asthmatic airways. Neyrolles O, éditeur. PLoS One. 2010;5(1):e8578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Charlson ES, Bittinger K, Haas AR, Fitzgerald AS, Frank I, Yadav A, et al. Topographical continuity of bacterial populations in the healthy human respiratory tract. Am J Respir Crit Care Med. 2011;184(8):957–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Li R, Li J, Zhou X. Lung microbiome: new insights into the pathogenesis of respiratory diseases. Sig Transduct Target Ther. 2024;9(1):1–27. [Google Scholar]
  • 4.Budden KF, Shukla SD, Rehman SF, Bowerman KL, Keely S, Hugenholtz P, et al. Functional effects of the microbiota in chronic respiratory disease. Lancet Respir Med. 2019;7(10):907–20. [DOI] [PubMed] [Google Scholar]
  • 5.Pindling S, Klugman M, Lan Q, Hosgood HD. Narrative review: respiratory tract microbiome and never smoking lung cancer. J Thorac Dis. 2023;15(8):4522–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Liu YX, Qin Y, Chen T, Lu M, Qian X, Guo X, et al. A practical guide to amplicon and metagenomic analysis of microbiome data. Protein Cell mai. 2021;12(5):315–30. [Google Scholar]
  • 7.Sun Y, Liu Y, Li J, Tan Y, An T, Zhuo M, et al. Characterization of lung and oral microbiomes in lung cancer patients using culturomics and 16S rRNA gene sequencing. Microbiol Spectr. 2023;11(3):e0031423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Teo SM, Tang HHF, Mok D, Judd LM, Watts SC, Pham K, et al. Airway microbiota dynamics uncover a critical window for interplay of pathogenic bacteria and allergy in childhood respiratory disease. Cell Host Microbe. 2018;24(3):341-352.e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lai S, Yan Y, Pu Y, Lin S, Qiu JG, Jiang BH, et al. Enterotypes of the human gut mycobiome. Microbiome. 2023;11:179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Barber DG, Davies CA, Hartley IP, Tennant RK. Evaluation of commercial RNA extraction kits for long-read metatranscriptomics in soil. Microb Genom. 2024;10(9):001298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Laver T, Harrison J, O’Neill PA, Moore K, Farbos A, Paszkiewicz K, et al. Assessing the performance of the Oxford Nanopore Technologies MinION. Biomol Detect Quantif. 2015;3:1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Rigauts C, Aizawa J, Taylor SL, Rogers GB, Govaerts M, Cos P, et al. Rothia mucilaginosa is an anti-inflammatory bacterium in the respiratory tract of patients with chronic lung disease. Eur Respir J. 2022;59(5):2101293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bowron LA, Acosta N, Thornton CS, Carpentero J, Waddell BJM, Bharadwaj L, et al. The airway microbiome of persons with cystic fibrosis correlates with acquisition and microbiological outcomes of incident Stenotrophomonas maltophilia infection. Front Microbiol. 2024;15. Disponible sur: https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2024.1353145/full. Cité 12 déc 2024.
  • 14.Man WH, de Steenhuijsen Piters WAA, Bogaert D. The microbiota of the respiratory tract: gatekeeper to respiratory health. Nat Rev Microbiol. 2017;15(5):259–70 mai. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Reteng P, Nguyen Thuy L, Rahman M, de Bispo Filippis AM, Hayashida K, Sugi T, et al. Circular Whole-Transcriptome Amplification (cWTA) and mNGS Screening Enhanced by a Group Testing Algorithm (mEGA) Enable High-Throughput and Comprehensive Virus Identification. mSphere. 2022;7(5):e0033222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Zinter MS. The pulmonary metatranscriptome prior to pediatric HCT identifies post-HCT lung injury. 2020. p. 11. [Google Scholar]
  • 17.MalagodaPathiranage K, Cavac E, Chen TH, Roy B, Martin CT. High-salt transcription from enzymatically gapped promoters nets higher yields and purity of transcribed RNAs. Nucleic Acids Res. 2023;51(6):e36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Trivedi CB, Keuschnig C, Larose C, Rissi DV, Mourot R, Bradley JA, et al. DNA/RNA preservation in glacial snow and ice samples. Front Microbiol. 2022;13:894893. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Saito MA, Bulygin VV, Moran DM, Taylor C, Scholin C. Examination of microbial proteome preservation techniques applicable to autonomous environmental sample collection. Front Microbiol. 2011;2:215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Sze MA, Dimitriu PA, Suzuki M, McDonough JE, Campbell JD, Brothers JF, et al. Host response to the lung microbiome in chronic obstructive pulmonary disease. Am J Respir Crit Care Med. 2015;192(4):438–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Flynn M, Lyall Z, Shepherd G, Lee ONY, Da Marianna Fonseca I, Dong Y, et al. Interactions of the bacteriome, virome, and immune system in the nose. FEMS Microbes. 2022;3:xtac020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Martinez V, Matabang MA, Miller D, Aggarwal R, LaFortune A. First case report on Empedobacter falsenii bacteremia. IDCases. 2023;33:e01814. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Ryan MP, Pembroke JT. Brevundimonas spp: Emerging global opportunistic pathogens. Virulence. 2018;9(1):480–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Johnson J, Jain KR, Patel A, Parmar N, Joshi C, Madamwar D. Chronic industrial perturbation and seasonal change induces shift in the bacterial community from gammaproteobacteria to betaproteobacteria having catabolic potential for aromatic compounds at Amlakhadi canal. World J Microbiol Biotechnol. 2023;40(2):52. [DOI] [PubMed] [Google Scholar]
  • 25.Drigot ZG, Clark SE. Insights into the role of the respiratory tract microbiome in defense against bacterial pneumonia. Curr Opin Microbiol. 2024;77:102428. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Esteban V, Gilabert P, Ferrer C, Gálvez B, Chiner E, Colom MF. Affinity of Malassezia and other yeasts for pulmonary lipids. Mycopathologia. 2025;190(1):1. [Google Scholar]
  • 27.Krause R, Halwachs B, Thallinger GG, Klymiuk I, Gorkiewicz G, Hoenigl M, et al. Characterisation of Candida within the mycobiome/microbiome of the lower respiratory tract of ICU patients. PLoS One. 2016;11(5):e0155033. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

12864_2025_12028_MOESM1_ESM.docx (34.7KB, docx)

Supplementary Material 1: Supplementary Methods.

12864_2025_12028_MOESM2_ESM.docx (166.7KB, docx)

Supplementary Material 2: Supplementary Table 1.

12864_2025_12028_MOESM3_ESM.docx (121.8KB, docx)

Supplementary Material 3: Supplementary Table 2.

12864_2025_12028_MOESM4_ESM.docx (46.2KB, docx)

Supplementary Material 4: Supplementary Figure 1.

12864_2025_12028_MOESM5_ESM.docx (92.3KB, docx)

Supplementary Material 5: Supplementary Figure 2.

Data Availability Statement

The metatranscriptomic sequencing data have been deposited in the NCBI Sequence Read Archive (SRA) under BioProject accession number PRJNA1242689. All data are publicly available at: [https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1242689].


Articles from BMC Genomics are provided here courtesy of BMC

RESOURCES