ABSTRACT
The advent of high-throughput sequencing techniques has recently provided an astonishing insight into the composition and function of the human microbiome. Next-generation sequencing (NGS) has become the gold standard for advanced microbiome analysis; however, 3rd generation real-time sequencing, such as Oxford Nanopore Technologies (ONT), enables rapid sequencing from several kilobases to >2 Mb with high resolution. Despite the wide availability and the enormous potential for clinical and translational applications, ONT is poorly standardized in terms of sampling and storage conditions, DNA extraction, library creation, and bioinformatic classification. Here, we present a comprehensive analysis pipeline with sampling, storage, DNA extraction, library preparation, and bioinformatic evaluation for complex microbiomes sequenced with ONT. Our findings from buccal and rectal swabs and DNA extraction experiments indicate that methods that were approved for NGS microbiome analysis cannot be simply adapted to ONT. We recommend using swabs and DNA extractions protocols with extended washing steps. Both 16S rRNA and metagenomic sequencing achieved reliable and reproducible results. Our benchmarking experiments reveal thresholds for analysis parameters that achieved excellent precision, recall, and area under the precision recall values and is superior to existing classifiers (Kraken2, Kaiju, and MetaMaps). Hence, our workflow provides an experimental and bioinformatic pipeline to perform a highly accurate analysis of complex microbial structures from buccal and rectal swabs.
IMPORTANCE Advanced microbiome analysis relies on sequencing of short DNA fragments from microorganisms like bacteria, fungi, and viruses. More recently, long fragment DNA sequencing of 3rd generation sequencing has gained increasing importance and can be rapidly conducted within a few hours due to its potential real-time sequencing. However, the analysis and correct identification of the microbiome relies on a multitude of factors, such as the method of sampling, DNA extraction, sequencing, and bioinformatic analysis. Scientists have used different protocols in the past that do not allow us to compare results across different studies and research fields. Here, we provide a comprehensive workflow from DNA extraction, sequencing, and bioinformatic workflow that allows rapid and accurate analysis of human buccal and rectal swabs with reproducible protocols. This workflow can be readily applied by many scientists from various research fields that aim to use long-fragment microbiome sequencing.
KEYWORDS: 16S rRNA, bioinformatic workflow, buccal swab, DNA extraction, eNAT, eSwab, Kaiju, Kraken2, metagenomics, MetaMaps, Metapont, microbiome, Oxford Nanopore Technologies, ONT, rectal swab, sampling, sequencing, storage
INTRODUCTION
The advent of high-throughput sequencing techniques has recently provided an astonishing insight into the composition and function of the human microbiome. Disruption of the commensal microbiome, commonly referred to as dysbiosis, is linked to several diseases, such as obesity, diabetes, chronic inflammatory disorders, and cancer (1–6). Next-generation sequencing (NGS) contributed vastly to the accumulating evidence in microbiome research of the last decade and has transformed the microbial landscape by identifying an enormous quantity of unculturable microbes (7, 8). Due to its wide availability and decreasing sequencing costs, NGS has become the gold standard for advanced microbiome analysis (9). However, there are inherent limitations of this method. The length of reads does not exceed 300 bp in the case of Illumina MiSeq. Although long-fragment sequencing is technically possible, it is not widely used. Furthermore, it is limited to 10 kb, and the capability for microbiome analysis could not yet be shown (10). Thus, the resolution of marker gene sequencing for bacteria, 16S rRNA, only provides reliable information up to the genus level, whereas targeting of particular hypervariable regions (V1 to V9) influences the microbial composition (9, 11, 12). To gain insight into species and strain compositions by NGS, metagenomic sequencing is required. However, this approach is more expensive and difficult to conduct with low-biomass environments and/or samples with high host contamination, e.g., tissue samples (9). In these cases, microbiome profiling must rely on marker gene sequencing. The limitation of insufficient species resolution by 16S rRNA sequencing can be overcome by analyzing the whole 16S rRNA gene (13, 14). The 3rd generation of sequencing subsumes methods with two distinct benefits: long reads and the possibility of real-time sequencing. Oxford Nanopore Technologies (ONT) enable the sequencing from several kilobases to >2 Mb (15, 16). Thus, it becomes possible to assemble whole genomes of bacteria and eukaryotes or even complex repetitive parts of the human genome using a few or even one contig (15–18). Several groups showed the feasibility of ONT sequencing for microbiome analysis with a diverse set of samples, i.e., mock communities (14, 19–21), low-biomass microbial compositions such as dog skin or dust (14, 22, 23), antimicrobial resistances (24), and infective pathogens from clinical samples (25, 26).
To the best of our knowledge, few studies have focused on analyzing a complex microbiome structure with ONT (27). Hence, the impact of sampling and storage conditions, DNA extraction, bioinformatic classification, and library creation on the analysis of more advanced biological specimens have not been investigated in depth. In contrast to most previous clinical studies that used stool samples, we analyzed buccal and rectal swabs. Notably, stool samples have certain limitations for clinical studies. First, longitudinal studies are increasingly being performed (9), and stool may not be immediately available in the desired time window, e.g., in the outpatient department. Second, there are cohorts of patients from which a standardized acquisition of stool is not easily feasible, e.g., infants or incontinent geriatric patients. Furthermore, the different niches of gut microbes must be taken into account (28, 29). Previous studies highlighted a taxonomic difference between biopsy specimen, rectal swabs, and stool samples. To this end, 16S rRNA and metagenomic sequencing studies revealed that swabs harbor a microbiome comparable to that of stool samples but also contain bacteria that can be found more frequently in biopsy specimens than in stool samples (28, 30–32). Therefore, rectal swabs are considered to give insight into an intestinal niche that is located between the lumen (stool) and the gut mucosa (33). Importantly, bacteria adjacent to the mucosa are crucial for the interaction with the gut immune system (34). However, the required bowel preparation prior to colonoscopy and the invasiveness of the procedure are limitations. Hence, rectal swabs have become a convenient alternative, providing more information about mucosal adherent microbes than stool samples (33). Despite intraindividual differences regarding the collection method, previous studies give evidence of a preserved microbial signature enabling the differentiation between individuals (31, 35, 36). There is growing evidence that oral microbiota can be used as biomarkers for several diseases (1, 37, 38). Although different microbial niches of the oral cavity were characterized recently (39, 40), there is still a significant lack of knowledge about the optimal sampling conditions. In past studies, patients were requested to refrain from food, drinks, and oral hygiene between 30 min and several hours (41–43). To our knowledge, no conclusive data exist clarifying which sampling point is ideal and what transient impact food and drinking have on the oral microbiome.
In particular, the bioinformatics remain challenging for ONT due to the combination of the high base call error rate of ∼10% and long reads (44). Previous studies provided bioinformatics workflows (20, 25). However, these wrapper programs were designed for the identification of single infectious microorganisms. Here, we present a customizable workflow algorithm that can run on different systems due to its convenient docker format (Github; https://github.com/microbiome-gastro-UMG/MeTaPONT/). This program was able to reduce the false-positive rate more sufficiently than classifiers designed for NGS.
To address these caveats, we conducted comprehensive experiments establishing a reliable wet-bench and bioinformatic workflow for both 16S rRNA and metagenomic sequencing, including detailed protocols that are easily adaptable for other researchers.
RESULTS
Swab reliability and storage conditions.
Swabs are commonly used for analyzing different sites of human microbiota, and their practicability is widely proven (35, 39, 40). We hypothesized that the choice of the swab and its medium will have an impact on microbiome analysis.
eNAT and eSwab are known to preserve microbes reliably, and their feasibility for microbiome analysis was recently shown (45–47). For reasons of comparability, one stool batch from a volunteer was homogenized. The swabs were dipped in stool, and a small portion was directly transferred to lysis buffer (from the MagMAX microbiome ultra nucleic acid isolation kit). The samples were immediately extracted (day 0 [d0]) or stored for 3 or 7 days under different temperature conditions (Fig. 1a). Beta diversity showed a clustering of most storage conditions (Fig. 1b). The microbial composition at the species level revealed a wide overlap between the majority of samples (Fig. 1c). Two outliers significantly explained ∼80% of the observed differences. These two samples were eSwabs stored at room temperature (RT) for 3 and 7 days. Facultative anaerobic genera, like Escherichia coli, Citrobacter freundii, Enterobacter cloacae, and Klebsiella aerogenes, were especially increased. Thus, we conclude that eSwabs must be frozen immediately after sampling. eNATs seem to preserve the microbial composition even if samples were stored at room temperature for 7 days. Despite the same procedure, eNAT swabs showed reduced DNA content after extraction and failed 16S PCR in some instances. A systematic comparison of DNA concentration revealed a significantly higher concentration in both specimens collected by eSwabs. The measured purity was also in parts significantly higher in eSwabs compared to eNATs (Fig. 1d). Therefore, eSwabs with immediate freezing were used for subsequent experiments.
The quantity of stool has an impact on microbial composition.
Different microbes colonize the gut at different sites (28). Following this paradigm, it could be expected that rectal swabs with higher quantities of stool may harbor a different microbial composition. To this end, we defined three grades of stool contamination before allocating 12 rectal swabs from 2 independent people (grade 0, 3; grade +, 4; grade ++, 5) (Fig. 2a). Regarding the amount of microbial DNA compared to human DNA, there was a significant increase of microbial genes in the samples with a high quantity of stool (Fig. 2b). Despite low microbial DNA content in clean rectal swabs, metagenomic sequencing yielded a sufficient sequencing depth, with more than 7,200 microbial reads/sample, allowing profound microbial analysis. Interestingly, the microbial composition at the species level revealed, among others, a higher portion of Corynebacterium species, Corynebacterium jekeium, and Akkermansia muciniphila in swabs defined as grade 0 (Fig. 2c). Accordingly, both genera (Corynebacterium and Akkermansia) were more frequently found adherent to the mucosa than in stool samples (32, 48). Besides these minor differences, all swabs contained highly similar microbiota. All three mentioned species were also identified in more feculent swab samples with lower abundance (<2%) and were categorized in the green bar (other) (see Table S1 in the supplemental material). To further investigate whether highly feculent swabs are more similar to stool than to grade 0 swabs, 3 stool samples from consecutive days and 2 swabs within 7 h after bowel movement were collected from one individual. Indeed, the quantity of stool contamination has only a minor impact on microbial composition if swabs of different contaminations were compared to stool samples (Fig. S1).
Impact of eating and drinking on the buccal microbiome.
Due to thermal and mechanical stimuli, it can be assumed that eating and drinking provoke a temporal alteration of the oral microbiome. Therefore, we aimed to investigate the impact of eating and drinking on the buccal microbiome (Fig. 3a). Unweighted UniFrac distance discovered a significant modification (R2 = 0.31) of the buccal microbiome 5 and 30 min after eating (persons 1, 2, and 4) (Fig. 3b). Drinking water did not significantly affect the buccal bacterial composition. Importantly, despite the transient shift, the intraindividual microbial signature seemed to persist due to consistent personal clusters.
Impact of DNA extraction protocol on DNA concentration, sequenced species, and read count for 16S rRNA and metagenomic sequencing.
There are several reports about the impact of DNA extraction methods on microbiome analysis (11, 46, 49–51). In order to standardize the microbiome analysis, a protocol by International Human Microbiome Standards (IHMS) is recommended for fecal samples (52). Therefore, we tested four different DNA extraction kits, of which three had shown reliable results for microbiome analysis from swabs (46) (Text S1). Two protocols achieved a significantly higher DNA concentration (measured by Qubit) in both buccal and rectal swabs compared to the other isolation methods. Invitrogen (IHMS) and Qiagen investigator kits (original) reproducibly yielded concentrations above the required threshold (>10 ng/μl, eluted in 50 μl) recommended for ONT metagenomic sequencing (dashed line) (Fig. 4a).
Despite some minor variability in A260/A280 ratio (Fig. S2a), most samples exceeded the required 1.8 value. However, remarkable divergences were observed regarding the NanoDrop A260/A230 ratio. Only the samples extracted by the Invitrogen (IHMS) protocol had values around the recommended A260/A230 ratio of 2 (Fig. S2b). To examine the reliability of extracted DNA methods on 16S rRNA and metagenomic sequencing approaches, at least four samples per protocol were sequenced. Using the rarefaction curve, a minimum required sequencing depth can be defined, where most samples reached a saturation. For 16S rRNA sequencing, a throughput of 250,000 sequences per sample was determined (Fig. 4b). The average alpha diversity did not show significant variations between the DNA isolation methods (Fig. 4c). However, most samples extracted by the Qiagen microbiome kit remained under this threshold of 250,000, and a tendency toward a lower alpha diversity was seen. Using unweighted UniFrac distance, which is recommended for rarefied samples (53), no DNA isolation protocol clustered significantly in both sample sites (Fig. S3).
Similar results were obtained by analyzing buccal and rectal swabs by metagenomics. Here, all samples isolated with the Qiagen microbiome failed to reach the threshold of 250,000 reads, and no differences regarding the alpha diversity were observed (Fig. 4d and e). Interestingly, there was a tendency where medians of samples isolated by IHMS protocols were higher regarding alpha diversity in metagenomic sequencing than their original counterparts. This might be explained by different read counts (sequencing depth). To prove whether the DNA extraction protocol has an impact on sequencing depth, five metagenomics experiments (10 to 12 samples per flow cell) were combined. Here, all samples were pooled equimolarly before flow cell loading. To avoid biases introduced by different flow cell pore counts, Qiagen IHMS was arbitrarily used as a reference. All read counts of the other methods were normalized to this protocol. Despite highly similar DNA input, a variety of sequencing depths per protocol was observed. Applied Biosystems and Qiagen investigator IHMS protocols tended to yield more sequences than the original protocols, whereas both Invitrogen protocols tended to yield the most profound throughput (Fig. 4f). Regarding the base call qscore, the protocols did not differ significantly and ranged between 11 and 12 (Fig. S4a and b). DNA extraction methods also had an impact on the average sequence length. Interestingly, Applied Biosystem and Qiagen investigator kits with the original protocols produced the longest fragments (Fig. S4c).
To summarize, we observed no biologically relevant differences regarding alpha and beta diversity and qscore from the base call. However, we found significant differences between extraction protocols regarding DNA concentration measured by Qubit and purity measured by NanoDrop, which has important implications for subsequent library preparation and sequencing throughput. Consequently, we used the PureLink microbiome DNA purification kit (Invitrogen), modified according to IHMS protocols, for further experiments.
Classification of long noisy reads with Centrifuge.
The bioinformatic workflow applied for these experiments used three main programs (Fig. 5a): (i) Guppy for high-accuracy base calling; (ii) Centrifuge for classification; and (iii) Minimap2 as an alignment control (54, 55). Guppy was applied for base calling as it is currently the best-performing and the only officially supported base caller for ONT sequencing, outperforming traditional programs like Albacore, Flappie, and Scrappie (44). Centrifuge is used by the ONT-related platform Epi2me, which, among others, provides real-time 16S rRNA sequencing (14). This classifier has sensitive and fast annotation that do not require high computational capacity. Using 12 rectal swabs from the stool contamination experiment, a Venn diagram compared the read to taxonomic identification number (taxID) annotations between centrifuge alone (taking only the taxID with the highest quality score, which was the first in the row) and Centrifuge plus Minimap2 (Fig. 5b). More than one-quarter of sequence to taxID classification mismatched.
Low-quality sequences were excluded by defining cutoffs for (i) Centrifuge quality score and (ii) maximum number of annotations to a single read. Sanderson et al. postulated a threshold for a quality score of 150 (25). If this threshold was increased, a lot more sequences would be removed. With focus on the slope, defined by difference quotients (g), the increased threshold of the score did not influence the Minimap2 controlled sequences to such an extent as it did for the Centrifuge only classified reads (Fig. 5c). In other words, Minimap2 removed a high proportion of the reads, which will be excluded by a higher Centrifuge score. Therefore, the relatively low threshold of 150 proposed by Sanderson et al. was maintained.
For metagenomic sequencing, it is expected that highly conserved sequences or reads with redundant base sequences [like poly(A) fragments] will be obtained, resulting in a multitude of potential matches (for some sequences, more than 1,000 matches of different taxIDs per read were observed). To prove that these reads harbor low information and can be omitted to save computational time and to decrease the false-positive rate, we calculated a hit length/query length ratio. For most sequences, Centrifuge labeled fewer than 50 taxIDs, and the classification was based on an annotation length (hit length) of 40% of the total sequence length (query length) (Fig. 5d). The majority of sequences with more than 50 different taxID matches ranged below the first quartile of this ratio (Fig. 5e). Similar findings were seen regarding the Centrifuge quality score, which was mostly below the first quartile of all reads in the group with more than 50 matches (Fig. 5f). To investigate whether the underlying library is sufficient for microbiome studies, a mock community with 14 common gut bacteria, one archaeon, and 2 fungi was analyzed using metagenomic sequencing. We evaluated four different indices/libraries: 2 preformed indices supplied by Centrifuge (p + h + v and p) as well as one library containing all NCBI Refseq complete genomes of bacteria, fungi, archaea, virus, and human and one including all incomplete and complete genomes in the NCBI nt database. These indices were analyzed by using three parameters: precision, recall, and area under precision recall curve (AUPR) (details in Text S1). These three parameters are commonly applied across benchmarking studies (56, 57). Preformatted indices (pvh and p) contained only 11 and 13 species, respectively (Fig. 5g). Surprisingly, the comprehensive library including all NCBI Refseq complete genomes still failed to identify 3 species from the mock community: Prevotella corporis, Veillonella rogosa, and Candida albicans (Table S2). A recall of 100% and high AUPR was only achieved by the index that includes all species from the NCBI BLAST nt database (Fig. 5g). The lowest precision was calculated for the preformatted index pvh.
Evaluation of different Minimap2 parameters.
Centrifuge showed a convincing sensitivity, whereas the rate of false positives remained unacceptably high. Therefore, an alignment control by Minimap2 according to Sanderson et al. was established (25). To exclude low-quality reads and, thus, possible false-positive species, the alignment score (AS) and coverage (Cov) were further adjusted. To investigate which combination of these parameters kept all true-positive species and removed a maximum of false positives, a benchmarking was conducted by calculating precision, recall, and AUPR for each Cov and AS combination (Fig. 6a). An AS of 1,500 with a Cov of 50 obtained the highest precision and AUPR of all combinations with a recall of 100%. If the AS is increased to 2,000, the AUPR and precision values were still higher but also true-positive species were excluded.
To examine the impact of different Minimap2 coverages and alignment scores on the number of sequences and alpha diversity, 12 rectal swabs (metagenomic sequencing) and 8 (16S rRNA sequencing) samples were analyzed. As expected, more reads deriving from metagenomic sequencing were removed with an increased score and coverage (Fig. 6b). However, a higher AS influenced 16S rRNA sequences more than increased coverage. The same dynamics were observed for the alpha diversity (Fig. 6c). The precisions and AUPRs of both Cov 10 and 50 are comparable. To further investigate the impact of the coverage on microbial composition, 12 rectal swabs were benchmarked. The application of a higher Cov seems reasonable to filter out high-abundance, low-quality sequences (Fig. 6d). These reads were incorrectly annotated to environmental bacteria, i.e., Mycobacterium branderi and species C057 (Nostoc) or Escherichia coli, which are not usually observed with this high abundance or at all (black arrows in Fig. 6d).
The average read length was also affected by the coverage. Interestingly, a Cov of 80 significantly favored longer reads, whereas the average sequence length did not alter much between other AS and Cov thresholds (Fig. S5). To conclude, these benchmarking experiments and the established bioinformatic workflow that was tested on a mock community delivered highly accurate and reliable results. A diagram summarizes the optimized workflow (Fig. 7).
Comparison of classifiers with simulated data.
There exists an increasing number of classifiers with different underlying algorithms that allow profound metagenomic analysis of complex microbial structures (56, 57). However, most of them were designed for NGS sequences, whereas some programs, like MetaMaps, were particularly developed for long reads with high error rates (58). Our presented workflow, Metapont, was compared to existing programs by using four simulated data sets. Deep learning allows us to simulate ONT sequencing with long reads and a realistic error rate (59). These simulated sequencing sets differ in their compositional complexity, containing 10 to 77 taxa (bacteria, fungi, and archaea) (Table S3). To this end, the 10-taxon-set contained 8 pathogenic bacteria and 2 fungi. The gut-set was similar to the mock community used for evaluation of libraries and Minimap2 parameters. The tumor-set simulated a human tumor (diverse solid tumors) with 99.9% human DNA and 8 different bacteria. This composition was based on the Nejman et al. intratumoral microbiome study (60). The last set with 77 taxa was adapted from the Dilthey et al. MetaMaps study (58). Metapont was compared with MetaMaps, Kraken2, and Kaiju (58, 61, 62). The last classifier is based on protein sequences, whereas the other ones are based on DNA sequences. Subsequently, the precision, recall, and AUPR were calculated. To avoid biases introduced by different libraries, comprehensive indices were built for Kaiju, Kraken2, and Metapont to recognize all taxa (recall of 100%). Unfortunately, currently there is no comprehensive library available for MetaMaps that includes all complete genomes of bacteria, fungi, and archaea (63). Therefore, this program yielded lower recalls except for the simulated data set, which was adapted from the programmers’ study (58) (Fig. 8).
Notably, Metapont outperformed Kaiju and Kraken2 in all simulated sets in regard to precision, especially in the more complex microbiome sets (tumor and metamaps), whereas similar results for AUPR were yielded for these three classifiers. Given the link between the false-positive rate and the precision, these results demonstrate that the sequence of Centrifuge plus Minimap2 with adjusted AS and Cov can remove false-positive taxa most sufficiently in ONT sequencing experiments.
DISCUSSION
Here, we present a benchmarking study for establishing a comprehensive workflow from sampling over DNA extraction and sequencing to bioinformatic analysis of microbiome data for ONT sequencing of complex microbiota. Although the validity of rectal swabs was shown several times before (33, 35, 36, 64, 65), there are several pitfalls for swab sampling. We could demonstrate that stool contamination has an impact on microbial DNA proportion and microbial composition. Therefore, to obtain reliable results, collecting rectal swabs with only minor stool contamination is recommended. Another drawback of using swabs instead of stool samples is a potential lack of biomass (31). However, our experiments proved that the amount of DNA isolated from swabs was reproducibly sufficient for complex microbiome analysis on the species level sequenced with 16S rRNA and metagenomic approaches.
Regarding the buccal swabs, our data revealed a transient alteration of the microbiome after food intake; nevertheless, an intraindividual microbial signature was preserved. Accordingly, the most reliable time point is the morning swab before tooth brushing and breakfast. If this time point is not feasible, a couple of hours need to pass between the last food intake and microbial sampling, whereas a small amount of drinking water will not disturb the microbial composition.
It is common knowledge that the method of DNA extraction strikingly influences microbiome analysis (50, 51). In general, IHMS modified protocols obtained more sequences than their original counterparts. A reason for the different sequence depths could be the portion of guanidine and other contaminants. The first one is known to interfere with the ONT library preparation and results in a low A260/A230 ratio by NanoDrop (66). These findings lead to the conclusion that the IHMS protocol yielded higher throughput due to additional washing steps (see the supplemental material). The Invitrogen protocols (original and IHMS modified) have strikingly higher A260/A230 ratio values, which likely result in more sufficient sequencing depth due to the lower contamination (guanidine, residual salt, etc.) content of this kit.
There is growing evidence that ONT sequencing can perform accurate microbiome analysis (14, 19, 25, 67). In line with previous studies, we show that Centrifuge achieved a high sensitivity (19, 25). In contrast to Sanderson et al. (25) and Leidenfrost et al. (19), a more complicated mock community was chosen containing taxa that are not frequently implemented in preformatted libraries. To the best of our knowledge, all previous ONT microbiome studies classified with Centrifuge only used preformatted indices/libraries (phv or p). Interestingly, a database that was built according to official Centrifuge instructions, and that contained a big set of complete genomes from NCBI Refseq, failed to identify 3 species. One of them was Candida albicans, a well-known and important pathogen. Unfortunately, only 18 fungi were downloaded and included in this index. To this end, it is not advisable to use this library for complex microbiome analysis that aims to focus on fungi. Therefore, the most comprehensive database that detects all mock community species was used for all experiments if not stated otherwise. The drawback of a library that includes all known species is the high memory capacity and more time-consuming analyses.
Besides excellent AUPR and recall, the precision still remained low. This is explained by the structure of the analyzed mock community, which included extremely low-abundance species. Four species were below 0.1%, whereby the lowest-abundance species (Clostridium perfringens) accounted for only 1 × 10−4%. Therefore, only all taxa below 1 × 10−5% were filtered out by calculating the overall precision and recall. With this low prevalence filter, plenty of incorrectly annotated species were included in the analysis. If only the most abundant taxa were compared to the theoretical mock community composition, one species would be falsely classified, whereas the correct genus (Veillonella) would be annotated. We longitudinally investigated the reliability of 16S rRNA and metagenomic ONT sequencing for rectal and buccal swabs from the same individual. This longitudinal observation revealed a high consistency regarding microbial composition for both sequencing methods and biological specimens.
Furthermore, we simulated different sets of microbial ONT sequences and compared our workflow with three classifiers. MetaMaps was exclusively developed for long and noisy reads generated by ONT or PacBio (58). Currently, there is only one library available for this program containing >10,000 genomes. To this end, it performed best for the simulated data (metamaps-set), which was adapted from the community evaluated in the original study (58). However, it achieved insufficient AUPR and recall values, even in simulated sets with low complexity (10-taxa). This finding emphasized the need for a classification tool that analyzes complex microbiome structures sequenced by ONT with a comprehensive library. Metapont outperformed Kaiju and Kraken 2 regarding the precision in all simulated data sets. In other words, Metapont classified fewer false-positive reads than the other two conventional classifiers designed for NGS sequencing due to the adjusted Minimap2 coverage and the alignment score parameter.
Conclusions.
Here, we present a comprehensive analysis pipeline with sampling, storage, DNA extraction, library preparation, and bioinformatic evaluation for complex microbiomes sequenced with ONT. Our findings from the swabs and DNA extraction experiments indicate that methods that were approved for NGS microbiome analysis cannot be simply adapted to ONT. We recommend using swabs and DNA extraction protocols with appropriate medium or extended washing steps. Both 16S rRNA and metagenomic sequencing achieved reliable and reproducible results. Still, the relatively high error rate of ONT sequences remains a bioinformatic challenge. Our benchmarking experiments reveal thresholds for analysis parameters that achieved excellent precision, recall, and AUPR values and is superior to existing classifiers. Hence, with the published bioinformatic pipeline, it is possible to achieve a highly accurate analysis of complex microbial structures. This workflow can be easily downloaded as a docker and individually customized by other scientists.
MATERIALS AND METHODS
Swabs.
For the experiments designed to define the swab with the best reliability, Copan liquid Amies elution swabs (eSwab) containing 1 ml medium and eNAT (Copan) with 2 ml medium were evaluated. A stool sample from a nonvegetarian volunteer was homogenized with phosphate-buffered saline (PBS). Both swabs were dipped in the stool. Additionally, a small stool portion was directly mixed with lysis buffer. Subsequently, a set of three different samples were either directly extracted (d0), stored for 3 days (d3) at room temperature (RT) or at −20°C, or stored for 7 days (d7) at RT, −20°C or −80°C.
Buccal and rectal swabs for subsequent experiments were collected from a nonvegetarian volunteer using eSwabs. Rectal swabs were collected by one physician and were not self-administered. For this purpose, swabs were inserted 5 to 6 cm into the rectum and rotated 5 to 6 times. If not stated otherwise, rectal swabs had a quantity of stool ranging from 0 to + (Fig. 2a). For buccal swabs, both sides of the oral cavity were wiped thoroughly for at least 10 s. All samples were stored within 3 h at −80°C.
To analyze the impact of eating and drinking on the buccal microbiome, four nonvegetarian volunteers collected buccal swabs (eSwab) for 2 days according to the protocol (Fig. 3a). Text S1 in the supplemental material contains a detailed description. As a positive control, a gut mock community was purchased from ZymoBIOMICS gut microbiome standard (Zymo Research), extracted, and sequenced alongside swab samples.
All samples were derived from contributing scientists who are also listed as authors. The study was reviewed and approved by the Ethics Committee of the University Center Goettingen (no. 11/7/19).
DNA extraction and purification.
Four DNA extraction kits were evaluated. To this end, two different protocols for three isolation kits were applied. First, the manufacturer’s protocols, referred to as original, were used for MagMAX microbiome ultra nucleic acid isolation kit (Applied Biosystems), PureLink microbiome DNA purification kit (Invitrogen), QIAmp DNA investigator kit (Qiagen), and QIAmp DNA microbiome kit (Qiagen). Second, the protocols were modified according to the International Human Microbiome Standard (IHMS) (52). For the QIAmp DNA microbiome kit (Qiagen), a modification was not possible. If not stated otherwise, a PureLink microbiome DNA purification kit (Invitrogen) was used for DNA isolation. The detailed protocols are listed in Text S1. All samples were eluted with 50 μl elution buffer.
The purity and concentration of extracted DNA was measured using a Nanophotometer P330 (INTAS Göttingen) and Qubit3 (dsDNA HS assay kit; Thermo Fisher Scientific, Waltham, MA). Samples extracted with Qiagen kits or an Applied Biosystem kit were purified by a OneStep PCR inhibitor removal kit (Zymo Research) prior to sequencing. DNA isolated with Invitrogen only underwent a cleaning step if NanoDrop ratios (A260/A230 and A260/A280) were below 2 (A260/A230 ) or below 1.8 (A260/A280), respectively.
Library preparation, sequencing, and base calling.
Extracted DNA samples were prepared for 16S rRNA gene sequencing using either SQK-16S024 (ONT) or SQK-RAB204 (ONT). For metagenomic sequencing, the ligation sequencing kit (SQK-LSK109) was applied in combination with Native Barcoding Expansion 1–12 (EXP-NBD104). Either 500 ng or the maximum amount of extracted DNA per sample was prepared for metagenomic sequencing. All samples were sequenced with MinION (ONT) or GridION (ONT) using R9.4 flow cells. The sequencing was controlled with the MinKNOW v. 20.06.4. The sequence duration ranged from 48 to 72 h depending on the throughput. Fast5 files were base called and demultiplexed using Guppy (ONT) version 4.0.15 running with GPU driver cuda with default parameters. All fastq files and corresponding metadata were uploaded in Qiita (study number 13720) (68) and https://github.com/microbiome-gastro-UMG/MeTaPONT/.
Classification.
The bioinformatic workflow is summarized in Fig. 5a. Centrifuge (version 1.0.4-beta) was used for classification (55) and Minimap2 (version 2.17) for alignment control (54). A detailed description is provided in Text S1. A threshold filtering and the benchmarking were realized with a python script (version 3.7.7, pandas 1.1.3).
All listed tools were combined in a wrapper program. The classification analysis pipeline can be downloaded as a docker. Detailed instruction and download information can be found at Github at https://github.com/microbiome-gastro-UMG/MeTaPONT/.
Four different libraries/indices were tested with a gut mock community. Two were the preformatted indices, which can be downloaded from the official Centrifuge homepage (bacteria, archaea, viruses, human [compressed], 6 December 2016 [phv], and bacteria, archaea [compressed], 15 April 2018 [p]). The third was built on 10 September 2020 according to the official instructions of the Centrifuge manual containing all complete NCBI bacterial, fungi, archaea, human, and mouse RefSeq genomes (NCBI). Low-complexity regions were masked by NCBI-tool dustmasker (version 0.1.03). The fourth library comprises all genomes from the NCBI BLAST’s nt database and was built on 15 October 2020.
Simulated data.
Four sets of simulated data were built using DeepSimulator 1.5 (59). Fasta files were downloaded from NCBI, and steps for fast5 creation were conducted with standard parameters as instructed previously (69). A bash script for downloading, simulating, and allocation was written and is publicly available (https://github.com/microbiome-gastro-UMG/MeTaPONT/). Simulated data were classified with 4 programs: Kaiju, Kraken2, Metamaps, and Metapont. More details about the programs and libraries are provided in the supplemental material.
The 10-taxa-set contains 8 bacteria and 2 fungi (abundance 10% per taxa). The gut-set included a similar microbial composition like mock gut community (with highly divergent abundances 14% to 0.0001%), 14 bacteria, 2 fungi, and 1 archaeon. Here, Veillonella dispar and Prevotella intermedia replaced the species from the same genus, which were found in the mock community. The tumor-set simulates a human tumor microbiome. Eight bacteria constitute only 0.1% of the DNA, whereas the majority belongs to the host (Homo sapiens). Bacteria composition is similar to that of intratumoral microbiome (60). MetaMaps-set contained most species (77 bacteria) from the simulated data, which were analyzed by Dilthey et al. (58). Table S3 contains all simulated species names and abundances.
Downstream analysis.
All downstream analysis was conducted by R (version 3.6.3). Operational taxonomic unit (OTU) tables were preprocessed by prevalence filtering and rarefaction. For filtering, PERFect was applied with the “simultaneous” approach (70). When rarefaction was performed, the figure legends explain the normalized OTU count per sample. Alpha- and beta-diversity, taxonomic composition, and rarefaction curve was calculated from a phyloseq object (packages: phyloseq 1.30.0, ape 5.4-1, tidyverse 1.3.0, ggplot2 3.3.2, vegan 2.5.6, pairwiseAdonis 0.0.1, data.table 1.13.0, gridExtra 2.3, ggpubr 0.4.0, car 3.0–10, reshape2 1.4.4, and rstatix 0.6.0). To generate the required taxon table, the program taxonkit (71) was applied and all taxID from NCBI were extracted in May 2020. ReadIDs, length, quality score, and barcode of each sequence were taken from sequencing_summary.txt, which was created during the Guppy base call.
Statistics.
Before applying statistic calculations for P values, normal distribution and homogeneity of variance was tested, performing Shapiro-Wilk test, plotting QQ-plot, and Levene test, respectively. If both conditions were given, t test for two groups or analysis of variance (ANOVA) and post hoc Tukey test for data sets with more than two groups were performed. If no normality was tested, Wilcoxon rank test for two groups and Kruskal-Wallis with additional pairwise Wilcoxon test for more than two groups were performed. To determine the explained distance of beta diversity by certain factors, adonis and pairwise adonis tests were applied after checking normality and homogeneity of variance using betadisper and permutest. Beta diversities are ordinated with principal coordinate analysis (PCoA). For each plot, significances were visualized: *, P < 0.05; **, P < 0.01; ***, P < 0.001.
Data availability.
The classification analysis pipeline can be downloaded as a docker. Detailed instructions and download information can be found at Github (https://github.com/microbiome-gastro-UMG/MeTaPONT).
Supplementary Material
ACKNOWLEDGMENT
We have no conflict of interest to declare.
Contributor Information
Albrecht Neesse, Email: albrecht.neesse@med.uni-goettingen.de.
Nicola Segata, University of Trento.
REFERENCES
- 1.Fan X, Alekseyenko AV, Wu J, Peters BA, Jacobs EJ, Gapstur SM, Purdue MP, Abnet CC, Stolzenberg-Solomon R, Miller G, Ravel J, Hayes RB, Ahn J. 2018. Human oral microbiome and prospective risk for pancreatic cancer: a population-based nested case-control study. Gut 67:120–127. doi: 10.1136/gutjnl-2016-312580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Duvallet C, Gibbons SM, Gurry T, Irizarry RA, Alm EJ. 2017. Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nat Commun 8:1784. doi: 10.1038/s41467-017-01973-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI. 2006. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444:1027–1031. doi: 10.1038/nature05414. [DOI] [PubMed] [Google Scholar]
- 4.Wu H, Esteve E, Tremaroli V, Khan MT, Caesar R, Mannerås-Holm L, Ståhlman M, Olsson LM, Serino M, Planas-Fèlix M, Xifra G, Mercader JM, Torrents D, Burcelin R, Ricart W, Perkins R, Fernàndez-Real JM, Bäckhed F. 2017. Metformin alters the gut microbiome of individuals with treatment-naive type 2 diabetes, contributing to the therapeutic effects of the drug. Nat Med 23:850–858. doi: 10.1038/nm.4345. [DOI] [PubMed] [Google Scholar]
- 5.Khan I, Ullah N, Zha L, Bai Y, Khan A, Zhao T, Che T, Zhang C. 2019. Alteration of gut microbiota in inflammatory bowel disease (IBD): cause or consequence? IBD treatment targeting the gut microbiome. Pathogens 8:126. [PMC]. doi: 10.3390/pathogens8030126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Qin J, Li Y, Cai Z, Li S, Zhu J, Zhang F, Liang S, Zhang W, Guan Y, Shen D, Peng Y, Zhang D, Jie Z, Wu W, Qin Y, Xue W, Li J, Han L, Lu D, Wu P, Dai Y, Sun X, Li Z, Tang A, Zhong S, Li X, Chen W, Xu R, Wang M, Feng Q, Gong M, Yu J, Zhang Y, Zhang M, Hansen T, Sanchez G, Raes J, Falony G, Okuda S, Almeida M, LeChatelier E, Renault P, Pons N, Batto J-M, Zhang Z, Chen H, Yang R, Zheng W, Li S, Yang H, et al. 2012. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490:55–60. doi: 10.1038/nature11450. [DOI] [PubMed] [Google Scholar]
- 7.Shendure J, Ji H. 2008. Next-generation DNA sequencing. Nat Biotechnol 26:1135–1145. doi: 10.1038/nbt1486. [DOI] [PubMed] [Google Scholar]
- 8.Schloss PD, Handelsman J. 2005. Metagenomics for studying unculturable microorganisms: cutting the Gordian knot. Genome Biol 6:229. doi: 10.1186/gb-2005-6-8-229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Knight R, Vrbanac A, Taylor BC, Aksenov A, Callewaert C, Debelius J, Gonzalez A, Kosciolek T, McCall L-I, McDonald D, Melnik AV, Morton JT, Navas J, Quinn RA, Sanders JG, Swafford AD, Thompson LR, Tripathi A, Xu ZZ, Zaneveld JR, Zhu Q, Caporaso JG, Dorrestein PC. 2018. Best practices for analysing microbiomes. Nat Rev Microbiol 16:410–422. doi: 10.1038/s41579-018-0029-9. [DOI] [PubMed] [Google Scholar]
- 10.Li R, Hsieh C-L, Young A, Zhang Z, Ren X, Zhao Z. 2015. Illumina synthetic long read sequencing allows recovery of missing sequences even in the “finished” C. elegans genome. Sci Rep 5:10814. doi: 10.1038/srep10814. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Teng F, Darveekaran Nair SS, Zhu P, Li S, Huang S, Li X, Xu J, Yang F. 2018. Impact of DNA extraction method and targeted 16S-rRNA hypervariable region on oral microbiota profiling. Sci Rep 8:16321. doi: 10.1038/s41598-018-34294-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chen Z, Hui PC, Hui M, Yeoh YK, Wong PY, Chan MCW, Wong MCS, Ng SC, Chan FKL, Chan PKS. 2019. Impact of preservation method and 16S rRNA hypervariable region on gut microbiota profiling. mSystems 4:e00271-18. [PMC]doi: 10.1128/mSystems.00271-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Johnson JS, Spakowicz DJ, Hong B-Y, Petersen LM, Demkowicz P, Chen L, Leopold SR, Hanson BM, Agresta HO, Gerstein M, Sodergren E, Weinstock GM. 2019. Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nat Commun 10:5029. doi: 10.1038/s41467-019-13036-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Cuscó A, Catozzi C, Viñes J, Sanchez A, Francino O. 2018. Microbiota profiling with long amplicons using Nanopore sequencing: full-length 16S rRNA gene and the 16S-ITS-23S of the rrn operon. F1000Res 7:1755. doi: 10.12688/f1000research.16817.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Jain M, Koren S, Miga KH, Quick J, Rand AC, Sasani TA, Tyson JR, Beggs AD, Dilthey AT, Fiddes IT, Malla S, Marriott H, Nieto T, O'Grady J, Olsen HE, Pedersen BS, Rhie A, Richardson H, Quinlan AR, Snutch TP, Tee L, Paten B, Phillippy AM, Simpson JT, Loman NJ, Loose M. 2018. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol 36:338–345. doi: 10.1038/nbt.4060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Tyson JR, O'Neil NJ, Jain M, Olsen HE, Hieter P, Snutch TP. 2018. MinION-based long-read sequencing and assembly extends the Caenorhabditis elegans reference genome. Genome Res 28:266–274. doi: 10.1101/gr.221184.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Moss EL, Maghini DG, Bhatt AS. 2020. Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nat Biotechnol 38:701–707. doi: 10.1038/s41587-020-0422-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Loman NJ, Quick J, Simpson JT. 2015. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods 12:733–735. doi: 10.1038/nmeth.3444. [DOI] [PubMed] [Google Scholar]
- 19.Leidenfrost RM, Pöther D-C, Jäckel U, Wünschiers R. 2020. Benchmarking the MinION: evaluating long reads for microbial profiling. Sci Rep 10:5125. doi: 10.1038/s41598-020-61989-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kai S, Matsuo Y, Nakagawa S, Kryukov K, Matsukawa S, Tanaka H, Iwai T, Imanishi T, Hirota K. 2019. Rapid bacterial identification by direct PCR amplification of 16S rRNA genes using the MinION nanopore sequencer. FEBS Open Biol 9:548–557. doi: 10.1002/2211-5463.12590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Nicholls SM, Quick JC, Tang S, Loman NJ. 2019. Ultra-deep, long-read nanopore sequencing of mock microbial community standards. Gigascience 8:giz043. [PMC]doi: 10.1093/gigascience/giz043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Cuscó A, Belanger JM, Gershony L, Islas-Trejo A, Levy K, Medrano JF, Sánchez A, Oberbauer AM, Francino O. 2017. Individual signatures and environmental factors shape skin microbiota in healthy dogs. Microbiome 5:139. doi: 10.1186/s40168-017-0355-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Nygaard AB, Tunsjø HS, Meisal R, Charnock C. 2020. A preliminary study on the potential of Nanopore MinION and Illumina MiSeq 16S rRNA gene sequencing to characterize building-dust microbiomes. Sci Rep 10:3209. doi: 10.1038/s41598-020-59771-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Arango-Argoty GA, Dai D, Pruden A, Vikesland P, Heath LS, Zhang L. 2019. NanoARG: a web service for detecting and contextualizing antimicrobial resistance genes from nanopore-derived metagenomes. Microbiome 7:88. doi: 10.1186/s40168-019-0703-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Sanderson ND, Street TL, Foster D, Swann J, Atkins BL, Brent AJ, McNally MA, Oakley S, Taylor A, Peto TEA, Crook DW, Eyre DW. 2018. Real-time analysis of nanopore-based metagenomic sequencing from infected orthopaedic devices. BMC Genomics 19:714. doi: 10.1186/s12864-018-5094-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Moon J, Jang Y, Kim N, Park WB, Park K-I, Lee S-T, Jung K-H, Kim M, Lee SK, Chu K. 2018. Diagnosis of Haemophilus influenzae pneumonia by nanopore 16S amplicon sequencing of sputum. Emerg Infect Dis 24:1944–1946. doi: 10.3201/eid2410.180234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Shin J, Lee S, Go M-J, Lee SY, Kim SC, Lee C-H, Cho B-K. 2016. Analysis of the mouse gut microbiome using full-length 16S rRNA amplicon sequencing. Sci Rep 6:29681. doi: 10.1038/srep29681. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Albenberg L, Esipova TV, Judge CP, Bittinger K, Chen J, Laughlin A, Grunberg S, Baldassano RN, Lewis JD, Li H, Thom SR, Bushman FD, Vinogradov SA, Wu GD. 2014. Correlation between intraluminal oxygen gradient and radial partitioning of intestinal microbiota. Gastroenterology 147:1055–1063. doi: 10.1053/j.gastro.2014.07.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Eckburg PB, Bik EM, Bernstein CN, Purdom E, Dethlefsen L, Sargent M, Gill SR, Nelson KE, Relman DA. 2005. Diversity of the human intestinal microbial flora. Science 308:1635–1638. doi: 10.1126/science.1110591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Araújo-Pérez F, McCoy AN, Okechukwu C, Carroll IM, Smith KM, Jeremiah K, Sandler RS, Asher GN, Keku TO. 2012. Differences in microbial signatures between rectal mucosal biopsies and rectal swabs. Gut Microbes 3:530–535. doi: 10.4161/gmic.22157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Budding AE, Grasman ME, Eck A, Bogaards JA, Vandenbroucke-Grauls CMJE, van Bodegraven AA, Savelkoul PHM. 2014. Rectal swabs for analysis of the intestinal microbiota. PLoS One 9:e101344. doi: 10.1371/journal.pone.0101344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Ringel Y, Maharshak N, Ringel-Kulka T, Wolber EA, Sartor RB, Carroll IM. 2015. High throughput sequencing reveals distinct microbial populations within the mucosal and luminal niches in healthy individuals. Gut Microbes 6:173–181. doi: 10.1080/19490976.2015.1044711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Jones RB, Zhu X, Moan E, Murff HJ, Ness RM, Seidner DL, Sun S, Yu C, Dai Q, Fodor AA, Azcarate-Peril MA, Shrubsole MJ. 2018. Inter-niche and inter-individual variation in gut microbial community assessment using stool, rectal swab, and mucosal samples. Sci Rep 8:4139. doi: 10.1038/s41598-018-22408-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Zheng D, Liwinski T, Elinav E. 2020. Interaction between microbiota and immunity in health and disease. Cell Res 30:492–506. doi: 10.1038/s41422-020-0332-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Biehl LM, Garzetti D, Farowski F, Ring D, Koeppel MB, Rohde H, Schafhausen P, Stecher B, Vehreschild MJGT. 2019. Usability of rectal swabs for microbiome sampling in a cohort study of hematological and oncological patients. PLoS One 14:e0215428. doi: 10.1371/journal.pone.0215428. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Bassis CM, Moore NM, Lolans K, Seekatz AM, Weinstein RA, Young VB, Hayden MK, for the CDC Prevention Epicenters Program. 2017. Comparison of stool versus rectal swab samples and storage conditions on bacterial community profiles. BMC Microbiol 17:78. doi: 10.1186/s12866-017-0983-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Si J, Lee C, Ko G. 2017. Oral Microbiota: microbial biomarkers of metabolic syndrome independent of host genetic factors. Front Cell Infect Microbiol 7:516. doi: 10.3389/fcimb.2017.00516. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Said HS, Suda W, Nakagome S, Chinen H, Oshima K, Kim S, Kimura R, Iraha A, Ishida H, Fujita J, Mano S, Morita H, Dohi T, Oota H, Hattori M. 2014. Dysbiosis of salivary microbiota in inflammatory bowel disease and its association with oral immunological biomarkers. DNA Res 21:15–25. doi: 10.1093/dnares/dst037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Costello EK, Lauber CL, Hamady M, Fierer N, Gordon JI, Knight R. 2009. Bacterial community variation in human body habitats across space and time. Science 326:1694–1697. doi: 10.1126/science.1177486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Human Microbiome Project Consortium. 2012. Structure, function and diversity of the healthy human microbiome. Nature 486:207–214. doi: 10.1038/nature11234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Woo JS, Lu DY. 2019. Procurement, transportation, and storage of saliva, buccal swab, and oral wash specimens. Methods Mol Biol 1897:99–105. doi: 10.1007/978-1-4939-8935-5_10. [DOI] [PubMed] [Google Scholar]
- 42.Camelo-Castillo AJ, Mira A, Pico A, Nibali L, Henderson B, Donos N, Tomás I. 2015. Subgingival microbiota in health compared to periodontitis and the influence of smoking. Front Microbiol 6:119. doi: 10.3389/fmicb.2015.00119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Wei Y, Shi M, Zhen M, Wang C, Hu W, Nie Y, Wu X. 2019. Comparison of subgingival and buccal mucosa microbiome in chronic and aggressive periodontitis: a pilot study. Front Cell Infect Microbiol 9:53. doi: 10.3389/fcimb.2019.00053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Wick RR, Judd LM, Holt KE. 2019. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol 20:129. doi: 10.1186/s13059-019-1727-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Demuyser T, De Geyter D, Van Dorpe D, Vandoorslaer K, Wybo I. 2018. Extensive evaluation of fastidious anaerobic bacteria recovery from the Copan eSwab transport system. J Microbiol Methods 144:73–78. doi: 10.1016/j.mimet.2017.11.009. [DOI] [PubMed] [Google Scholar]
- 46.Bjerre RD, Hugerth LW, Boulund F, Seifert M, Johansen JD, Engstrand L. 2019. Effects of sampling strategy and DNA extraction on human skin microbiome investigations. Sci Rep 9:17287. doi: 10.1038/s41598-019-53599-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Young RR, Jenkins K, Araujo-Perez F, Seed PC, Kelly MS. 2020. Long-term stability of microbiome diversity and composition in fecal samples stored in eNAT medium. Microbiologyopen 9:e1046. doi: 10.1002/mbo3.1046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Zhu L, Lu X, Liu L, Voglmeir J, Zhong X, Yu Q. 2020. Akkermansia muciniphila protects intestinal mucosa from damage caused by S. pullorum by initiating proliferation of intestinal epithelium. Vet Res 51:34. doi: 10.1186/s13567-020-00755-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Helmersen K, Aamot HV. 2020. DNA extraction of microbial DNA directly from infected tissue: an optimized protocol for use in nanopore sequencing. Sci Rep 10:2985. doi: 10.1038/s41598-020-59957-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Sinha R, Abu-Ali G, Vogtmann E, Fodor AA, Ren B, Amir A, Schwager E, Crabtree J, Ma S, Abnet CC, Knight R, White O, Huttenhower C, Microbiome Quality Control Project Consortium. 2017. Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. Nat Biotechnol 35:1077–1086. doi: 10.1038/nbt.3981. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Costea PI, Zeller G, Sunagawa S, Pelletier E, Alberti A, Levenez F, Tramontano M, Driessen M, Hercog R, Jung F-E, Kultima JR, Hayward MR, Coelho LP, Allen-Vercoe E, Bertrand L, Blaut M, Brown JRM, Carton T, Cools-Portier S, Daigneault M, Derrien M, Druesne A, de Vos WM, Finlay BB, Flint HJ, Guarner F, Hattori M, Heilig H, Luna RA, van Hylckama Vlieg J, Junick J, Klymiuk I, Langella P, Le Chatelier E, Mai V, Manichanh C, Martin JC, Mery C, Morita H, O'Toole PW, Orvain C, Patil KR, Penders J, Persson S, Pons N, Popova M, Salonen A, Saulnier D, Scott KP, Singh B, et al. 2017. Towards standards for human fecal sample processing in metagenomic studies. Nat Biotechnol 35:1069–1076. doi: 10.1038/nbt.3960. [DOI] [PubMed] [Google Scholar]
- 52.IHMS. 2020. Standard operating procedure for fecal samples DNA Extraction, protocol Q INRA. http://www.human-microbiome.org/index.php?id=Sop&num=006. Accessed 30 October 2020.
- 53.Weiss S, Xu ZZ, Peddada S, Amir A, Bittinger K, Gonzalez A, Lozupone C, Zaneveld JR, Vázquez-Baeza Y, Birmingham A, Hyde ER, Knight R. 2017. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 5:27. doi: 10.1186/s40168-017-0237-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Kim D, Song L, Breitwieser FP, Salzberg SL. 2016. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res 26:1721–1729. doi: 10.1101/gr.210641.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Ye SH, Siddle KJ, Park DJ, Sabeti PC. 2019. Benchmarking metagenomics tools for taxonomic classification. Cell 178:779–794. doi: 10.1016/j.cell.2019.07.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.McIntyre ABR, Ounit R, Afshinnekoo E, Prill RJ, Hénaff E, Alexander N, Minot SS, Danko D, Foox J, Ahsanuddin S, Tighe S, Hasan NA, Subramanian P, Moffat K, Levy S, Lonardi S, Greenfield N, Colwell RR, Rosen GL, Mason CE. 2017. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol 18:182. doi: 10.1186/s13059-017-1299-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Dilthey AT, Jain C, Koren S, Phillippy AM. 2019. Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps. Nat Commun 10:3066. doi: 10.1038/s41467-019-10934-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Li Y, Wang S, Bi C, Qiu Z, Li M, Gao X. 2020. DeepSimulator1.5: a more powerful, quicker and lighter simulator for Nanopore sequencing. Bioinformatics 36:2578–2580. doi: 10.1093/bioinformatics/btz963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Nejman D, Livyatan I, Fuks G, Gavert N, Zwang Y, Geller LT, Rotter-Maskowitz A, Weiser R, Mallel G, Gigi E, Meltser A, Douglas GM, Kamer I, Gopalakrishnan V, Dadosh T, Levin-Zaidman S, Avnet S, Atlan T, Cooper ZA, Arora R, Cogdill AP, Khan MAW, Ologun G, Bussi Y, Weinberger A, Lotan-Pompan M, Golani O, Perry G, Rokah M, Bahar-Shany K, Rozeman EA, Blank CU, Ronai A, Shaoul R, Amit A, Dorfman T, Kremer R, Cohen ZR, Harnof S, Siegal T, Yehuda-Shnaidman E, Gal-Yam EN, Shapira H, Baldini N, Langille MGI, Ben-Nun A, Kaufman B, Nissan A, Golan T, Dadiani M, et al. 2020. The human tumor microbiome is composed of tumor type-specific intracellular bacteria. Science 368:973–980. doi: 10.1126/science.aay9189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Menzel P, Ng KL, Krogh A. 2016. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat Commun 7:11257. doi: 10.1038/ncomms11257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Wood DE, Lu J, Langmead B. 2019. Improved metagenomic analysis with Kraken 2. Genome Biol 20:257. doi: 10.1186/s13059-019-1891-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.GitHub. 2021. DiltheyLab/MetaMaps. https://github.com/DiltheyLab/MetaMaps/issues. Accessed 12 June 2021.
- 64.Patel AL, Mutlu EA, Sun Y, Koenig L, Green S, Jakubowicz A, Mryan J, Engen P, Fogg L, Chen AL, Pombar X, Meier PP, Keshavarzian A. 2016. Longitudinal survey of microbiota in hospitalized preterm very-low-birth-weight infants. J Pediatr Gastroenterol Nutr 62:292–303. doi: 10.1097/MPG.0000000000000913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Reyman M, van Houten MA, Arp K, Sanders EAM, Bogaert D. 2019. Rectal swabs are a reliable proxy for faecal samples in infant gut microbiota research based on 16S-rRNA sequencing. Sci Rep 9:16072. doi: 10.1038/s41598-019-52549-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Oxford Nanopore Technologies. 2020. Creating complete sample-to-analysis workflows with the interactive protocol selector. https://nanoporetech.com/resource-centre/creating-complete-sample-analysis-workflows. Accessed 17 December 2020.
- 67.Santos A, van Aerle R, Barrientos L, Martinez-Urtaza J. 2020. Computational methods for 16S metabarcoding studies using Nanopore sequencing data. Comput Struct Biotechnol J 18:296–305. doi: 10.1016/j.csbj.2020.01.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Gonzalez A, Navas-Molina JA, Kosciolek T, McDonald D, Vázquez-Baeza Y, Ackermann G, DeReus J, Janssen S, Swafford AD, Orchanian SB, Sanders JG, Shorenstein J, Holste H, Petrus S, Robbins-Pianka A, Brislawn CJ, Wang M, Rideout JR, Bolyen E, Dillon M, Caporaso JG, Dorrestein PC, Knight R. 2018. Qiita: rapid, web-enabled microbiome meta-analysis. Nat Methods 15:796–798. doi: 10.1038/s41592-018-0141-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.GitHub. 2021. liyu95/DeepSimulator. https://github.com/liyu95/DeepSimulator. Accessed 10 June 2021.
- 70.Smirnova E, Huzurbazar S, Jafari F. 2019. PERFect: PERmutation Filtering test for microbiome data. Biostatistics 20:615–631. doi: 10.1093/biostatistics/kxy020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Shenwei356. 2020. Usage-TaxonKit-NCBI taxonomy toolkit. https://bioinf.shenwei.me/taxonkit/usage/. Accessed 30 October 2020.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The classification analysis pipeline can be downloaded as a docker. Detailed instructions and download information can be found at Github (https://github.com/microbiome-gastro-UMG/MeTaPONT).