Abstract
Metagenomics is a powerful tool to identify novel or unexpected pathogens, since it is generic and relatively unbiased. The limit of detection (LOD) is a critical parameter for the routine application of methods in the clinical diagnostic context. Although attempts for the determination of LODs for metagenomics next-generation sequencing (mNGS) have been made previously, these were only applicable for specific target species in defined samples matrices. Therefore, we developed and validated a generalized probability-based model to assess the sample-specific LOD of mNGS experiments (LODmNGS). Initial rarefaction analyses with datasets of Borna disease virus 1 human encephalitis cases revealed a stochastic behavior of virus read detection. Based on this, we transformed the Bernoulli formula to predict the minimal necessary dataset size to detect one virus read with a probability of 99%. We validated the formula with 30 datasets from diseased individuals, resulting in an accuracy of 99.1% and an average of 4.5 ± 0.4 viral reads found in the calculated minimal dataset size. We demonstrated by modeling the virus genome size, virus-, and total RNA-concentration that the main determinant of mNGS sensitivity is the virus-sample background ratio. The predicted LODmNGS for the respective pathogenic virus in the datasets were congruent with the virus-concentration determined by RT-qPCR. Theoretical assumptions were further confirmed by correlation analysis of mNGS and RT-qPCR data from the samples of the analyzed datasets. This approach should guide standardization of mNGS application, due to the generalized concept of LODmNGS.
Keywords: Metagenomics, Next-generation sequencing, Detection limit, Sensitivity, Bernoulli process, qPCR
1. Introduction
Metagenomic next-generation sequencing (mNGS) is a powerful tool to identify the DNA or RNA of novel or unexpected pathogens in a single-assay. It enables a relatively unbiased detection of all organisms present in a sample, including viruses, bacteria, fungi, and parasites [1]. It has therefore a great potential to fill the gap of detecting undiagnosed causative agents in diseased patients [2], [3], [4]. Routine molecular diagnostic methods like real-time quantitative PCR (qPCR) are highly sensitive, specific and can be standardized [5]. However, the specificity hampers the detection of newly emerging pathogens or distant relatives, like the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the variegated squirrel bornavirus 1 (VSBV-1) [6], [7]. Additionally, by qPCR only those pathogens can be detected that are specifically targeted. Unexpected pathogens are missed [8]. This gap of diagnosis can lead to a fatal outcome for patients, due to a delayed development and/or implementation of clinical intervention strategies, like vaccination, medication, treatment, and quarantine. In this respect, mNGS is increasingly applied in clinical settings [9]. Technological and bioinformatics advances made it even more attractive [10], [11], [12], [13], [14]. In recent years, ring-trials of bioinformatics pipelines [15], [16] and clinical retro- and prospective studies were performed focusing on proof-of-concept, turnaround-times, accuracy, thresholds to prevent false-positive calls, quality metrics, and analytical and diagnostic specificity and sensitivity [17], [18], [19], [20].
Sensitivity is one of the major factors to assess the power of a diagnostic method. At the first glance, the sensitivity of mNGS is determined by the amount of sequenced reads. Thus, the more reads are sequenced, the further the sensitivity increases. However, the selection of the required data depth has been based mainly on economic factors and empirical and ad-hoc heuristic models, resulting in published datasets that range from 5 to 24 mio reads [17], [18], [19], [21], [22]. Especially for tissue samples, the unbiased sequencing usually results in high background levels of often >99%, which is an inherent disadvantage of mNGS, limiting the analytical sensitivity at constant data depths [22]. To address this issue, targeted pathogen enrichment techniques and host-depletion have been applied [23], [24], [25]. However, they are expensive, complex, and not available for every host or pathogen and moreover do not support the detection of hitherto unknown pathogens. The heterogenic composition of host and pathogen is consequently a key problem in mNGS analysis. Low levels of pathogen reads further complicate the differentiation from commensals and contaminants. Hence, data interpretation has been supported by statistical assessment (z-scores) [26] or methodical parameters, for example calculation of the pathogen reads per million (rpm) [18], to make positive calls based on the pathogen read numbers and proportions. Furthermore, the detection rate is influenced by the genome size of the specific target. In coverage theories, the genome size determines the necessary sequencing effort. To achieve equal sequence depth, a higher sequence data input into assembly is needed for larger genomes than for smaller genomes. Likewise, the detection of a single sequencing read is more likely to come from a large genome rather than a small one at uniform abundance levels [27], [28]. The detection of a species out of the specimen is thus dependent on its abundance, the relative genome size, and the data depth [27]. Therefore, mNGS design should be aware of these factors to find the needle in the metagenome haystack [29], since low abundant pathogens have also been linked to severe diseases [30], [31].
So far, sensitivity assessments of mNGS have been made by comparison with routine methods at qualitative or semi-quantitative levels (Cq values) and by spiking a collection of pathogens in serial dilutions in a specific sample matrix [18], [20], [22], [32], [33]. However, due to the core property of mNGS to detect all nucleic acids with nearly identical probability, a generalization of these pathogen/matrix combination specific results is not possible. Thus, the definition of a limit of detection (LOD) for mNGS (LODmNGS), as applied for other routine methods, is hampered due to the many variables influencing the sensitivity.
Hence, the aim of this study was the development and validation of a pathogen/matrix independent generally applicable mathematical model to assess the detection limit of mNGS experiments. This approach should guide standardization of mNGS application. Therefore, we developed and validated a straightforward analytical tool to assess the sample-specific LODmNGS, which is critical for the routine application of mNGS in the clinical diagnostic context.
2. Experimental procedures
2.1. Samples and datasets
The study included 30 disease-associated samples and the respective datasets from human and animal cases (Table 1), confirmed by RT-qPCR and mNGS from total RNA. Briefly, five samples originated from brain material of human fatal encephalitis cases caused by Borna disease virus 1 (BoDV-1) [8], [31]. Twenty-five samples with different sample matrices, including lung, brain, heart, liver, and spleen were derived from various host species infected with rustrela virus (RusV) [34], a pegivirus (PGV), or with West Nile virus (WNV) lineage 2 [35], respectively. In the analysed mNGS datasets, virus-specific reads were identified by assembler/mapping analysis after quality and adapter trimming implemented in the 454 software suite (v3.0; Roche). The quality of the library and dataset was checked using FastQC [36] and R-packages bioanalyzeR [37] and qrqc [38] in R-Studio [39] with R (v4.0.2; [40]). Subsequently, the percentage of the respective target virus in the dataset was calculated from the number of virus-specific reads and the total number of reads of that dataset.
Table 1.
Viral target | Library ID | Host | Tissue | Total RNA | RT-qPCR | mNGS |
||
---|---|---|---|---|---|---|---|---|
(ng/µl) | (Cq value) | total reads | target virus reads | target virus percentage | ||||
BoDV-1 | lib02012 | Human | brain | 25.7 | 15.7 | 2.69E + 06 | 2.15E + 04 | 8.00E – 01 |
lib02246 | Human | brain | 37.6 | 23.0 | 7.65E + 06 | 3.20E + 01 | 4.18E – 04 | |
lib02462 | Human | brain | 17.0 | 17.8 | 4.60E + 06 | 4.61E + 03 | 1.00E – 01 | |
lib02557 | Human | brain | 4.1 | 19.3 | 1.15E + 07 | 1.96E + 04 | 1.71E – 01 | |
lib02558 | Human | brain | 18.4 | 20.5 | 3.93E + 06 | 2.70E + 01 | 6.86E – 04 | |
PGV | lib03148 | European hamster | lung | 136.4 | 26.8 | 1.45E + 06 | 2.10E + 01 | 1.44E – 03 |
lib03150 | European hamster | lung | 260.2 | 28.0 | 1.76E + 06 | 1.00E + 01 | 5.67E – 04 | |
RusV | lib03123 | Donkey | brain | 249.0 | 26.2 | 2.65E + 06 | 1.30E + 01 | 4.91E – 04 |
WNV | lib02898 | Great Grey Owl | organ pool | 598.3 | 11.3 | 6.82E + 06 | 3.97E + 05 | 5.83E + 00 |
lib02914 | Goshawk | brain | 197.3 | 15.7 | 6.67E + 06 | 3.01E + 04 | 4.52E – 01 | |
lib02959 | Goshawk | brain | 80.3 | 17.6 | 7.99E + 06 | 2.79E + 04 | 3.49E – 01 | |
lib03378 | Snowy Owl | heart | 217.4 | 21.4 | 2.71E + 06 | 8.82E + 02 | 3.26E – 02 | |
lib03379 | Great Grey Owl | liver | 898.5 | 14.3 | 3.00E + 06 | 3.61E + 04 | 1.20E + 00 | |
lib03380 | Snowy Owl | liver | 832.1 | 12.3 | 3.87E + 06 | 2.12E + 05 | 5.46E + 00 | |
lib03381 | Blue Tit | brain | 40.6 | 21.7 | 4.38E + 06 | 2.88E + 04 | 6.57E – 01 | |
lib03382 | Snowy Owl | liver | 715.2 | 16.6 | 3.55E + 06 | 1.84E + 04 | 5.17E – 01 | |
lib03415 | Snowy Owl | heart | 189.9 | 16.8 | 2.47E + 06 | 9.73E + 03 | 3.94E – 01 | |
lib03416 | Andean Flamingo | heart | 119.5 | 17.2 | 8.37E + 05 | 3.36E + 03 | 4.02E – 01 | |
lib03417 | Goshawk | heart | 188.9 | 17.6 | 1.05E + 06 | 3.58E + 03 | 3.41E – 01 | |
lib03418 | Goshawk | brain | 134.7 | 12.5 | 2.72E + 06 | 2.94E + 05 | 1.08E + 01 | |
lib03419 | Goshawk | brain | 535.9 | 13.0 | 9.25E + 05 | 4.04E + 04 | 4.37E + 00 | |
lib03420 | Goshawk | brain | 411.5 | 16.1 | 7.98E + 05 | 5.53E + 03 | 6.93E – 01 | |
lib03422 | Great Tit | liver/heart | 1180.9 | 11.7 | 1.23E + 06 | 1.56E + 05 | 1.27E + 01 | |
lib03423 | Eurasian Golden Plover | liver/spleen | 472.7 | 19.2 | 1.58E + 06 | 1.32E + 04 | 8.38E – 01 | |
lib03424 | Goshawk | brain | 446.0 | 16.7 | 1.02E + 06 | 4.92E + 03 | 4.81E – 01 | |
lib03425 | Snowy Owl | liver | 619.9 | 14.2 | 1.57E + 06 | 3.21E + 04 | 2.05E + 00 | |
lib03426 | Snowy Owl | liver | 513.1 | 17.2 | 3.77E + 06 | 1.17E + 04 | 3.10E – 01 | |
lib03449 | Humboldt-Penguin | heart | 270.7 | 12.1 | 2.72E + 06 | 3.06E + 05 | 1.12E + 01 | |
lib03450 | Goshawk | brain | 291.5 | 14.5 | 2.32E + 06 | 3.15E + 04 | 1.36E + 00 | |
lib03451 | Horse | spinal cord | 34.7 | 28.2 | 2.94E + 06 | 1.80E + 01 | 6.11E – 04 |
Abbreviations: BoDV-1, Borna disease virus 1; PGV, Pegivirus, RusV, Rustrela virus; WNV, West Nile virus lineage 2; RNA (total), ribonucleic acid concentration of the sample; RT-qPCR, reverse transcriptase real-time PCR; Cq, quantification cycle; mNGS, metagenomics next generation sequencing.
2.2. Wet-lab procedures
Total RNA concentrations were quantified using a Nanodrop ND1000 instrument (Peqlab, Erlangen, Germany). The DNA library concentration was measured by using the Bioanalyzer 2100 (Agilent Technologies, CA, USA). Absolute quantification of the viral RNA and the double-stranded virus cDNA (library) was performed by specific 5′ nuclease RT-qPCR and qPCR, respectively (SensiFAST™ Probe No-ROX One-Step Kit, meridian Bioscience, Tennessee, USA). For BoDV-1, Mix1 targeting the P gene was used [8]. For PGV, we used an in-silico and in-vitro confirmed specific assay. For WNV, the INEID-assay targeting the 5′ untranslated region was used [41]. For RusV, an assay targeting the non-structural gene was used [34]. For absolute quantification, a plasmid or synthetic dsDNA (gBlocks®, Integrated DNA Technologies, Leuven, Belgium) calibration standard was applied in duplicates in ten-fold dilutions series from 1.0E + 06 to 1.0E + 01 copies per µl (c/µl) in concordance with the MIQE Guidelines [42]. RT-qPCR calibration curves for BoDV-1, PGV, WNV, and RusV showed an efficiency between 96.6% and 103.1% with R2 ranging from 0.9998 to 1.0 and slope in the range from –3.407 to –3.348. For BoDV-1 RNA only, retrospective absolute quantification was carried out with an external calibration curve. An internal standard was used for normalization between the runs. The qPCR calibration curves for the quantification of target virus fragments in the library showed an efficiency between 100.4% and 103.2% with R2 ranging from 0.993 to 1.0 and slope in the range from –3.312 to –3.247. For this, 14 libraries, comprising nine WNV (lib03416 – lib03425), two BoDV-1 (lib02246, lib02462), two PGV (lib03148, lib03150), and one RusV (lib03123) were analyzed.
2.3. Rarefaction analysis
Rarefaction analyses were performed initially with the five BoDV-1 metagenomics datasets only (lib02012 to lib02558; Table 1). Reads were mapped along the BoDV-1 reference sequence (NC_001607.1) using the 454 software suite (v3.0; Roche, Mannheim, Germany) to identify reads of viral origin. Complete lists of read accessions of the individual libraries were extracted. Then, random subsets of read accessions of each library comprising 1.0E + 02, 1.0E + 03, 1.0E + 04, 5.0E + 04, 1.0E + 05, 5.0E + 05, 1.0E + 06, 2.0E + 06, and 3.0E + 06 reads were retrieved from these lists using the linux command ‘shuf’. In these subsets, read accessions representing viral reads were identified using the linux command ‘fgrep –f’ and the list of accessions representing reads of known viral origin. For each subset size, analyses were repeated 100 times and for each repetition presence, absence of BoDV-1 as well as the number of BoDV-1 reads were recorded. In case the detection rate (presence or absence of BoDV-1 reads) in a given subset size exceeded 95%, only five repetitions were performed because of the low variation in results.
2.4. Reference sequences
For calculations exploring factors that influence the limit of detection, we included the following sequences of RNA virus family representatives: West Nile virus (WNV), NC_001563.2, Flaviviridae; Borna disease virus 1 (BoDV-1), NC_001607, Bornaviridae; Rift Valley fever virus (RVF), NC_014397, NC_014396, NC_014395, Bunyaviridae; Sindbis virus (SINV), NC_001547, Togaviridae; Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), NC_045512.2, Coronaviridae; Human coxsackievirus A (CV-A2), NC_038306, Picornaviridae; Measles virus (MV), NC_001498, Paramaxoviridae; Rabies lyssavirus (RABV), NC_001542, Rhabdoviridae; Rubella virus (RuV), NC_001545.2, Matonaviridae; Influenza A virus (IAV), GCA_001343785, Orthomyxoviridae; Hepatitis delta virus (HDV), NC_001653.2, Deltavirus incertae sedis.
3. Results
3.1. Virus read proportion and dataset size determine virus detection
As a starting point for the analyses, we performed a rarefaction analysis. We repetitively determined the detection (presence/absence) of virus reads in data subsets of different size (100 repeats per subset size). From the results of these repetitive drawings, we calculated the positivity rate, i.e. the detection rate of the virus in a given dataset size. For this, we used a set of five datasets generated from BoDV-1-positive samples covering a range of virus read percentages from 6.9E – 04 – 8.0E – 01%. The detection rate of BoDV-1 reads in subsets of these datasets differed (Fig. 1). In subsets of datasets with a low virus read percentage (lib02246 = 4.2E – 04%, lib02558 = 6.9E – 04%) BoDV-1 read detection was possible at a partial dataset size of 1.0E + 06 reads with 100% and at 5.0E + 05 reads with 97% detection rate. At higher virus read percentages in the range of 1.0E – 01 – 8.0E – 01%, BoDV-1 read detection was possible at low partial dataset sizes of 1.0E + 03 reads for lib02012 and 1.0E + 04 reads for lib02462 and lib02557 (detection rates of 100%). The BoDV-1 read amount per partial dataset size increased linearly for all virus read percentages (R2 ≥ 0.9851, p ≤ 0.001), despite detection rates of <100% (Extended Data Fig. 1).
3.2. Virus detection by mNGS is a Bernoulli process
Based on the results of the rarefaction analyses (=stochastic behavior of virus read detection influenced by dataset size and virus read proportion), we sought a mathematical formula to predict the minimum required dataset size to detect one virus read with a reasonable detection rate. The Bernoulli process describes a discrete stochastic process with only two possible results (presence/absence), coupled with a statement about the probability of occurrence. The equation for the standard Bernoulli process is shown in Equation 1. The notations for the mathematical derivation can be found in Table 2.
Table 2.
Variable | Meaning |
---|---|
n | Number of trials (size of dataset) |
p | Possibility of occurrence (of a viral read; virus read proportion) |
k | Number of matches to obtain |
Equation 1: Bernoulli process
The LOD is defined as the lowest quantity that can be detected with reasonable certainty for a given analytical procedure [43]. The chance to detect at least one viral read should be close to 100%. To estimate the dataset size necessary to find one viral read (k = 1) with an event probability of α = 0.99 (0 < α < 1) and a given probability of p, it is necessary to transform Equation 1. Therefore, the arising question was to transform the Bernoulli process to gather an insight into the necessary size of n, i.e. the number of reads sequenced for a library (mNGS). This was done by taking the counter event possibilities into the equation. Following, the natural logarithms were processed and the equation was solved according to equation 2 (Eq. 2). To directly use the virus read proportion of a sample, we set p = /100, where = virus read percentage.
Equation 2: Transformed Bernoulli process
Validation of the transformed Bernoulli formula was performed with which originated from the mNGS analysis of the 30 trimmed and quality checked datasets from diseased animals and humans (Table 1, Extended Data Figs. 2, 3, and 4). The from the mNGS and assembler/mapping analysis resulted from the number of virus-specific reads and the total number of all sequenced reads of a library. One-hundred subsamples from the total read accession numbers of the libraries were taken respectively with replacement and were compared to the accession list of mapped virus reads. The mean accuracy of Eq. 2 to predict dataset size n for a virus read was 99.1% within the range of 93.0 to 100.0% at a qualitative level (Table 3). We proved the assumption of k = 1 virus read of Eq. 2 by counting the amount of the respective virus reads in the subsets. This resulted in k = 4.5 ± 0.4 reads in n (Table 3). As a cross check of Eq. 2, we reconstructed the number of virus reads from the mNGS analyses. To do this, we divided the individual dataset sizes (Table 1) by the calculated n (Table 3). We included k = 4.5 as a multiplication factor in Equation 3, since k ≠ 1.
Table 3.
Viral target | ID | n | Accuracy (%) | Viral reads | |
---|---|---|---|---|---|
Mean | SD | ||||
BoDV-1 | lib02012 | 559 | 99 | 4.7 | 2.2 |
lib02246 | 1,101,151 | 100 | 4.7 | 2.0 | |
lib02462 | 4603 | 93 | 2.8 | 1.8 | |
lib02557 | 2693 | 100 | 4.3 | 2.2 | |
lib02558 | 670,830 | 100 | 4.4 | 1.9 | |
PGV | lib03148 | 318,891 | 100 | 4.5 | 1.9 |
lib03150 | 807,922 | 100 | 4.4 | 1.7 | |
RusV | lib03123 | 939,828 | 100 | 4.5 | 1.7 |
WNV | lib02898 | 77 | 98 | 4.7 | 2.1 |
lib02914 | 1017 | 99 | 4.5 | 2.2 | |
lib02959 | 1317 | 100 | 4.5 | 2.4 | |
lib03378 | 14,144 | 97 | 4.5 | 2.1 | |
lib03379 | 381 | 100 | 4.5 | 1.7 | |
lib03380 | 82 | 100 | 4.8 | 2.0 | |
lib03381 | 699 | 99 | 4.8 | 2.4 | |
lib03382 | 888 | 100 | 5.0 | 2.3 | |
lib03415 | 1166 | 99 | 4.7 | 1.9 | |
lib03416 | 1144 | 97 | 4.5 | 2.0 | |
lib03417 | 1349 | 98 | 4.7 | 2.1 | |
lib03418 | 40 | 100 | 4.3 | 1.8 | |
lib03419 | 103 | 100 | 4.7 | 2.2 | |
lib03420 | 662 | 100 | 4.7 | 2.1 | |
lib03422 | 34 | 100 | 4.3 | 1.8 | |
lib03423 | 547 | 99 | 4.8 | 2.1 | |
lib03424 | 955 | 100 | 4.2 | 1.9 | |
lib03425 | 223 | 98 | 4.7 | 2.1 | |
lib03426 | 1481 | 99 | 5.0 | 2.0 | |
lib03449 | 39 | 100 | 4.3 | 2.0 | |
lib03450 | 337 | 99 | 4.6 | 2.2 | |
lib03451 | 754,944 | 100 | 4.8 | 1.5 | |
Mean | 99.1 | 4.5 | 2.0 | ||
SD | 1.5 | 0.4 | 0.2 |
Abbreviations: BoDV-1, Borna disease virus 1; PGV, Pegivirus, RusV, Rustrela virus; WNV, West Nile virus lineage 2; n, theoretically required dataset size for 1 virus read; SD, standard deviation.
Equation 3: Recovery of virus read numbers
where r = actually available dataset size, n = theoretically required dataset size for ≥1 virus read (Eq. 2), and k = multiplication factor. The recovery rate was 97.99% (median; Extended Data Fig. 5).
3.3. Modelling factors that impact mNGS sensitivity
As mentioned above, empirical data shows that the detection of a species depends on its abundance, the relative genome size, and the dataset size. We used R Studio [39] to investigate the influence of these factors on mNGS sensitivity. To be able to apply Eq. 2 for the prediction of the necessary sequencing effort, we approximate as the ratio of the amounts (in g) of viral RNA and total RNA in the sample. We approximated from the amount of viral RNA calculated from the virus genome copy number and the amount of total RNA as determined photometrically with Equation 4 (Eq. 4).
Equation 4: Prediction of
where i = virus genome copies, nt = virus genome size, 340 Da = mean weight of one RNA nucleotide in Dalton, 1.6605402E – 15 = weight of one Da in nanogram, and c = i/µl.
Applying Eq. 2 in combination with Eq. 3 and Eq. 4, we modeled in dependence of different factors but with constant α = 0.99 (Fig. 2). First, we investigated the effect of on the expected number of BoDV-1 reads in a dataset of defined size (r = 5.0E + 06 reads) in dependence of the virus copy number per µl and the total RNA concentration. To assess the sensitivity, a tenfold serial dilution of 1.0E + 00 to 1.0E + 06 c/µl of the BoDV-1 genome (8910 nt, NC_001607) was used, while RNA concentration was increased (1 to 100 ng/µl) (Fig. 2a). As Fig. 2a shows, the expected number of BoDV-1 reads differed within and between virus concentrations, showing a decrease in virus reads with a simultaneous increase in total RNA concentration. To illuminate qualitative diagnostic aspects, we calculated the necessary dataset size n for the same dependencies as in Fig. 2a with an upper cut-off for dataset size set at 1.5E + 07 reads (Fig. 2b). This showed that with copy numbers higher than 1.0E + 05 c/µl, BoDV-1 was detectable independently of the background, i.e. at every . On the contrary, with BoDV-1 copy numbers below 1.0E + 04 c/µl, virus reads were only detectable at total RNA concentrations lower than approx. 50 ng/µl (Fig. 2b). With a BoDV-1 copy number below 1.0E + 02 c/µl, no detection was possible with a dataset size of 5.0E + 06 reads (Fig. 2b).
In order to generalize the model, we investigated the influence of the genome size on the virus read numbers at a given dataset size (Fig. 2c) and the necessary dataset size (Fig. 2d). For these analyses, we repeated the calculations with representative genome sizes for small, medium and large RNA virus genomes (7.5 kb, 15 kb, and 30 kb) at a concentration of 1.0E + 04 c/µl. As Fig. 2c shows, the number of virus reads that can be expected in a dataset of 5.0E + 06 reads depends on the genome size. The detection of a read from a virus with a small genome (7.5 kb) size required higher dataset sizes (n) than for larger viruses (15 and 30 kb; Fig. 2d).
To assess the meaningfulness of the result obtained with a certain assay, the limit of detection (LOD) of that assay needs to be defined. Although in practice the LOD of qPCR depends on the specific assay, theoretically the LOD of qPCR is at the genome copy number of 3 c/µl but independent of the genome size. As shown above, the sensitivity of mNGS depends on both virus copy number and virus genome size. In order to investigate the limit of detection for mNGS analysis, we calculated the minimum virus genome copy number that allows for the detection of a virus in a dataset of 5.0E + 06 reads generated from a sample with 30 ng/µl total RNA. Specifically, we further examined the effect of the genome size (1.5 kb to 30 kb) on the detection limit of an mNGS analysis. For this, we calculated the LOD of an mNGS analysis as follows: For each i (1.0E + 00 ≤ i ≤ 1.0E + 06 c/µl; Eq. 4), the theoretically necessary minimal dataset size n was calculated according to Eq. 2. The LOD is then defined as the minimal c/µl for which 1 viral read can be expected in a dataset of 5.0E + 06 reads. As shown in Fig. 2e, the LOD varies among the genome sizes. The LOD for the very large SARS-CoV-2 (1686 c/µl) and the very small HDV (29106 c/µl) differs 17.3 times from each other.
To evaluate the sensitivity independent of the pathogen (genome size and copy number) and the total nucleic acid concentration, we calculated n (the necessary dataset size to detect 1 viral read) for a range of (5.0E – 05 – 1.0E – 03%). In this analysis, we observed an exponential decrease in the required dataset size n (Fig. 2f). For all ≥ 0.0001% the pathogen was detectable with a dataset size of 5.0E + 06 reads. For < 0.0001% a higher amount of sequenced reads were necessary, indicating that in theory the sensitivity can be scaled by scaling the dataset size.
3.4. RT-qPCR and mNGS are significantly correlated
As a proof-of-concept that for mNGS analysis is defined as the ratio of the mass of viral nucleic acids and total RNA, we compared mNGS and RT-qPCR. To this end, we calculated from the quantitative RT-qPCR results by Eq. 4. For these calculations the genome sizes of the individual viruses (BoDV-1, 8910 nt; RusV, 9322 nt; WNV, 11080; PGV, 11,520 nt) were used. The calculated RT-qPCR correlated highly significant with the mNGS (r = 0.82, p < 0.0001). Unexpectedly, with a single exception (BoDV-1 in lib02246; Fig. 3a, Extended Data Table 1) mNGS were higher (median 61.7 times, IQR 35.8 – 107.8) than RT-qPCR (Fig. 3b). Therefore, to trace the source of this deviation, we determined Library in the sequencing-ready libraries. To this end, we analyzed 14 libraries by qPCR and Agilent Bioanalyzer. For the calculations of Library, we modified the conversion factor for 340 Da for RNA into 660 Da for dsDNA in Eq. 4 and put the amount of qPCR target molecules in relation to the DNA library concentration. For a subset of five libraries, we observed an increase of in the library (Library; median = 1.2E – 03%) in comparison with RT-qPCR (median = 2.2E – 05%; Extended Data Fig. 6). This coincided with mNGS of these libraries. However, the same libraries had an increased in comparison to mNGS (median = 5.7E – 04%). Unfortunately, nine WNV libraries had to be excluded from this analysis of Library due to methodical constraints. Here, the RT-qPCR assay is located at the 5′-terminus of the genome, which is not converted efficiently during library preparation, as displayed by qPCR data and genome coverage analyses (data not shown). Hence, no reliable determination of Library was possible.
3.5. Detection limits of mNGS appear primarily determined by total RNA concentration
As outlined above, in published studies the sensitivity of mNGS is often tried to define by comparison with routine diagnostic methods. Therefore, here we conducted a systematic comparison of the LODs calculated from mNGS data with the virus genome copy numbers determined by RT-qPCR from the identical sample. To this end, we calculated the LODmNGS using Eq. 2 and its modification (Eq. 4, calculation of LOD) to the datasets used in this study (Table 1). The LODmNGS calculated for the individual libraries differed, apparently rather in relation to the total RNA concentration than to the amount of sequenced reads or virus species (Fig. 4a; Extended Data Table 1). LODmNGS values were considered plausible if lower than or equal to the virus copy numbers per µl as determined by RT-qPCR. This was true in 25/30 cases (Fig. 4b, Extended Data Table 1). Although the detection of virus concentrations below the calculated LOD is by definition very unlikely, this was observed for five libraries containing different viruses (lib02558, BoDV-1; lib03148 and lib03150, PGV; lib03123, RusV; lib03451, WNV; Fig. 4b and Extended Data Table 1). For these five samples, we recalculated the LODmNGS for the different event probabilities for α = 0.01 to 0.99 (stepwise increase of 0.01) and compared it to the c/µl of RT-qPCR (Fig. 4c). In all these cases, the recalculated LODmNGS was plausible according to the definition above, albeit with reduced α in the range of 0.09 and 0.75 (Fig. 4c).
3.6. mNGS and LODmNGS are significantly correlated with RT-qPCR values
We conclusively examined the correlation between the various sample and dataset characteristics examined above. To this end, values were log-transformed prior to the calculation of spearman correlations and p-values with the rcorr() function of the Hmisc package [44] in R Studio. The correlation matrix was created with the corrplot package [45]. In this analysis we included Cq-values, virus copy numbers calculated from RT-qPCR values (Cq, c/µl), dataset size, number of virus reads, mNGS, n, and LODmNGS. Inverse correlation of semi-quantitative (Cq) and absolute quantitative (c/µl) RT-qPCR values were observed (Fig. 5). As Fig. 5 shows, this analysis revealed highly significant (p < 0.01) correlations of RT-qPCR values and mNGS (viral reads, mNGS) and formula-derived values (n, LODmNGS), respectively. Obviously, the correlation between mNGS and formula-derived values is due to the dependency of the formula derived values from the mNGS data. None of the categories had significant correlation with the dataset size. This correlation analyses clearly shows that the calculation of LODmNGS and the necessary dataset size n is possible and yields meaningful results. These allow the assessment of the mNGS based detection limit depending on .
4. Discussion
We developed a straightforward probability-based mathematical approach to test the assignment of the individual detection limit per sample for mNGS analysis. We followed a sample matrix-independent approach to preserve the advantageous non-specificity of mNGS in pathogen detection and at the same time make a statistical statement about the probability of virus detection at a certain data depth. The assessment of an mNGS result must always take into account the specific detection limit of the analysis for a certain pathogen and the analysis of specific parameters (total nucleic acid input, expected pathogen genome size, dataset size; compare Fig. 2e). Our model incorporates the hitherto known factors influencing LODmNGS, whereby valuable information can be derviated for the assessment of mNGS experiments and related expectations. The expression of LODmNGS in copies per microliter enables comparison with RT-qPCR derived concentrations. To the best of our knowledge, we showed for the first time a direct relationship between the ratio of viral and total RNA and its ratio after mNGS analysis.
Rarefaction analysis of the BoDV-1 datasets showed that the relationship between virus detection rate and dataset depths depends on the virus read percentage . Therefore, we concluded that the presence or absence of a virus read at a certain, minimal dataset size follows a Bernoulli distribution, a discrete probability distribution with binomial results. We transformed the formula of the Bernoulli process into Eq. 2 to calculate the dataset size required for virus detection with a given probability. We set α = 0.99 (99%) to detect the virus read with a probability close to 100%. The introduction of a probability for the detection limit for mNGS is thus in line with the general definition of LOD [43]. The verification of Eq. 2 with datasets from diseased animals and humans showed a high accuracy and repeatability, confirming our probability based approach. However, the accuracy of n (minimal dataset size for the detection of one virus read) was influenced by the accurate determination of the virus-background-ratio, designated . We argue that an incorrect assignment of virus reads in the determination of for BoDV-1 reads in lib02462 resulted in a slightly reduced accuracy (93%) and 2.8 ± 1.8 virus reads. In Eq. 2, we set k = 1 virus read to calculate n. Indeed, in datasets of size n, a mean of 4.5 ± 0.4 reads were counted. Actually, dataset size n is calculated for k = 1. At last, we confirmed the applicability and correctness of Eq. 2 and k = 4.5 by recovering the actual virus read numbers with an accuracy of 97.99%.
We demonstrated that the critical factor of the mNGS sensitivity is . We observed a logarithmic relationship of and n, indicating that a pathogen abundance level of > 0.001% is already reliably detectable within a dataset of 5.0E + 06 reads (Fig. 2f). Due to the logarithmic relation of and n, lower require disproportionately large datasets. Interestingly, a log relationship of Cq values and mapped reads of viral pathogens in nasopharyngeal swabs has already been observed [32]. This observation also fits with published [30] findings that the selection of a suitable sample is critical for the success of mNGS analyses.
However, is a relative value. The nucleic acid amount of larger viruses is naturally higher than that of a small one at the same concentration, i.e. genome copy number (c/µl). The effect of the genome size and the probability of occurrence of a single species read has already been reported [27]. Moreover, the genome size has already been taken into account in the normalization of read counts (RPKM [20], VTMK [46]) and in experimental planning for assembly approaches [28]. We also observed an effect of the genome size on LODmNGS. The LODmNGS decreased with increasing genome sizes (Fig. 2e). The LODmNGS for SARS-CoV-2 was 17.3 times lower than for HDV. When comparing large DNA viruses, bacteria or parasites, the impact of genome size on the differences in the LOD will be more pronounced. Additionally, the basic assumption of our calculations and those from RT-qPCR quantification relies on linking a target read or amplicon of small size to a genomic equivalent, neglecting differences in genome coverage as potentially caused by transcriptional gradients or the expression of subgenomic RNA found in several species [47]. Furthermore, we show that the virus-concentration is not a reliable indicator of mNGS sensitivity (compare Fig. 2a, 2b). With decreasing , i.e. increasing background, the same virus genome copy number can lead to different amount of virus reads and required dataset size for detection. In absolute read numbers, in a dataset of 5.0E + 06 reads generated from a sample with 50 ng/µl total RNA and a virus concentration of 1.0E + 04 c/µl one would receive 5 BoDV-1 reads while at 1 ng/µl total RNA with the same virus concentration, the same dataset would comprise approx. 400 viral reads (Fig. 2a, 2b). Consequently, with a virus concentration of 1.0E + 04 c/µl and 1 ng/µl total RNA only 1.0E + 05 reads (minimal dataset size n) are needed for detection of BoDV-1, whereas 5.0E + 06 reads are needed at 50 ng/µl total RNA. The effect of high and low background is well known [17], [18], [19], [33], [48]. Consequently, highly abundant pathogens are more obvious than low-abundant pathogens and the differentiation to a contaminant becomes more important [49], [50]. At a low pathogen read and abundance level, assembly approaches may fail or threshold criteria used to differentiate clinically relevant pathogens from contaminants may not be met [18], [19], but even a single pathogen read should be reviewed carefully and should not be rejected per se [19], [30]. Nevertheless, it is of course not advisable to derive a diagnosis or even a clinical treatment strategy based on single or few reads. Especially single or low abundant pathogen reads need to be reviewed carefully and a false assignment e.g. due to low-complexity regions, has to be excluded by a data analyst. However, knowledge of LODmNGS can help to assess and rank the obtained results and provide valuable information to base the decision on whether or not it is worth following up the findings.
Deducing from absolute quantitative RT-qPCR is in principle possible (r = 0.82, p < 0.0001, Fig. 3). We also confirmed the correlation of RT-qPCR values and mNGS results (Fig. 5). The observed factor of 61.7 between mNGS and RT-qPCR is presumably a combined effect of different experimental factors: (i) The use of an external DNA standard (according to the original publication [51]) instead of an RNA standard may render the absolute quantification of our RT-qPCR assays somewhat inexact by disregarding the efficiency of the reverse transcriptase [52]; (ii) it is presupposed that a suitable method for measuring the total RNA concentration is applied in order not to flaw the RT-qPCR or LODmNGS calculations; this is especially true for samples with low biomass (<10 ng); however, we did not observe substantial differences of the low biomass lib02557 (4.1 ng/µl) to all other analyzed libraries (≥17 ng/µl; Table 1, Extended Data Table 1, Fig. 4,); (iii) presumably most importantly, library preparation impacts the finally resulting mNGS; it alters the composition of the total nucleic acids by enzymatic modifications including reverse-transcription, fragment end polishing, and adapter ligation. Of course lastly also size selection impacts the composition by removing small and large fragments from the sample during library preparation [10]. To assess the impact of library preparation and distinguish its effect from potential sequencing bias, we determined Library from a set of analyzed libraries. Although due to technical constraints only a subset of the data could be taken into account, it appears that the main difference between RT-qPCR and mNGS is introduced during library preparation. This does of course not rule out differences of viral read proportions in datasets which can derive from different sequencing platforms and their respective library preparation workflows, affecting mNGS [53], [54]. Rather, it can be expected that each workflow from sample to sequence dataset will have its specific factor between RT-qPCR and mNGS. Therefore, further studies are needed to identify such factors to adjust our model and increase its level of precision.
In Fig. 4a, we modelled the LOD for 30 datasets that originated from various sample matrices of diseased animals and humans. We calculated the individual LODmNGS for every sample based on the target virus, total RNA concentration, and dataset size. While LODmNGS increased with increasing total RNA-concentrations, the impact of the dataset size was neglectable. This missing influence of the dataset size may be caused by the selected datasets, although these were randomly selected from available datasets. The accuracy of the calculated LOD remains to be assessed by systematic comparison of mNGS negative but RT-qPCR positive samples. Nevertheless, in 25/30 cases the RT-qPCR derived quantitative values were above the mNGS LOD, supporting the dependencies between sample and LODmNGS elaborated in this paper. In the remaining cases, LODmNGS was higher than the concentration derived from RT-qPCR. All these had a of 1.44E-03 – 6.86E-04 and ≤27 virus reads. We argue that the used data depth for these samples was too low to fulfill the 99% probability requirement for the occurrence of at least one viral read in a data subset. Systematic analysis are needed to evaluate the effect of data depth and probability of detection as well as to validate the predicted and actual LOD.
In previous studies, the detection cut-offs of mNGS have been linked to Cq ~32 and ~36 in nasopharyngeal swabs, aspirates, or sputums for different virus panels [20], [32] or have been evaluated by a serial dilution of a set of pathogens, including human immunodeficiency virus and cytomegalovirus with 313 and 14 copies/ml in CSF samples [18]. Although these results highlight the limitation and power of mNGS, the results are hardly transferable to other matrices and viruses. Additionally, differences in sequencing depths complicate a generalization of the detection limit. A general definition of LODmNGS seems therefore not suitable but appears rather matrix and pathogen-specific [17], [18], [20], [32]. However, our approach supports the standardization of the mNGS detection limit across matrices and pathogens.
5. Conclusion
The assessment of the detection limit is of major interest for the application of shotgun mNGS in clinical laboratories. Therefore, we developed and validated a straightforward analytical tool to assess the sample-specific LODmNGS, considering nucleic acid concentration, genome length, and data depth. For this calculation, we define the total nucleic acid concentration as the background for modeling the LODmNGS. The results of these calculations are congruent with RT-qPCR results. This mathematical and sample matrix independent approach may guide to a more transferable and standardizable LOD for future mNGS experiments.
CRediT authorship contribution statement
Arnt Ebinger: Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Data curation, Writing - original draft, Writing - review & editing. Susanne Fischer: Conceptualization, Methodology, Validation, Writing - review & editing. Dirk Höper: Conceptualization, Funding acquisition, Methodology, Supervision, Writing - original draft, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
We thank Pauline Santos and Sten Calvelage (Friedrich-Loeffler-Institut, Greifswald-Insel Riems, Germany) for their support in the selection of samples and datasets. We are also grateful to Claudia Wylezich and Martin Beer (Friedrich-Loeffler-Institut, Greifswald-Insel Riems, Germany) for helpful comments and discussions. This work was supported by the Federal Ministry of Education and Research within the research consortium “ZooBoCo” (Grant No. 01KI1722A).
Footnotes
Supplementary data to this article can be found online at https://doi.org/10.1016/j.csbj.2020.12.040.
Appendix A. Supplementary data
The following are the Supplementary data to this article:
References
- 1.Forbes J.D., Knox N.C., Ronholm J., Pagotto F., Reimer A. Metagenomics: the next culture-independent game changer. Front Microbiol. 2017;8:1069. doi: 10.3389/fmicb.2017.01069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Thomas M.K., Murray R., Flockhart L., Pintar K., Fazil A. Estimates of foodborne illness–related hospitalizations and deaths in Canada for 30 specified pathogens and unspecified agents. Foodborne Pathogens and Disease. 2015;12(10):820–827. doi: 10.1089/fpd.2015.1966. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Fenollar F., Raoult D. Molecular diagnosis of bloodstream infections caused by non-cultivable bacteria. Int J Antimicrob Agents. 2007;30:7–15. doi: 10.1016/j.ijantimicag.2007.06.024. [DOI] [PubMed] [Google Scholar]
- 4.Glaser C.A., Honarmand S., Anderson L.J., Schnurr D.P., Forghani B. Beyond viruses: clinical profiles and etiologies associated with encephalitis. Clin Infect Dis. 2006;43(12):1565–1577. doi: 10.1086/509330. [DOI] [PubMed] [Google Scholar]
- 5.ISO (International Organization for Standardization) (2011) Microbiology of food and animal feeding stuffs. Real-time polymerase chain reaction (PCR) for the detection of food-borne pathogens. General requirements and definitions. ISO 22119:2011.
- 6.Zhou P., Yang X.-L., Wang X.-G., Hu B., Zhang L. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579(7798):270–273. doi: 10.1038/s41586-020-2012-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Hoffmann B., Tappe D., Höper D., Herden C., Boldt A. A variegated squirrel bornavirus associated with fatal human encephalitis. N Engl J Med. 2015;373(2):154–162. doi: 10.1056/NEJMoa1415627. [DOI] [PubMed] [Google Scholar]
- 8.Schlottau K., Forth L., Angstwurm K., Höper D., Zecher D. Fatal encephalitic borna disease virus 1 in solid-organ transplant recipients. N Engl J Med. 2018;379(14):1377–1379. doi: 10.1056/NEJMc1803115. [DOI] [PubMed] [Google Scholar]
- 9.Han D., Li Z., Li R., Tan P., Zhang R. mNGS in clinical microbiology laboratories: on the road to maturity. Crit Rev Microbiol. 2019;45(5-6):668–685. doi: 10.1080/1040841X.2019.1681933. [DOI] [PubMed] [Google Scholar]
- 10.Wylezich C., Papa A., Beer M., Höper D. A versatile sample processing workflow for metagenomic pathogen detection. Sci Rep. 2018;8(1) doi: 10.1038/s41598-018-31496-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Andrusch A, Dabrowski PW, Klenner J, Tausch SH, Kohl C, et al. (2018) PAIPline: pathogen identification in metagenomic and clinical next generation sequencing samples. Bioinformatics 34: i715-i721. [DOI] [PMC free article] [PubMed]
- 12.Naccache S.N., Federman S., Veeraraghavan N., Zaharia M., Lee D. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res. 2014;24(7):1180–1192. doi: 10.1101/gr.171934.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Scheuch M., Höper D., Beer M. RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets. BMC Bioinf. 2015;16(1) doi: 10.1186/s12859-015-0503-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chiu C.Y., Miller S.A. Clinical metagenomics. Nat Rev Genet. 2019;20(6):341–355. doi: 10.1038/s41576-019-0113-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Brinkmann A, Andrusch A, Belka A, Wylezich C, Höper D, et al. (2019) Proficiency Testing of Virus Diagnostics Based on Bioinformatics Analysis of Simulated In Silico High-Throughput Sequencing Data Sets. J Clin Microbiol 57. [DOI] [PMC free article] [PubMed]
- 16.Junier T, Huber M, Schmutz S, Kufner V, Zagordi O, et al. (2019) Viral Metagenomics in the Clinical Realm: Lessons Learned from a Swiss-Wide Ring Trial. Genes (Basel) 10. [DOI] [PMC free article] [PubMed]
- 17.Blauwkamp T.A., Thair S., Rosen M.J., Blair L., Lindner M.S. Analytical and clinical validation of a microbial cell-free DNA sequencing test for infectious disease. Nat Microbiol. 2019;4(4):663–674. doi: 10.1038/s41564-018-0349-6. [DOI] [PubMed] [Google Scholar]
- 18.Miller S., Naccache S.N., Samayoa E., Messacar K., Arevalo S. Laboratory validation of a clinical metagenomic sequencing assay for pathogen detection in cerebrospinal fluid. Genome Res. 2019;29(5):831–842. doi: 10.1101/gr.238170.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wilson M.R., Sample H.A., Zorn K.C., Arevalo S., Yu G. Clinical metagenomic sequencing for diagnosis of meningitis and encephalitis. N Engl J Med. 2019;380(24):2327–2340. doi: 10.1056/NEJMoa1803396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Bal A., Pichon M., Picard C., Casalegno J.S., Valette M. Quality control implementation for universal characterization of DNA and RNA viruses in clinical respiratory samples using single metagenomic next-generation sequencing workflow. BMC Infect Dis. 2018;18(1) doi: 10.1186/s12879-018-3446-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Thoendel M, Jeraldo P, Greenwood-Quaintance KE, Chia N, Abdel MP, et al. (2017) A Novel Prosthetic Joint Infection Pathogen, Mycoplasma salivarium, Identified by Metagenomic Shotgun Sequencing. Clin Infect Dis 65: 332-335. [DOI] [PMC free article] [PubMed]
- 22.Graf E.H., Simmon K.E., Tardif K.D., Hymas W., Flygare S. Unbiased detection of respiratory viruses by use of RNA sequencing-based metagenomics: a systematic comparison to a commercial PCR panel. J Clin Microbiol. 2016;54(4):1000–1007. doi: 10.1128/JCM.03060-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Forth J.H., Tignon M., Cay A.B., Forth L.F., Höper D. Comparative analysis of whole-genome sequence of African swine fever virus Belgium 2018/1. Emerg Infect Dis. 2019;25(6):1249–1252. doi: 10.3201/eid2506.190286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lee J.S., Mackie R.S., Harrison T., Shariat B., Kind T. Targeted enrichment for pathogen detection and characterization in three felid species. J Clin Microbiol. 2017;55(6):1658–1670. doi: 10.1128/JCM.01463-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hasan M.R., Rawat A., Tang P., Jithesh P.V., Thomas E. Depletion of human DNA in spiked clinical specimens for improvement of sensitivity of pathogen detection by next-generation sequencing. J Clin Microbiol. 2016;54(4):919–927. doi: 10.1128/JCM.03050-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Wilson M.R., O’Donovan B.D., Gelfand J.M., Sample H.A., Chow F.C. Chronic meningitis investigated via metagenomic next-generation sequencing. JAMA Neurol. 2018;75(8):947. doi: 10.1001/jamaneurol.2018.0463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Venter J.C., Remington K., Heidelberg J.F., Halpern A.L., Rusch D. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304:66–74. doi: 10.1126/science.1093857. [DOI] [PubMed] [Google Scholar]
- 28.Wendl M.C., Kota K., Weinstock G.M., Mitreva M. Coverage theories for metagenomic DNA sequencing based on a generalization of Stevens’ theorem. J Math Biol. 2013;67(5):1141–1161. doi: 10.1007/s00285-012-0586-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kowalchuk G.A., Speksnijder A.G.C.L., Zhang K., Goodman R.M., van Veen J.A. Finding the needles in the metagenome haystack. Microb Ecol. 2007;53(3):475–485. doi: 10.1007/s00248-006-9201-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Forth L.F., Scholes S.F.E., Pesavento P.A., Jackson K., Mackintosh A. Novel picornavirus in lambs with severe encephalomyelitis. Emerg Infect Dis. 2019;25(5) doi: 10.3201/eid2505.181573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Niller H.H., Angstwurm K., Rubbenstroth D., Schlottau K., Ebinger A. Zoonotic spillover infections with Borna disease virus 1 leading to fatal human encephalitis, 1999–2019: an epidemiological investigation. Lancet Infect Dis. 2020;20(4):467–477. doi: 10.1016/S1473-3099(19)30546-8. [DOI] [PubMed] [Google Scholar]
- 32.Thorburn F., Bennett S., Modha S., Murdoch D., Gunson R. The use of next generation sequencing in the diagnosis and typing of respiratory infections. J Clin Virol. 2015;69:96–100. doi: 10.1016/j.jcv.2015.06.082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Schlaberg R., Chiu C.Y., Miller S., Procop G.W., Weinstock G. Validation of metagenomic next-generation sequencing tests for universal pathogen detection. Arch Pathol Lab Med. 2017;141(6):776–786. doi: 10.5858/arpa.2016-0539-RA. [DOI] [PubMed] [Google Scholar]
- 34.Bennett A.J., Paskey A.C., Ebinger A., Pfaff F., Priemer G. Relatives of rubella virus in diverse mammals. Nature. 2020;586(7829):424–428. doi: 10.1038/s41586-020-2812-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ziegler U, Santos PD, Groschup MH, Hattendorf C, Eiden M, et al. (2020) West Nile Virus Epidemic in Germany Triggered by Epizootic Emergence, 2019. Viruses 12. [DOI] [PMC free article] [PubMed]
- 36.Babraham Bioinformatics FastQC. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
- 37.Foley J (2020) bioanalyzeR: Analysis of Agilent electrophoresis data. R package version 0.5.0.
- 38.Buffalo V (2020) qrqc: Quick Read Quality Control. R package version 1.44.0. http://github.com/vsbuffalo/qrqc
- 39.RStudio Team (2020) RStudio: Integrated Development for R. Version 1.2.5042. RStudio, Inc., Boston, MA. http://www.rstudio.com/.
- 40.R Core Team (2020) R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. https://www.R-project.org/.
- 41.Eiden M., Vina-Rodriguez A., Hoffmann B., Ziegler U., Groschup M.H. Two new real-time quantitative reverse transcription polymerase chain reaction assays with unique target sites for the specific and sensitive detection of lineages 1 and 2 West Nile virus strains. J Vet Diagn Invest. 2010;22(5):748–753. doi: 10.1177/104063871002200515. [DOI] [PubMed] [Google Scholar]
- 42.Bustin SA, Benes V, Garson JA, Hellemans J, Huggett J, et al. (2009) The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin Chem 55: 611-622. [DOI] [PubMed]
- 43.Compiled by McNaught AD, Wilkinson A (1997) IUPAC. Compendium of Chemical Terminology, 2nd ed. (the “Gold Book”). Online version (2019-) created by Chalk SJ. ISBN 0-9678550-9-8. https://doi.org/10.1351/goldbook.https://doi.org/10.1351/goldbook.
- 44.Harrell FE Jr. (2020) Hmisc: Harrell Miscellaneous. R package version 4.4-0. https://CRANR-projectorg/package=Hmisc.
- 45.Wei T, Simko V (2017) R package “corrplot”: Visualization of a Correlation Matrix (Version 0.84). https://githubcom/taiyun/corrplot.
- 46.Yang J., Yang F., Ren L., Xiong Z., Wu Z. Unbiased parallel detection of viral pathogens in clinical samples by use of a metagenomic approach. J Clin Microbiol. 2011;49(10):3463–3469. doi: 10.1128/JCM.00273-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Lefkowitz E.J., Dempsey D.M., Hendrickson R.C., Orton R.J., Siddell S.G. Virus taxonomy: the database of the International Committee on Taxonomy of Viruses (ICTV) Nucl Acids Res. 2017;46:D708–D717. doi: 10.1093/nar/gkx932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Van Borm S., Fu Q., Winand R., Vanneste K., Hakhverdyan M. Evaluation of a commercial exogenous internal process control for diagnostic RNA virus metagenomics from different animal clinical samples. J Virol Methods. 2020;283:113916. doi: 10.1016/j.jviromet.2020.113916. [DOI] [PubMed] [Google Scholar]
- 49.Zinter M.S., Mayday M.Y., Ryckman K.K., Jelliffe-Pawlowski L.L., DeRisi J.L. Towards precision quantification of contamination in metagenomic sequencing experiments. Microbiome. 2019;7(1) doi: 10.1186/s40168-019-0678-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Davis N.M., Proctor D.M., Holmes S.P., Relman D.A., Callahan B.J. Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data. Microbiome. 2018;6(1) doi: 10.1186/s40168-018-0605-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Schindler A.R., Vögtlin A., Hilbe M., Puorger M., Zlinszky K. Reverse transcription real-time PCR assays for detection and quantification of Borna disease virus in diseased hosts. Mol Cell Probes. 2007;21(1):47–55. doi: 10.1016/j.mcp.2006.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Schwaber J., Andersen S., Nielsen L. Shedding light: the importance of reverse transcription efficiency standards in data interpretation. Biomol Detect Quantif. 2019;17:100077. doi: 10.1016/j.bdq.2018.12.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Marine R.L., Magaña L.C., Castro C.J., Zhao K., Montmayeur A.M. Comparison of Illumina MiSeq and the Ion Torrent PGM and S5 platforms for whole-genome sequencing of picornaviruses and caliciviruses. J Virol Methods. 2020;280:113865. doi: 10.1016/j.jviromet.2020.113865. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Forth L.F., Höper D. Highly efficient library preparation for Ion Torrent sequencing using Y-adapters. Biotechniques. 2019;67(5):229–237. doi: 10.2144/btn-2019-0035. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.