Abstract
Background
Prosthetic joint infections are clinically difficult to diagnose and treat. Previously, we demonstrated metagenomic sequencing on an Illumina MiSeq replicates the findings of current gold standard microbiological diagnostic techniques. Nanopore sequencing offers advantages in speed of detection over MiSeq. Here, we report a real-time analytical pathway for Nanopore sequence data, designed for detecting bacterial composition of prosthetic joint infections but potentially useful for any microbial sequencing, and compare detection by direct-from-clinical-sample metagenomic nanopore sequencing with Illumina sequencing and standard microbiological diagnostic techniques.
Results
DNA was extracted from the sonication fluids of seven explanted orthopaedic devices, and additionally from two culture negative controls, and was sequenced on the Oxford Nanopore Technologies MinION platform. A specific analysis pipeline was assembled to overcome the challenges of identifying the true infecting pathogen, given high levels of host contamination and unavoidable background lab and kit contamination.
The majority of DNA classified (> 90%) was host contamination and discarded. Using negative control filtering thresholds, the species identified corresponded with both routine microbiological diagnosis and MiSeq results. By analysing sequences in real time, causes of infection were robustly detected within minutes from initiation of sequencing.
Conclusions
We demonstrate a novel, scalable pipeline for real-time analysis of MinION sequence data and use of this pipeline to show initial proof of concept that metagenomic MinION sequencing can provide rapid, accurate diagnosis for prosthetic joint infections. The high proportion of human DNA in prosthetic joint infection extracts prevents full genome analysis from complete coverage, and methods to reduce this could increase genome depth and allow antimicrobial resistance profiling. The nine samples sequenced in this pilot study have shown a proof of concept for sequencing and analysis that will enable us to investigate further sequencing to improve specificity and sensitivity.
Electronic supplementary material
The online version of this article (10.1186/s12864-018-5094-y) contains supplementary material, which is available to authorized users.
Keywords: Nanopore, Prosthetic joint infection, Metagenomics, Real-time, Clinical, Device-related infection
Background
Infection remains a feared and devastating complication of orthopaedic implant surgery. It occurs in up to 2% of prosthetic joint replacements [1] and may present several years after implantation [2]. Recent studies in England of joint revisions undertaken for infection report an increase in prevalence for both knee and hip revisions between 2003 and 2014 (2.5-fold and 7.5-fold and 2.3-fold and 3.0-fold increase following primary and revision knee and hip replacements respectively [3, 4]). It has been estimated that in the USA, there will be more than 65,500 infected joint replacements per year by 2020 [5]. Improvements in speed and accuracy of diagnosis may improve outcomes following revision surgery by allowing more targeted therapy. PJI diagnosis can be challenging as infections may be associated with biofilms that colonise the orthopaedic devices [6], with a small but potentially problematic number caused by fastidious or slow-growing organisms that are not detectable by culture or from patients who have received prior antibiotics. Although culture of multiple periprosthetic tissue (PPT) samples remains the gold standard for microbial detection, it is relatively insensitive, with only approximately 65% of causative bacteria detected even when multiple PPT samples are collected [7–9].
Development of molecular methods, such as 16 s rRNA sequencing, can be more sensitive in detection of PJI [10]. An alternative is the use of metagenomic shotgun sequencing that can detect full bacteria genomes directly from a sample. Sequencing directly from samples can provide accurate diagnostic information for PJIs when compared to laboratory culture and can also detect additional organisms [11, 12] and potentially provide additional information such as presence of antimicrobial resistance genes [12].
Using third generation sequencing technology, developed by Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio), longer read lengths in faster turnarounds are possible. The ONT MinION potentially could allow analysis to be conducted in real-time with obvious advantages to clinical diagnosis of infection. Examples of metagenomic pathogen studies using MinION include viral detection from serum [13] and bacteria from urines [12]. These previous studies have shown proof-of- principle for direct from sample clinical sequencing using ONT MinION. However, PJI sequencing has a further challenge of high human DNA contamination which require specific laboratory preparation and bioinformatic analyses to overcome. A previous study using ONT MinION sequencing to identify pathogens within highly human DNA contaminated pleura effusion samples used 16 s rDNA sequencing [14]. This proved quick identification was possible in high host DNA samples but could not provide further genomic information.
Here we describe proof-of-principle for the use of ONT MinION sequencing for the diagnosis of PJI when compared to standard microbiological culture and Illumina sequencing. We describe an analysis work-flow that differentiates between predicted infection species and background contamination and can be run during sequencing for real-time species detection.
Methods
Samples
Samples used in this study were collected by the Bone Infection Unit at the Nuffield Orthopaedic Centre (NOC) in Oxford University Hospitals (OUH), UK, as previously described [11]. Nine samples previously assessed by Illumina MiSeq sequencing were chosen for further analysis by ONT MinION sequencing. Samples were chosen from the remaining DNA extracts that had sufficient DNA to either be sequenced directly, or amplified and sequenced, and to represent a range of disparate species and compositions.
DNA preparation and sequencing
Libraries were prepared for sequencing on an Oxford Nanopore MinION (Oxford Nanopore Technologies (ONT)) using genomic DNA previously extracted from sonication fluid samples [11]. Samples 259, 312, 335, 352 and 354 were prepared using the 1D genomic DNA by ligation protocol (SQK-LSK108) (ONT). Samples 229, 249, 506 and 509 had insufficient DNA for this protocol so were prepared using either a PCR-based protocol for low input genomic DNA with modified primers (DP006_revB_14Aug2015), followed by rapid sequencing adapter ligation (ONT) (sample 229) or the 1D low input genomic DNA with PCR protocol (SQK-LSK108) (ONT) (samples 249, 506 and 509). Briefly, the protocols comprise DNA end-repair and dA-tailing (NEBNext Ultra II End Repair/dA-Tailing Module, New England Biolabs (NEB), Ipswich, MA, USA) followed by purification using AMPure XP solid phase reversible immobilisation (SPRI) beads (Beckman Coulter, High Wycombe, UK); Sequencing adapter ligation (Blunt/TA Ligase Master Mix, NEB) followed by additional SPRI bead purification. For the samples with insufficient DNA requiring PCR amplification, additional steps between end-repair and sequencing adapter ligation included; PCR adapter ligation (Blunt/TA Ligase Master Mix, NEB) followed by SPRI bead purification; PCR amplification (Phusion High Fidelity PCR Master Mix, NEB) with 18 cycles (samples 229 and 249) or 24 cycles (samples 506 and 509) followed by additional SPRI bead purification. Samples were sequenced on FLO-MIN105 (v.R9) (sample 229) or FLO-MIN106 (v.R9.4) (all other samples) SpotON flowcells.
PCR analysis of sample 354a
Quantitative real-time PCR (q-PCR) was performed for sample 354a to determine relative amounts of both Arcanobacterium haemolyticum and Fusobacterium nucleatum DNA in the original sonication fluid genomic DNA extract. qPCR was performed on a Stratagene MX3005P QPCR System (Agilent Technologies, Santa Clara, CA, USA) using Luna Universal Probe qPCR Master Mix (New England Biolabs, Ipswich, MA, USA). For A. haemolyticum, primers and probe were designed to target the phospholipase D gene: forward primer ATGTACGACGATGAAGACGCG (previously published, [15]), reverse primer TTGATTGCGTCATCGACACT, probe [6FAM]-TTGGTAGTGCGGCTGCTGCGCC-[TAM]. For F. nucleatum, primers and probe were designed to target the nusG gene: forward primer CAGCAACTTGTCCTTCTTGATCC, reverse primer CTGGATTTGTAGGAGTTGGTTC, probe [6FAM]-AGACCCTATTCCTATGGAAGAGGAAGAAGTA-[TAM]. Reactions were performed in 20 μl with 2ul of template DNA, 0.4 μM of each primer and 0.2 μM of the probe. Cycling conditions were an initial denaturation at 95 °C for 1 min, followed by 40 cycles of 95 °C denaturation for 15 s and 60 °C extension for 30 s. Genomic DNA, extracted from cultures of A. haemolyticum (Type Strain NCTC 8452) and F. nucleatum subspecies vincentii (Type Strain ATCC 49256), was diluted to 100,000 genome copies per μl then serially diluted to 10 genome copies per μl and used to create copy number standard curves for both species. Negative controls, replacing template DNA with water, were also performed. All reactions were performed in triplicate.
Bioinformatics analysis
We assembled an analysis pipeline for detection of bacterial pathogens using ONT MinION sequencing of orthopaedic device infections. The pipeline includes filtering steps for the genetic sequence data that have been tuned on seven positive samples with known infections and two culture negative samples.
The analysis was performed within a Nextflow workflow [16] with the software contained within a Singularity [17] image generated from a Docker repository [18]. This workflow and software are available for public use, [19], with our intention for the analysis to be reproducible or replicable with other datasets on most systems.
The workflow, CRuMPIT, has three major components, as shown in Fig. 1. The first monitors the output of a MinION device or devices and creates batches of fast5 files (default 1000) as they are written to a storage drive location, Fig. 1 (a,b). The second receives the fast5 files and uses a Nextflow workflow that basecalls data to be classified and aligns them to specific reference sequences with results pushed to a database, Fig. 1(c). Thirdly, analysis results including species identified, are determined and continually updated as the run progresses, Fig. 1(d).
During the progression of this project, ONT have released several different software applications for basecalling, with each version improving accuracy [20]; we used the most up to date and reliable version at the time of sequencing. Basecalling from the fast5 files used different versions of either Metrichor (dragonet), MinKNOW-Live or ONT Albacore, Table 1. Fastq files were generated from the Metrichor or MinKNOW basecalled fast5 files using fast5watcher.py (commit b88e14a) [21] for downstream analysis. Albacore is now used as the basecaller within the CRuMPIT workflow, with sequences basecalled directly to fastq files for analysis. Experimental use of Guppy (ONT developer access required, version 0.3.0) as a basecaller was performed to compare speeds. An additional Porechop [22] (v0.2.3) step for de multiplexing barcodes was added for use with Guppy.
Table 1.
Sample | Basecaller | Software version |
---|---|---|
229a | Metrichor (dragonet) | 1.22.4 |
249a | MinKNOW-Live-Basecalling | 1.4.3 |
259a | MinKNOW-Live-Basecalling | 1.3.30 |
312a | Metrichor (dragonet) | 1.23.0 |
335a | Metrichor (dragonet) | 1.23.0 |
352a | MinKNOW-Live-Basecalling | 1.1.21 |
354a | MinKNOW-Live-Basecalling | 1.1.20 |
509a | ONT Albacore Sequencing Software | 1.1.0 |
506a | ONT Albacore Sequencing Software | 1.1.0 |
To minimise spurious read classifications caused by repeat regions, sequences within the fastq files were separated based on molecular complexity, with only high complexity reads analysed further. Complexity was calculated using a dust score threshold of seven with prinseq-lite-0.20.4 [23] which removes reads containing sequences consisting only of homopolymer, dipolymer and triploymer repeats.
Centrifuge [24] was used to classify sequencing reads to a taxonomic identifier. We used Centrifuge instead of Kraken [25] for this analysis because the initial starting match uses kmers of length 16, which is more suited to the Nanopore error profile compared to Kraken where databases are built with a default kmer size of 31. Additionally, the Centrifuge indexes require significantly less storage and memory compared to Kraken. A Centrifuge index [24] was constructed using bacterial and viral genomes downloaded from NCBI RefSeq as of 03-March-2017, and the human reference genome (GRCh38). Low complexity regions with a dust score greater than 20 in the reference sequences were masked using dustmasker (v 1.0.0, NCBI). Alternatively, the precompiled “P_compressed_b + v + h” available to download from the Centrifuge authors was also used, yielding very similar results to our database. We used our database for this analysis because it is a more recent and complete dataset. However, for ease of reproducibility, the precompiled databases can also be used.
Sequences with a taxonomic id, or a descendant, that belonged to a list of bacterial reference genome sequences downloaded from NCBI RefSeq, were mapped using minimap2 [26] (v2.2-r409). To be considered for detection, bacterial species were first classified by Centrifuge with a score of 150 or greater with over 10% of the classified bacterial bases. The score of 150 was chosen as a suitable cutoff after several thresholds were tested, Additional file 1: Figure S1. To remove spurious hits and background lab contamination, species were reported if they accounted for over 10% of the classified bacterial bases by Centrifuge which also removed the majority of negative control hits, Additional file 2: Figure S2. Alternatively, a read number threshold could have been chosen, however the margin of proportional read numbers was deemed too narrow between positive samples and negative controls. Therefore, a further mapping step was added to validate the Centrifuge classification.
To be confirmed as a positive the mapped reads required a mapping quality score (mapq) of 50 or above and had to account for greater than 1% of the classified bacterial bases. Mapq 50 was used to ensure high quality alignments and helped to remove any remaining indiscriminate alignments, Additional file 3: Figure S3. The 1% bases threshold was used after plotting bases over reads for positive samples and negative controls, Additional file 4: Figure S4. However, if a detection species meets these criteria, the mapped reads can have any Centrifuge score and are included in further analysis. Therefore, more reads can be included if mapping provides satisfactory alignment over Centrifuge classification. This filtering method was tuned to remove all hits from the negative controls but leave as many validated positive detection species reads as possible. It is therefore a heuristic method and can be tuned with greater power when more samples have been processed.
The entire workflow was run in Nextflow [16] with the software contained inside a Singularity [17] image. This has enabled the entire pipeline to run on a distributed cluster (SLURM [27]) with the flexibility to run on other platforms including locally on a single computer. A SLURM cluster was setup and used to handle the high computational demands of basecalling with Albacore, with the remaining pipeline requiring less computer time to complete. The cluster setup was built from a head node and four worker nodes with a total of 21 worker cores. Centrifuge was only run on two of the nodes, each with at least 16gb of memory. The workflow can be run in real time and detect new fast5 files from a MinION sequencing run, process them and push the data to a MongoDB database for analysis.
Results
Sample composition after analysis
Nine samples previously sequenced with an Illumina MiSeq were sequenced using the Oxford Nanopore MinION platform. Seven samples were extracted from bacterial culture positive sonication fluid. The remaining two samples, extracted from culture negative sonication fluid, were used as negative controls. Between 0.2 and 2.8 gigabases were basecalled for each sequencing run, with read lengths averaging between 500 bp and 1.7 kb (Table 2).
Table 2.
Total | Mean | Median | Low complexity | Human | Bacteria | |||||
---|---|---|---|---|---|---|---|---|---|---|
Sample | Bases | Reads | read length | read length | bases | reads | bases | reads | bases | reads |
229a | 204,346,556 | 124,218 | 1645.06 | 1745 | 113,836 | 69 | 198,972,861 | 117,821 | 1,692,097 | 914 |
249a | 723,925,562 | 585,098 | 1237.27 | 1006 | 403,668 | 370 | 563,888,189 | 411,612 | 44,502,912 | 34,773 |
259a | 1,057,865,247 | 600,291 | 1762.25 | 1321 | 390,209 | 289 | 949,663,786 | 502,426 | 512,827 | 312 |
312a | 1,121,119,742 | 1,004,818 | 1115.74 | 674 | 1,905,423 | 1044 | 1,038,235,876 | 882,763 | 30,426,198 | 14,948 |
335a | 2,847,687,425 | 1,717,810 | 1657.74 | 1171 | 2,835,054 | 1362 | 2,783,128,118 | 1,605,466 | 1,388,748 | 989 |
352a | 803,638,340 | 986,867 | 814.33 | 609 | 567,656 | 630 | 669,796,136 | 752,022 | 459,779 | 579 |
354a | 706,380,170 | 945,929 | 746.76 | 596 | 680,560 | 848 | 570,485,740 | 717,662 | 2,151,551 | 2443 |
509a | 2,740,060,527 | 4,940,241 | 554.64 | 439 | 16,355,839 | 24,413 | 1,199,779,866 | 1,352,438 | 6,240,425 | 2628 |
506a | 2,451,399,949 | 4,700,013 | 521.57 | 431 | 20,014,343 | 23,631 | 1,161,796,584 | 1,671,726 | 4,705,919 | 2139 |
Bacteria, Human with a centrifuge score greater than 150, and total reads including unclassified reads. Samples 509a and 506a are culture negatives and used as negative controls. Results are after removing low complexity reads
The majority of classified reads were human, Table 2, with a range of 80% to 97% of bases in the sequenced culture positive samples coming from host contamination. A range of 0.04% to over 6% of bases were classified as bacterial by Centrifuge in the culture positive samples, Table 2.
Our analysis workflow identified one or more bacterial species per sample, with the exception of the two culture negative samples, 509a and 506a (Table 3). One sample, 354a, was polymicrobial, with Enterococcus faecalis, Arcanobacterium haemolyticum and Fusobacterium nucleatum identified. Two species of the same genus, Bacillus cereus and Bacillus thuringiensis, were identified in sample 352a. All other samples had only a single bacterial species identified.
Table 3.
Sample | ONT minion species | TaxID | Mapped reads (% of identified bacterial) | Mapped bases (% of identified bacterial) | Sonication species | Tissue culture species | MiSeq reads (% of bacterial) |
---|---|---|---|---|---|---|---|
229a | Staphylococcus aureus | 1280 | 815 (89) | 1,912,820 (113) | S. aureus | S. aureus | 6038 (98) |
249a | Cutibacterium acnes | 1747 | 23,500 (68) | 29,443,269 (66) | P. acnes | P. acnes | 108,940 (100) |
259a | Staphylococcus epidermidis | 1282 | 155 (50) | 223,611 (44) | S. epidermidis | S. epidermidis | 749 (86) |
312a | Citrobacter koseri | 545 | 11,629 (78) | 24,631,203 (81) | C. koseri | C. koseri | 221,516 (95) |
335a | Morganella morganii | 582 | 613 (62) | 515,991 (37) | M. morganii | M. morganii | 3555 (94) |
352a | Bacillus thuringiensis | 1428 | 41 (7) | 27,026 (6) | Bacillus species | Bacillus species | 1109 (86*) |
Bacillus cereus | 1396 | 119 (21) | 85,627 (19) | ||||
354a | Arcanobacterium haemolyticum | 28,264 | 584 (24) | 547,413 (25) | A. haemolyticum | 11,182 (72) | |
Fusobacterium nucleatum | 851 | 529 (22) | 493,717 (23) | 1156 (7) | |||
Enterococcus faecalis | 1351 | 225 (9) | 223,665 (10) | E. faecalis | E. faecalis | 1173 (8) | |
506a | Non detected | No growth | No growth | Non detected | |||
509a | Non detected | No growth | No growth | Non detected |
Samples 509a and 506a are culture negatives and used as negative controls, no bacterial species were detected after filtering thresholds were used. Species detected from sonication fluid, tissue culture and MiSeq sequence analysis using Kraken. Adapted from [11]. (*) indicates % of bacterial reads taken from the Bacillus cereus group level (taxonomic id of 86,661)
The results from ONT MinION sequencing correspond with previously published analysis of the same samples by conventional microbiology culture and metagenomic Illumina MiSeq sequencing, Table 2 [11]. A notable difference between the two molecular analyses can be seen in sample 352a, where ONT MinION sequencing enabled species level detection. The Illumina short read sequencing identified Bacillus spp. only (agreeing with the corresponding culture results) whereas ONT MinION sequencing identified two species from the Bacillus cereus group: Bacillus cereus and Bacillus thuringiensis. It is worth noting that speciation within the Bacillus cerus group is problematic as species within this group share a high level of genome sequence identity [28]. Further investigation would be required to determine whether both species are actually present in this sample.
Another difference observed between the two sequencing techniques is in sample 354a, and concerns the relative abundance of sequencing reads/bases for the multiple species classified in this polymicrobial sample. The Illumina MiSeq sequencing identified A. haemolyticum as the most abundant species, at 72% of bacterial reads, with F. nucleatum representing 7% of bacterial reads. However, ONT MinION sequencing classified very similar base numbers for both F. nucleatum and A. haemolyticum (493,717 and 547,413 bases respectively) We speculated that this observed difference in proportions of reads for the F. nucleatum and A. haemolyticum was caused by platform sequencing bias, possibly as a result of variable genome GC content: The A. haemolyticum genome is 54% GC, compared to 27% for F. nucleatum. We used qPCR to test our hypothesis, and investigate which platform represents an estimate of genome abundance of these two species that is closest to the original DNA extract from sample 354a. qPCR results detected approximately equal copy numbers of both A. haemolyticum and F. nucleatum genomes in the original DNA extract, suggesting that ONT MinION sequencing has given a more accurate representation of species abundance in sample 354a, Table 4. However, standard deviations were high therefore further investigation will be needed to confirm this.
Table 4.
Species | Std curve RSq | Efficiency | Replicate | CT | Copies | Average ± Std Dev |
---|---|---|---|---|---|---|
Arcanobacterium haemolyticum | 0.991 | 89.20% | 1 | 29.12 | 2356 | 3214 ± 965 |
2 | 28.72 | 3028 | ||||
3 | 28.19 | 4258 | ||||
Fusobacterium nucleatum | 0.999 | 86.00% | 1 | 28.93 | 4269 | 3421 ± 1304 |
2 | 30.22 | 1919 | ||||
3 | 29.01 | 4075 |
Real time analysis
Using the ONT MinION platform, it was possible to analyse sequences in real-time, and predict the species composition of culture positive samples minutes after data acquisition. Samples containing a larger yield of bacterial DNA, such as 354a and 249a, produced several hundred kilobases of sequences within the first two of hours, Fig. 2a, b. Samples with lower yields, such as 352a, produced less sequence data, with several kilobases generated in the first 2 hours, Fig. 2c. For all the species identified that passed the analysis thresholds, however, the sequences generated after data acquisition were consistent with the species identified by traditional culture methods and MiSeq sequencing, Fig. 3. Each batch analysed within the Nextflow workflow took between four and fifteen minutes to process using a single core, depending on which node the job was submitted to, Additional file 5: Figure S5a. Therefore, real-time in this context needs to include this bioinformatics analysis time, the majority of which is basecalling. Encouragingly, basecalling speed was improved dramatically by using Guppy and utilising the graphics card of a single local PC, Additional file 5: Figure S5b. This enabled CRuMPIT analysis to be fully conducted on a single computer and time to detection more than halved.
Discussion
Here we demonstrate proof-of-principle that long-read sequencing using the ONT MinION can detect bacterial infections from DNA extracted directly from sonication fluid samples, and potentially do so within minutes of starting sequencing. If DNA extraction techniques can be similarly optimised, these technologies have the potential to make intra-operative diagnosis of the causes of specific infections possible. This would allow both local and systemic antibiotics to be targeted to the causative organisms in prosthetic joint infections, starting at the time of surgery.
Analysis of the MinION data indicates concordance with the current gold standard laboratory culture and also Illumina short-read sequencing. In addition, we present a new analytical tool, CRuMPIT, which automates analysis of MinION data in this setting, and could be applied by other researchers and clinicians. By using negative controls we were able to determine signatures of background contamination - a challenge to diagnostic metagenomic interpretation [11, 29]. The thresholds and scores used within our bioinformatics workflow were determined after sequencing two negative controls that allow us to create heuristic thresholds to remove background sequences from kit contamination and false positives without masking the infection species. It will be important to determine the limits of detection for bacterial DNA in high host contaminated samples. Future studies will involve sequencing more samples and-spiked in references so refined threshold scores can be determined. This can be done as before with a Youden Index and J-statistic [11]. Sensitivity and specificity of MinION cannot be determined from this study and therefore further, more extensive studies are required before use in a routine diagnostic microbiology laboratory can be recommended.
Although we were able to predict each species present within the sequenced samples, the vast majority of DNA sequenced was human, from host contamination, despite efforts to reduce this in the laboratory preparation. Depletion of host DNA contamination will facilitate greater pathogen genome sequencing coverage but this continues to present challenges as the numbers of bacterial cells in joint infections is low [7] in relation to human cells. Previous studies with ONT MinION on direct clinical samples have used samples with relatively high concentrations of bacteria in urine [30] (compared to PJI samples) or moderate to high viral titres in blood [13]. The MinION has also been used for metagenomics in environmental samples [31]. However, reduction of human DNA could allow better genotyping, transmission analysis and antimicrobial resistance gene prediction as the proportion of bacterial DNA increases. Currently, this depends on laboratory development to reduce the number of human cells in samples rather than downstream bioinformatic analysis.
The sequencing yields here were low compared to other ONT MinION sequencing yields sequenced within the same lab (data not shown). DNA read lengths sequenced in this project are also relatively short, with the average under 1 kilobase, where mean read lengths can be expected over 10 kilobases with this method. This is likely due to the DNA extraction methods used, as they were optimised for MiSeq sequencing. However, of the four samples processed by PCR due to low DNA concentration, there was variation in read length and depth ranging from highest to lowest.
There are known biases for organisms associated with GC content in using PCR-based methods for sample preparation [32] and with Illumina metagenomic data [33]. We found some evidence that MinION sequencing may better reflect the relative abundance of pathogen DNA in polymicrobial infections, as it appeared less prone to GC biases than Illumina MiSeq short-read sequencing.
Detection of the species was possible within minutes of the sequencing run starting, and this includes the time required to process the sequencing data, with basecalling being the biggest bottleneck. The fast5 file batch size has an effect on turnaround time and reducing batch sizes is preferable for longer reads that take more time to basecall. We have tested the pipeline on a single PC and on a SLURM cluster on the same network as the computers running the MinION sequencers, enabling us to scale to the rate of sequencing and basecall with greater throughput than we could with a single machine, and analyse multiple sequencing runs in parallel.
A limitation of this study was seen in runs where reads were live basecalled with the MinKNOW basecaller: the runs produced data too quickly for the system to keep up. Retrospective basecalling was not possible at the time and the skipped reads have since been discarded. Therefore, in future studies using Albacore, as is the case with the most recent two sequencing runs (506a and 509a), we expect the average DNA yields to increase, which will aid species classification and potential genome completion.
The ONT MinION sequencing process has undergone continual development with substantial improvements since this project began. Therefore, we have used three different basecallers, Metrichor, MinKNOW and Albacore, for converting the raw signal or event data to DNA sequences. It is possible to rebasecall some of this data, but as we no longer have access to some sample raw data files, we cannot rebasecall all the samples. Also, as this would not reflect the real-time analysis carried out, we have not rebasecalled all samples with the same software version. Future studies should continue to use the most accurate, current, and efficient basecaller for real-time analysis. Furthermore, as ONT routinely updates protocols and computational tools, the impact on clinical diagnostics would need to be constantly evaluated and tested to achieve and maintain accreditation.
Although analysis of the sequencing is close to real-time, the DNA extraction and library preparation takes several hours, with 1D ligation preparation currently taking approximately 70 min or PCR amplification taking 150 mins. There are rapid library preparation kits available, however we feel the sequencing yield is currently too low for these to be a viable route to detection of pathogens directly from samples, particularly in samples with high host contamination. In addition, future studies will need to replicate samples to show this process is reproducible. This project was a proof of concept, but to be cost effective in the future, multiplexing of samples, smaller cheaper flowcells or reusable/washable flowcells may need to be employed.
Conclusions
The study shows reliable detection of infection species composition in prosthetic joint infections using ONT MinION sequencing. This represents proof of concept for utilising real time ONT MinION sequencing for PJI diagnostics. The speed of detection indicates that this technology has the potential to deliver results to the clinician in a timelier manner than traditional microbiological methods. Reduction of diagnostic time could have a significant positive influence on patient outcome, allowing prompt, targeted antimicrobial therapy.
The development of a reproducible workflow, as described in this study, has potential use for any clinical sample metagenomic ONT MinION sequencing, not just sonication fluids. The software used for analysis is provided [19] and can be installed and run locally or in a distributed cluster to scale with throughput. The controlling and analysis of the workflow is written in a python3 wrapper that relies on open source tools including, pysam [34], Biopython [35], Pandas [36], Matplotlib [37], ETE3 toolkit [38] and Numpy [39].
Additional files
Acknowledgements
The authors thank the microbiology laboratory staff of the John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, for providing assistance with sample collection and processing. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.
Funding
The research was funded by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC). DWC and TEAP are NIHR Senior Investigators. DWE is a NIHR Clinical Lecturer. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Availability of data and materials
The raw FASTQ sequencing data from this publication are archived to EBI ENA with the bioproject identifier, PRJEB23460, and available to download. We hope to make the constructed centrifuge index available to download when we find a host for it and future iterations, until then please contact the corresponding author for access.
Abbreviations
- A. haemolyticum
Arcanobacterium haemolyticum
- CRuMPIT
Clinical Real-time Metagenomics Pathogen Identification Test
- DNA
Deoxyribonucleic acid
- F. nucleatum
Fusobacterium nucleatum
- GC-content
Guanine-cytosine content
- MA
Massachusetts
- NEB
New England Biolabs
- NOC
Nuffield Orthopaedic Centre
- ONT
Oxford Nanopore Technologies
- OUH
Oxford University Hospitals
- PacBio
Pacific Biosciences
- PCR
Polymerase chain reaction
- PJI
Prosthetic joint infections
- PPT
multiple periprosthetic tissue
- q-PCR
Quantitative real-time PCR
- SLURM
Simple Linux Utility for Resource Management
- SPRI
Solid phase reversible immobilisation
Authors’ contributions
NDS performed bioinformatic analysis including creating CRuMPIT and wrote the manuscript. TLS performed all molecular laboratory work including DNA extraction, library preparation, sequencing run and qPCR. TLS,DF,TEAP and DWE helped prepare manuscript. JS helped setup computational infrastructure. TEAP, DWE and DWC contributed intellectually to the direction of the project and helped finish manuscript. BLA,AJB,MAM,SO and AT provided and processed biological samples. All authors read and approved the final version of the manuscript.
Ethics approval and consent to participate
For this study, no ethical review was required, because the study was a laboratory method development study focusing on bacterial DNA extracted from discarded samples identified only by laboratory numbers, with no personal or identifiable data. Sequencing reads identified as human on the basis of Centrifuge were counted and immediately discarded.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Matthews P. C, Berendt A. R, McNally M. A, Byren I. Diagnosis and management of prosthetic joint infection. BMJ. 2009;338(may29 1):b1773–b1773. doi: 10.1136/bmj.b1773. [DOI] [PubMed] [Google Scholar]
- 2.Huotari K, Peltola M, Jamsen E. The incidence of late prosthetic joint infections: a registry-based study of 112,708 primary hip and knee replacements. Acta Orthop. 2015;86:321–325. doi: 10.3109/17453674.2015.1035173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lenguerrand E, Whitehouse MR, Beswick AD, Jones SA, Porter ML, Blom AW. Revision for prosthetic joint infection following hip arthroplasty: evidence from the National Joint Registry. Bone Jt Res. 2017;6:391–398. doi: 10.1302/2046-3758.66.BJR-2017-0003.R1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Lenguerrand E, Whitehouse MR, Beswick AD, Toms AD, Porter ML, Blom AW, et al. Description of the rates, trends and surgical burden associated with revision for prosthetic joint infection following primary and revision knee replacements in England and Wales: an analysis of the National Joint Registry for England, Wales, Northern Ire. BMJ Open. 2017;7:e014056. doi: 10.1136/bmjopen-2016-014056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Kurtz Steven M., Lau Edmund, Watson Heather, Schmier Jordana K., Parvizi Javad. Economic Burden of Periprosthetic Joint Infection in the United States. The Journal of Arthroplasty. 2012;27(8):61-65.e1. doi: 10.1016/j.arth.2012.02.022. [DOI] [PubMed] [Google Scholar]
- 6.Rochford ET, Richards RG, Moriarty TF. Influence of material on the development of device-associated infections. Clin Microbiol Infect. 2012;18:1162–1167. doi: 10.1111/j.1469-0691.2012.04002.x. [DOI] [PubMed] [Google Scholar]
- 7.Atkins BL, Athanasou N, Deeks JJ, Crook DW, Simpson H, Peto TE, et al. Prospective evaluation of criteria for microbiological diagnosis of prosthetic-joint infection at revision arthroplasty. The OSIRIS collaborative study group. J Clin Microbiol. 1998;36:2932–2939. doi: 10.1128/jcm.36.10.2932-2939.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Osmon DR, Berbari EF, Berendt AR, Lew D, Zimmerli W, Steckelberg JM, et al. Diagnosis and management of prosthetic joint infection: clinical practice guidelines by the Infectious Diseases Society of America. Clin Infect Dis. 2013;56:e1–25. doi: 10.1093/cid/cis803. [DOI] [PubMed] [Google Scholar]
- 9.Bejon P, Berendt A, Atkins BL, Green N, Parry H, Masters S, et al. Two-stage revision for prosthetic joint infection: predictors of outcome and the role of reimplantation microbiology. J Antimicrob Chemother. 2010;65:569–575. doi: 10.1093/jac/dkp469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Marín M, Garcia-Lechuz JM, Alonso P, Villanueva M, Alcalá L, Gimeno M, et al. Role of universal 16S rRNA gene PCR and sequencing in diagnosis of prosthetic joint infection. J Clin Microbiol. 2012;50:583–589. doi: 10.1128/JCM.00170-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Street TL, Sanderson ND, Atkins BL, Brent AJ, Cole K, Foster D, et al. Molecular diagnosis of orthopaedic device infection direct from sonication fluid by metagenomic sequencing. J Clin Microbiol. 2017;55(8):2334–2347. doi: 10.1128/JCM.00462-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ruppe E, Lazarevic V, Girard M, Mouton W, Ferry T, Laurent F, et al. Clinical metagenomics of bone and joint infections: a proof of concept study. Sci Rep. 2017;7:7718. doi: 10.1038/s41598-017-07546-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Greninger AL, Naccache SN, Federman S, Yu G, Mbala P, Bres V, et al. Rapid metagenomic identification of viral pathogens in clinical samples by real-time nanopore sequencing analysis. Genome Med. 2015;7:99. doi: 10.1186/s13073-015-0220-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Mitsuhashi S, Kryukov K, Nakagawa S, Takeuchi JS, Shiraishi Y, Asano K, et al. A portable system for rapid bacterial composition analysis using a nanopore-based sequencer and laptop computer. Sci Rep. 2017;7:5657. doi: 10.1038/s41598-017-05772-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Hassan AA, Ülbegi-Mohyla H, Kanbar T, Alber J, Lämmler C, Abdulmawjood A, et al. Phenotypic and genotypic characterization of arcanobacterium haemolyticum isolates from infections of horses. J Clin Microbiol. 2009;47:124–128. doi: 10.1128/JCM.01933-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017;35:316–319. doi: 10.1038/nbt.3820. [DOI] [PubMed] [Google Scholar]
- 17.Kurtzer GM, Sochat V, Bauer MW. Singularity: scientific containers for mobility of compute. PLoS One. 2017;12:1–20. doi: 10.1371/journal.pone.0177459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.What is docker. 2017. https://www.docker.com/what-docker. Accessed Nov 2017.
- 19.Sanderson ND. Clinincal Real-time Metagenomics Pathogen Identification Test (CRuMPIT). https://gitlab.com/ModernisingMedicalMicrobiology/CRuMPIT. Accessed Nov 2017.
- 20.Wick RR, Judd LM, Holt KE. Comparison Of Oxford Nanopore Basecalling Tools. 2017. doi:10.5281/ZENODO.1043612.
- 21.Sanderson ND. fast5watcher.py. 2017. https://github.com/nick297/fast5_scripts. Accessed Nov 2017.
- 22.Wick RR. Porechop. 2018. https://github.com/rrwick/Porechop. Accessed Nov 2017.
- 23.Schmieder R, Edwards R. Quality control and preprocessing of metagenomic datasets. Bioinformatics. 2011;27:863–864. doi: 10.1093/bioinformatics/btr026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kim D, Song L, Breitwieser FP, Salzberg SL. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 2016;26:1721–1729. doi: 10.1101/gr.210641.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014;15:R46. doi: 10.1186/gb-2014-15-3-r46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Li H. Minimap2: fast pairwise alignment for long nucleotide sequences. 2017:2–5. 10.1101/169557.
- 27.Jette MA, Yoo AB, Grondona M. SLURM: simple Linux utility for resource management. In: In Lecture Notes in Computer Science: Proceedings of Job Scheduling Strategies for Parallel Processing (JSSPP) 2003. Berlin: Springer-Verlag; 2002. p. 44–60. https://link.springer.com/chapter/10.1007/10968987_3.
- 28.Helgason E, Økstad OA, Caugant DA, Johansen HA, Fouet A, Mock M, et al. Bacillus anthracis, Bacillus cereus, and bacillus thuringiensis - one species on the basis of genetic evidence. Appl Environ Microbiol. 2000;66:2627–2630. doi: 10.1128/AEM.66.6.2627-2630.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kearney MF, Spindler J, Wiegand A, Shao W, Anderson EM, Maldarelli F, et al. Multiple sources of contamination in samples from patients reported to have XMRV infection. PLoS One. 2012;7:e30889. doi: 10.1371/journal.pone.0030889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Schmidt K, Mwaigwisya S, Crossman LC, Doumith M, Munroe D, Pires C, et al. Identification of bacterial pathogens and antimicrobial resistance directly from clinical urines by nanopore-based metagenomic sequencing. J Antimicrob Chemother. 2017;72:104–114. doi: 10.1093/jac/dkw397. [DOI] [PubMed] [Google Scholar]
- 31.Brown BL, Watson M, Minot SS, Rivera MC, Franklin RB. MinION nanopore sequencing of environmental metagenomes: a synthetic approach. Gigascience. 2017;6:1–10. doi: 10.1093/gigascience/gix007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Jones MB, Highlander SK, Anderson EL, Li W, Dayrit M, Klitgord N, et al. Library preparation methodology can influence genomic and functional predictions in human microbiome research. Proc Natl Acad Sci. 2015;112:14024–14029. doi: 10.1073/pnas.1519288112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Schirmer M, D’Amore R, Ijaz UZ, Hall N, Quince C. Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data. BMC Bioinformatics. 2016;17:125. doi: 10.1186/s12859-016-0976-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Pysam. https://github.com/pysam-developers/pysam. Accessed Nov 2017.
- 35.Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–1423. doi: 10.1093/bioinformatics/btp163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.McKinney W. pandas: a Foundational Python Library for Data Analysis and Statistics. Python High Perform Sci Comput. 2011. p. 1–9. https://www.scipy.org/citing.html.
- 37.Hunter JD. Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007;9:90–95. doi: 10.1109/MCSE.2007.55. [DOI] [Google Scholar]
- 38.Huerta-Cepas J, Serra F, Bork P. ETE 3: reconstruction, analysis, and visualization of Phylogenomic data. Mol Biol Evol. 2016;33:1635–1638. doi: 10.1093/molbev/msw046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Oliphant TE. Guide to NumPy. Trelgol Publ. 2006;1:378. doi: 10.1016/j.jmoldx.2015.02.001. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The raw FASTQ sequencing data from this publication are archived to EBI ENA with the bioproject identifier, PRJEB23460, and available to download. We hope to make the constructed centrifuge index available to download when we find a host for it and future iterations, until then please contact the corresponding author for access.