NanoForms: an integrated server for processing, analysis and assembly of raw sequencing data of microbial genomes, from Oxford Nanopore technology

Anna Czmil; Michal Wronski; Sylwester Czmil; Marta Sochacka-Pietal; Michal Cmil; Jan Gawor; Tomasz Wołkowicz; Dariusz Plewczynski; Dominik Strzalka; Michal Pietal

doi:10.7717/peerj.13056

. 2022 Mar 29;10:e13056. doi: 10.7717/peerj.13056

NanoForms: an integrated server for processing, analysis and assembly of raw sequencing data of microbial genomes, from Oxford Nanopore technology

Anna Czmil ^1,^#, Michal Wronski ^1,^#, Sylwester Czmil ¹, Marta Sochacka-Pietal ², Michal Cmil ¹, Jan Gawor ³, Tomasz Wołkowicz ⁴, Dariusz Plewczynski ^5,⁶, Dominik Strzalka ¹, Michal Pietal ^1,^✉

Editor: Adam Witney

PMCID: PMC8973472 PMID: 35368340

Abstract

Background

Next Generation Sequencing (NGS) techniques dominate today’s landscape of genetics and genomics research. Though Illumina still dominates worldwide sequencing, Oxford Nanopore is one of the leading technologies currently being used by biologists, medics and geneticists across various applications. Oxford Nanopore is automated and relatively simple for conducting experiments, but generates gigabytes of raw data, to be processed by often ambiguous set of alternative bioinformatics command-line tools, and genomics frameworks which require a knowledge of bioinformatics to run.

Results

We established an inter-collegiate collaboration across experimentalists and bioinformaticians in order to provide a novel bioinformatics tool, free for academics. This tool allows people without extensive bioinformatics knowledge to simply process their raw genome sequencing data. Currently, due to ICT resources’ maintenance reasons, our server is only capable of handling small genomes (up to 15 Mb). In this paper, we introduce our tool, NanoForms: an intuitive and integrated web server for the processing and analysis of raw prokaryotic genome data, coming from Oxford Nanopore. NanoForms is freely available for academics at the following locations: http://nanoforms.tech (webserver) and https://github.com/czmilanna/nanoforms (GitHub source repository).

Keywords: NGS, Bioinformatics, Oxford Nanopore, Genomics, Webserver, DNA sequencing, DNA assembly, Microbial genomes

Introduction

Next Generation Sequencing technologies, such as Illumina (Gloor et al., 2010), Pacific BioSciences (Rhoads & Au, 2015) and Oxford Nanopore (Jain et al., 2016), dominate today’s landscape of genetics and genomics research. Each technology has its advantages, disadvantages and market share in specific applications and niches. The applications of Oxford Nanopore technology include eDNA extraction and sequencing (i.e., Garlapati et al., 2019), rapid viral sequencing (including the current challenge of SARS-CoV2 (Wang et al., 2020)), human genome comparative sequencing (De Coster et al., 2019) and many others. Short-read sequencing technologies such as Illumina have made bacterial genome sequencing relatively cheap and accessible. However, the procedure of closing microbial genomes is often costly and laborious. Assembly of short reads from genomes that are repetitive and/or have extreme %GC content remains challenging. These difficulties can be mostly overcome by using single-molecule, long-read sequencing technologies such as the Oxford Nanopore. Nanopore helps with closing bacterial genomes (Risse et al., 2015; Kawalek et al., 2020), can deliver two strategies for bacterial genome assembly (Goldstein et al., 2019), even helps to obtain complete bacterial chromosomes from microbiomes (Moss, Maghini & Bhatt, 2020) or is used in routine microbial genome sequencing (Wick et al., 2017a). Two main strategies are used to assemble bacterial genomes using long read sequencing. In the first, nanopore reads are used for long read only genome assembly followed by polishing with Illumina reads. Alternatively, long reads are used to enhance genome assemblies that are generated from short-read Illumina data. In such case nanopore reads can scaffold contigs generated by short read sequencing. It is also now possible to extract 3D structures of the genome using Oxford Nanopore (Ulahannan et al., 2019).

Oxford Nanopore is relatively user-friendly, easily operated, inexpensive, and can be simply adjusted to allow for rapid sample processing (including outdoor usage). The downside is, after the experiment is done, researchers are left with a huge amount of raw data. There is a wide variety of software tools available to perform taxonomic classification of the raw data. There are also many comparative studies that evaluate the top performing bioinformatics tools, provide recommendations for use cases, and show how to run these tools (McIntyre et al., 2017; Escobar-Zepeda et al., 2018; Simon et al., 2019). An inexperienced user, however, could easily become overwhelmed by the complexity of the data, the fast pace of tool development, and the version updates, command changes, installation problems, etc.

Materials & Methods

We used the following technologies to create the NanoForms server: Python language, Linux/UNIX/BSD operating system, Django application server, Workflow Description Language and Cromwell, Crontab, Docker and BioContainers (Da Veiga Leprevost et al., 2017), and a custom set of bioinformatics tools. The NanoForms server is freely available for academic use and a commercial release of the server (for non-academics, businesses, etc.) is planned. We also provide its source code (under GPLv3 license) for non-commercial uses. The server is fully virtualized, with about 30 processor cores and 120 GB RAM available on average. The infrastructure is also hosted in a virtual environment. It can handle about 5-10 parallel jobs (taking into account dataset size limits of 15 GB). In general, single run processing time depends on configuration, sample size and the assembly type (short reads, long reads or both). It takes nearly two hours to process a 220 MB sample of ONT data, yielding the results. Computation time grows in a near linear manner, with a 1 GB dataset taking about five hours. Hybrid assemblies, which include a Unicycler polishing stage, can take up to 10 h for a 1 GB dataset combined (300 MB of ONT and 600 MB of Illumina data). However, actual run times will depend on server usage, so for complete control over timings one can install a local version of nanoforms. The detailed diagram of the server workflow is shown in Fig. 1.

Subsequent computation steps are run but in between, the user is given the partial results can take action, *i.e.*, give more specific parameters for the next programmes, based on the data available (*i.e.*, quality, quantity *etc*.). In the end, the report is generated and sent in a form of a PDF file.

The current version of the server includes the use of the following bioinformatics applications: Nanoplot 1.32.0 NanoFilt 2.7.1 (De Coster et al., 2018), FastQC 0.11.9 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/), Flye 2.8.1 (Lin et al., 2016), Bandage 0.8.1 (Wick et al., 2015), Rebaler 0.2.0 (https://github.com/rrwick/Rebaler), Medaka 1.0.1 (https://github.com/nanoporetech/medaka), Quast 3.2 (Mikheenko et al., 2018), Fitlong 0.1.0 (https://github.com/rrwick/Filtlong), Prokka 1.14.6 (Seemann, 2014), Kraken Tools 0.1 (Davis et al., 2013), Kraken 2.1.0 (Wood, Lu & Langmead, 2019) and Krona 2.7.1 (Ondov, Bergman & Phillippy, 2011). For hybrid genome assembly, we use Unicycler 0.3.0b (Wick et al., 2017b) and Fastp 0.18.0 (Chen et al., 2018) for filtering data from Illumina. A hybrid assembly strategy has been developed to overcome the limitations of both Illumina and Oxford Nanopore sequencing and to unlock their full potential for genome assembly. Oxford Nanopore long reads can scaffold contigs generated by Illumina short reads to disambiguate regions of the assembly graph that cannot be resolved by Illumina short reads alone, as implemented in the Unicycler assembler (Chen, Erickson & Meng, 2020).

The human genome comprises approximately 3 Gb of nucleotides while a typical raw data set from Oxford Nanopore sequencing (before base-calling) exceeds 1 TB. This makes it difficult to upload the data with even high-speed bandwidth (a 100 Mbps transfer would take over 24 h to transfer the data alone). In addition, such large amounts of data would require substantial funding for computing resources as the required calculations would be considered Big Data. Because of these technological issues, we narrowed the analyses to genomes of prokaryotic sizes (up to 15 Mb in length, up to 15 GB in file size) but our server also can handle small eukaryotic genomes (such as S. cerevisiae). Unicycler may not be the best tool for yeast genomes but Flye and nanopore assembly pipeline from NanoForms, might be easily used for small euakryotic genomes (see: Martín-Hernández et al., 2021). After deploying this server and gathering remarks from the users, we are considering designing another milestone which might address this problem. We also plan to launch the commercial version of the server for institutional clients.

Results

We introduce NanoForms: an intuitive and integrated web server for the processing and analysis of raw data from small genomes, yielding from Oxford Nanopore technology. The user uploads an archived, single sequence file (FASTQ) or a list of archived, sequence files, then the data is preprocessed. The user then chooses several options on the go and, after subsequent steps, the user obtains the DNA/RNA sequence in a form of FASTA file as well as the HTML summary with reports, images or statistics of the calculations performed.

The initial output from the NanoPlot program (read length vs. the read quality) gives the user a quick outlook (as shown in Fig. 2) that helps him or her decide whether to continue the analysis or to go back to the lab to fix the sample. The final output of the NanoForms service is an assembled genome in fasta format, prokka annotation files and Bandage diagram, allowing for easy graphical assessment of assembly completeness. An example, derived for the Bacillus subtilis, is depicted in Fig. 3. Since the MinION is marketed as needing only minor laboratory skills to operate, NanoForms needs practically no bioinformatics skills to produce the sequence (however more skilled users can benefit from extra, but optional, commands and options that they can provide during the course of the analysis). Therefore, we claim that our NanoForms server, in combination with Oxford Nanopore technology, has ultimately made NGS available for all, including both biologists and bioinformaticians, as specialized skills are no longer needed to perform certain NGS tasks and analyses. On the server website, after logging in, there are several toy datasets already provided to the user. Users can use these datasets for quality tests or data assembly using the respective forms provided in NanoForms, such as Bacillus subtilis SRX6978160.

Based on this information on the server website, the user can decide if the quality of the data is good enough to continue the analysis and thus, save time of the project or decide to which extent to crop the data to exclude low-quality short reads.

Sometimes the long-read nanopore data gives a few separate genome fragments as sequencing output (contigs), so the hybrid assembly option, provided by NanoForms, can often resolve these ambiguities. This image is normally the last stage of NanoForms sequencing protocol, however all figures and statistics that arose on the course of sequencing, are delivered to the user as a report.

Discussion

We performed a detailed analysis and comparison of similar services available for researchers. The first service to be tested was the CGE (Larsen et al., 2017) server. We checked and tested this service at the very beginning of this project and at that time, it was an up-to-date and convenient service, and easy to use for biologists without technical knowledge. Shortly before drafting this manuscript, the part of the server dedicated to genome assembly went offline, so the only input is the contigs in FASTA (Pearson, 1990) format. The rest of the tools tested are free to use, but as standalone programs, they are not interconnected with the comprehensive pipeline. In addition, no figures are generated, which makes the qualitative analyses more difficult.

The Enterobase server (Zhou et al., 2020) is aimed at wgMLST analyses. This server is mainly designed for genotyping isolates. Users can screen the database against specific STRs etc., and the service can also generate phylogenetic trees. Enterobase is dedicated to analyses of gut bacteria and supports Illumina or PacBio reads. The user only needs to provide the FASTQ (Cock et al., 2010) files, which need to be compressed by the gzip tool and also, manually curated, before running the service. The figures can be generated or plotted, however this requires additional manual user intervention. The Enterobase server does not accept nanopore data.

Another interesting tool, though with significantly diminished accessibility, is the Galaxy Tools service (Cock et al., 2013). Our experience testing this service suggests that the stand-alone version of the server needs to be installed and run locally for optimal use, but for smaller analysis it can be also run on the public server. The server provides workflows (though only an empty set, for the novel user) which the user can adjust upon request. The tool supports both Illumina and long read (nanopore, PacBio) data input. It is worth noting that both the CGE and Galaxy Tools offer additional applications for practical genome analysis such as: resistance, serotype or virulence, and maintains the updated databases of these genetic targets.

While preparing and programming our tool, a software report was published about the new toolkit, NanoGalaxy, dedicated to the nanopore data processing (De Koning et al., 2020). NanoGalaxy is an extension of the aforementioned Galaxy Tools, in the area of support for nanopore data. In some ways NanoForms and NanoGalaxy seems to have similar features and functionalities. Similar to standard Galaxy Tools, NanoGalaxy seems to be more powerful but on the other hand it is aimed at more advanced user with more bioinformatics skills. In our subjective opinion NanoForms is easier to use for non-bioinformatics users but on the other hand can be treated as a quite bordered “black box”. NanoGalaxy is more complex and has more functionalities, but getting familiar with the numerous available algorithms requires some bioinformatics experience. NanoGalaxy and NanoSPC (Xu et al., 2020) deliver similar results and has similar capabilities as our server. NanoSPC is focused only on Nanopore data and as a result it is not possible to perform e.g., hybrid assembly there. It is also focused mostly on metagenomics, identification of pathogens and variant calling, but it seems to be easy to use also for users with limited dry-lab knowledge.

Another tool we tested was Patric (Gillespie et al., 2011). This platform provides bioinformatics analyses of all bacteria, with special focus on pathogens. It supports hybrid genome assembly in the formula of short + long reads (PacBio and nanopore are supported). Table 1 provides the comprehensive summary of the features about the quoted services, compared based on NanoForms functionality, in terms of nanopore data processing. We decided not to include “demultiplexing reads support” in our table as the most current version of MinKnow software (the native software to Oxford Nanopore) supports this feature already.

Table 1. NanoForms server vs. other services: the comparison.

Our service successfully fills in the gap for NGS genome assembly, in regard to the fully automated (but interactive) pipelined nanopore data processing.

Feature	CGE	Enterobase	Galaxy tools	EPI2ME	NanoPipe	Patric	Nano galaxy	NanoSPC	NanoForms
Raw data processing	+	+	+	+	+	+	+	+	+
Interactive interface	+	+	+	+	+	+	+	+	+
QA, reports	+	+	+	+	+	+	+	+	+
Free for academics	+	+	+	+	+	+	+	+	+
Sequence assembly	−	+	+	−	+^a	+	+	+	+
Qualitative analyses: images	−	+	+	+	+	+	+	+	+
Nanopore data processing	+	−	+	+	+	+	+	+	+
Tools connected into the pipeline	−	+	+/ −^b	−	-	+/ −^b	+	+	+
Hybrid assembly: nanopore + Illumina	−	−	+	−	-	+	+	-	+
Special nanopore visualization tools	−	−	+	+	-	−	+	+	+
Summary report generated	−	−	−	−	+	−	−	+	+
Ease of use^c	+	+/ −	+/ −	+	+	+	+	+	+
Dry-lab knowledge unnecessary^c	+	+	−	−	+	−	−	+	+

Open in a new tab

Notes.

But not the novo assembly.

There are no automatic pipelines, but users can develop them themselves.

Subjective assessments of the authors.

As we were pursuing this project in mid-2020, Oxford Nanopore announced the availability of their EPI2ME (http://epi2me.nanoporetech.com/,requireslogin) cloud-based workflow to process raw nanopore data. The platform’s intended use is only nanopore data analysis, without options for sequence assembly or trimming. After performing the base calling, reads are uploaded to the server via the EPI2ME Agent, which is a stand-alone program. The user can pick one of the several workflows (ie. microbiological classification or human genome analysis) which are triggered and executed in real time. Also in the table, the reader might find the characteristics of the last tool we reviewed: NanoPipe (Shabardina et al., 2019).

Conclusions

In summary, our NanoForms server, freely available for all academics, bridges the high-speed of prokaryotic genome assembly with an intuitive, interactive interface. According to the Oxford Nanopore MinION specification product page, it is sufficient for the user to have a mid-range laptop and the device to obtain the sequence of the sample. No further resources are needed and the user can continue the genomic analyses after a short break in sequencing with the use of NanoForms service.

Acknowledgments

MP wants to thank Przemyslaw Wroblewski and Giovanni Mazzocco who (with Dariusz Plewczynski) pioneered the NanOline web server at CeNT University of Warsaw, implementing the initial idea for nanopore genome assemblies of human genomes preceding the release of the NanoForms server. MP would also like to thank to Kacper Kroczek for additional testing and for checking documentation consistency on the server homepage. In addition, MP would like to thank Michal Madera and the SoftSystem Sp. z o.o. (Ltd.) company for providing some parts of the code necessary to the server and for delivering auxiliary server infrastructure.

Funding Statement

The work of Dominik Strzalka, Michal Pietal, Michal Cmil, Marta Sochacka-Pietal, Michal Wronski and Anna Czmil as well as the paper itself, was financed by Subcarpatian Center for Innovation (Podkarpackie Centrum Innowacyjnos̀ci - PCI), Teofila Lenartowicza 4 Street, 35-051 Rzeszòw, Poland, under the grant no. F3_116 contract no. 05/PRZ/1/DG/PCI/2019. “Oxford Nanopore technology: optimization of enzymes and analysis of genomic data for commercial applications”. Michal Pietal and Dariusz Plewczynski were supported by the National Science Center grant (2018/02/X/NZ2/00622) “Identification of structural variants in the human genome using long fragments from next generation sequencing, based on Oxford Nanopore technology”. Dariusz Plewczynski was supported by Polish National Science Centre (2019/35/O/ST6/02484), and the Foundation for Polish Science co-financed by the European Union under the European Regional Development Fund (TEAM to Dariusz Plewczynski) “Three-dimensional Human Genome structure at the population scale: computational algorithm and experimental validation for lymphoblastoid cell lines of selected families from 1000 Genomes Project”. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Additional Information and Declarations

Competing Interests

The authors declare there are no competing interests.

Author Contributions

Anna Czmil, Sylwester Czmil and Michal Cmil analyzed the data, prepared figures and/or tables, application programming, and approved the final draft.

Michal Wronski analyzed the data, authored or reviewed drafts of the paper, application programming, and approved the final draft.

Marta Sochacka-Pietal conceived and designed the experiments, performed the experiments, analyzed the data, authored or reviewed drafts of the paper, and approved the final draft.

Tomasz Wołkowicz analyzed the data, prepared figures and/or tables, and approved the final draft.

Dariusz Plewczynski and Dominik Strzalka conceived and designed the experiments, authored or reviewed drafts of the paper, and approved the final draft.

Michal Pietal conceived and designed the experiments, authored or reviewed drafts of the paper, application programming, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The source code of the NanoForms server is available at Github: https://github.com/czmilanna/nanoforms. This is the source for standalone server installation. The server is available at https://nanoforms.tech/.

The Bacillus subtilis sequences are available at ENA: SRX6978160.

References

Chen, Erickson & Meng (2020).Chen Z, Erickson DL, Meng J. Benchmarking hybrid assembly approaches for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing. BMC Genomics. 2020;21(1):1–21. doi: 10.1186/s12864-019-6419-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen et al. (2018).Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cock et al. (2010).Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research. 2010;38(6):1767–1771. doi: 10.1093/nar/gkp1137. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cock et al. (2013).Cock PJ, Grüning BA, Paszkiewicz K, Pritchard L. Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology. PeerJ. 2013;1:e167. doi: 10.7717/peerj.167. [DOI] [PMC free article] [PubMed] [Google Scholar]
Da Veiga Leprevost et al. (2017).Da Veiga Leprevost F, Grüning BA, Alves Aflitos S, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017;33(16):2580–2582. doi: 10.1093/bioinformatics/btx192. [DOI] [PMC free article] [PubMed] [Google Scholar]
Davis et al. (2013).Davis MP, Van Dongen S, Abreu-Goodger C, Bartonicek N, Enright AJ. Kraken: a set of tools for quality control and analysis of high-throughput sequence data. Methods. 2013;63(1):41–49. doi: 10.1016/j.ymeth.2013.06.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
De Coster et al. (2018).De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018;34(15):2666–2669. doi: 10.1093/bioinformatics/bty149. [DOI] [PMC free article] [PubMed] [Google Scholar]
De Coster et al. (2019).De Coster W, De Rijk P, De Roeck A, De Pooter T, D’Hert S, Strazisar M, Sleegers K, Van Broeckhoven C. Structural variants identified by Oxford Nanopore PromethION sequencing of the human genome. Genome Research. 2019;29(7):1178–1187. doi: 10.1101/gr.244939.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
De Koning et al. (2020).De Koning W, Miladi M, Hiltemann S, Heikema A, Hays J, Flemming S, Van den Beek M, Mustafa D, Backofen R, Grüning B, Stubbs A. NanoGalaxy: nanopore long-read sequencing data analysis in Galaxy. GigaScience. 2020;10(9):giaa105. doi: 10.1093/gigascience/giaa105. [DOI] [PMC free article] [PubMed] [Google Scholar]
Escobar-Zepeda et al. (2018).Escobar-Zepeda A, Godoy-Lozano EE, Raggi L, Segovia L, Merino E, Gutiérrez-Rios RM, Juarez K, Licea-Navarro AF, Pardo-Lopez L, Sanchez-Flores A. Analysis of sequencing strategies and tools for taxonomic annotation: defining standards for progressive metagenomics. Scientific Reports. 2018;8(1):1–13. doi: 10.1038/s41598-018-30515-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Garlapati et al. (2019).Garlapati D, Charankumar B, Ramu K, Madeswaran P, Murthy MR. A review on the applications and recent advances in environmental DNA (eDNA) metagenomics. Reviews in Environmental Science and Bio/Technology. 2019;18(3):389–411. doi: 10.1007/s11157-019-09501-4. [DOI] [Google Scholar]
Gillespie et al. (2011).Gillespie JJ, Wattam AR, Cammer SA, Gabbard JL, Shukla MP, Dalay O, Driscoll T, Hix D, Mane SP, Mao C, Nordberg EK. PATRIC: the comprehensive bacterial bioinformatics resource with a focus on human pathogenic species. Infection and Immunity. 2011;79(11):4286–4298. doi: 10.1128/IAI.00207-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gloor et al. (2010).Gloor GB, Hummelen R, Macklaim JM, Dickson RJ, Fernandes AD, MacPhee R, Reid G. Microbiome profiling by illumina sequencing of combinatorial sequence-tagged PCR products. PLOS ONE. 2010;5(10):e15406. doi: 10.1371/journal.pone.0015406. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goldstein et al. (2019).Goldstein S, Beka L, Graf J, Klassen JL. Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencing. BMC Genomics. 2019;20(1):1–17. doi: 10.1186/s12864-018-5379-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jain et al. (2016).Jain M, Olsen HE, Paten B, Akeson M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biology. 2016;17(1):239. doi: 10.1186/s13059-016-1103-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kawalek et al. (2020).Kawalek A, Kotecka K, Modrzejewska M, Gawor J, Jagura-Burdzy G, Bartosik AA. Genome sequence of Pseudomonas aeruginosa PAO1161, a PAO1 derivative with the ICE Pae 1161 integrative and conjugative element. BMC Genomics. 2020;21(1):1–12. doi: 10.1186/s12864-019-6419-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Larsen et al. (2017).Larsen MV, Joensen KG, Zankari E, Ahrenfeldt J, Lukjancenko O, Kaas RS, Roer L, Leekitcharoenphon P, Saputra D, Cosentino S, Thomsen MCF. Applied genomics of foodborne pathogens. Springer; Cham: 2017. The CGE tool box; pp. 65–90. [Google Scholar]
Lin et al. (2016).Lin Y, Yuan J, Kolmogorov M, Shen MW, Chaisson M, Pevzner P. Assembly of long error-prone reads Using de Bruijn graphs. Proceedings of the National Academy of Sciences of the United States of America. 2016;113(52):E8396–E8405. doi: 10.1073/pnas.1604560113. [DOI] [PMC free article] [PubMed] [Google Scholar]
Martín-Hernández et al. (2021).Martín-Hernández GC, Müller B, Chmielarz M, Brandt C, Hölzer M, Viehweger A, Passoth V. Chromosome-level genome assembly and transcriptome-based annotation of the oleaginous yeast Rhodotorula toruloides CBS 14. bioRxiv. 2021 doi: 10.1016/j.ygeno.2021.10.006. [DOI] [PubMed]
McIntyre et al. (2017).McIntyre AB, Ounit R, Afshinnekoo E, Prill RJ, Hénaff E, Alexander N, Minot SS, Danko D, Foox J, Ahsanuddin S, Tighe S. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biology. 2017;18(1):1–19. doi: 10.1186/s13059-016-1139-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mikheenko et al. (2018).Mikheenko A, Prjibelski A, Saveliev V, Antipov D, Gurevich A. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics. 2018;34(13):i142–i150. doi: 10.1093/bioinformatics/bty266. [DOI] [PMC free article] [PubMed] [Google Scholar]
Moss, Maghini & Bhatt (2020).Moss EL, Maghini DG, Bhatt AS. Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nature Biotechnology. 2020;38(6):701–707. doi: 10.1038/s41587-020-0422-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ondov, Bergman & Phillippy (2011).Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011;12(1):1–10. doi: 10.1186/1471-2105-12-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pearson (1990).Pearson WR. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 1990;183:63–98. doi: 10.1016/0076-6879(90)83007-v. [DOI] [PubMed] [Google Scholar]
Rhoads & Au (2015).Rhoads A, Au KF. PacBio sequencing and its applications. Genomics, Proteomics & Bioinformatics. 2015;13(5):278–289. doi: 10.1016/j.gpb.2015.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
Risse et al. (2015).Risse J, Thomson M, Patrick S, Blakely G, Koutsovoulos G, Blaxter M, Watson M. A single chromosome assembly of Bacteroides fragilis strain BE1 from Illumina and MinION nanopore sequencing data. Gigascience. 2015;4(1):s13742–015. doi: 10.1186/s13742-015-0101-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Seemann (2014).Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30(14):2068–2069. doi: 10.1093/bioinformatics/btu153. [DOI] [PubMed] [Google Scholar]
Shabardina et al. (2019).Shabardina V, Kischka T, Manske F, Grundmann N, Frith MC, Suzuki Y, Makałowski W. NanoPipe—a web server for nanopore MinION sequencing data analysis. GigaScience. 2019;8(2):giy169. doi: 10.1093/gigascience/giy169. [DOI] [PMC free article] [PubMed] [Google Scholar]
Simon et al. (2019).Simon HY, Siddle KJ, Park DJ, Sabeti PC. Benchmarking metagenomics tools for taxonomic classification. Cell. 2019;178(4):779–794. doi: 10.1016/j.cell.2019.07.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ulahannan et al. (2019).Ulahannan N, Pendleton M, Deshpande A, Schwenk S, Behr JM, Dai X, Tyer C, Rughani P, Kudman S, Adney E, Tian H. Nanopore sequencing of DNA concatemers reveals higher-order features of chromatin structure. bioRxiv. 2019:833590.
Wang et al. (2020).Wang M, Fu A, Hu B, Tong Y, Liu R, Liu Z, Gu J, Xiang B, Liu J, Jiang W, Shen G. Nanopore targeted sequencing for the accurate and comprehensive detection of SARS-CoV-2 and other respiratory viruses. Small. 2020;16(32):2002169. doi: 10.1002/smll.202002169. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wick et al. (2017a).Wick RR, Judd LM, Gorrie CL, Holt KE. Completing bacterial genome assemblies with multiplex MinION sequencing. Microbial Genomics. 2017a;3(10):e000132. doi: 10.1099/mgen.0.000132. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wick et al. (2017b).Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLOS Computational Biology. 2017b;13(6):e1005595. doi: 10.1371/journal.pcbi.1005595. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wick et al. (2015).Wick RR, Schultz MB, Zobel J, Holt KE. Bandage: interactive visualisation of de novo genome assemblies. Bioinformatics. 2015;31(20):3350–3352. doi: 10.1093/bioinformatics/btv383. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wood, Lu & Langmead (2019).Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biology. 2019;20(1):257. doi: 10.1186/s13059-019-1891-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu et al. (2020).Xu Y, Yang-Turner F, Volk D, Crook D. NanoSPC: a scalable, portable, cloud compatible viral nanopore metagenomic data processing pipeline. Nucleic Acids Research. 2020;48(W1):W366–W371. doi: 10.1093/nar/gkaa413. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhou et al. (2020).Zhou Z, Alikhan NF, Mohamed K, Fan Y, Achtman M, Brown D, Chattaway M, Dallman T, Delahay R, Kornschober C, Pietzka A. The EnteroBase user’s guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny, and Escherichia core genomic diversity. Genome Research. 2020;30(1):138–152. doi: 10.1101/gr.251678.119. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The following information was supplied regarding data availability:

The Bacillus subtilis sequences are available at ENA: SRX6978160.

[ref-1] Chen, Erickson & Meng (2020).Chen Z, Erickson DL, Meng J. Benchmarking hybrid assembly approaches for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing. BMC Genomics. 2020;21(1):1–21. doi: 10.1186/s12864-019-6419-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-2] Chen et al. (2018).Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-3] Cock et al. (2010).Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research. 2010;38(6):1767–1771. doi: 10.1093/nar/gkp1137. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-4] Cock et al. (2013).Cock PJ, Grüning BA, Paszkiewicz K, Pritchard L. Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology. PeerJ. 2013;1:e167. doi: 10.7717/peerj.167. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-5] Da Veiga Leprevost et al. (2017).Da Veiga Leprevost F, Grüning BA, Alves Aflitos S, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017;33(16):2580–2582. doi: 10.1093/bioinformatics/btx192. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-6] Davis et al. (2013).Davis MP, Van Dongen S, Abreu-Goodger C, Bartonicek N, Enright AJ. Kraken: a set of tools for quality control and analysis of high-throughput sequence data. Methods. 2013;63(1):41–49. doi: 10.1016/j.ymeth.2013.06.027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-7] De Coster et al. (2018).De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018;34(15):2666–2669. doi: 10.1093/bioinformatics/bty149. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-8] De Coster et al. (2019).De Coster W, De Rijk P, De Roeck A, De Pooter T, D’Hert S, Strazisar M, Sleegers K, Van Broeckhoven C. Structural variants identified by Oxford Nanopore PromethION sequencing of the human genome. Genome Research. 2019;29(7):1178–1187. doi: 10.1101/gr.244939.118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-9] De Koning et al. (2020).De Koning W, Miladi M, Hiltemann S, Heikema A, Hays J, Flemming S, Van den Beek M, Mustafa D, Backofen R, Grüning B, Stubbs A. NanoGalaxy: nanopore long-read sequencing data analysis in Galaxy. GigaScience. 2020;10(9):giaa105. doi: 10.1093/gigascience/giaa105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-10] Escobar-Zepeda et al. (2018).Escobar-Zepeda A, Godoy-Lozano EE, Raggi L, Segovia L, Merino E, Gutiérrez-Rios RM, Juarez K, Licea-Navarro AF, Pardo-Lopez L, Sanchez-Flores A. Analysis of sequencing strategies and tools for taxonomic annotation: defining standards for progressive metagenomics. Scientific Reports. 2018;8(1):1–13. doi: 10.1038/s41598-018-30515-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-11] Garlapati et al. (2019).Garlapati D, Charankumar B, Ramu K, Madeswaran P, Murthy MR. A review on the applications and recent advances in environmental DNA (eDNA) metagenomics. Reviews in Environmental Science and Bio/Technology. 2019;18(3):389–411. doi: 10.1007/s11157-019-09501-4. [DOI] [Google Scholar]

[ref-12] Gillespie et al. (2011).Gillespie JJ, Wattam AR, Cammer SA, Gabbard JL, Shukla MP, Dalay O, Driscoll T, Hix D, Mane SP, Mao C, Nordberg EK. PATRIC: the comprehensive bacterial bioinformatics resource with a focus on human pathogenic species. Infection and Immunity. 2011;79(11):4286–4298. doi: 10.1128/IAI.00207-11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-13] Gloor et al. (2010).Gloor GB, Hummelen R, Macklaim JM, Dickson RJ, Fernandes AD, MacPhee R, Reid G. Microbiome profiling by illumina sequencing of combinatorial sequence-tagged PCR products. PLOS ONE. 2010;5(10):e15406. doi: 10.1371/journal.pone.0015406. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-14] Goldstein et al. (2019).Goldstein S, Beka L, Graf J, Klassen JL. Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencing. BMC Genomics. 2019;20(1):1–17. doi: 10.1186/s12864-018-5379-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-15] Jain et al. (2016).Jain M, Olsen HE, Paten B, Akeson M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biology. 2016;17(1):239. doi: 10.1186/s13059-016-1103-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-16] Kawalek et al. (2020).Kawalek A, Kotecka K, Modrzejewska M, Gawor J, Jagura-Burdzy G, Bartosik AA. Genome sequence of Pseudomonas aeruginosa PAO1161, a PAO1 derivative with the ICE Pae 1161 integrative and conjugative element. BMC Genomics. 2020;21(1):1–12. doi: 10.1186/s12864-019-6419-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-17] Larsen et al. (2017).Larsen MV, Joensen KG, Zankari E, Ahrenfeldt J, Lukjancenko O, Kaas RS, Roer L, Leekitcharoenphon P, Saputra D, Cosentino S, Thomsen MCF. Applied genomics of foodborne pathogens. Springer; Cham: 2017. The CGE tool box; pp. 65–90. [Google Scholar]

[ref-18] Lin et al. (2016).Lin Y, Yuan J, Kolmogorov M, Shen MW, Chaisson M, Pevzner P. Assembly of long error-prone reads Using de Bruijn graphs. Proceedings of the National Academy of Sciences of the United States of America. 2016;113(52):E8396–E8405. doi: 10.1073/pnas.1604560113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-19] Martín-Hernández et al. (2021).Martín-Hernández GC, Müller B, Chmielarz M, Brandt C, Hölzer M, Viehweger A, Passoth V. Chromosome-level genome assembly and transcriptome-based annotation of the oleaginous yeast Rhodotorula toruloides CBS 14. bioRxiv. 2021 doi: 10.1016/j.ygeno.2021.10.006. [DOI] [PubMed]

[ref-20] McIntyre et al. (2017).McIntyre AB, Ounit R, Afshinnekoo E, Prill RJ, Hénaff E, Alexander N, Minot SS, Danko D, Foox J, Ahsanuddin S, Tighe S. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biology. 2017;18(1):1–19. doi: 10.1186/s13059-016-1139-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-21] Mikheenko et al. (2018).Mikheenko A, Prjibelski A, Saveliev V, Antipov D, Gurevich A. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics. 2018;34(13):i142–i150. doi: 10.1093/bioinformatics/bty266. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-22] Moss, Maghini & Bhatt (2020).Moss EL, Maghini DG, Bhatt AS. Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nature Biotechnology. 2020;38(6):701–707. doi: 10.1038/s41587-020-0422-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-23] Ondov, Bergman & Phillippy (2011).Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011;12(1):1–10. doi: 10.1186/1471-2105-12-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-24] Pearson (1990).Pearson WR. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 1990;183:63–98. doi: 10.1016/0076-6879(90)83007-v. [DOI] [PubMed] [Google Scholar]

[ref-25] Rhoads & Au (2015).Rhoads A, Au KF. PacBio sequencing and its applications. Genomics, Proteomics & Bioinformatics. 2015;13(5):278–289. doi: 10.1016/j.gpb.2015.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-26] Risse et al. (2015).Risse J, Thomson M, Patrick S, Blakely G, Koutsovoulos G, Blaxter M, Watson M. A single chromosome assembly of Bacteroides fragilis strain BE1 from Illumina and MinION nanopore sequencing data. Gigascience. 2015;4(1):s13742–015. doi: 10.1186/s13742-015-0101-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-27] Seemann (2014).Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30(14):2068–2069. doi: 10.1093/bioinformatics/btu153. [DOI] [PubMed] [Google Scholar]

[ref-28] Shabardina et al. (2019).Shabardina V, Kischka T, Manske F, Grundmann N, Frith MC, Suzuki Y, Makałowski W. NanoPipe—a web server for nanopore MinION sequencing data analysis. GigaScience. 2019;8(2):giy169. doi: 10.1093/gigascience/giy169. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-29] Simon et al. (2019).Simon HY, Siddle KJ, Park DJ, Sabeti PC. Benchmarking metagenomics tools for taxonomic classification. Cell. 2019;178(4):779–794. doi: 10.1016/j.cell.2019.07.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-30] Ulahannan et al. (2019).Ulahannan N, Pendleton M, Deshpande A, Schwenk S, Behr JM, Dai X, Tyer C, Rughani P, Kudman S, Adney E, Tian H. Nanopore sequencing of DNA concatemers reveals higher-order features of chromatin structure. bioRxiv. 2019:833590.

[ref-31] Wang et al. (2020).Wang M, Fu A, Hu B, Tong Y, Liu R, Liu Z, Gu J, Xiang B, Liu J, Jiang W, Shen G. Nanopore targeted sequencing for the accurate and comprehensive detection of SARS-CoV-2 and other respiratory viruses. Small. 2020;16(32):2002169. doi: 10.1002/smll.202002169. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-32] Wick et al. (2017a).Wick RR, Judd LM, Gorrie CL, Holt KE. Completing bacterial genome assemblies with multiplex MinION sequencing. Microbial Genomics. 2017a;3(10):e000132. doi: 10.1099/mgen.0.000132. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-33] Wick et al. (2017b).Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLOS Computational Biology. 2017b;13(6):e1005595. doi: 10.1371/journal.pcbi.1005595. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-34] Wick et al. (2015).Wick RR, Schultz MB, Zobel J, Holt KE. Bandage: interactive visualisation of de novo genome assemblies. Bioinformatics. 2015;31(20):3350–3352. doi: 10.1093/bioinformatics/btv383. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-35] Wood, Lu & Langmead (2019).Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biology. 2019;20(1):257. doi: 10.1186/s13059-019-1891-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-36] Xu et al. (2020).Xu Y, Yang-Turner F, Volk D, Crook D. NanoSPC: a scalable, portable, cloud compatible viral nanopore metagenomic data processing pipeline. Nucleic Acids Research. 2020;48(W1):W366–W371. doi: 10.1093/nar/gkaa413. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-37] Zhou et al. (2020).Zhou Z, Alikhan NF, Mohamed K, Fan Y, Achtman M, Brown D, Chattaway M, Dallman T, Delahay R, Kornschober C, Pietzka A. The EnteroBase user’s guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny, and Escherichia core genomic diversity. Genome Research. 2020;30(1):138–152. doi: 10.1101/gr.251678.119. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

NanoForms: an integrated server for processing, analysis and assembly of raw sequencing data of microbial genomes, from Oxford Nanopore technology

Anna Czmil

Michal Wronski

Sylwester Czmil

Marta Sochacka-Pietal

Michal Cmil

Jan Gawor

Tomasz Wołkowicz

Dariusz Plewczynski

Dominik Strzalka

Michal Pietal