Skip to main content
. 2024 May 16;30:1611676. doi: 10.3389/pore.2024.1611676

TABLE 1.

Summary of the most recent and common-used long-read bioinformatics tools.

Long-read bioinformatics tools
Data analysis step Tool name Background and performance References
Complex user-friendly interfaces capable of perform the whole analysis process exept error correction: PacBio: SMRT link (BioSciences) Nanopore: EPI2ME Labs (Nanopore) QC metrics FastQC, MultiQC, LongQC, NanoPack, MinIONQC, NanoR, RNASeQC The listed items are quality control (QC) tools suitable for sequencing approaches, including long- and short-reads. Their aim is to provide QC checks on raw sequence data (FastQC) or dataset (MultiQC) and give detailed feedback regarding the occurring problems. For RNA-seq data, an unique algorithm (RNA-SeQC) was developed [4754]
Base calling SMRT analysis tools, Dorado, Guppy Neural network and statistical method based base calling methods; SMRT reads require specific analysis tools. Dorado and Guppy were developed for NS reads [5557]
Variant calling Clair3, Sniffles Sniffles perform structural variant calling on noisy long-read data. Clair3 is a deep neural network based variant caller even capable of haplotype-sensitive variant detecion performing variant detection from sequencing data containing modified bases [5860]
wf-human-variation, wf-somatic-variation Complex command line compatible workflows for NS variant detection. On demand, the separate or combined usage of tumor and normal data is insured with the production of well-detailed analysis reports [61]
Modified base calling Modbamtools, Guppy, Mekada, DeepSignal, DeepMod Set of tools to manipulate and visualize DNA/RNA base modification and methylation data that are stored in.bam format. Some of them is suitable for all long-read techniques. The detectable modified bases are 5mC, 5hmC and 6 mA [33, 5759, 62, 63]
Genome assembly Flye, Canu, HiCanu, BLASR, FALCON Some of them are graph construction-based method (Flye) or using hierarchical genome assembly process with clustering (BLASR) and overlap-based error correction, also carry out phasing (FALCON) during the accomplishment of de novo genome assembly on high-noise single-molecule sequencing data [6468]
Visualization NanoPack, R packages: maftools, ggplot2, Python packages: matplotlib (pyVolcano) Packages offering universal and problem-specific solutions for long-read data visualization [50, 6972]
Error correction Pilon, Racon, DeepConsensus, Medaka Neural network- and transformer-based methods, which are intended as standalone modules to correct raw contigs generated by rapid assembly methods which include or do not include a consensus step. An advantage of the application of transformer-based error correction methods is that they leverage a unique alignment loss to correct sequencing errors [33, 35, 71]

Additional packages are listed on webpage https://long-read-tools.org and can be found on bioinformatics-related pages.