Skip to main content
. 2023 Jul 13;12(7):997. doi: 10.3390/biology12070997

Table 3.

Bioinformatic steps and tools used for NGS data analysis.

Analysis Commonly Used Tools
Common Analysis
Quality check of sequences FastQC [90], FASTX-toolkit [91], MultiQC [92]
Trimming of adaptors and low-quality bases Trimmomatic [93], Cutadapt [94], fastp [95]
Alignment of sequence reads to reference genome BWA [96], Bowtie [97], dragMAP [98]
Reports visualization MultiQC [92]
Whole-Genome Sequencing/Whole-Exome Sequencing/Targeted Panel
Removal of duplicate reads Picard [99], Sambamba [100]
Variant calling (single-nucleotide polymorphisms and indels) GATK [101], freeBayes [102], Platypus [103], VarScan [104], DeepVariant [105], Illumina Dragen [106]
Filter and merge variants bcftools [107]
Variant annotation ANNOVAR [108], ensemblVEP [109], snpEff [110], NIRVANA [111]
Structural variant calling DELLY [112], Lumpy [113], Manta [114], GRIDDS [115], Wham [116], Pindel [117]
Copy number variation (CNV) calling CNVnator [118], GATK gCNV [119], cn.MOPS [120], cnvCapSeq(targeted sequencing) [121], ExomeDepth (CNVs from Exome) [122]
Transcriptomics
Alignment of reads to reference Splice-aware aligner such as TopHat2 [123], HISAT2 [124], and STAR [125]
Transcript quantification featureCounts [126], HTSeq-count [127], Salmon [128], Kallisto [129]
Differential gene expression analysis
enrichment of gene categories
DESeq2 [130], EdgeR [131], DAVID [132], clusterProfiler [133], Enrichr [134]
Epigenomics-Methyl Seq
Sequence aligners Bwameth [135], BS-Seeker2 [136], Bismark [137]
Methylation level quantification MethylDackel *
Differential methylation Metilene [138], BSsmooth [139], methylKit [140]
Epigenomics-ChIP seq
Removal of PCR duplicates Samtools [107]
Peak calling MACS2 [141], SICER2 [142], SPP [143]
Peak filtering Bedtools [144]
Enrichment quality control ChipQC [145], Phantompeakqualtools [146]
Enrichment comparison diffBind [147], MAnorm [148], MMDiff [149]
Motif analysis MemeCHiP [150], Homer [151], RSAT [152]
16s rRNA seq
16S rRNAseq analysis pipelines QIIME2 [82], mothur [153], USEARCH [154]
Ribosomal RNA databases Greengenes [155], Silva [156], RDP [157]
Shotgun Metagenomics
Taxonomic classification MetaPhlAn4 [158], Kaiju [159], Kraken [160]
Assembly of metagenomic reads metaSPAdes [86], metaIDBA [87]
Protein databases for taxonomic classification NCBI non-redundant protein database [83]
Gene annotation Prokka [88], MetaGeneMark [89]
Databases for functional annotation of genes COG [161], KEGG [84], GO [85]

Footnote: ANNOVAR—ANNOtate VARiation; BWA—Burrows Wheeler Aligner; cn.mops Copy Number Estimation by a Mixture Of PoissonS; COG—Clusters of Orthologous Groups of Proteins; DAVID—A Database for Annotation, Visualization and Integrated Discovery; Ensembl VEP—Ensembl Variant Effect Predictor; Fastp—Fsatq Preprocessor; GATK—Genome Analysis Tool Kit; GO—Gene Ontology; HISAT2—Hierarchical Indexing for Spliced Alignment of Transcripts; HOMER—Hypergeometric Optimization of Motif EnRichment; Htseq-count—High-Throughput Sequence Analysis in Python; KEGG: Kyoto Encyclopedia of Genes and Genomes; NCBI—National Center for Biotechnology Information; MACS: Model-Based Analysis for ChIP-Seq; MEME—Multiple EM for Motif Elicitation; Meta-IDBA—Meta-Iterative De Bruijn Graph De Novo Short-Read Assembler; MetaPhlAn—Metagenomic Phylogenetic Analysis; metaSPAdes—meta St Petersburg Genome Assembler; QIIME—Quantitative Insights Into Microbial Ecology; RDP—Ribosomal Database Project; RSAT—Regulatory Sequence Analysis tools; SICER—Spatial Clustering Approach for the Identification of ChIP-Enriched regions; SPP—The Signaling Pathways Project; STAR—Spliced Transcripts Alignment to a Reference. * Available at: https://github.com/dpryan79/MethylDackel/ (accessed on 1 June 2023). Bold represents the categories of analysis and commonly used bioinformatics tools used for NGS data analysis.