Table 1.
Recommended tools for predicting pAs from DNA sequences, bulk RNA-seq, and scRNA-seq
Category | Tool | Year | Description | Refs. |
---|---|---|---|---|
Web servers for predicting pAs from DNA sequences | Dragon PolyA Spotter | 2012 | A web server for predicting 12 poly(A) motifs from human DNA sequences, using an artificial neural network and a random forest | [60] |
PolyApred | 2009 | An SVM-based web server for predicting 13 poly(A) motifs in human, using sequence features of different types of nucleotide frequencies and binary pattern | [58] | |
Polyadq | 1999 | An early web server based on two quadratic discriminant functions for predicting AAUAAA/AUUAAA signals, using features encoded by position weight matrix | [57] | |
DL-based tools for predicting pAs from DNA sequences | PASNet | 2021 | A hybrid DL framework for identifying 16 poly(A) motifs in different species, which integrates gated convolutional highway networks with self-attention mechanisms | [75] |
SANPolyA | 2020 | A self-attention DL model for predicting 18 poly(A) motifs in human and mouse | [74] | |
HybPAS | 2019 | A hybrid model for predicting 12 poly(A) motifs in human, using eight neural networks and four logistic regression models | [73] | |
APARENT | 2019 | The model trained on isoform expression data from more than three million synthetic APA reporters | [29] | |
DeepPASTA | 2019 | A model based on CNN and RNN for predicting pAs from both sequence and RNA secondary structure | [28] | |
DeeReCT-PolyA | 2018 | A transferrable CNN model for recognition of 12 poly(A) motifs, which enables transfer learning across datasets and species | [26] | |
DeepGSR | 2018 | An approach based on CNN and one-hot features to predict genome-wide and cross-organism genomic signals and regions | [27] | |
DeepPolyA | 2018 | A model for predicting pAs in Arabidopsis with one-hot encoding features | [71] | |
Traditional ML-based tools for predicting pAs from DNA sequences | PASS | 2007 | A GHMM-based model for predicting pAs in plants | [68] |
polya_svm | 2006 | An SVM-based tool for predicting pAs using position-specific scoring matrices to score 15 cis-regulatory elements | [24] | |
Methods for bulk RNA-seq that rely on prior annotations of pAs | QAPA | 2018 | It compiles an expanded compendium of known pA annotations for identifying and quantifying pAs, which was suggested by Shah et al. [50] to be used in combination with pAs derived from 3′ seq or Iso-Seq | [38] |
PAQR | 2018 | It uses read coverage to segment 3′ UTRs at annotated pAs | [86] | |
Methods for bulk RNA-seq that based on detecting changes in read density | moutainClimber | 2019 | It runs on a single RNA-seq sample and can recognize multiple TSSs or pAs | [93] |
APAtrap | 2018 | It can detect all pAs along the 3′ UTR and can be used to improve 3′ end annotations | [39] | |
TAPAS | 2018 | It adopts a method originally used for time-series data to detect change points, which was suggested to have overall high performance in several benchmark studies [49], [50] | [40] | |
DaPars, DaPars2 | 2014, 2018 | DaPars is probably the first and the most widely used tool for bulk RNA-seq and DaPars2 is its updated version | [17], [90], [91] | |
Methods for bulk RNA-seq that based on ML models | Aptardi | 2021 | A multi-omics DL-based approach for predicting pAs by leveraging DNA sequences, RNA-seq, and the predilection of transcriptome assemblers; however, its sensitivity may be low according to our preliminary test (Figure 3) | [98] |
Terminitor | 2020 | A DL-based model for three-label classification problem, which determines a poly(A) cleavage site, a non-polyadenylated cleavage site, or non-cleavage site | [97] | |
TECtool | 2018 | It is based on transcriptome assembly and prior pA annotations, and can predict novel terminal exons | [95] | |
Methods for predicting pAs from scRNA-seq | scDaPars | 2021 | It is applicable to both full-length and 3′ tag scRNA-seq, which uses DaPars to infer pAs and may be slow for large-scale scRNA-seq | [45] |
MAAPER | 2021 | An annotation-assisted method for both bulk RNA-seq and 3′ tag scRNA-seq data, which incorporates prior pAs in the PolyA_DB for identifying pAs in 3′ UTRs and introns | [105] | |
SCAPTURE | 2021 | An annotation-assisted pipeline that implements a DL model to evaluate called peaks from 3′ tag scRNA-seq, using prior pAs from four databases for model training | [106] | |
scUTRquant | 2021 | An annotation-assisted method that incorporates pA atlas established from a mouse full-length Microwell-seq dataset of 400,000 single cells [108] for filtering pAs predicted from 3′ tag scRNA-seq | [107] | |
SCAPE | 2022 | A peak calling-based method based on a probabilistic mixture model for identification and quantification of pAs in 3′ tag scRNA-seq by utilizing insert size information | [103] | |
ReadZS | 2021 | A statistical approach to characterize read distributions that bypasses parametric peak calling and identifies pAs from 3′ tag scRNA-seq | [104] | |
scAPAtrap | 2020 | A peak calling-based method that incorporates poly(A) reads for genome-wide pA prediction from 3′ tag scRNA-seq | [44] | |
Sierra | 2020 | A splice-aware peak calling-based method that can identify pAs in 3′ UTRs and introns from 3′ tag scRNA-seq | [43] |
Note: Tools are chosen based on criteria such as availability, function, ease of use, and popularity. pA, poly(A) site; ML, machine learning; SVM, support vector machine; GHMM, generalized hidden Markov model; CNN, convolution neural network; RNN, recurrent neural network; scRNA-seq, single-cell RNA sequencing; 3′ seq, 3′ end sequencing; Iso-Seq, isoform-sequencing; RNA-seq, RNA sequencing; DL, deep learning; UTR, untranslated region; TSS, transcription start site.