Skip to main content
. 2022 Sep 24;21(1):67–83. doi: 10.1016/j.gpb.2022.09.005

Table 1.

Recommended tools for predicting pAs from DNA sequences, bulk RNA-seq, and scRNA-seq

Category Tool Year Description Refs.
Web servers for predicting pAs from DNA sequences Dragon PolyA Spotter 2012 A web server for predicting 12 poly(A) motifs from human DNA sequences, using an artificial neural network and a random forest [60]
PolyApred 2009 An SVM-based web server for predicting 13 poly(A) motifs in human, using sequence features of different types of nucleotide frequencies and binary pattern [58]
Polyadq 1999 An early web server based on two quadratic discriminant functions for predicting AAUAAA/AUUAAA signals, using features encoded by position weight matrix [57]
DL-based tools for predicting pAs from DNA sequences PASNet 2021 A hybrid DL framework for identifying 16 poly(A) motifs in different species, which integrates gated convolutional highway networks with self-attention mechanisms [75]
SANPolyA 2020 A self-attention DL model for predicting 18 poly(A) motifs in human and mouse [74]
HybPAS 2019 A hybrid model for predicting 12 poly(A) motifs in human, using eight neural networks and four logistic regression models [73]
APARENT 2019 The model trained on isoform expression data from more than three million synthetic APA reporters [29]
DeepPASTA 2019 A model based on CNN and RNN for predicting pAs from both sequence and RNA secondary structure [28]
DeeReCT-PolyA 2018 A transferrable CNN model for recognition of 12 poly(A) motifs, which enables transfer learning across datasets and species [26]
DeepGSR 2018 An approach based on CNN and one-hot features to predict genome-wide and cross-organism genomic signals and regions [27]
DeepPolyA 2018 A model for predicting pAs in Arabidopsis with one-hot encoding features [71]
Traditional ML-based tools for predicting pAs from DNA sequences PASS 2007 A GHMM-based model for predicting pAs in plants [68]
polya_svm 2006 An SVM-based tool for predicting pAs using position-specific scoring matrices to score 15 cis-regulatory elements [24]
Methods for bulk RNA-seq that rely on prior annotations of pAs QAPA 2018 It compiles an expanded compendium of known pA annotations for identifying and quantifying pAs, which was suggested by Shah et al. [50] to be used in combination with pAs derived from 3′ seq or Iso-Seq [38]
PAQR 2018 It uses read coverage to segment 3′ UTRs at annotated pAs [86]
Methods for bulk RNA-seq that based on detecting changes in read density moutainClimber 2019 It runs on a single RNA-seq sample and can recognize multiple TSSs or pAs [93]
APAtrap 2018 It can detect all pAs along the 3′ UTR and can be used to improve 3′ end annotations [39]
TAPAS 2018 It adopts a method originally used for time-series data to detect change points, which was suggested to have overall high performance in several benchmark studies [49], [50] [40]
DaPars, DaPars2 2014, 2018 DaPars is probably the first and the most widely used tool for bulk RNA-seq and DaPars2 is its updated version [17], [90], [91]
Methods for bulk RNA-seq that based on ML models Aptardi 2021 A multi-omics DL-based approach for predicting pAs by leveraging DNA sequences, RNA-seq, and the predilection of transcriptome assemblers; however, its sensitivity may be low according to our preliminary test (Figure 3) [98]
Terminitor 2020 A DL-based model for three-label classification problem, which determines a poly(A) cleavage site, a non-polyadenylated cleavage site, or non-cleavage site [97]
TECtool 2018 It is based on transcriptome assembly and prior pA annotations, and can predict novel terminal exons [95]
Methods for predicting pAs from scRNA-seq scDaPars 2021 It is applicable to both full-length and 3′ tag scRNA-seq, which uses DaPars to infer pAs and may be slow for large-scale scRNA-seq [45]
MAAPER 2021 An annotation-assisted method for both bulk RNA-seq and 3′ tag scRNA-seq data, which incorporates prior pAs in the PolyA_DB for identifying pAs in 3′ UTRs and introns [105]
SCAPTURE 2021 An annotation-assisted pipeline that implements a DL model to evaluate called peaks from 3′ tag scRNA-seq, using prior pAs from four databases for model training [106]
scUTRquant 2021 An annotation-assisted method that incorporates pA atlas established from a mouse full-length Microwell-seq dataset of 400,000 single cells [108] for filtering pAs predicted from 3′ tag scRNA-seq [107]
SCAPE 2022 A peak calling-based method based on a probabilistic mixture model for identification and quantification of pAs in 3′ tag scRNA-seq by utilizing insert size information [103]
ReadZS 2021 A statistical approach to characterize read distributions that bypasses parametric peak calling and identifies pAs from 3′ tag scRNA-seq [104]
scAPAtrap 2020 A peak calling-based method that incorporates poly(A) reads for genome-wide pA prediction from 3′ tag scRNA-seq [44]
Sierra 2020 A splice-aware peak calling-based method that can identify pAs in 3′ UTRs and introns from 3′ tag scRNA-seq [43]

Note: Tools are chosen based on criteria such as availability, function, ease of use, and popularity. pA, poly(A) site; ML, machine learning; SVM, support vector machine; GHMM, generalized hidden Markov model; CNN, convolution neural network; RNN, recurrent neural network; scRNA-seq, single-cell RNA sequencing; 3′ seq, 3′ end sequencing; Iso-Seq, isoform-sequencing; RNA-seq, RNA sequencing; DL, deep learning; UTR, untranslated region; TSS, transcription start site.