Skip to main content
. 2024 Feb 22;25(2):bbae030. doi: 10.1093/bib/bbae030

Table 1.

A Summary of Feature Encoding Schemes, Experimental Datasets and Computational Approaches Proposed For Enhancer Identification and Strength Prediction Task

Author Feature Encoding Classifier Dataset Performance
Liu et al-2016 [33] PseKNC SVM Benchmark Dataset, & Independent Test set Layer-1 SN=78.09, SP=75.88, ACC= 76.89, MCC= 0.54, AU-ROC=0.85 Layer-2 SN=62.21, SP=61.82, ACC= 61.93, MCC=0.24, AU-ROC 0.66 & Layer-1 SN=71.0 SP=75.0 ACC=73.00 MCC=0.460 AU-ROC=80.62 Layer-2 SN=47.00, SP=74.00, ACC= 60.50, MCC= 0.218, AU-ROC= 66.78
Jia et al-2016 [31] Bi-profile Bayes+ Nucleotide composition+ pseudo-nucleotide composition SVM Benchmark Dataset & Independent Test set Layer-1 SN=71.97 SP=82.82 ACC=77.39, MCC= 0.55, AU-ROC=N/A Layer-2 SN=71.16, SP=65.23, ACC=68.19, MCC=0.36 & Layer-1 SN=73.5 SP=74.5 ACC=74.00 MCC=0.480 AU-ROC=80.13 Layer-2 SN=45.00, SP=65.00, ACC= 55.00 MCC=0.102 AU-ROC=57.90
Liu et al-2016 [59] Pseudo degenerate kmer nucleotide composition SVM Benchmark Dataset Layer-1 SN=77.31, SP=76.30, ACC=76.78, MCC=0.54, AU-ROC=0.85 Layer-2 SN=62.62, SP=64.41, ACC=63.41, MCC=0.27, AU-ROC=0.69
He et al-2017 [32] position-specific trinucleotide propensity + EIIP of trinucleotides+ F-score feature selection SVM Benchmark dataset Layer-1 SN=87.94, SP= 88.61, ACC= 88.27, MCC= 0.77 Layer-2 SN=97.98, SP= 98.11, ACC= 98.05, MCC= 0.96
Bin Liu et al-2018 [43] kmer+subsequence profile +PseKNC SVM Benchmark Dataset & Independent test set Layer-1 SN=75.67, SP= 80.39, ACC=78.03 MCC=0.5613, AU-ROC=85.47 Layer-2 SN=69.00, SP=61.05, ACC= 65.03, MCC= 0.3149AU-ROC= 69.57 & Layer-1 SN=71.0, SP=78.5, ACC=74.75, MCC=0.496, AU-ROC=81.73 Layer-2 SN=54.00,SP= 68.00 ACC=61.00 MCC=0.222 AU-ROC=68.01
Tan et al-2019 [44] One hot encoding + physicochemical properties CNN+RNN Ensemble Benchmark dataset & Independent Test set Layer-1 SN=73.25, SP= 76.42, ACC= 74.83, MCC= 0.498, AU-ROC= 76.94 Layer-2 SN=58.96, SP= 38.28, ACC= 79.65, MCC= 0.197, AU-ROC= 60.68 & Layer-1 SN= 75.5, SP= 76.00, ACC= 75.50, MCC= 0.51 AU-ROC= 77.04 Layer-2 SN= 83.15, SP= 45.61, ACC= 68.49, MCC= 0.312AU-ROC= 67.14
Le et al-2019 [37] Neural word embeddings SVM Benchmark dataset & Independent Test set Layer-1 SN=81.1, SP=83.5, ACC=82.3, MCC=0.65 Layer-2 SN=75.3, SP=60.8, ACC=68.1, MCC=0.37 & Layer-1 SN=82, SP=76, ACC=79, MCC=0.58 Layer-2 SN=74, SP=53, ACC=63.5, MCC=0.28
Asim et al-2020 [2] K-mer representaion by fusing k-mer positional information with sequence type Precise Softmax Classifier Benchmark dataset & Independent Test set Layer-1 SN=76.0, SP=76.0, ACC=76.0, MCC=0.52 Layer-2 SN=67.0, SP=67.0, ACC=63.0, MCC=0.26 & Layer-1 SN=78.0, SP=77, ACC=78, MCC=0.56 Layer-2 SN=83.0, SP=67.0, ACC=83.0, MCC=0.70
Le et al-2021 [48] BERT Embeddings CNN Benchmark dataset & Independent Test set Layer-1 SN=79.5, SP= 73, ACC=76.2, MCC=0.525 & Layer-1 SN=80, SP=71.2, ACC=75.6, MCC=0.514
Lim et al-2021 [60] binary +debinary +ANF +NCP + ENAC, KGAP RF Benchmark dataset & Independent Test set Layer-1 SN=73.64, SP= 78.71, ACC=76.18, MCC=0.5264, AUC=84 Layer-2 SN=68.46, SP=56.61, ACC=62.53, MCC=0.2529, AUC=67 & Layer-1 SN=78.50, SP=81, ACC=79.75, MCC=0.5952 Layer-2 SN=93, SP=77.0, ACC=85, MCC=0.7091
Cai et al-2021 [45] Mismatch k-tuple +PSSM +Spectrum +Subsequence Profile + PseDNC XGBoost Benchmark dataset & Independent Test set Layer-1 SN=75.7, SP= 86.5, ACC=58.55, MCC=0.62 Layer-2 SN=74.94, SP=58.55, ACC=66.74, MCC=0.33 & Layer-1 SN=74.0, SP=77.5, ACC=75.75, MCC=0.514 Layer-2 SN=70.0, SP=57.0, ACC=63.5, MCC=0.272
Kamran et al-2022 [16] One hot encoding CNN Benchmark dataset & Independent Test set Layer-1 SN=86.99, SP=88.54, ACC=87.77, MCC=0.75 Layer-2 SN=83.57, SP=78.16, ACC=80.86, MCC=0.62 & Layer-1 SN=81.5, SP=67, ACC=74.02, MCC=0.4902 Layer-2 SN=73.0, SP=49.0, ACC=61, MCC=0.226
Yang et al-2021 [36] Skip-gram GAN Independent Test set Layer-1 SN=81.1, SP=75.8, ACC=78.4, MCC=0.567 Layer-2 SN=96.1, SP=53.7, ACC=74.9, MCC=0.505
Liao et al-2022 [52] Word2vec CNN-LSTM-Attention Benchmark dataset & Independent Test set Layer-1 SN=84.18, SP=82.45, ACC=83.32, MCC=0.666 Layer-2 SN=89.27, SP=77.33, ACC=83.3, MCC=0.673 & Layer-1 SN=78, SP=78.50, ACC=78.25, MCC=0.565 Layer-2 SN=87, SP=69.0, ACC=78, MCC=0.569
Luo et al-2022 [47] DNABERT CNN Benchmark dataset & Independent Test set Layer-1 ACC=79.4, MCC=0.593, AUC= 87.9 Layer-2 ACC=65.3, MCC=0.31, AUC=70.3 & Layer-1 ACC=79.3, MCC=0.585, AUC= 84.4 Layer-2 ACC=70.1, MCC=0.401, AUC= 81.2
Geng et al-2022 [51] GAN + FastText LSTM-CNN Benchmark dataset & Independent Test set Layer-1 SN=74.87, SP=75.63, ACC=75.25, MCC=0.5051 Layer-2 SN=70.68, SP=68.89, ACC=69.7, MCC=0.3954 & Layer-1 SN=74.87, SP=75.63, ACC=75.25, MCC=0.5051 Layer-2 SN=70.68, SP=68.89, ACC=69.7, MCC=0.3954
Xiao et al-2023 [58] kmer + ANF + NBP RBF Benchmark dataset & Independent Test set Layer-1 SN=79.52, SP=82.95, ACC=81.23, MCC=0.6254, AUC= 88.09 Layer-2 SN=77.23, SP=79.69, ACC=76.95, MCC=0.5419, AUC=84.09 & Layer-1 SN=82, SP=77.50, ACC=79.75, MCC=0.5956 Layer-2 SN=100, SP=67.0, ACC=83.5, MCC=0.7098
Li et al-2023 [49] BERT Linear Benchmark dataset Layer-1 SN=93.73, SP=95.75, ACC=94.74, MCC=0.8951 Layer-2 SN=80, SP=86, ACC=83, MCC=0.6612