Skip to main content
. 2019 Jul 3;25:104209. doi: 10.1016/j.dib.2019.104209

Table 2.

Features calculated for each sequence.

Feature name Description Dimension
nt_proportion Ratio of each base in the sequence (A, C, G and T) 4
dinucleotide_proportion Ratio of dinucleotide elements of each kind, making 16 Features for the possible binary combinations of the 4 nucleotides 16
gc_content Proportion of guanine and cytosine on the sequence 1
gc_ratio Ratio between guanine and cytosine 1
sequence_length The length of the sequence 1
stem_number The number of stem-loops 1
avg_bp_stem Average of nucleotides per stem 1
longest_stem_length Longest region where the pairing is perfect 1
terminal_loop_length Number of nucleotides in the stem region 1
bp_number Number of base-pairs 1
dP Number of base pair divided by the nucleotide number 1
bp_proportion Number of each possible base pair normalized by sequence length 3
bp_proportion_stem Proportion of base pairs on stems 3
triplets Frequencies of secondary structure triplets, this is the 32 possible combinations of the 4 nucleotides in a sequence of 3 32
MFE Minimum free energy 1
EFE Normalized Ensemble Free Energy calculated with RNAfold (-p option) 1
ensemble_frequency The frequency of the minimum free energy in the ensemble 1
diversity Structural diversity calculated with RNAfold (-p option) 1
mfe_efe_difference Calculated as |MFE-EFE|/l 1
dQ Calculated as 1/Li<jpij log2pij, where L is length and pij is the probability of pairing of nucleotides i and j 1
dG Minimum free energy divided by sequence length 1
MFEI1 Ratio between the minimum free energy and the %C+G 1
MFEI2 dG/Ns, where Ns is the number of stems. 1
MFEI4 MFE/Nb, where Nb is the total number of base pairs in the secondary structure 1