Table 2.
Features calculated for each sequence.
Feature name | Description | Dimension |
---|---|---|
nt_proportion | Ratio of each base in the sequence (A, C, G and T) | 4 |
dinucleotide_proportion | Ratio of dinucleotide elements of each kind, making 16 Features for the possible binary combinations of the 4 nucleotides | 16 |
gc_content | Proportion of guanine and cytosine on the sequence | 1 |
gc_ratio | Ratio between guanine and cytosine | 1 |
sequence_length | The length of the sequence | 1 |
stem_number | The number of stem-loops | 1 |
avg_bp_stem | Average of nucleotides per stem | 1 |
longest_stem_length | Longest region where the pairing is perfect | 1 |
terminal_loop_length | Number of nucleotides in the stem region | 1 |
bp_number | Number of base-pairs | 1 |
dP | Number of base pair divided by the nucleotide number | 1 |
bp_proportion | Number of each possible base pair normalized by sequence length | 3 |
bp_proportion_stem | Proportion of base pairs on stems | 3 |
triplets | Frequencies of secondary structure triplets, this is the 32 possible combinations of the 4 nucleotides in a sequence of 3 | 32 |
MFE | Minimum free energy | 1 |
EFE | Normalized Ensemble Free Energy calculated with RNAfold (-p option) | 1 |
ensemble_frequency | The frequency of the minimum free energy in the ensemble | 1 |
diversity | Structural diversity calculated with RNAfold (-p option) | 1 |
mfe_efe_difference | Calculated as |MFE-EFE|/l | 1 |
dQ | Calculated as 1/L log2pij, where L is length and pij is the probability of pairing of nucleotides i and j | 1 |
dG | Minimum free energy divided by sequence length | 1 |
MFEI1 | Ratio between the minimum free energy and the %C+G | 1 |
MFEI2 | dG/Ns, where Ns is the number of stems. | 1 |
MFEI4 | MFE/Nb, where Nb is the total number of base pairs in the secondary structure | 1 |