Inherent feature |
CDS size |
Gene coding sequence length |
Python 3.7 |
|
Protein size |
Amino acid length |
|
CodonW |
T3s, C3s, |
Relative synonymous codon usage of |
CodonW [43] |
|
A3s, G3s |
T, C, A, and G at the 3rd position |
|
|
CAI |
Codon adaptation index |
|
|
CBI |
Codon bias index |
|
|
Fop |
Frequency of optimal codons |
|
|
Nc |
Effective number of codons |
|
|
GC3s |
GC of silent 3rd codon posit |
|
|
GC |
GC content of gene |
|
|
L_sym |
Number of synonymous codons |
|
|
Gravy |
Hydrophobicity of protein |
|
|
Aromo |
Aromaticity of protein |
|
Amino acid usage frequency |
Amino acid |
A,R,D,C,Q,E,G,H,I,N,L,K,M,F,P,S,T,W,Y,V |
Python 3.7 |
|
Rare_aa_ratio |
Frequency of rare amino acids |
|
|
Close_aa_ratio |
Number of codons 3rd stop codon mutation |
|
Physicochemical properties of amino acids |
M_weight |
Molecular weight |
Pepstats [44] |
|
I_Point |
Isoelectric point |
|
|
Tiny |
(A + C + G + S + T) |
|
|
Small |
(A + B + C + D + G + N + P + S + T + V) |
|
|
Aliphatic |
(A + I + L + V) |
|
|
Aromatic |
(F + H + W + Y) |
|
|
Nonpolar |
(A + C + F + G + I + L + M + P + V + W + Y) |
|
|
Polar |
(D + E + H + K + N + Q + R + S + T + Z) |
|
|
Charged |
(B + D + E + H + K + R + Z) |
|
|
Basic |
(H + K + R) |
|
|
Acidic |
(B + D + E + Z) |
|
|
A_R Weight |
Average Residue Weight |
|
Transmembrane helix |
ExpAA |
Exp number of AAs in TMHs |
TMHMM3 [45] |
|
First60 |
Exp number, first 60 AAs |
|
|
PredHel |
Total prob of N-in |
|
Hurst |
Hurst |
Hurst
index |
R package [46] |
Information Entropy |
Shannon Entropy |
quantifies the average information content of the gene sequence from the distribution of symbols |
Python 3.7 |
|
Mutual Information |
measures the information shared by two random variables |
Python 3.7 |
|
Kullback–Leibler divergence |
measure the similarity of two probability distributions |
Python 3.7 |
|
Cross Entropy |
measure the difference information between two probability distributions |
Python 3.7 |