Table 2.
Model | Flu vaccines | mRFP expression | Fungal expression | E. coli proteins | mRNA stability | Tc-riboswitch | SARS-CoV-2 vaccine degradation |
---|---|---|---|---|---|---|---|
Nucleotide-based | |||||||
Plain TextCNN | 0.72 | 0.62 | 0.53 | 0.39 | 0.01 | 0.41 | 0.55 |
RNABERT+TextCNN | 0.65 | 0.40 | 0.41 | 0.39 | 0.16 | 0.47 | 0.64 |
RNA-FM+TextCNN | 0.71 | 0.80 | 0.59 | 0.43 | 0.34 | 0.58 | 0.74 |
Codon-based | |||||||
TF-IDF | 0.68 | 0.57 | 0.68 | 0.44 | 0.54 | 0.49 | 0.69 |
Plain TextCNN | 0.71 | 0.78 | 0.76 | 0.36 | 0.26 | 0.43 | 0.80 |
Codon2vec+TextCNN | 0.72 | 0.77 | 0.61 | 0.43 | 0.33 | 0.56 | 0.70 |
CodonBERT | 0.81 | 0.85 | 0.88 | 0.55 | 0.51 | 0.56 | 0.77 |
For regression tasks, the corresponding Spearman's rank correlation values are listed. For the classification task (E. coli protein data set), classification accuracy is calculated. The best values of correlation and accuracy for each task are in bold. The corresponding loss values are listed in Supplemental Table S1.