Skip to main content
. 2024 Jul;34(7):1027–1035. doi: 10.1101/gr.278870.123

Table 2.

Comparison of CodonBERT to prior methods on seven downstream tasks

Model Flu vaccines mRFP expression Fungal expression E. coli proteins mRNA stability Tc-riboswitch SARS-CoV-2 vaccine degradation
Nucleotide-based
 Plain TextCNN 0.72 0.62 0.53 0.39 0.01 0.41 0.55
 RNABERT+TextCNN 0.65 0.40 0.41 0.39 0.16 0.47 0.64
 RNA-FM+TextCNN 0.71 0.80 0.59 0.43 0.34 0.58 0.74
Codon-based
 TF-IDF 0.68 0.57 0.68 0.44 0.54 0.49 0.69
 Plain TextCNN 0.71 0.78 0.76 0.36 0.26 0.43 0.80
 Codon2vec+TextCNN 0.72 0.77 0.61 0.43 0.33 0.56 0.70
 CodonBERT 0.81 0.85 0.88 0.55 0.51 0.56 0.77

For regression tasks, the corresponding Spearman's rank correlation values are listed. For the classification task (E. coli protein data set), classification accuracy is calculated. The best values of correlation and accuracy for each task are in bold. The corresponding loss values are listed in Supplemental Table S1.