Table 2.
Gram-negative | Gram-positive | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Metric | Model | S. typhimurium | E. coli | C. crescentus | M. smegmatis | B. subtilis | S. coelicolor | S. aureus | |||||||
MS | SS | MS | SS | MS | SS | MS | SS | MS | SS | MS | SS | MS | SS | ||
ROC AUC | Full | 0.983 | 0.991 | 0.991 | 0.995 | 0.971 | 0.973 | 0.930 | 0.956 | 0.985 | 0.993 | 0.973 | 0.966 | 0.983 | 0.995 |
CNN | 0.943 | 0.962 | 0.969 | 0.976 | 0.918 | 0.946 | 0.877 | 0.929 | 0.956 | 0.974 | 0.935 | 0.949 | 0.969 | 0.987 | |
RNN | 0.939 | 0.980 | 0.934 | 0.980 | 0.923 | 0.958 | 0.809 | 0.854 | 0.942 | 0.982 | 0.907 | 0.913 | 0.933 | 0.965 | |
PR AUC | Full | 0.804 | 0.910 | 0.860 | 0.943 | 0.710 | 0.842 | 0.522 | 0.717 | 0.796 | 0.922 | 0.777 | 0.863 | 0.874 | 0.965 |
CNN | 0.574 | 0.706 | 0.640 | 0.763 | 0.562 | 0.730 | 0.419 | 0.627 | 0.639 | 0.779 | 0.622 | 0.760 | 0.812 | 0.910 | |
RNN | 0.533 | 0.777 | 0.531 | 0.812 | 0.576 | 0.781 | 0.114 | 0.175 | 0.508 | 0.768 | 0.478 | 0.637 | 0.485 | 0.707 | |
ROC AUC | REP | - | 0.916 | - | 0.916 | - | 0.838 | - | 0.821 | - | 0.933 | - | 0.838 | - | 0.944 |
PR AUC | REP | - | 0.735 | - | 0.799 | - | 0.344 | - | 0.285 | - | 0.889 | - | 0.272 | - | 0.910 |
The performance metrics for are given in case multiple start sites are considered possible (MS) and in case each stop codon can only have a single predicted start site (SS). Performances of DeepRibo using either the DNA sequences as input (CNN) or ribo-seq data (RNN) highlights the improved performance if both features are combined in one model (Full). The performances on REPARATION (REP) are furthermore given. Note that these models are both trained and evaluated on the listed dataset using cross-validation.