Table 2.
Training (ENVIM) | Training (MelonnPan) | Testing (ENVIM) | Testing (MelonnPan) | Predictable metabolites (defined by MelonnPan) | |
---|---|---|---|---|---|
ZOE 2.0 (NM = 503) | |||||
DNA only | 356 (71%) | 63 (13%) | 124 (25%) | 47 (9%) | 70 |
RNA only | 409 (81%) | 157 (31%) | 106 (21%) | 68 (14%) | 163 |
Both DNA and RNA | 423 (84%) | 146 (29%) | 110 (22%) | 73 (15%) | 154 |
Mallick cohort (NM = 466) | |||||
DNA only | 408 (88%) | 239 (51%) | 225 (48%) | 178 (38%) | 249 |
Lloyd-Price cohort (NM = 522) | |||||
DNA only | 501 (96%) | 271 (52%) | 322 (62%) | 193 (37%) | 305 |
RNA only | 521 (100%) | 298 (57%) | 393 (75%) | 236 (45%) | 318 |
Both DNA and RNA | 518 (99%) | 306 (59%) | 381 (73%) | 232 (44%) | 323 |
Based on the “well-prediction” criterion, defined as Spearman correlation ≥0.3 between the observed and the predicted metabolites, the numbers of well-predicted metabolites with different prediction methods, datasets, and modality levels (DNA, RNA, and Both) are presented for comparing MelonnPan and ENVIM. NM is the number of metabolites to be predicted. Percentages in parentheses (%) represent the number of well-predicted metabolites divided by the total number of metabolites (NM) to be predicted in each study. The Mallick cohort has only metagenomics data available. The last column presents numbers of “predictable metabolites,” defined by MelonnPan, also seen in the Figure 2 legend. Bold in the column of in testing results represents the highest number of well-predicted metabolites among the three modalities (DNA, RNA, both DNA and RNA) in the ZOE2.0 cohort and the Lloyd-Price cohort.