Skip to main content
. 2023 Jul 18;15(7):evad129. doi: 10.1093/gbe/evad129

Table 1.

Summary of Top Performing v3 Models After Feature Selection Using RFE-CV

AA seq Optimal Features (ap) Residue Positions (Zm) ROC AUC Precision Recall F1 Acc
RbcL 101, 143, 281, 309, 418, 468 101, 143, 281, 309, 418, 468 1.0 1.0 0.9997 0.9998 0.9998
RpoC2 357, 368, 450, 774, 1050, 1132, 1290 357, 368, 450, 706, 942, 1011, 1151 0.9639 0.9728 0.9471 0.9591 0.955
NdhI 5, 25, 38, 49, 84, 88, 89, 147, 153 5, 25, 38, 49, 84, 88, 89, 147, 153 0.9583 0.9784 0.9197 0.9464 0.9423
NdhA 5, 29, 36, 98, 110, 187, 298, 301, 319, 320 5, 27, 34, 96, 108, 184, 293, 296, 314, 315 0.9805 0.9910 0.9031 0.9436 0.9401
RpoA 6, 14, 146, 163, 180, 243, 279, 326, 329, 336 6, 14, 146, 161, 176, 237, 270, 317, 327 0.969 0.9444 0.9305 0.9359 0.9295
MatK 49, 147, 159, 314, 378, 417, 436 16, 111, 123, 274, 338, 377, 396 0.9673 0.8491 0.9716 0.93 0.9191
NdhD 64, 76, 114, 334, 364, 376, 442, 451, 497, 501 62, 74, 112, 332, 362, 374, 440, 449, 495, 499 0.963 0.9214 0.9197 0.9183 0.9089
NdhF 89, 145, 287, 340, 400, 566, 568, 597, 659 89, 145, 287, 340, 400, 555, 557, 586, 646 0.9653 0.9860 0.8497 0.9114 0.9082

ap—positions numbered according to alignment position.

Z. m—Residue positions numbered according to positions in Zea mays protein sequence.

Average scores of classification metrics (ROC AUC, Precision, Recall, F1, Accuracy) after cross validation (using repeated random subsampling, n = 500, 70/30 train/test split) are shown. Gray cells indicate models and model performances that are potential artifacts of corresponding sequence length/variation. ROC AUC, area under receiver operating characteristic curve.