. Author manuscript; available in PMC: 2023 Mar 18.

Published in final edited form as: Nat Methods. 2022 Aug 11;19(9):1116–1125. doi: 10.1038/s41592-022-01574-4

Table 1.

AUC-ROC and AUC-PrecRec of identifying inconsistent positions

	Average AUC-ROC				Average AUC-PrecRec				AUC-ROC All				AUC-PercRec All
Window Size	1	3	11	19	1	3	11	19	1	3	11	19	1	3	11	19
DAQ(AA)	0.76	0.84	0.90	0.92	0.41	0.55	0.70	0.73	0.78	0.88	0.95	0.96	0.44	0.63	0.81	0.85
Q-score	0.72	0.77	0.81	0.83	0.41	0.47	0.53	0.55	0.66	0.70	0.73	0.74	0.36	0.38	0.37	0.36
EMRinger	0.55	0.58	0.62	0.63	0.22	0.24	0.31	0.33	0.56	0.59	0.66	0.69	0.15	0.17	0.23	0.27
CaBLAM	0.66	0.69	0.72	0.73	0.33	0.38	0.48	0.48	0.63	0.65	0.68	0.70	0.28	0.30	0.37	0.39

Performance in detecting inconsistent residue positions in 35 first version models of the PDB2Ver dataset was evaluated for four validation scores: DAQ(AA), Q-score, EMRinger, and CaBLAM. Average AUC-ROC and Average AUC-PrecRoc, values were computed for each model separately and then averaged over the models. The latter two evaluations, AUC-ROC All and AUC-PrecRoc All, considered the 35 models altogether (Methods). Four window sizes (1, 3, 11, and 19 residues) were used to average scores. The largest values in each column are indicated in bold. Supplementary Figure 7 shows the score distributions for inconsistent and consistent residue positions.