Skip to main content
. 2021 Dec 24;38(6):1677–1684. doi: 10.1093/bioinformatics/btab859

Table 1.

Summary of the evaluation for predicting causative variants in the benchmark dataset of dbVar, time-based split, for 1503 newly added variants along with the evaluation for 640 newly added variants associated with 175 new diseases which were not present in our training dataset

Synthetic dataset
Synthetic dataset (novel diseases)
Recall@1 Recall@10 Recall@30 ROCAUC PRAUC Recall@1 Recall@10 Recall@30 ROCAUC PRAUC
DeepSVP models using average score GO 435 (0.2894) 626 (0.4165) 811 (0.5396) 0.9647 0.3303 114 (0.1781) 192 (0.3000) 259 (0.4047) 0.9560 0.2163
MP 634 (0.4218) 1035 (0.6886) 1217 (0.8097) 0.9850 0.5123 208 (0.3250) 330 (0.5156) 436 (0.6813) 0.9728 0.3873
HP 545 (0.3626) 977 (0.6500) 1220 (0.8117) 0.9828 0.4528 157 (0.2453) 352 (0.5500) 447 (0.6984) 0.9760 0.3263
CL 157 (0.1045) 542 (0.3606) 882 (0.5868) 0.9740 0.1761 40 (0.0625) 125 (0.1953) 262 (0.4094) 0.9659 0.1050
UBERON 254 (0.1690) 602 (0.4005) 1097 (0.7299) 0.9752 0.2377 32 (0.0500) 147 (0.2297) 347 (0.5422) 0.9627 0.1070
Union 678 (0.4511) 1055 (0.7019) 1248 (0.8303) 0.9854 0.5424 221 (0.3453) 436 (0.6813) 545 (0.8516) 0.9858 0.4578
DeepSVP models using maximum score GO 325 (0.2162) 536 (0.3566) 725 (0.4824) 0.9558 0.2670 97 (0.1516) 174 (0.2719) 245 (0.3828) 0.9494 0.1917
MP 237 (0.1577) 630 (0.4192) 855 (0.5689) 0.9605 0.2492 102 (0.1594) 156 (0.2437) 233 (0.3641) 0.9431 0.1949
HP 445 (0.2961) 1088 (0.7239) 1348 (0.8969) 0.9929 0.4364 122 (0.1906) 370 (0.5781) 528 (0.8250) 0.9901 0.3194
CL 272 (0.1810) 835 (0.5556) 1148 (0.7638) 0.9801 0.2569 52 (0.0813) 250 (0.3906) 390 (0.6094) 0.9756 0.1429
UBERON 259 (0.1723) 637 (0.4238) 1049 (0.6979) 0.9733 0.2417 69 (0.1078) 161 (0.2516) 369 (0.5766) 0.9656 0.1550
Union 328 (0.2182) 948 (0.6307) 1122 (0.7465) 0.9750 0.3489 85 (0.1328) 363 (0.5672) 457 (0.7141) 0.9758 0.2585
SV pathogenicity prediction/ranking StrVCTVRE 72 (0.0479) 223 (0.1484) 405 (0.2695) 0.9178 0.0952 34 (0.0531) 120 (0.1875) 210 (0.3281) 0.9308 0.1142
CADD-SV 38 (0.0253) 620 (0.4125) 1020 (0.6786) 0.9816 0.1262 9 (0.0141) 162 (0.2531) 373 (0.5828) 0.9871 0.0860
AnnotSV 19 (0.0126) 229 (0.1524) 700 (0.4657) 0.9605 0.2203 5 (0.0078) 60 (0.0938) 190 (0.2969) 0.9424 0.2319

Note: The evaluation inserts one disease-associated SV in a whole genome and reports the rank at which the inserted variant was recovered. Some methods provide the same score for variants, and we break ties randomly and report the absolute number of variants recovered at each rank together with the recall, as well as areas under the ROC curve (using microaverages per genome) and precision–recall curve. Best performing results (using maximum or average score) for each measure are indicated in bold.