Table 1.
Synthetic dataset |
Synthetic dataset (novel diseases) |
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Recall@1 | Recall@10 | Recall@30 | ROCAUC | PRAUC | Recall@1 | Recall@10 | Recall@30 | ROCAUC | PRAUC | ||
DeepSVP models using average score | GO | 435 (0.2894) | 626 (0.4165) | 811 (0.5396) | 0.9647 | 0.3303 | 114 (0.1781) | 192 (0.3000) | 259 (0.4047) | 0.9560 | 0.2163 |
MP | 634 (0.4218) | 1035 (0.6886) | 1217 (0.8097) | 0.9850 | 0.5123 | 208 (0.3250) | 330 (0.5156) | 436 (0.6813) | 0.9728 | 0.3873 | |
HP | 545 (0.3626) | 977 (0.6500) | 1220 (0.8117) | 0.9828 | 0.4528 | 157 (0.2453) | 352 (0.5500) | 447 (0.6984) | 0.9760 | 0.3263 | |
CL | 157 (0.1045) | 542 (0.3606) | 882 (0.5868) | 0.9740 | 0.1761 | 40 (0.0625) | 125 (0.1953) | 262 (0.4094) | 0.9659 | 0.1050 | |
UBERON | 254 (0.1690) | 602 (0.4005) | 1097 (0.7299) | 0.9752 | 0.2377 | 32 (0.0500) | 147 (0.2297) | 347 (0.5422) | 0.9627 | 0.1070 | |
Union | 678 (0.4511) | 1055 (0.7019) | 1248 (0.8303) | 0.9854 | 0.5424 | 221 (0.3453) | 436 (0.6813) | 545 (0.8516) | 0.9858 | 0.4578 | |
DeepSVP models using maximum score | GO | 325 (0.2162) | 536 (0.3566) | 725 (0.4824) | 0.9558 | 0.2670 | 97 (0.1516) | 174 (0.2719) | 245 (0.3828) | 0.9494 | 0.1917 |
MP | 237 (0.1577) | 630 (0.4192) | 855 (0.5689) | 0.9605 | 0.2492 | 102 (0.1594) | 156 (0.2437) | 233 (0.3641) | 0.9431 | 0.1949 | |
HP | 445 (0.2961) | 1088 (0.7239) | 1348 (0.8969) | 0.9929 | 0.4364 | 122 (0.1906) | 370 (0.5781) | 528 (0.8250) | 0.9901 | 0.3194 | |
CL | 272 (0.1810) | 835 (0.5556) | 1148 (0.7638) | 0.9801 | 0.2569 | 52 (0.0813) | 250 (0.3906) | 390 (0.6094) | 0.9756 | 0.1429 | |
UBERON | 259 (0.1723) | 637 (0.4238) | 1049 (0.6979) | 0.9733 | 0.2417 | 69 (0.1078) | 161 (0.2516) | 369 (0.5766) | 0.9656 | 0.1550 | |
Union | 328 (0.2182) | 948 (0.6307) | 1122 (0.7465) | 0.9750 | 0.3489 | 85 (0.1328) | 363 (0.5672) | 457 (0.7141) | 0.9758 | 0.2585 | |
SV pathogenicity prediction/ranking | StrVCTVRE | 72 (0.0479) | 223 (0.1484) | 405 (0.2695) | 0.9178 | 0.0952 | 34 (0.0531) | 120 (0.1875) | 210 (0.3281) | 0.9308 | 0.1142 |
CADD-SV | 38 (0.0253) | 620 (0.4125) | 1020 (0.6786) | 0.9816 | 0.1262 | 9 (0.0141) | 162 (0.2531) | 373 (0.5828) | 0.9871 | 0.0860 | |
AnnotSV | 19 (0.0126) | 229 (0.1524) | 700 (0.4657) | 0.9605 | 0.2203 | 5 (0.0078) | 60 (0.0938) | 190 (0.2969) | 0.9424 | 0.2319 |
Note: The evaluation inserts one disease-associated SV in a whole genome and reports the rank at which the inserted variant was recovered. Some methods provide the same score for variants, and we break ties randomly and report the absolute number of variants recovered at each rank together with the recall, as well as areas under the ROC curve (using microaverages per genome) and precision–recall curve. Best performing results (using maximum or average score) for each measure are indicated in bold.