Table 3.
Precision analysis of the guilt-by-association algorithm
| Threshold for number of connections (TC) | ||||||
| Threshold for % disease genes among interactors (TD) | Q1 TC = 1 |
Q2 TC = 4 |
Q3 TC = 13 |
Mean TC = 12 |
||
| Q1 | TD = 12.8 | N Captured | 1,943 | 1,391 | 638 | 683 |
| % Known | 73.3 | 75.0 | 76.5 | 76.4 | ||
| Q2 | TD = 28.6 | N Captured | 1,024 | 563 | 195 | 219 |
| % Known | 74.8 | 78.9 | 85.1 | 84.9 | ||
| Q3 | TD = 50.0 | N Captured | 251 | 118 | 16 | 19 |
| % Known | 70.5 | 67.8 | 75.0 | 78.9 | ||
| Mean | TD = 35.0 | N Captured | 748 | 409 | 109 | 127 |
| % Known | 73.4 | 76.3 | 84.4 | 84.2 | ||
The optimality of various location parameters to be used as thresholds in the guilt-by-association algorithm was explored by computing the proportion of known (% Known) disease associated genes from the total number of captured genes (N Captured). The analysis was performed using only the 1,445 genes (out of the initial 6,151) with known disease phenotype as the set of truly disease causing, and with the remaining 4,706 declared as disease associated. The three inter-quartiles (Q1: 25th percentile; Q2: 50th percentile or median; and Q3: 75th percentile) plus the mean were used as thresholds.