Table 1.
Threshold of E-value | Structural database search | Superfamily assignmenta | |||||
---|---|---|---|---|---|---|---|
Recall | Precision | False positive rate | 894 proteins | Sequence identity <25%b | |||
Correct assignment (%) | Coveragec(%) | Correct assignment (%) | Coverage (%) | ||||
e−10 | 0.60 | 0.52 | 0.0091 | 96.68 | 97.76 | 86.32 | 92.58 |
e−15 | 0.50 | 0.81 | 0.0020 | 98.53 | 91.61 | 92.77 | 72.49 |
e−20 | 0.43 | 0.91 | 0.00056 | 99.60 | 84.23 | 97.60 | 54.59 |
e−25 | 0.39 | 0.95 | 0.00016 | 99.86 | 77.96 | 98.94 | 41.05 |
SCOP-894 consists of 894 query proteins from two subsets, SCOP95-1.67 and SCOP95-1.69. SCOP95-1.67 has 378 query proteins, which are in SCOP 1.67 but not in SCOP 1.65, and the search database is SCOP 1.65. SCOP95-1.69 consists of 516 query proteins, which are in SCOP1.69 but not in SCOP1.67, and the search database is SCOP1.69.
aThe first rank in the hit list of a query protein is assigned as the superfamily.
bThe predicted accuracy was calculated from 229 query proteins having <25% sequence identity.
cThe coverage is defined as P/T where P is the number of the assigned structures and T is total number of structures. For example, P is 819 and T is 894, and the coverage is 91.61% if the E-value is set to e−15 for the query set SCOP-894.