Skip to main content
. 2008 Dec 4;37(2):452–462. doi: 10.1093/nar/gkn944

Table 2.

Studying the effect of homolog availability for building profiles, the number of proteins in the RPS and the effect of maximum sequence identity among the sequences in the RPS on the performance of FIEFDom

Database Alignment Number of domains
One
Two
Three
Four
A Sp Sn A Sp Sn A Sp Sn A
Homolog availability
SCOP 1.73 (30%) PS 97 88 60 55 95 61 59 90 63 59
SCOP 1.73 (30%) SS 99 95 40 39 94 41 40 86 62 57
Number of proteins in RPS
SCOP 1.65 (30%) PS 97 86 54 50 96 58 57 93 45 44
SCOP 1.69 (30%) PS 97 90 57 54 93 58 56 91 49 47
SCOP 1.73 (30%) PS 97 88 60 55 95 61 59 90 63 59
Maximum sequence identity in RPS
SCOP 1.69 (20%) PS 97 86 43 41 90 42 40 71 19 17
SCOP 1.69 (30%) PS 97 90 57 54 93 58 56 91 49 47
SCOP 1.69 (40%) PS 97 91 67 63 92 66 62 93 56 54

A, accuracy; Sp, specificity; Sn, sensitivity. Alignment: PS profile-sequence, SS- sequence-sequence alignment. All values are percentages. Top: The availability of homology information for query sequences is simulated by using either the query profile (profile-sequence consistent with high availability) or the query sequence itself (sequence-sequence consistent with low availability) to search for identical fragments in the RPS. For multidomain proteins, the profile-sequence yields on average 13% higher overall accuracy, compared to the sequence-sequence alignment method. Middle: Every other version of the SCOP database, with 30% maximum sequence identity among the proteins, is used to study the effect of number of proteins in the RPS. The larger the size of the RPS (see Table 1 for the detailed breakdown in number of proteins and domain compositions), the higher is the average domain boundary prediction accuracy for multidomain proteins, presumably because the additional structure/sequence information uncovered as additional novel structures are added to the database. Bottom: Three simulations were conducted by experimenting with databases of three different maximum sequence identities among the reference proteins. The maximum sequence identity among the reference proteins varies from 20% to 40%.