Table 1. Evaluation scenarios defined by phase4, given a protein sequence database that is organized into families and superfamilies.
Scenario | Description |
---|---|
Distant relationship (distant family one model) | From a superfamily, each family in turn is chosen to provide the test sequences. |
The remaining families within that superfamily provide the training sequences. | |
Close relationship (family halves one model) | For each superfamily, half of the sequences of each of its families are chosen as training sequences and the remaining ones are chosen as test sequences. |
Very close relationship (family half one model) | For each superfamily: For each family, half of its sequences are chosen as test sequences, and the remaining ones are chosen as training sequences. The sequences of the surrounding superfamily are ignored in the evaluation. |
Note that training sequences are always ignored in the evaluation and that the division into test and training sequences as described above is performed for each superfamily in turn. For the last model, average performance is calculated over an additional inner loop that considers each family in turn.