Table 1.
Benchmark testsa | Data | Classification tasks | Comparison methodsb |
---|---|---|---|
Classification of protein domains in SCOP [PCB0001, PCB00003, PDB0005] | 11 944 Protein sequences/or protein structures from SCOP95 (6) | Superfamilies subdivided into families………246 | BLAST, Smith–Waterman, Needleman–Wunsch, LA–kernel, PRIDE2 |
Folds subdivided into superfamilies………191 | |||
Classes subdivided into folds………377 | |||
Classification of protein domains in CATH [PCB00007, PCB00009, PCB00011, PCB00013] | 11 373 Protein sequences/or protein structures from CATH (7) | (H) groups subdivided into S groups………165 | BLAST, Smith–Waterman, Needleman–Wunsch, LA–kernel, PRIDE2 |
T groups subdivided into H groups………199 | |||
A groups subdivided into T groups………297 | |||
Classes subdivided into A groups………33 | |||
CLassification of phyla based on 3 phospho-glycerate kinase (3PGK) sequences. [PCB00031, PCB00032] | 131 3PGK Protein and DNA sequences (11,29) | Groups of kingdoms (Archaea, Bacteria, Eucarya) subdivided into phyla……10 | BLAST, Smith–Waterman, Needleman–Wunsch, LA–kernel, LZW, PPMZ |
Functional annotation of unicellular eukaryotic sequences based on prokaryotic orthologs. [PCB00031] | 17 973 Sequences of prokaryotes and unicellular eukaryotes from the COG databases (5) | Orthologous groups subdivided into prokaryotes and eukaryotes………119 | BLAST, Smith–Waterman, Needleman–Wunsch, LA–kernel, LZW, PPMZ |
aThe collection contains a total of 6405 benchmark tests including a total of 3297 protein sequence classification tests, 3095 3D classification tests and 10 DNA (coding region) classification tests. The accession numbers of the records are given in square brackets.
bSee text for the references.