Skip to main content
. 2012 Jun 22;13:144. doi: 10.1186/1471-2105-13-144

Table 1.

Comparison of curated and automatically-generated domain hierarchies

CDD
Protein superfamily
number
length
Manually curated
Automatically generated
Ident.   seqs   nodes* LLR nodes* LLR time§
cd00030
C2
23,452
102
106 (103)
236574
78(73)
223857
19.4
cd00138
PLDc_SF
16,765
119
105 (102)
241766
36(34)
192876
10.0
cd00142
PI3Kc_like
2,409
219
22
34129
16
34563
4.5
cd00159
RhoGAP
4,815
169
39(38)
55604
32
53540
7.97
cd00173
SH2
5,917
79
111 (101)
49274
39
40075
3.5
cd00180
Protein kinases
104,912
215
280(260)
1378273
107(104)
1536991
241.0
cd00229
SG NH_hydrolase
14,635
187
30
180667
29
183822
14.95
cd00306
S8/S53 peptidase
10,960
241
36
161685
45(44)
173693
30.90
cd00368
Molybdopterin-Binding
9,540
374
26
177569
44
209704
39.3
cd00397
DNA_BRE_C
25,824
164
27 (26)
187382
39(37)
211739
16.9
cd00761
Glycosyltransferase A (GT-A)
66,260
156
71 (70)
944727
123(110)
1048396
193.8
cd00768
Class II aaRS-like core
37,160
211
17
674454
31
833691
54.3
cd00838
MPP_superfamily
33,753
131
61
402297
55(54)
399553
65.1
cd00900
PH-like
22,593
99
81
211812
99(98)
274945
52.3
cd01067
Globin_like
9,933
117
4 (1)
11133
26 (25)
73808
4.3
cd01391
Periplasmic_Binding_Protein_1
36,330
269
142(140)
619713
68(65)
580753
169.1
cd01494
AAT_I (Pyrodoxal-PO4-binding)
114,781
170
16
1086328
92(84)
2027660
249.67
cd01635
Glycosyltransferase GTB
44,366
229
45
723443
95(93)
881414
232.7
cd02156
Class I aaRS-like core √
53,605
105
34
522962
61(57)
698273
41.4
cd02883
Nudix_Hydrolase
32,046
123
55 (54)
321636
61(60)
367819
43.2
cd03128
GAT-1 (mcBPPS vs pmcBPPS)
46,514
92
34(32)
319515
64(62)
388621
42.2
cd03440
hot_dog
30,162
100
22(18)
141990
70 (69)
345298
39.1
cd03873
Zinc peptidases
24,455
237
81
596408
69(66)
590521
43.9
cd05466
Periplasmic_Binding_Protein_2
45,287
197
76(73)
523941
49(41)
411445
31.7
cd06587
Glo_EDI_BRP_like
36,165
112
60 (58)
335848
94(91)
479522
54.8
cd06663
Biotinyl-lipoyl
25,013
73
4
53038
25(18)
66571
4.53
cd06846
Adenylation_DNA_ligase_like
3,833
182
14
43276
20
48,475
4.8
cd08555
PI-PLCc_GDPD_SF
8,707
179
74 (73)
143201
37(32)
123075
6.9
cd08772
GH43_62_32_68 (β propellers)
6,760
286
28
111336
51(50)
176701
30.0
cl09931
Rossmann fold proteins
424,764
93
361 (347)
4110907
145(130)
4029120
757.2
  Average 44,057 167.7 66.4 486696 56.9 556884 83.6

After removing identical sequences and sequences that fail to align with at least 75% of the domain.

* Numbers in parentheses indicate the nodes retained after insignificant nodes were removed by the mcBPPS program.

The log-likelihood ratio in nats.

§ The time (in minutes) is for Steps 2 and 3 of the algorithm only; Step 1 can be parallelized to run in less than 10% of the time shown.