Skip to main content
. 2012 Aug 31;5:472. doi: 10.1186/1756-0500-5-472

Table 5.

The dataset

Dataset
Protein
D1
D2
Sequences
Species
1A45
1 82
83 173
160
E(146)N(14)
1BIB
67 270
271 317
236
A(12)B(201)N(23)
1BKS
1 188
189 268
478
A(21)B(401)E(10)N(46)
1FNB
19 152
153 314
58
B(22)E(34)N(2)
1G8A
1 51
52 227
75
A(47)E(20)N(8)
1G8P
18 216
261 350
230
A(10)B(143)E(49)N(28)
1I39
1 158
159 200
688
A(32)B(538)E(7)V(1)U(1)N(109)
1J5X
2 169
170 319
252
A(9)B(183)E(5)N(55)
1LAP
1 147
148 484
454
A(2)B(331)E(84)N(37)
1LLD
7 148
149 319
709
A(33)B(389)E(221)N(66)
1MRI
1 162
163 246
68
B(2)E(65)N(1)
1PII
1 255
256 452
75
B(65)N(10)
1RHD
1 156
157 293
505
A(26)B(365)E(57)U(1)N(56)
1THM
1 127
128 208
106
A(1)B(62)E(34)N(9)
1W98
88 227
228 357
70
E(64)N(6)
1WRU
3 176
177 346
64
B(58)V(2)N(4)
1X2G
1 246
247 337
224
A(2)B(155)E(42)N(25)
2AAA
1 376
377 484
245
B(141)E(74)N(30)
2AHE
16 108
109 253
144
B(25)E(100)N(19)
2D3V
3 95
96 195
77
E(71)N(6)
2D8N
16 97
102 189
240
E(195)N(45)
2E64
1 188
189 235
294
A(9)B(231)E(4)U(1)N(49)
2I00
10 300
301 406
116
A(2)B(80)N(34)
2IU5
1 71
72 180
65
B(56)N(9)
2NPO
3 76
77 188
224
A(3)B(182)U(1)N(38)
2NRC
1 247
261 480
188
A(9)B(96)E(68)N(15)
2OF7
17 67
68 207
204
B(135)N(69)
2OI8
8 86
87 216
215
B(151)N(64)
2PGD
1 172
178 433
317
B(211)E(78)N(28)
2PGE
3 136
137 368
138
A(6)B(102)E(1)N(29)
2PGX
2 56
57 250
102
B(87)N(15)
2PHZ
20 142
143 296
420
A(4)B(343)N(73)
2QY9
201 284
285 495
471
A(32)B(344)E(15)N(80)
2REB
23 268
269 328
482
B(434)E(12)N(36)
2TS1
1 220
248 319
598
B(512)E(34)N(52)
4ENL
1 126
127 436
649
A(32)B(448)E(122)N(47)
4MDH
1 154
155 333
339
A(6)B(173)E(134)N(26)
5FBP
1 201
202 335
355
A(3)B(213)E(112)N(27)
6GST
1 82
90 217
374
B(10)E(312)N(52)
8TLN 1 135 136 316 44 A(1)B(36)E(2)N(5)

The “protein” column contains a list of pdb identifiers [40]. D1 and D2 columns denote the start and end pdb residues of domains 1 and 2, respectively. For all pdbs listed, the start and end residues are located in chain A of the structure, except for pdb 1W98 where the mentioned domains are in chain B, and pdb 8TLN in chain E. The “sequences” column indicates the number of sequences present in the multiple sequence alignment (MSA). The final column states the distribution of sequences in each MSA taken from the various species’ domains: eukaryotes (E); archea (A); bacteria (B); viruses (V); unclassified (U); and not found (N), i.e. those sequences that could not be found in the NCBI Taxonomy Database. This dataset was taken from Hamer et al.[12].