Table 5.
The dataset
| Dataset | ||||
|---|---|---|---|---|
| 
Protein | 
D1 | 
D2 | 
Sequences | 
Species | 
| 1A45 | 
1 82 | 
83 173 | 
160 | 
E(146)N(14) | 
| 1BIB | 
67 270 | 
271 317 | 
236 | 
A(12)B(201)N(23) | 
| 1BKS | 
1 188 | 
189 268 | 
478 | 
A(21)B(401)E(10)N(46) | 
| 1FNB | 
19 152 | 
153 314 | 
58 | 
B(22)E(34)N(2) | 
| 1G8A | 
1 51 | 
52 227 | 
75 | 
A(47)E(20)N(8) | 
| 1G8P | 
18 216 | 
261 350 | 
230 | 
A(10)B(143)E(49)N(28) | 
| 1I39 | 
1 158 | 
159 200 | 
688 | 
A(32)B(538)E(7)V(1)U(1)N(109) | 
| 1J5X | 
2 169 | 
170 319 | 
252 | 
A(9)B(183)E(5)N(55) | 
| 1LAP | 
1 147 | 
148 484 | 
454 | 
A(2)B(331)E(84)N(37) | 
| 1LLD | 
7 148 | 
149 319 | 
709 | 
A(33)B(389)E(221)N(66) | 
| 1MRI | 
1 162 | 
163 246 | 
68 | 
B(2)E(65)N(1) | 
| 1PII | 
1 255 | 
256 452 | 
75 | 
B(65)N(10) | 
| 1RHD | 
1 156 | 
157 293 | 
505 | 
A(26)B(365)E(57)U(1)N(56) | 
| 1THM | 
1 127 | 
128 208 | 
106 | 
A(1)B(62)E(34)N(9) | 
| 1W98 | 
88 227 | 
228 357 | 
70 | 
E(64)N(6) | 
| 1WRU | 
3 176 | 
177 346 | 
64 | 
B(58)V(2)N(4) | 
| 1X2G | 
1 246 | 
247 337 | 
224 | 
A(2)B(155)E(42)N(25) | 
| 2AAA | 
1 376 | 
377 484 | 
245 | 
B(141)E(74)N(30) | 
| 2AHE | 
16 108 | 
109 253 | 
144 | 
B(25)E(100)N(19) | 
| 2D3V | 
3 95 | 
96 195 | 
77 | 
E(71)N(6) | 
| 2D8N | 
16 97 | 
102 189 | 
240 | 
E(195)N(45) | 
| 2E64 | 
1 188 | 
189 235 | 
294 | 
A(9)B(231)E(4)U(1)N(49) | 
| 2I00 | 
10 300 | 
301 406 | 
116 | 
A(2)B(80)N(34) | 
| 2IU5 | 
1 71 | 
72 180 | 
65 | 
B(56)N(9) | 
| 2NPO | 
3 76 | 
77 188 | 
224 | 
A(3)B(182)U(1)N(38) | 
| 2NRC | 
1 247 | 
261 480 | 
188 | 
A(9)B(96)E(68)N(15) | 
| 2OF7 | 
17 67 | 
68 207 | 
204 | 
B(135)N(69) | 
| 2OI8 | 
8 86 | 
87 216 | 
215 | 
B(151)N(64) | 
| 2PGD | 
1 172 | 
178 433 | 
317 | 
B(211)E(78)N(28) | 
| 2PGE | 
3 136 | 
137 368 | 
138 | 
A(6)B(102)E(1)N(29) | 
| 2PGX | 
2 56 | 
57 250 | 
102 | 
B(87)N(15) | 
| 2PHZ | 
20 142 | 
143 296 | 
420 | 
A(4)B(343)N(73) | 
| 2QY9 | 
201 284 | 
285 495 | 
471 | 
A(32)B(344)E(15)N(80) | 
| 2REB | 
23 268 | 
269 328 | 
482 | 
B(434)E(12)N(36) | 
| 2TS1 | 
1 220 | 
248 319 | 
598 | 
B(512)E(34)N(52) | 
| 4ENL | 
1 126 | 
127 436 | 
649 | 
A(32)B(448)E(122)N(47) | 
| 4MDH | 
1 154 | 
155 333 | 
339 | 
A(6)B(173)E(134)N(26) | 
| 5FBP | 
1 201 | 
202 335 | 
355 | 
A(3)B(213)E(112)N(27) | 
| 6GST | 
1 82 | 
90 217 | 
374 | 
B(10)E(312)N(52) | 
| 8TLN | 1 135 | 136 316 | 44 | A(1)B(36)E(2)N(5) | 
The “protein” column contains a list of pdb identifiers [40]. D1 and D2 columns denote the start and end pdb residues of domains 1 and 2, respectively. For all pdbs listed, the start and end residues are located in chain A of the structure, except for pdb 1W98 where the mentioned domains are in chain B, and pdb 8TLN in chain E. The “sequences” column indicates the number of sequences present in the multiple sequence alignment (MSA). The final column states the distribution of sequences in each MSA taken from the various species’ domains: eukaryotes (E); archea (A); bacteria (B); viruses (V); unclassified (U); and not found (N), i.e. those sequences that could not be found in the NCBI Taxonomy Database. This dataset was taken from Hamer et al.[12].