. 2012 Aug 31;5:472. doi: 10.1186/1756-0500-5-472

Table 5.

The dataset

Dataset
Protein	D1	D2	Sequences	Species
1A45	1 82	83 173	160	E(146)N(14)
1BIB	67 270	271 317	236	A(12)B(201)N(23)
1BKS	1 188	189 268	478	A(21)B(401)E(10)N(46)
1FNB	19 152	153 314	58	B(22)E(34)N(2)
1G8A	1 51	52 227	75	A(47)E(20)N(8)
1G8P	18 216	261 350	230	A(10)B(143)E(49)N(28)
1I39	1 158	159 200	688	A(32)B(538)E(7)V(1)U(1)N(109)
1J5X	2 169	170 319	252	A(9)B(183)E(5)N(55)
1LAP	1 147	148 484	454	A(2)B(331)E(84)N(37)
1LLD	7 148	149 319	709	A(33)B(389)E(221)N(66)
1MRI	1 162	163 246	68	B(2)E(65)N(1)
1PII	1 255	256 452	75	B(65)N(10)
1RHD	1 156	157 293	505	A(26)B(365)E(57)U(1)N(56)
1THM	1 127	128 208	106	A(1)B(62)E(34)N(9)
1W98	88 227	228 357	70	E(64)N(6)
1WRU	3 176	177 346	64	B(58)V(2)N(4)
1X2G	1 246	247 337	224	A(2)B(155)E(42)N(25)
2AAA	1 376	377 484	245	B(141)E(74)N(30)
2AHE	16 108	109 253	144	B(25)E(100)N(19)
2D3V	3 95	96 195	77	E(71)N(6)
2D8N	16 97	102 189	240	E(195)N(45)
2E64	1 188	189 235	294	A(9)B(231)E(4)U(1)N(49)
2I00	10 300	301 406	116	A(2)B(80)N(34)
2IU5	1 71	72 180	65	B(56)N(9)
2NPO	3 76	77 188	224	A(3)B(182)U(1)N(38)
2NRC	1 247	261 480	188	A(9)B(96)E(68)N(15)
2OF7	17 67	68 207	204	B(135)N(69)
2OI8	8 86	87 216	215	B(151)N(64)
2PGD	1 172	178 433	317	B(211)E(78)N(28)
2PGE	3 136	137 368	138	A(6)B(102)E(1)N(29)
2PGX	2 56	57 250	102	B(87)N(15)
2PHZ	20 142	143 296	420	A(4)B(343)N(73)
2QY9	201 284	285 495	471	A(32)B(344)E(15)N(80)
2REB	23 268	269 328	482	B(434)E(12)N(36)
2TS1	1 220	248 319	598	B(512)E(34)N(52)
4ENL	1 126	127 436	649	A(32)B(448)E(122)N(47)
4MDH	1 154	155 333	339	A(6)B(173)E(134)N(26)
5FBP	1 201	202 335	355	A(3)B(213)E(112)N(27)
6GST	1 82	90 217	374	B(10)E(312)N(52)
8TLN	1 135	136 316	44	A(1)B(36)E(2)N(5)

The “protein” column contains a list of pdb identifiers [40]. D1 and D2 columns denote the start and end pdb residues of domains 1 and 2, respectively. For all pdbs listed, the start and end residues are located in chain A of the structure, except for pdb 1W98 where the mentioned domains are in chain B, and pdb 8TLN in chain E. The “sequences” column indicates the number of sequences present in the multiple sequence alignment (MSA). The final column states the distribution of sequences in each MSA taken from the various species’ domains: eukaryotes (E); archea (A); bacteria (B); viruses (V); unclassified (U); and not found (N), i.e. those sequences that could not be found in the NCBI Taxonomy Database. This dataset was taken from Hamer et al.[12].