. Author manuscript; available in PMC: 2017 May 3.

Published in final edited form as: Structure. 2016 Apr 7;24(5):826–837. doi: 10.1016/j.str.2016.03.008

Table 1.

Statistics on the Surfaces of apo Structures within the Canonical Set of Proteins

PDB ID	Surf (SC Res)	Surf (LB Res)	SC-LB Overlap	No. of SC Sites	No. of LB Sites	No. of Overlapping Sites	% LB Sites Identified
3pfk	0.51	0.204	0.255 (0.155)	19	3	3	1
4ake	0.454	0.178	0.274 (0.154)	29	2	2	1
1cd5	0.589	0.1	0.153 (0.096)	24	2	1	0.5
1j3h	0.066	0.08	0.25 (0.041)	2	1	1	1
1bks	0.343	0.097	0.079 (0.079)	24	4	1	0.25
1e5x	0.207	0.093	0.139 (0.077)	17	3	2	0.667
1efk	0.055	0.086	0.03 (0.036)	10	10	0	0
1nr7	0.149	0.175	0.187 (0.102)	45	24	6	0.25
1xtt	0.298	0.196	0.295 (0.154)	31	5	5	1
2hnp	0.739	0.133	0.16 (0.134)	25	2	2	1
3d7s	0.267	0.137	0.054 (0.064)	26	9	0	0
3ju5	0.016	0.039	0 (0.013)	1	2	0	0
Mean	0.308	0.127	0.156 (0.092)	21.083	5.583	1.917	0.556

For each apo structure within the canonical set of proteins, statistics relating surface-critical sites to known ligand-binding sites are reported. The surface of a given structure is defined to be the set of all residues that have a relative solvent accessibility of at least 50%, where relative solvent accessibility is evaluated using all heavy atoms in both the main chain and side chain of a given residue. Mean values are given in the bottom row. NACCESS is used to calculate relative solvent accessibility (Hubbard and Thornton, 1993). Column 1: protein name and PDB IDs for each structure. Column 2: among these surface residues, the fraction that constitutes surface-critical residues (SC Res). Column 3: among surface residues, the fraction that constitutes known ligand-binding residues (LB Res) (known ligand-binding residues are taken to be those within 4.5 Å of the ligand in the holo structure; Table S1). Column 4: the Jaccard similarity between the sets of residues represented in columns 2 and 3 (i.e., surface-critical and known ligand-binding residues), where values given in parentheses represent the expected Jaccard similarity, given a null model in which surface-critical and ligand-binding residues are randomly distributed throughout the surface (for each structure, 10,000 simulations are performed to produce random distributions, and the expected values reported here constitute the mean Jaccard similarity among the 10,000 simulations for each structure). Column 5: the number of distinct surface-critical sites identified in each structure. Column 6: the number of known ligand-binding sites in each structure. Column 7: the number of known ligand-binding sites which are positively identified within the set of surface-critical sites, where a positive match occurs if a majority of the residues in a surface-critical site coincide with the known ligand-binding site. Column 8: the fraction of ligand-binding sites captured is simply the ratio of the values in column 7 to those in column 6. See also Figure S1; Tables S1 and S2.