. 2013 Dec 12;9(12):e1003382. doi: 10.1371/journal.pcbi.1003382

Table 1. The different datasets constructed and used in this study and their composition.

Data set	Protein chains	nsSNPs	Description
1 kG	19,058	106,311	A data set containing all the 1 kG variants filtered by population.
OMIM	19,058	10,151	A protein sequence based set containing OMIM variants for all reviewed UniProt human proteins.
Humsavar	19,058	23,846	A set based on human disease polymorphisms from UniProt.
3D	2,139	10,628	A protein 3D structure based set consisting of 1 kG variants for proteins that have a complete structure in the PDB.
Monomer	325	1,461	A subset of the 3D set containing only proteins identified as being monomeric.
Model	2,630	13,037	A set based on human ModBase homology models where sequence coverage and identity are between 90–100%.