Table 1. The different datasets constructed and used in this study and their composition.
Data set | Protein chains | nsSNPs | Description |
1 kG | 19,058 | 106,311 | A data set containing all the 1 kG variants filtered by population. |
OMIM | 19,058 | 10,151 | A protein sequence based set containing OMIM variants for all reviewed UniProt human proteins. |
Humsavar | 19,058 | 23,846 | A set based on human disease polymorphisms from UniProt. |
3D | 2,139 | 10,628 | A protein 3D structure based set consisting of 1 kG variants for proteins that have a complete structure in the PDB. |
Monomer | 325 | 1,461 | A subset of the 3D set containing only proteins identified as being monomeric. |
Model | 2,630 | 13,037 | A set based on human ModBase homology models where sequence coverage and identity are between 90–100%. |