Skip to main content
. 2020 Mar 3;11(12):3180–3191. doi: 10.1039/c9sc06561j

Total number of training and testing examples for chemical shift prediction for each atom type. The training set is comprised of the combination of the SPARTA+ training set and the training and testing set for SHIFTX+, and removing all redundant chains. We have developed a new test set comprised of 200 high-resolution proteins with chemical shifts available from RefDB; the test data eliminates duplicate chains, and residues with no deposited chemical shift values. The LH-Test set refers to the subset of the total set of test proteins with only low sequence homology to other proteins such that sequence or structural homology cannot be exploited. We also created two curated test sets which additionally exclude paramagmetic proteins, some Hα chemical shifts that have calculated ring current effect exceeding 1.5 ppm, and “outliers” detected by the PANAV program (13). Further information is provided in Methods and ESI.

# of PDBs H C′ N
Train 647 72 894 56 149 58 228 79 611 70 621 74 896
Test 200 19 120 11 727 8231 13 140 10 139 15 374
Test (curated) 200 18 494 11 240 7861 12 533 9883 14 610
LH-Test 100 8634 4979 3332 5685 4278 6576
LH-Test (curated) 100 8606 4950 3331 5251 4201 6480