Skip to main content
. Author manuscript; available in PMC: 2021 Sep 23.
Published in final edited form as: J Chem Inf Model. 2020 Mar 10;60(3):1253–1275. doi: 10.1021/acs.jcim.9b01080

Table 7.

Standard and Nonstandard InChI Recapitulation across All Rules (InChI used: V.1.05)a

Complete pass Partial pass
Database For any applicable rule Complete pass for at least one rule and partial pass for other Tautomeric molecules count Overall InChI recapitulationb (%) Overall strict InChI recapitulationc (%)
StdInChI
Drugbank 1,042 100 375 7427 62.11 14.03
965 1431 700
PDB ligands 3494 360 1354 22,939 69.83 15.23
3402 4794 2615
CSD organics 16,807 3379 2351 153,091 35.28 10.98
16,469 11,127 3872
ChEMBL 207,453 36,033 48,316 1,398,045 70.64 14.84
304,087 246,541 145,095
AMS 1,126,213 289,808 116,649 6,358,861 73.38 17.71
1,657,392 1,030,261 445,996
SureChEMBL 1,802,766 268,598 517,010 12,621,006 62.21 14.28
1,949,348 2,006,240 1,307,812
PubChem 10,516,304 1,417,527 1,580,535 67,262,970 66.36 15.63
14,270,022 12,801,744 4,050,060
ChemNav 17,418,383 4,447,222 1,500,175 105,565,942 80.30 16.50
33,623,754 22,336,554 5,438,461
CSDB 17,154,105 4,534,817 1,694,508 115,696,900 79.08 14.83
36,928,720 23,633,538 7,547,799
NonStdInChI
Drugbank 2016 157 582 7427 81.88 27.14
658 1909 759
PDB ligands 5484 502 2169 22,939 83.47 23.91
2305 5841 2847
CSD organics 45,556 5690 7982 153,091 65.10 29.76
12,143 20,702 7592
ChEMBL 330,685 43,749 98,588 1,398,045 83.44 23.65
219,892 299,824 173,848
AMS 1,534,982 306,656 307,735 6,358,861 81.57 24.14
1,263,143 1,126,724 647,711
SureChEMBL 2,917,438 366,712 866,922 12,621,006 75.69 23.12
1,419,390 2,512,248 1,470,143
PubChem 15,900,675 1,826,999 2,973,696 67,250,941 77.94 23.64
11,266,876 15,119,739 5,328,079
ChemNav 22,942,776 4,617,529 3,328,166 105,565,942 86.64 21.73
25,734,204 25,121,432 9,719,674
CSDB 23,447,796 4,921,978 3,883,624 115,679,596 86.76 20.27
28,119,041 27,699,416 12,295,971
a

The first row of the three columns “Complete pass”, “Partial pass”, and “Complete pass for one rule and partial pass for other” for each database shown here contains numbers without failure by any other rule, whereas the second row for each database (in italics) shows the results for the cases with failures included. For more detailed explanation of these columns and failure-containing data added, please refer to the third spreadsheet in the SI.

b

“Overall InChI recapitulation” is the percentage of the sum of the six columns named “Complete pass”, “Partial pass”, and “Complete pass for one rule and partial pass for other” and three columns that failed relative to the tautomeric molecules of that database.

c

“Overall strict InChI recapitulation” is the percentage of molecules where input InChI matches with all enumerated tautomers generated by at least one rule (Complete pass) relative to tautomeric molecules of that database.