Table 7.
Standard and Nonstandard InChI Recapitulation across All Rules (InChI used: V.1.05)a
Complete pass | Partial pass | |||||
---|---|---|---|---|---|---|
Database | For any applicable rule | Complete pass for at least one rule and partial pass for other | Tautomeric molecules count | Overall InChI recapitulationb (%) | Overall strict InChI recapitulationc (%) | |
StdInChI | ||||||
Drugbank | 1,042 | 100 | 375 | 7427 | 62.11 | 14.03 |
965 | 1431 | 700 | ||||
PDB ligands | 3494 | 360 | 1354 | 22,939 | 69.83 | 15.23 |
3402 | 4794 | 2615 | ||||
CSD organics | 16,807 | 3379 | 2351 | 153,091 | 35.28 | 10.98 |
16,469 | 11,127 | 3872 | ||||
ChEMBL | 207,453 | 36,033 | 48,316 | 1,398,045 | 70.64 | 14.84 |
304,087 | 246,541 | 145,095 | ||||
AMS | 1,126,213 | 289,808 | 116,649 | 6,358,861 | 73.38 | 17.71 |
1,657,392 | 1,030,261 | 445,996 | ||||
SureChEMBL | 1,802,766 | 268,598 | 517,010 | 12,621,006 | 62.21 | 14.28 |
1,949,348 | 2,006,240 | 1,307,812 | ||||
PubChem | 10,516,304 | 1,417,527 | 1,580,535 | 67,262,970 | 66.36 | 15.63 |
14,270,022 | 12,801,744 | 4,050,060 | ||||
ChemNav | 17,418,383 | 4,447,222 | 1,500,175 | 105,565,942 | 80.30 | 16.50 |
33,623,754 | 22,336,554 | 5,438,461 | ||||
CSDB | 17,154,105 | 4,534,817 | 1,694,508 | 115,696,900 | 79.08 | 14.83 |
36,928,720 | 23,633,538 | 7,547,799 | ||||
NonStdInChI | ||||||
Drugbank | 2016 | 157 | 582 | 7427 | 81.88 | 27.14 |
658 | 1909 | 759 | ||||
PDB ligands | 5484 | 502 | 2169 | 22,939 | 83.47 | 23.91 |
2305 | 5841 | 2847 | ||||
CSD organics | 45,556 | 5690 | 7982 | 153,091 | 65.10 | 29.76 |
12,143 | 20,702 | 7592 | ||||
ChEMBL | 330,685 | 43,749 | 98,588 | 1,398,045 | 83.44 | 23.65 |
219,892 | 299,824 | 173,848 | ||||
AMS | 1,534,982 | 306,656 | 307,735 | 6,358,861 | 81.57 | 24.14 |
1,263,143 | 1,126,724 | 647,711 | ||||
SureChEMBL | 2,917,438 | 366,712 | 866,922 | 12,621,006 | 75.69 | 23.12 |
1,419,390 | 2,512,248 | 1,470,143 | ||||
PubChem | 15,900,675 | 1,826,999 | 2,973,696 | 67,250,941 | 77.94 | 23.64 |
11,266,876 | 15,119,739 | 5,328,079 | ||||
ChemNav | 22,942,776 | 4,617,529 | 3,328,166 | 105,565,942 | 86.64 | 21.73 |
25,734,204 | 25,121,432 | 9,719,674 | ||||
CSDB | 23,447,796 | 4,921,978 | 3,883,624 | 115,679,596 | 86.76 | 20.27 |
28,119,041 | 27,699,416 | 12,295,971 |
The first row of the three columns “Complete pass”, “Partial pass”, and “Complete pass for one rule and partial pass for other” for each database shown here contains numbers without failure by any other rule, whereas the second row for each database (in italics) shows the results for the cases with failures included. For more detailed explanation of these columns and failure-containing data added, please refer to the third spreadsheet in the SI.
“Overall InChI recapitulation” is the percentage of the sum of the six columns named “Complete pass”, “Partial pass”, and “Complete pass for one rule and partial pass for other” and three columns that failed relative to the tautomeric molecules of that database.
“Overall strict InChI recapitulation” is the percentage of molecules where input InChI matches with all enumerated tautomers generated by at least one rule (Complete pass) relative to tautomeric molecules of that database.