Skip to main content
. 2022 Dec 23;13(1):29. doi: 10.3390/biom13010029

Table 2.

The data “cleaning” preprocess workflow.

Step Action
Organic filter Inorganic molecules are removed.
Element filter Molecules containing elements other than C, H, O, N, P, S, Cl, Br, I, and F are removed.
Connectivity Fragments except for the biggest from unconnected molecules are removed.
Standardizer Directly bonded zwitterions are converted to the neutral representation. The charges on a molecule are set to a standard form.
Aromatizer The molecules are converted into an aromatic form.
Canonicalize The molecules are represented in canonical SMILES format.
Duplicate filter Duplicate molecules are removed.