Skip to main content
. 2021 Nov 29;11(12):1783. doi: 10.3390/biom11121783
Algorithm 1 Segmentation of substructures for molecule G=(V,E)

Input: SMILES strings of compounds

Output: Vocabulary of substructures C

  • Get molecule object from SMILES

  • Number the atoms in the compound molecule

  • Initialize: vocabulary of substructures C=

  • Construct V1← the set of bonds E

  • Construct V2← the set of simple rings of G

  • for each bond ei in V1 do

  •     if ei does not belong to any ring then

  •         add ei to the vocabulary of substructures C

  •     end if

  • end for

  • for each ring ri in V2 do

  •     for each ring rj in V2 do

  •         inter=rirj

  •         if the length of inter≥ 3 then

  •            tmp← merge r1,r2 to one unique ring

  •            ritmp

  •            rjtmp

  •         end if

  •     end for

  • end for

  • remove duplicate substructures from V2

  • add each substructure in V2 to the vocabulary of substructures C

  • return vocabulary of substructures C