|
Algorithm 1 Segmentation of substructures for molecule
|
|
Input: SMILES strings of compounds
Output: Vocabulary of substructures C
Get molecule object from SMILES
Number the atoms in the compound molecule
Initialize: vocabulary of substructures
Construct ← the set of bonds
Construct ← the set of simple rings of G
for each bond in do
if does not belong to any ring then
add to the vocabulary of substructures C
end if
end for
for each ring in do
for each ring in do
if the length of ≥ 3 then
← merge to one unique ring
←
←
end if
end for
end for
remove duplicate substructures from
add each substructure in to the vocabulary of substructures C
return vocabulary of substructures C
|