Enumeration method for tree-like chemical compounds with benzene rings and naphthalene rings by breadth-first search order

Jira Jindalertudomdee; Morihiro Hayashida; Yang Zhao; Tatsuya Akutsu

doi:10.1186/s12859-016-0962-4

. 2016 Mar 1;17:113. doi: 10.1186/s12859-016-0962-4

Enumeration method for tree-like chemical compounds with benzene rings and naphthalene rings by breadth-first search order

Jira Jindalertudomdee ¹, Morihiro Hayashida ^1,^✉, Yang Zhao ¹, Tatsuya Akutsu ¹

PMCID: PMC4774041 PMID: 26932529

Abstract

Background

Drug discovery and design are important research fields in bioinformatics. Enumeration of chemical compounds is essential not only for the purpose, but also for analysis of chemical space and structure elucidation. In our previous study, we developed enumeration methods BfsSimEnum and BfsMulEnum for tree-like chemical compounds using a tree-structure to represent a chemical compound, which is limited to acyclic chemical compounds only.

Results

In this paper, we extend the methods, and develop BfsBenNaphEnum that can enumerate tree-like chemical compounds containing benzene rings and naphthalene rings, which include benzene isomers and naphthalene isomers such as ortho, meta, and para, by treating a benzene ring as an atom with valence six, instead of a ring of six carbon atoms, and treating a naphthalene ring as two benzene rings having a special bond. We compare our method with MOLGEN 5.0, which is a well-known general purpose structure generator, to enumerate chemical structures from a set of chemical formulas in terms of the number of enumerated structures and the computational time. The result suggests that our proposed method can reduce the computational time efficiently.

Conclusions

We propose the enumeration method BfsBenNaphEnum for tree-like chemical compounds containing benzene rings and naphthalene rings as cyclic structures. BfsBenNaphEnum was from 50 times to 5,000,000 times faster than MOLGEN 5.0 for instances with 8 to 14 carbon atoms in our experiments.

Keywords: Benzene ring, Naphthalene ring, Enumeration, Breadth-first search

Background

Enumeration of chemical compounds is important in bioinformatics, and has been adapted to several applications such as drug discovery and design [1–3], structure elucidation [4–6], and analyses of chemical spaces [7–13]. It is defined as a problem of generating all non-redundant chemical structures satisfying some constraints. For example, a chemical formula, which consists of the number of each atom included in the compound, is given as an input. There are several algorithms for enumerating chemical compounds from a chemical formula and most of them use a molecular graph to represent a chemical compound, where the nodes and edges of the graph refer to atoms and bonds of the chemical compound, respectively. Some of those algorithms are claimed to be able to enumerate various chemical structures without restriction of the structure, such as MOLGEN [14] and Open Molecule Generator (OMG) [15]. It was reported that OMG is able to deal with different valences for a kind of atom, and was not efficient for several instances compared with MOLGEN. While the remaining ones, such as EnuMol [16, 17] as well as BfsSimEnum and BfsMulEnum [18], have a limitation of the structure of enumerated compounds, such as acyclic compounds for BfsSimEnum and BfsMulEnum and compounds with no cycle except for benzene rings for EnuMol, the methods consume significantly less computational time. There are also related application softwares, e.g. SmiLib [19] and CLEVER [20], that generate chemical compounds from given fragments. The limitation of these tools is that they require a library of desired chemical fragments, which can be generated by the enumeration tool.

Our previous methods, BfsSimEnum and BfsMulEnum, use a tree structure, instead of a general graph, to represent a chemical compound and call it a molecular tree so they can generate only tree-like chemical compounds. In this work, we develop BfsBenNaphEnum, which aims to reduce the limitation of previous methods by extending them such that they can enumerate chemical compounds containing only benzene rings and naphthalene rings as cyclic structures, which are six carbon atoms cyclic structures and ten carbon atoms bicyclic structures, respectively. Pólya proposed a group-theoretic method for isomer counting of single cyclic structures such as a benzene ring, a naphthalene ring, and an anthracene ring using the cycle index, from which many studies followed [21]. However, structures enumerated by these methods are restricted to certain types. Indeed, Meringer wrote that up to now the only way to calculate the number of isomers belonging to an arbitrary molecular formula is to use structure generators [22]. Suzuki et al. considered the problem of enumerating structures having monocyclic graph structures, each of which has exactly one cycle [23]. An enumeration method for tree-like chemical compounds containing only benzene rings as cyclic structures has been implemented on EnuMol web server (http://sunflower.kuicr.kyoto-u.ac.jp/tools/enumol/). On the other hand, our method can enumerate compounds containing naphthalene rings in addition to benzene rings. Moreover, the proposed algorithm can calculate the number of benzene rings and naphthalene rings from chemical formula, while users have to specify the number of benzene rings in EnuMol.

Chemical structures considered in this study can be represented by a molecular tree, where a benzene ring is converted to a node with valence six and a naphthalene ring is considered as two benzene nodes having a special bond. We name that special bond as a merge bond. Since a merge bond merges two carbon atoms of two benzene rings together, it reduces the number of carbon atoms with free valence electron of two benzene rings by two so we represent a merge bond by a double-edge. Moreover, benzene nodes cannot have double bonds with other nodes because they bond with other non-benzene atoms by a single bond [24]. This means that a double-edge represents a double bond if it connects two non-benzene nodes, while it represents a merge bond if it connects two benzene nodes. Therefore, bonds in a benzene ring and a naphthalene ring are considered as the same bond and Kekulé representation is not included in this work. Besides, this work uses a two-dimensional molecular tree to represent a chemical structure so it cannot deal with stereoisomers. For tautomeric, this work considers two structures in a pair of tautomeric as non- redundant compounds and generates both of them.

BfsSimEnum and BfsMulEnum are modified to return a set of molecular trees as the output, given a chemical formula, the number of benzene rings, and the number of naphthalene rings. After that, an attribute called carbon position list is added into benzene nodes in a molecular tree to represent the way that benzene nodes bond with their adjacent nodes. This attribute is important because bonding with different carbon atoms in a benzene ring may result in different chemical structures. Finally, for each molecular tree from BfsSimEnum and BfsMulEnum, we generate a set of molecular trees whose nodes adjacent to benzene nodes are labeled with a carbon position such that all chemical structures are enumerated without redundancy based on normal form rule.

For evaluating our proposed method, we perform computational experiments for several instances, and compare the execution time by our method with that by MOLGEN. We show that our proposed method is efficient for enumerating chemical compounds containing benzene rings and naphthalene rings, and is from 50 times to 5,000,000 times faster than MOLGEN for several instances in our experiments.

Preliminaries

Enumeration problem

Let Σ be a finite set of labels of atoms, for example, Σ={C,N,O,H }, where ‘C’, ‘N’, ‘O’, and ‘H’ denote carbon, nitrogen, oxygen, and hydrogen atoms, respectively. A molecular graph is defined as a multi-graph G(V, E), where V is a set of nodes and E is a set of multi-edges, also denoted by V(G) and E(G), respectively. Each node is labeled with an atom-label in Σ, while each edge represents the bond between two atoms and the multiplicity of edge represents the bond type. The degree of each node is equal to the valence of its atom. Let deg(v) and l(v) be the degree and the label of node v, respectively. Let val(l_i) be the valence of the atom represented by label l_i in Σ. It should be noted that there exist different valences for a kind of atom, for example, carbon atoms of CO₂ and CO. For this case, it is sufficient to put two distinct labels C and C⁽²⁾ in Σ, and to define val(C)=4 and val(C⁽²⁾)=2. Let num(G,l_i) be the total number of nodes labeled with label l_i in molecular graph G. Then, the enumeration problem is defined as follows.

Problem1.

Given the numbers $n_{l_{i}}$ of atoms for all labels l_i∈Σ, the number n_b of benzene rings, and the number n_n of naphthalene rings, enumerate all non-redundant molecular graphs G such that $num ({G,l}_{i}) = n_{l_{i}}$ for all l_i∈Σ, deg(v)=val(l(v)) for all nodes v∈V(G), and G includes exactly n_b benzene rings, n_n naphthalene rings, and no other cyclic structures. It must be noted that n_b and n_n can be zero.

In the case that the input chemical formula contains five or less carbon atoms, BfsStructEnum can enumerate only tree-like chemical compounds by specifying the number of benzene rings and the number of naphthalene rings to be zero. Because we enumerate molecular trees such that degree of each node equals to valence of atom label of that node, charged molecules cannot be enumerated automatically. However, they can still be enumerated by specifying a charged atom as a new kind of atom type with appropriate valence value.

Since our enumeration methods deal with a chemical compound as a node-labeled rooted ordered tree for efficient enumeration, we contract cyclic structures appearing in a molecular graph to single nodes. Concretely, we contract a benzene ring to a node, called benzene node, labeled with a special label ‘b’, and contract a naphthalene ring to two benzene nodes connected by a special bond, called merge bond, represented by a double edge (see Fig. 1). Since six carbon atoms contained in a benzene ring are contracted into a benzene node, we need to remember which carbon atom in the benzene ring connects to its adjacent node in a molecular graph. Hence, we add an attribute called carbon position list to each benzene node. Figure 1 b shows examples of carbon position lists using numbers assigned to carbon atoms in benzene rings in Fig. 1 a. We call such a node-labeled rooted ordered tree whose benzene nodes are attributed with carbon position lists a carbon position-assigned molecular tree. We enumerate carbon position-assigned molecular trees instead of molecular graphs.

Fig. 1 — Example of a molecular graph including benzene rings and naphthalene rings. a A molecular graph including one benzene ring and one naphthalene ring. b A rooted tree contracted from the left graph. It is noted that hydrogen atoms are omitted

Center-rooted and left-heavy

In our previous work, we defined the normal form for molecular trees without any cyclic structures using center-rooted and left-heavy to avoid its redundant generation. In this work, we also utilize center-rooted and left-heavy for carbon position-assigned molecular trees, of which properties do not depend on carbon position lists.

A molecular tree T is called center-rooted if its root is the center node (see Fig. 2 a) or one endpoint of the center edge of the longest path in T (see Fig. 2 b). The center can be either a node or an edge depending on the length of the longest path.

In order to define a left-heavy tree, atom-labels must be ordered so that they can be compared with each other, for example, b >C>N>O>H for Σ={b,C,N,O,H }, where ‘b’ denotes a special atom representing a benzene ring. Let T(u) be the ordered subtree rooted at u in T. Let u and v be two nodes in a molecular tree T, (u₁,u₂,…,u_h) and (v₁,v₂,…,v_k) be lists of child nodes of u and v, respectively. It is defined that T(u)>_sT(v) if l(u)>l(v) (Fig. 3 a) or there exists an integer i such that T(u_j)=_sT(v_j) for all j<i and (T(u_i)>_sT(v_i) (Fig. 3 b) or i=k+1≤h (Fig. 3 c)). If T(u)>_sT(v) or T(v)>_sT(u) does not hold, it is said that T(u)=_sT(v).

Fig. 3 — Illustration of three molecular trees such that T(u)>_s T(v) or T(u)>_m T(v). a l(u)>l(v). b l(u)=l(v), T(u ₁)>_s T(v ₁). c l(u)=l(v), T(u ₁)=_s T(v ₁), h=2>1=k. d T(u)=_s T(v), m u l(e ₁)>m u l(e1′)

Let mul(e) and mul(u,v) be the multiplicity of edge e=(u,v). Let (e₁,e₂,…,e_m) and ( $e_{1}^{'}, e_{2}^{'}, \dots, e_{m}^{'}$ ) be two lists of edges in T(u) and T(v) in breadth-first search (BFS) order (see Fig. 4), respectively. T(u)>_mT(v) if T(u)>_sT(v), or if T(u)=_sT(v) and there exists an integer i such that mul(e_j)=mul(ej′) for all j<i, and mul(e_i)>mul(ei′) (Fig. 3 d). If T(u)>_mT(v) or T(v)>_mT(u) does not hold, it is said that T(u)=_mT(v).

Fig. 4 — Illustration of breadth-first search (BFS) order. *Numbers* indicate BFS order for this example

Let child(v)=(v₁,v₂,…) be a list of all child nodes of node v in BFS order. It is defined that a molecular tree T is left-heavy if T(v_i)≥_mT(v_i+1) holds for all nodes v in T and all i=1,…,|child(v)|−1.

It should be noted that center-rooted and left-heavy are different from centroid-rooted and left-heavy defined by Fujiwara et al. [16], for example, the molecular tree in Fig. 1 b is center-rooted and is not centroid-rooted because the number of nodes in the left subtree by removing the root, 4, is more than (total number of nodes −1)/2=(7−1)/2=3. In addition, their left-heavy is defined using depth-first search order, not our breadth-first search order.

Carbon position list

Let s=(v₁,v₂,…,v_n) be a list of nodes, |s| and s[ i] denote the size and the i-th element of s, respectively. Let T^sub(v₁,v₂) be the left-heavy tree rooted at v₁ that consists of the connected component including v₁ when the edge (v₁,v₂) is deleted from T (see Fig. 5). T^sub(v₁,v₂)=_mT(v₁) if v₁ is a child of v₂ in T. Let index(v,T) be the order of v∈V(T) by traversing a center-rooted left-heavy molecular tree T with BFS order, which is also denoted by index(v) if T is clear.

Fig. 5 — Illustration of subtree T ^sub(v ₁,v ₂). a A molecular tree T and T ^sub(v ₁,v ₂), which is surrounded by a red rectangle. b T ^sub(v ₂,v ₁)

Proposition1.

For a node v that has the parent node v_p and a child node v_c in a center-rooted molecular tree T, T^sub(v_p,v)≠_mT^sub(v_c,v).

Proof.

The height of T^sub(v_p,v) is larger than that of T^sub(v_c,v) because T is center-rooted. Hence, T^sub(v_p,v) is always different from T^sub(v_c,v).

We define an equality T₁=_CT₂ for two rooted carbon-position assigned trees T₁ and T₂ if T₁=_mT₂, and Cv₁T₁=Cv₂T₂ for all benzene nodes v₁∈V(T₁), where v₂∈V(T₂) satisfies index(v₁,T₁)=index(v₂,T₂), and $C_{v}^{T}$ is a list of lists, called a carbon position list explained later, for a benzene node v in T. For convenience, we define another equality $T_{1} =_{\underset{̲}{C}} T_{2}$ by removing the condition that Cr₁T₁=Cr₂T₂ for the roots r₁ and r₂ of T₁ and T₂, respectively, from the conditions of T₁=_CT₂, if r₁ and r₂ are benzene nodes.

For a node v having the parent v_p and a child v_c, T^sub(v_p,v)≠_CT^sub(v_c,v) if T^sub(v_p,v)≠_mT^sub(v_c,v). Hence, only carbon position lists of descendent benzene nodes are needed to determine whether or not $T^{sub} (v_{c_{1}}, v) =_{C} T^{sub} (v_{c_{2}}, v)$ for child nodes $v_{c_{1}}$ and $v_{c_{2}}$ of v.

Definition1.

An adjacent node list $A_{v}^{T}$ of a benzene node v in a carbon position-assigned molecular tree T is defined as a list of lists of nodes adjacent to v using carbon position lists of descendent benzene nodes such that

$| A_{v}^{T} [i] | \leq | A_{v}^{T} [i + 1] |$ for all i,
$index (A_{v}^{T} [i] [1]) < index (A_{v}^{T} [i + 1] [1])$ if $| A_{v}^{T} [i] | = | A_{v}^{T} [i + 1] |$ ,
$index (A_{v}^{T} [i] [j]) < index (A_{v}^{T} [i] [j + 1])$ for all i,j,
$A_{v}^{T} [i] = (v^{'})$ if (v, v^′) is a merge bond for some i,
$v^{'} \in A_{v}^{T} [i]$ if (v, v^′) is not a merge bond, and $T^{sub} (v^{'}, v) =_{C} T^{sub} (A_{v}^{T} [i] [1], v)$ .

Figure 6 shows examples of carbon position-assigned molecular trees, where benzene node v₁ in each tree has adjacent nodes v₂,v₃,v₄,v₅. Then, $T_{1}^{sub} (v_{2}, v_{1}) =_{C} T_{1}^{sub} (v_{3}, v_{1}) \neq_{C} T_{1}^{sub} (v_{4}, v_{1}) \neq_{C} T_{1}^{sub} (v_{5}, v_{1})$ and index(v₄)<index(v₅), so we have $A_{v_{1}}^{T_{1}} = ((v_{4}), (v_{5}), (v_{2}, v_{3}))$ . Also for T₂, $A_{v_{1}}^{T_{2}} = ((v_{4}), (v_{5}), (v_{2}, v_{3}))$ . For T₃, $A_{v_{1}}^{T_{3}} = ((v_{2}), (v_{3}), (v_{4}), (v_{5}))$ because (v₂,v₁) is a merge bond. If (v₂,v₁) is not a merge bond and Cv₂T₃=Cv₃T₃, then $A_{v_{1}}^{T_{3}} = ((v_{4}), (v_{5}), (v_{2}, v_{3}))$ .

Fig. 6 — Examples of adjacent node lists and carbon position lists. a T ₁. b T ₂. c T ₃. d Molecular graph of T ₁. e Molecular graph of T ₂. f Molecular graph of T ₃. Red numbers represent carbon positions of node v ₁

Proposition2.

For a benzene node v that has the parent node v_p in a center-rooted molecular tree T, $A_{v}^{T} [1] = (v_{p})$ .

Proof.

If v has no child, it is clear because the adjacent node of v is only v_p. We assume that v has a child v_c. From Proposition 1 and index(v_p)<index(v_c), $A_{v}^{T} [1] = (v_{p})$ always holds.

A carbon position list $C_{v}^{T}$ of a benzene node v in T is a list of lists, where $C_{v}^{T} [i]$ is a list of carbon positions of the nodes in $A_{v}^{T} [i]$ . It is sufficient to enumerate $C_{v}^{T} [i]$ in ascending order because each node in $A_{v}^{T} [i]$ has the same subtree. If $(A_{v}^{T} [i] [1], v)$ is a merge bond, $C_{v}^{T} [i]$ has two carbon positions instead of one as usual. It should be noted that $C_{v}^{T} [i] \subseteq {1, \dots, 6}$ and two carbon positions are assigned for a merge bond because a naphthalene ring shares two carbon atoms between two benzene rings. In the examples of Fig. 6, $C_{v_{1}}^{T_{1}} = ((3), (4), (1, 2))$ for $A_{v_{1}}^{T_{1}} = ((v_{4}), (v_{5}), (v_{2}, v_{3}))$ , $C_{v_{1}}^{T_{2}} = ((1), (4), (2, 3))$ for $A_{v_{1}}^{T_{2}} = ((v_{4}), (v_{5}), (v_{2}, v_{3}))$ , $C_{v_{1}}^{T_{3}} = ((1, 2), (3), (5), (4))$ for $A_{v_{1}}^{T_{3}} = ((v_{2}), (v_{3}), (v_{4}), (v_{5}))$ .

Definition2.

An adjacent node list $A_{(v_{1}, v_{2})}^{T}$ for a naphthalene ring with two benzene nodes v₁, v₂, where (v₁,v₂) is a merge bond, is defined as a list of lists of nodes adjacent to v₁ or v₂ except v₁ and v₂ such that

$| A_{(v_{1}, v_{2})}^{T} [i] | \leq | A_{(v_{1}, v_{2})}^{T} [i + 1] |$ for all i,
$index (A_{(v_{1}, v_{2})}^{T} [i] [1]) < index (A_{(v_{1}, v_{2})}^{T} [i + 1] [1])$ if $| A_{(v_{1}, v_{2})}^{T} [i] | = | A_{(v_{1}, v_{2})}^{T} [i + 1] |$ ,
$index (A_{(v_{1}, v_{2})}^{T} [i] [j]) < index (A_{(v_{1}, v_{2})}^{T} [i] [j + 1])$ for all i,j,
$v^{'} \in A_{(v_{1}, v_{2})}^{T} [i]$ if $T^{sub} (v^{'}, bn (v^{'})) =_{C} T^{sub} (A_{(v_{1}, v_{2})}^{T} [i] [1], bn (A_{(v_{1}, v_{2})}^{T} [i] [1]))$ , where bn(v) is v₁ or v₂ that is adjacent to v.

For a benzene node v₂ that is connected by a merge bond with the parent node v₁, we suppose that the carbon atoms having positions 1,2 in v₂ are connected with the carbon atoms having positions $\bar{x + 1}, \bar{x}$ in v₁, respectively, where x takes an integer between 1 and 6, and $\bar{x} = (x mod 6) + 1$ (see Fig. 7 a). Here, consider the case that v₁ has the parent node v_p. If T is in normal form (Definition 6), position 1 is assigned to the carbon atom connected with v_p (Proposition 5). Then, from Proposition 1, T^sub(v_p,v₁)≠_CT^sub(v_c,v₂) for any child node v_c of v₂, T^sub(v_p,v₁)≠_CT^sub(v_c,v₁) for any child node v_c of v₁ except v₂, and the naphthalene ring is not symmetric. Consider the case that v₁ does not have a parent node, that is, v₁ is the root. If $T^{sub} (v_{1}, v_{2}) \neq_{\underset{̲}{C}} T^{sub} (v_{2}, v_{1})$ , the naphthalene ring can be symmetric only with respect to the axis denoted by the dashed red line in Fig. 7 a. Then, it is not needed to consider the other symmetry for the naphthalene ring.

Fig. 7 — Correspondence between carbon positions in a naphthalene ring. a Correspondence between carbon positions involved with a merge bond in two benzene rings. b Correspondence between carbon positions of a naphthalene ring and two benzene rings in the case of $T^{sub} (v_{1}, v_{2}) =_{\underset{̲}{C}} T^{sub} (v_{2}, v_{1})$ . The upper benzene ring v ₁ is the parent of the lower benzene ring v ₂. $\bar{x}$ denotes (x mod 6)+1. Blue, red, and green numbers are positions of $C_{v_{1}}^{T}$ , $C_{v_{2}}^{T}$ , and $C_{(v_{1}, v_{2})}^{T}$ , respectively. The dashed red line denotes the symmetric axis of ϕ _ref

Consider the case that $T^{sub} (v_{1}, v_{2}) =_{\underset{̲}{C}} T^{sub} (v_{2}, v_{1})$ . We can prove that x=1 if T is in normal form (see Proposition 4). Then, a carbon position list $C_{(v_{1}, v_{2})}^{T}$ of a naphthalene ring consisting of two benzene nodes v₁, v₂ is a list of lists determined from $C_{v_{1}}^{T}$ and $C_{v_{2}}^{T}$ according to the following rule, where $C_{(v_{1}, v_{2})}^{T} [i]$ is a list of carbon positions of nodes in $A_{(v_{1}, v_{2})}^{T} [i]$ in ascending order.

Definition3.

Carbon positions in a naphthalene ring correspond to carbon positions in two benzene nodes v₁,v₂, where v₁ is the parent node of v₂, if $T^{sub} (v_{1}, v_{2}) =_{\underset{̲}{C}} T^{sub} (v_{2}, v_{1})$ , as follows (see Fig. 7 b).

For the benzene ring of v₁, positions 1,2 are assigned to carbons of the merge bond in $C_{v_{1}}^{T}$ . Position i (i=3,…,6) in $C_{v_{1}}^{T}$ corresponds to i−2 in $C_{(v_{1}, v_{2})}^{T}$ .
For the benzene ring of v₂, positions 1,2 are assigned to carbons of the merge bond in $C_{v_{2}}^{T}$ . Position i (i=3,…,6) in $C_{v_{2}}^{T}$ corresponds to i+2 in $C_{(v_{1}, v_{2})}^{T}$ .

Figure 8 shows examples of carbon position lists for a naphthalene ring, where $T_{4}^{'}$ is T₄ with $C_{v_{1}}^{T_{4}^{'}} = ((1, 2), (4), (3))$ and $C_{v_{2}}^{T_{4}^{'}} = ((1, 2), (4), (5))$ , $T_{4}^{′′}$ is T₄ with $C_{v_{1}}^{T_{4}^{′′}} = ((1, 2), (4), (5))$ and $C_{v_{2}}^{T_{4}^{′′}} = ((1, 2), (4), (3))$ . Then, $A_{(v_{1}, v_{2})}^{T_{4}^{'}} = A_{(v_{1}, v_{2})}^{T_{4}^{′′}} = ((v_{3}, v_{5}), (v_{4}, v_{6}))$ , $C_{(v_{1}, v_{2})}^{T_{4}^{'}} = ((2, 6), (1, 7))$ , and $C_{(v_{1}, v_{2})}^{T_{4}^{′′}} = ((2, 6), (3, 5))$ .

Fig. 8 — Example of carbon position lists for a naphthalene ring. a T ₄. b Molecular graph of $T_{4}^{'}$ , which is T ₄ with $C_{v_{1}}^{T_{4}^{'}} = ((1, 2), (4), (3))$ , $C_{v_{2}}^{T_{4}^{'}} = ((1, 2), (4), (5))$ . c Molecular graph of $T_{4}^{′′}$ , which is T ₄ with $C_{v_{1}}^{T_{4}^{′′}} = ((1, 2), (4), (5))$ , and $C_{v_{2}}^{T_{4}^{′′}} = ((1, 2), (4), (3))$

Definition4.

For carbon position lists $C_{v}^{T_{1}}$ , $C_{v}^{T_{2}}$ , where AvT₁=AvT₂, it is defined that CvT₁<CvT₂ if there exist two integers i and j such that

CvT₁[ i^′][j^′]=CvT₂[ i^′][j^′] for all i^′<i and all $j^{'} = 1, \dots, | C_{v}^{T_{1}} [i^{'}] |$ ,
CvT₁[ i][j^′]=CvT₂[ i][j^′] for all j^′<j,
CvT₁[ i][ j]<CvT₂[ i][ j].

This definition is applied to comparison of $C_{(v_{1}, v_{2})}^{T_{1}}$ and $C_{(v_{1}, v_{2})}^{T_{2}}$ for a naphthalene ring with v₁ and v₂ in the same way.

In the example of Fig. 6, T₁ and T₂ have the same tree structure, and Cv₁T₂=((1),(4),(2,3))<((3),(4),(1,2))=Cv₁T₁ because Cv₁T₂[ 1][ 1]=1<3=Cv₁T₁[ 1][ 1].

Let Aut_b and Aut_n be the automorphism groups of a benzene ring and a naphthalene ring, respectively (see Fig. 9). Aut_b is generated from rotation of π/3 radians and reflection. For ϕ_b∈Aut_b, v₁ is adjacent to v₂ in a benzene ring if and only if ϕ_b(v₁) is adjacent to ϕ_b(v₂) in a benzene ring. Aut_n is generated from rotation of π radians and reflection. We suppose that a list $ϕ (C_{v}^{T} [i])$ of carbon positions for a map ϕ and $i = 1, \dots, | C_{v}^{T} |$ is in ascending order by sorting elements of the list because all nodes in $A_{v}^{T} [i]$ have the same subtree. For example, ϕ_b(Cv₁T₁)=((6),(5),(1,2)) for $C_{v_{1}}^{T_{1}} = ((3), (4), (1, 2))$ and the reflection map ϕ_b by the perpendicular bisector between carbon atoms of 1 and 2.

Normal form of a carbon position-assigned molecular tree

In order to prevent generating redundant molecular trees in enumeration, we define a normal form of a carbon position-assigned molecular tree.

Definition5.

Let P be a path in T consisting of n nodes (v₁,v₂,…,v_n)(n≥2). P is called a symmetric path if the following conditions are satisfied.

$T^{sub} (v_{⌊ \frac{n}{2} ⌋}, v_{⌊ \frac{n}{2} ⌋ + 1}) =_{m} T^{sub} (v_{n - ⌊ \frac{n}{2} ⌋ + 1}, v_{n - ⌊ \frac{n}{2} ⌋})$ ,
$index (v_{i}, T^{sub} (v_{⌊ \frac{n}{2} ⌋}, v_{⌊ \frac{n}{2} ⌋ + 1})) = index (v_{n - i + 1}, T^{sub} (v_{n - ⌊ \frac{n}{2} ⌋ + 1}, v_{n - ⌊ \frac{n}{2} ⌋}))$ for all $i = 1, \dots, ⌊ \frac{n}{2} ⌋$ , where ⌊x⌋ is the largest integer less than or equal to x,
$C_{v}^{T} = C_{v^{'}}^{T}$ for all benzene nodes $v \in V (T^{sub} (v_{⌊ \frac{n}{2} ⌋}, v_{⌊ \frac{n}{2} ⌋ + 1})) ∖ V (T^{sub} (v_{1}, v_{2}))$ , where $v^{'} \in V (T^{sub} (v_{n - ⌊ \frac{n}{2} ⌋ + 1}, v_{n - ⌊ \frac{n}{2} ⌋}))$ satisfies $index (v^{'}, T^{sub} (v_{n - ⌊ \frac{n}{2} ⌋ + 1}, v_{n - ⌊ \frac{n}{2} ⌋})) = index (v, T^{sub} (v_{⌊ \frac{n}{2} ⌋}, v_{⌊ \frac{n}{2} ⌋ + 1}))$ , and v∈V₁∖V₂ means that v∈V₁ and v∉V₂.

Proposition3.

For a center-rooted molecular tree, either of $v_{\frac{n}{2}}$ and $v_{\frac{n}{2} + 1}$ is the root if the length of a symmetric path (v₁,⋯,v_n) is even. Otherwise, the depth of $v_{\frac{n + 1}{2}}$ is less than that of any node in the path.

Proof.

For a path (v₁,⋯,v_n), v_i+1 and v_n−i must be the parent nodes of v_i and v_n−i+1, respectively, for $i = 1, \dots, \frac{n - 1}{2}$ if n is odd and for $i = 1, \dots, \frac{n}{2} - 1$ if n is even due to the center rooted property. Therefore, if the length of path is odd, $v_{\frac{n + 1}{2}}$ is the parent node of both $v_{\frac{n + 1}{2} - 1}$ and $v_{\frac{n + 1}{2} + 1}$ , which means that the depth of $v_{\frac{n + 1}{2}}$ is less than that of any node in the path.

In the case that n is even, either $v_{\frac{n}{2}}$ or $v_{\frac{n}{2} + 1}$ has the least depth among all nodes in the path and another node is the child node of that node. Assume that between these two nodes the parent node is v_a and the child node is v_b. v_a cannot have a parent node because the height of T^sub(v_p,v_a), where v_p is the parent node of v_a, cannot be equal to the height of T^sub(v_c,v_b) for any nodes v_c that are adjacent to v_b due the center-rooted condition, which means that T^sub(v_a,v_b)=_mT^sub(v_b,v_a) cannot be hold and the first condition of symmetric path is violated. In other words, v_a, which is either $v_{\frac{n}{2}}$ or $v_{\frac{n}{2} + 1}$ , is the root node of the tree if n is even.

We say that v₁ is left of v_n for a symmetric path (v₁,…,v_n) when $v_{n - ⌊ \frac{n}{2} ⌋ + 1}$ is the root, or index(v₁)<index(v_n).

Figure 10 shows examples of symmetric paths, (v₂,v₁,v₃) in T₅ and (v₅,v₂,v₁,v₃) in T₆, where $T_{5}^{sub} (v_{2}, v_{1}) =_{m} T_{5}^{sub} (v_{3}, v_{1})$ , $T_{6}^{sub} (v_{2}, v_{1}) =_{m} T_{6}^{sub} (v_{1}, v_{2})$ , and Cv₄T₆=Cv₆T₆.

Fig. 10 — Examples of symmetric paths. The *red lines* denote symmetric paths. a T ₅, where (v ₂,v ₁,v ₃) is a symmetric path, and $T_{5}^{sub} (v_{2}, v_{1}) =_{m} T_{5}^{sub} (v_{3}, v_{1})$ . b T ₆, where (v ₅,v ₂,v ₁,v ₃) is a symmetric path, $T_{6}^{sub} (v_{2}, v_{1}) =_{m} T_{6}^{sub} (v_{1}, v_{2})$ and C v ₄ T ₆=C v ₆ T ₆

We define an inequality T₁>_CT₂ for carbon position-assigned molecular trees T₁ and T₂ if T₁>_mT₂, or T₁=_mT₂, and there exists an integer i such that v_i is a benzene node, Cv_iT₁>Cvi′T₂, and Cv_jT₁=Cvj′T₂ for all benzene nodes v_j with j>i, where index(v_k,T₁)=index(vk′,T₂) for all k=1,…,|V(T₁)|.

Definition6.

Let ϕ_ref be the reflection map with the symmetric axis shown in Fig. 7 a. A carbon position-assigned molecular tree T that contains a carbon position list $C_{v}^{T}$ for each benzene node v is in normal form if the following conditions are satisfied.

T is center-rooted and left-heavy.
T(v)≥_mT^sub(r,v) if the center of the longest path in T with the root r is the edge (r,v).
Positions in each sublist of $C_{v}^{T}$ for each benzene node v are in ascending order.
$C_{v}^{T} \leq ϕ_{b} (C_{v}^{T})$ for all benzene nodes v that is not connected by a merge bond with the parent node and all ϕ_b∈Aut_b.
For benzene nodes v₁,v₂ connected by a merge bond such that v₁ is the root of T,
1. $C_{(v_{1}, v_{2})}^{T} \leq ϕ_{n} (C_{(v_{1}, v_{2})}^{T})$ for all ϕ_n∈Aut_n if $T^{sub} (v_{1}, v_{2}) =_{\underset{̲}{C}} T^{sub} (v_{2}, v_{1})$ , where $C_{(v_{1}, v_{2})}^{T}$ is related with $C_{v_{1}}^{T}$ and $C_{v_{2}}^{T}$ by Definition 3.
2. $C_{v_{2}}^{T} \leq ϕ_{ref} (C_{v_{2}}^{T})$ if $T^{sub} (v_{1}, v_{2}) \neq_{\underset{̲}{C}} T^{sub} (v_{2}, v_{1})$ and $C_{v_{1}}^{T} = ϕ_{ref} (C_{v_{1}}^{T})$ .
T^sub(v₁,v₂)≥_CT^sub(v_n,v_n−1) for all pairs v₁,v_n of nodes such that the path (v₁,…,v_n) is a symmetric path, v₁ and v_n(=v₂) are not connected by a merge bond, and v₁ is left of v_n.

We call a tree in normal form a normal tree.

Figure 8 also shows molecular trees in normal form and not in normal form. For condition 4 of the definition, $C_{v_{1}}^{T_{4}^{'}} = ((1, 2), (4), (3)) \leq ϕ_{b} (C_{v_{1}}^{T_{4}^{'}})$ , $C_{v_{1}}^{T_{4}^{′′}} = ((1, 2), (4), (5)) \leq ϕ_{b} (C_{v_{1}}^{T_{4}^{′′}})$ . $T_{4}^{'}$ and $T_{4}^{′′}$ satisfy conditions 1, 2, 3, and 4. For condition 5, $C_{(v_{1}, v_{2})}^{T_{4}^{'}} = ((2, 6), (1, 7)) \leq ϕ_{n} (C_{(v_{1}, v_{2})}^{T_{4}^{'}})$ , whereas $C_{(v_{1}, v_{2})}^{T_{4}^{′′}} = ((2, 6), (3, 5)) > ((2, 6), (1, 7)) = ϕ_{rot} (C_{(v_{1}, v_{2})}^{T_{4}^{′′}})$ for rotation ϕ_rot of π radians, and $T_{4}^{′′}$ violates the condition. It is noted that $T_{4}^{′′}$ is rotated by π radians from $T_{4}^{'}$ . For condition 6, v₁ and v₂ are connected by a merge bond. Thus, $T_{4}^{'}$ is a normal tree, and $T_{4}^{′′}$ is not a normal tree.

Proposition4.

For a normal tree T with a benzene node v₁ that is connected by a merge bond with its child node v₂ and satisfies $T^{sub} (v_{1}, v_{2}) =_{\underset{̲}{C}} T^{sub} (v_{2}, v_{1})$ , positions 1,2 are assigned to the merge bond in the benzene ring of v₁. Furthermore, if $C_{(v_{1}, v_{2})}^{T} \leq ϕ_{n} (C_{(v_{1}, v_{2})}^{T})$ for all ϕ_n∈Aut_n, then $C_{v_{1}}^{T} \leq ϕ_{b} (C_{v_{1}}^{T})$ for all ϕ_b∈Aut_b.

Proof.

We assume that there exists a node v_l as a left sibling of v₂, and v_l is the leftmost child of v₁. Since T is left-heavy, T(v_l)≥_mT(v₂), and l(v_l)=l(v₂)=‘b’ is needed. However, T(v_l)=_CT(v_c), where v_c is the leftmost child of v₂, because $T^{sub} (v_{1}, v_{2}) =_{\underset{̲}{C}} T^{sub} (v_{2}, v_{1}) =_{C} T (v_{2})$ . Hence, T(v_l)<_mT(v₂). It contradicts the assumption, and v₂ is the leftmost child of v₁. Therefore, $A_{v_{1}}^{T} [1] = (v_{2})$ . From condition 4 of Definition 6, $C_{v_{1}}^{T} [1] = (1, 2)$ , and positions 1,2 are assigned to the merge bond, that is x=1 in Fig. 7 a.

For a map ϕ_b∈Aut_b other than the identity and reflection map ϕ_ref for a benzene ring, $C_{v_{1}}^{T} < ϕ_{b} (C_{v_{1}}^{T})$ because each of ϕ_b(1) and ϕ_b(2) is at least 2. From $C_{(v_{1}, v_{2})}^{T} \leq ϕ_{ref} (C_{(v_{1}, v_{2})}^{T})$ and the correspondence between $C_{v_{1}}^{T}$ and $C_{(v_{1}, v_{2})}^{T}$ , $C_{v_{1}}^{T} \leq ϕ_{ref} (C_{v_{1}}^{T})$ . Therefore, $C_{v_{1}}^{T} \leq ϕ_{b} (C_{v_{1}}^{T})$ for all ϕ_b∈Aut_b.

Proposition5.

For a benzene node v of a normal tree T, $C_{v}^{T} [1] [1]$ is always equal to 1.

Proof.

If v is not connected by a merge bond with the parent node, from condition 4, $C_{v}^{T}$ must be the least possible carbon position list. Hence, $C_{v}^{T} [1] [1] = 1$ . Otherwise, from Definition 3, $C_{v}^{T} [1] [1] = 1$ .

Lemma1.

Given a molecular graph G without cyclic structures except benzene rings and naphthalene rings, G can be represented by a normal tree.

Proof.

We can assign numbers to carbons in benzene rings and naphthalene rings of G such that the conditions of Definition 6 are satisfied.

Lemma2.

Given two different molecular graphs G₁ and G₂, they cannot be represented by the same normal tree.

Proof.

We can unambiguously obtain a molecular graph from a normal tree by replacing all benzene nodes with benzene rings according to its carbon position lists.

Proposition6.

For a normal tree T with a path (v₁,…,v_n), G^′ is the molecular graph obtained from the tree T^′ by removing T^sub(v₁,v₂) and T^sub(v_n,v_n−1) except v₁ and v_n from T, where v₁ is left of v_n. If there is a non-identity map ϕ of the automorphism group of G^′ satisfying ϕ(v_i)=v_n−i+1 for all i=1,…,n, then T^sub(v₁,v₂)≥_CT^sub(v_n,v_n−1), where ϕ in G^′ is naturally extended to T.

Proof.

If $T^{sub} (v_{⌊ \frac{n}{2} ⌋}, v_{⌊ \frac{n}{2} ⌋ + 1}) >_{m} T^{sub} (v_{n - ⌊ \frac{n}{2} ⌋ + 1}, v_{n - ⌊ \frac{n}{2} ⌋})$ , then T^sub(v₁,v₂)>_mT^sub(v_n,v_n−1), and T^sub(v₁,v₂)>_CT^sub(v_n,v_n−1). We assume $T^{sub} (v_{⌊ \frac{n}{2} ⌋}, v_{⌊ \frac{n}{2} ⌋ + 1}) =_{m} T^{sub} (v_{n - ⌊ \frac{n}{2} ⌋ + 1}, v_{n - ⌊ \frac{n}{2} ⌋})$ . If the path (v₁,…,v_n) is a symmetric path, T^sub(v₁,v₂)≥_CT^sub(v_n,v_n−1) from condition 6. We assume that (v_i+1,…,v_n−i) is a symmetric path for some i, and index(v_i,T^sub(v_i+1,v_i+2))>index(v_n−i+1,T^sub(v_n−i,v_n−i−1)) (see Fig. 11). Then,

\begin{matrix} T^{sub} (v_{i + 1}, v_{i + 2}) & =_{m} T^{sub} (v_{n - i}, v_{n - i - 1}), \\ T^{sub} (v_{i + 1}, v_{i + 2}) & \geq_{C} T^{sub} (v_{n - i}, v_{n - i - 1}) . \end{matrix}

(1)

Fig. 11 — Illustration of an automorphism ϕ in the proof. The *red path* indicates (v ₁,…,v _n), where ϕ(v _i)=v _n−i+1 for all i=1,…,n

Let u_j and w_j be child nodes of v_i+1 and v_n−i, respectively. Then, $v_{i} = u_{j_{2}}$ and $v_{n - i + 1} = w_{j_{1}}$ , where j₁=index(v_n−i+1,T^sub(v_n−i,v_n−i−1)) and j₂=index(v_i,T^sub(v_i+1,v_i+2)). If v_i+1 and v_n−i are benzene nodes, $T (u_{j_{1}}) =_{C} T (v_{i})$ , $T (v_{n - i + 1}) =_{C} T (w_{j_{2}})$ , and T(v_i)=_CT(v_n−i+1) because $C_{v_{i + 1}}^{T} = C_{v_{n - i}}^{T}$ and ϕ(v_i)=v_n−i+1.

We assume that v_i+1 and v_n−i are not benzene nodes. For child nodes u_j of v_i+1, T(u_j)≥_CT(u_j+1) because (u_j,v_i+1,u_j+1) is a symmetric path. Also for child nodes w_j of v_n−i, T(w_j)≥_CT(w_j+1). From the definition of ϕ, T(u_j)=_CT(ϕ(u_j)) for all u_j≠v_i. If index(ϕ(u_j+l))<index(ϕ(u_j)) for u_j,u_j+l≠v_i and l>0, T(u_j)≥_CT(u_j+l)=_CT(ϕ(u_j+l))≥_CT(ϕ(u_j))=_CT(u_j). It means T(u_j)=_CT(u_j+l). We assume that index(ϕ(u_j))<index(ϕ(u_j+l)) for all u_j≠v_i, that is, ϕ(u_j)=w_j+1 for all j=j₁,…,j₂−1. Then,

\begin{matrix} T (u_{j}) & =_{C} T (w_{j + 1}) \leq_{C} T (w_{j}), \\ and T (v_{i}) & \leq_{C} T (u_{j_{2} - 1}) =_{C} T (w_{j_{2}}) . \end{matrix}

(2)

If T^sub(v_i+1,v_i+2)>_CT^sub(v_n−i,v_n−i−1), then there is an integer j (j₁≤j≤j₂) such that T(u_j)>_CT(w_j), and it contradicts Eq. (2). Therefore, T^sub(v_i+1,v_i+2)=_CT^sub(v_n−i,v_n−i−1), and T(v_i)=_CT(v_n−i+1). Also for the case that (v_i+1,…,v_n−i) is a symmetric path for some i and index(v_i,T^sub(v_i+1,v_i+2))<index(v_n−i+1,T^sub(v_n−i,v_n−i−1)), then T(v_i)=_CT(v_n−i+1). Thus, T^sub(v₁,v₂)≥_CT^sub(v_n,v_n−1).

Lemma3.

Given two different normal trees T₁ and T₂, T₁ does not represent the same molecular graph as T₂.

Proof.

We assume that T₁ represents the same molecular graph as T₂. Let G₁ and G₂ be molecular graphs transformed from T₁ and T₂, respectively, where each carbon in benzene rings and naphthalene rings is connected with adjacent atoms according to carbon position lists of T₁ and T₂. From the assumption, there is an isomorphism ψ from G₁ to G₂. It means that l(v₁)=l(ψ(v₁)) for all v₁∈V(G₁), (ψ(v₁),ψ(v₂))∈E(G₂) if and only if (v₁,v₂)∈E(G₁), and mul(ψ(v₁),ψ(v₂))=mul(v₁,v₂).

Consider the case that the automorphism group Aut(G₁) of G₁ has only elements ϕ such that ϕ(v₁)≠v₂ for v₁ and v₂ belonging to distinct benzene rings. Let T(G) be the molecular tree without carbon position lists, obtained from G by contracting benzene rings and naphthalene rings to benzene nodes, and satisfying conditions 1, 2 of Definition 6. We suppose that maps ψ and ϕ in G₁ are naturally extended to T(G₁). Since T₁ is different from T₂, there is a benzene node v₁∈V(T₁) such that

\begin{array}{lcr} C_{v_{1}}^{T_{1}} \neq C_{ψ (v_{1})}^{T_{2}} . \end{array}

(3)

If v₁ is not connected by a merge bond with the parent node, there is a non-identity map ϕ_b∈Aut_b such that $C_{v_{1}}^{T_{1}} = ϕ_{b} (C_{ψ (v_{1})}^{T_{2}})$ because T₁ and T₂ represent the same molecular graph. It contradicts condition 4 of Definition 6. Suppose that v₁ is connected by a merge bond with the parent node v_p and Cv_pT₁=Cψ(v_p)T₂. If $T^{sub} (v_{p}, v_{1}) =_{\underset{̲}{C}} T^{sub} (v_{1}, v_{p})$ , then v_p is the root, and there is a non-identity map ϕ_n∈Aut_n such that $C_{(v_{p}, v_{1})}^{T_{1}} = ϕ_{n} (C_{(ψ (v_{p}), ψ (v_{1}))}^{T_{2}})$ because T₁ and T₂ represent the same molecular graph. It contradicts condition 5a. Otherwise, $T^{sub} (v_{p}, v_{1}) \neq_{\underset{̲}{C}} T^{sub} (v_{1}, v_{p})$ . If v_p is not the root, then T₁ does not represent the same molecular graph as T₂ because T^sub(v_a,v_p), where v_a is the parent of v_p, is different from other subtrees connected to the naphthalene ring. It contradicts the assumption. If v_p is the root, $C_{v_{p}}^{T_{1}} = ϕ_{ref} (C_{v_{p}}^{T_{1}})$ and $C_{v_{1}}^{T_{1}} = ϕ_{ref} (C_{ψ (v_{1})}^{T_{2}})$ because T₁ and T₂ represent the same molecular graph. It contradicts condition 5b.

Consider the case that there is an element ϕ∈Aut(G₁) such that ϕ(v₁)=v₂ for v₁ and v₂ belonging to distinct benzene rings. Since T₁ is different from T₂, there is a benzene node v₁∈V(T₁) such that

\begin{array}{lcr} C_{v_{1}}^{T_{1}} \neq C_{ψ (v_{1})}^{T_{2}} . \end{array}

(4)

Here, we suppose that conditions 3, 4, 5 are satisfied for all benzene nodes in T₁ and T₂. Then, there is a path from v₁ to ϕ(v₁)=v_n, (v₁,…,v_n), in T₁. Since T₁ and T₂ represent the same molecular graph,

\begin{array}{lcr} T_{1}^{sub} (v_{1}, v_{2}) =_{C} T_{2}^{sub} (ψ (v_{n}), ψ (v_{n - 1})) and \\ T_{1}^{sub} (v_{n}, v_{n - 1}) =_{C} T_{2}^{sub} (ψ (v_{1}), ψ (v_{2})) . \end{array}

(5)

Here, we can assume that v₁ is left of v_n and ψ(v₁) is left of ψ(v_n) without loss of generality. Then, from Proposition 6, for paths of (v₁,…,v_n) and (ψ(v₁),…,ψ(v_n)),

\begin{array}{lcr} T_{1}^{sub} (v_{1}, v_{2}) \geq_{C} T_{1}^{sub} (v_{n}, v_{n - 1}) and \\ T_{2}^{sub} (ψ (v_{1}), ψ (v_{2})) \geq_{C} T_{2}^{sub} (ψ (v_{n}), ψ (v_{n - 1})) \end{array}

(6)

because T₁ and T₂ are normal trees. There is no carbon position lists that satisfy Eqs. (4), (6) and (7).

Therefore, T₁ does not represent the same molecular graph as T₂.

Methods

We propose an algorithm BfsBenNaphEnum for enumerating chemical compounds containing benzene rings and naphthalene rings as cyclic structures. BfsBenNaphEnum utilizes our previously developed algorithms BfsSimEnum, BfsMulEnum [18], and assigns carbon position lists.

Modification of BfsSimEnum and BfsMulEnum

Suppose that the numbers $n_{l_{i}}$ of atoms with label l_i for all l_i∈Σ, the numbers n_b, n_n of benzene rings and naphthalene rings are given. BfsBenNaphEnum introduces a special label ‘b’ representing a benzene node to Σ with b>l_i∈Σ and val(b)=6, and executes BfsSimEnum to generate all non-redundant molecular trees T such that $num (T, l_{i}) = n_{l_{i}}$ for l_i∈Σ except l_i=b,C and num(T,b)=n_b+2n_n, num(T,C)=n_C−6n_b−10n_n. At this time, all edges of enumerated trees are single because BfsSimEnum generates only simple trees. Then, we modify BfsMulEnum to assign n_n merge bonds to edges between benzene nodes in each tree enumerated by BfsSimEnum in addition to adding $1 + \sum_{l_{i} \in Σ, l_{i} \neq b} num (T, l_{i}) (val (l_{i}) - 2) / 2$ bonds to edges between usual nodes. It should be noted that multiple bonds cannot be assigned to edges connected to benzene nodes since a carbon atom in benzene rings and naphthalene rings is connected with another adjacent atom by a single bond.

Assignment of carbon positions for molecular trees

In this algorithm, we traverse along the tree T from the rightmost deepest benzene node to the root in reverse BFS order because an adjacent node list depends on carbon position lists of descendant nodes. For each benzene node v we found, we assign a carbon position list not to violate the conditions of normal form.

The pseudocode of assignment part in BfsBenNaphEnum is given in Algorithms 1 and 2. We always assign carbon position 1 to the first node in $A_{v}^{T}$ (line 20 in ASSIGN function) due to Proposition 5, which is the parent node of v if v is not the root (Proposition 2). If v is the root and $| A_{v}^{T} [1] | \geq 3$ , we assign carbon position lists in Table 1 (see also Fig. 12) to v immediately for the sake of efficiency. Carbon position lists in Table 1 satisfy condition 4 of the normal form, and all the cases are included in the table.

Table 1.

Carbon position lists for $A_{v}^{T}$ , where v is the root, and $| A_{v}^{T} [1] | \geq 3$

$\| A_{v}^{T} [1] \|$	$\| A_{v}^{T} [2] \|$	$C_{v}^{T}$
3	0	((1,2,3)), ((1,2,4)), ((1,3,5))
3	3	((1,2,3),(4,5,6)), ((1,2,4),(3,5,6)), ((1,3,5),(2,4,6))
4	0	((1,2,3,4)), ((1,2,3,5)), ((1,2,4,5))
5	0	((1,2,3,4,5))
6	0	((1,2,3,4,5,6))

Open in a new tab

Fig. 12 — Illustration of benzene rings having each carbon position list in Table 1. a ((1,2,3)). b ((1,2,4)). c ((1,3,5)). d ((1,2,3),(4,5,6)). e ((1,2,4),(3,5,6)). f ((1,3,5),(2,4,6)). g ((1,2,3,4)). h ((1,2,3,5)). i ((1,2,4,5)). j ((1,2,3,4,5)). k ((1,2,3,4,5,6)). *Solid* and *dashed lines* correspond to $A_{v}^{T} [1]$ and $A_{v}^{T} [2]$ , respectively

graphic file with name 12859_2016_962_Figa_HTML.gif

graphic file with name 12859_2016_962_Figb_HTML.gif

For other carbon positions from 2 to 6, we use ASSIGN_CHILD to assign such positions to the remaining adjacent nodes. For example, let T₁ in Fig. 6 be output without any carbon position list by BfsMulEnum. T₁ has a benzene node v₁, and $A_{v_{1}}^{T_{1}} = ((v_{4}), (v_{5}), (v_{2}, v_{3}))$ . First, carbon position 1 is assigned to $A_{v_{1}}^{T_{1}} [1] [1] = v_{4}$ , that is, $C_{v_{1}}^{T_{1}} [1] [1] = 1$ . Since v₁ is the root and $| A_{v_{1}}^{T_{1}} [1] | = 1 < 3$ , Table 1 is not used, and the other nodes v₅,v₂,v₃ are assigned by ASSIGN_CHILD. For v₅, each carbon position from 2 to 6 is examined (line 26 in ASSIGN_CHILD). For v₂, each position from 2 to 6 except the position assigned to v₅ is examined (line 27). For v₃, each position from 2 to 6 that is more than the position assigned to v₂ except the position assigned to v₅ is examined (line 27) because v₂ and v₃ have the same subtree and condition 3 must be satisfied. Thus, $C_{v_{1}}^{T_{1}} = ((1), (2), (3, 4)), ((1), (2), (3, 5)), ((1), (2), (3, 6)), \dots, ((1), (3), (2, 4)), ((1), (3), (2, 5)), ((1), (3), (2, 6)), \dots, ((1), (6), (4, 5))$ are examined, where ((1),(6),(2,3)),((1),(6),(2,4)),((1),(5),(2,3)) and so on are discarded in the next step.

For each benzene node v, after assignment of a carbon position list to $A_{v}^{T}$ , whether or not $C_{v}^{T}$ violates conditions 4, 5 of the normal form is confirmed (lines 5, 11, 14 in ASSIGN_CHILD). After carbon position lists are assigned to all benzene nodes, condition 6 is confirmed (line 4 in ASSIGN).

Since an input of this part, that is, an output of BfsMulEnum, satisfies conditions 1, 2 of the normal form, BfsBenNaphEnum always outputs normal trees. In ASSIGN_CHILD, a distinct carbon position list is always assigned, and all patterns are assigned (line 28). Hence, BfsBenNaphEnum outputs all distinct normal trees.

Theorem1.

BfsBenNaphEnum outputs all non-redundant molecular graphs that are solutions of Problem 1.

Figure 13 shows another example T₇ of molecular trees. T₇ includes four benzene nodes v₅, v₄, v₃, v₂ in reverse BFS order, and edges (v₂,v₄), (v₃,v₅) are merge bonds. First, our algorithm assigns carbon position lists for $A_{v_{5}}^{T_{7}} = ((v_{3}), (v_{7}))$ as $C_{v_{5}}^{T_{7}} = ((1, 2), (3)), ((1, 2), (4)), ((1, 2), (5)), ((1, 2), (6))$ . In a similar way, for $A_{v_{4}}^{T_{7}} = ((v_{2}), (v_{6}))$ , $C_{v_{4}}^{T_{7}} = ((1, 2), (3)), ((1, 2), (4)), ((1, 2), (5)), ((1, 2), (6))$ . For $A_{v_{3}}^{T_{7}} = ((v_{1}), (v_{5}))$ , $C_{v_{3}}^{T_{7}} = ((1), (2, 3)), ((1), (3, 4)), ((1), (4, 5)), ((1), (5, 6))$ are examined. In line 5 of ASSIGN_CHILD, ((1),(4,5)) and ((1),(5,6)) are discarded because ϕ_b(((1),(4,5)))=((1),(3,4)), ϕ_b(((1),(5,6)))=((1),(2,3)) for the reflection map ϕ_b with respect to the axis through positions 1 and 4, and these violate condition 4. In a similar way, for $A_{v_{2}}^{T_{7}} = ((v_{1}), (v_{4}))$ , $C_{v_{2}}^{T_{7}} = ((1), (2, 3)), ((1), (3, 4))$ are assigned. After carbon position lists are assigned to all benzene nodes, condition 6 is confirmed in line 4 of ASSIGN. If Cv₂T₇≠Cv₃T₇, then there is one symmetric path, $P = {(v_{2}, v_{3})}$ , and T₇(v₂)≥_CT₇(v₃) must be satisfied. It means that Cv₄T₇=Cv₅T₇=((1,2),(3)),((1,2),(4)),((1,2),(5)),((1,2),(6)) and Cv₂T₇=((1),(3,4))>Cv₃T₇=((1),(2,3)), or Cv₄T₇>Cv₅T₇ and Cv₂T₇≠Cv₃T₇. Hence, there are $4 + (\binom{4}{2}) \cdot 2 = 16$ structures. If Cv₂T₇=Cv₃T₇=((1),(2,3)) (or Cv₂T₇=Cv₃T₇=((1),(3,4))), then $P = {(v_{2}, v_{3}), (v_{4}, v_{5})}$ , and both of T₇(v₂)≥_CT₇(v₃) and T₇(v₄)≥_CT₇(v₅), that is, Cv₄T₇≥Cv₅T₇, must be satisfied. Hence, there are 4+3+2+1=10 structures. In total, 16+10·2=36 structures are generated by BfsBenNaphEnum for T₇.

Fig. 13 — Example of a molecular tree T ₇

Results

In this section, we show that our proposed method can enumerate chemical compounds with benzene rings and naphthalene rings correctly and efficiently. For the evaluation, although MOLGEN 3.5 is more suitable than MOLGEN 5.0 to enumerate tree-like compounds because MOLGEN 3.5 offered the possibility to define substructures like benzene or naphthalene as macro atoms but MOLGEN 3.5 cannot handle all the cases provided in Table 2, we compared proposed tool with MOLGEN 5.0. Thereby, we implemented it and installed another well-known general purpose structure generator, MOLGEN 5.0, on a computer with 3.47 GHz intel Xeon CPU and 23.5 GiB memory, and compared their computational time. The implementation of BfsBenNaphEnum is available on our supplementary web site, http://sunflower.kuicr.kyoto-u.ac.jp/jira/bfsenum/.

Table 2.

Results on execution time (sec), the number of enumerated structures by BfsBenNaphEnum and MOLGEN, and the number of chemical compounds exist in PubChem database for several instances

Chemical formula	#atoms						#all compounds in PubChem	#enumerated structures	Computational time (sec)
	n	b	C	N	O	H			BfsBenNaphEnum	MOLGEN
C ₇ O ₂ H ₈	0	1	1	0	2	8	728	19	0.001	0.053
C ₈ O ₃ H ₁₀	0	1	2	0	3	10	1602	307	0.002	0.124
C ₉ O ₄ H ₁₀	0	1	3	0	4	10	1469	6406	0.010	1.699
C ₁₀ N ₂ O ₄ H ₁₀	0	1	4	2	4	10	1592	8,333,991	12.260	957.53
	1	0	0	2	4	10		7980	0.031	69.51
C ₁₁ N ₂ H ₁₀	0	1	5	2	0	10	790	9012	0.021	630.44
	1	0	1	2	0	10		56	0.005	24.061
C ₁₂ N ₁ O ₁ H ₁₁	0	1	6	1	1	11	1582	80,883	0.155	2,611.57
	0	2	0	1	1	11		33	0.001	98.99
	1	0	2	1	1	11		888	0.009	560.98
C ₁₃ O ₂ H ₁₂	0	1	7	0	2	12	1239	162,122	0.289	6,497.55
	0	2	1	0	2	12		190	0.002	2,069.3
	1	0	3	0	2	12		2458	0.013	1,731.92
C ₁₄ O ₄ H ₁₂	0	1	8	0	4	12	1 397	19,514,480	35.655	197,264.54
	0	2	2	0	4	12		15,581	0.021	107,509.42
	1	0	4	0	4	12		337,178	1.061	97,326.71

Open in a new tab

Since MOLGEN can enumerate chemical compounds without restriction on the structure, we must specify a benzene ring and a naphthalene ring as a substructure so that the enumerated structures contain only benzene rings and naphthalene rings as cyclic structures. As can be seen from Table 2, where ‘n’ and ‘b’ denote a naphthalene ring and a benzene ring, respectively, BfsBenNaphEnum enumerated chemical compounds much faster than MOLGEN while giving the same number of enumerated structures. BfsBenNaphEnum was from 50 times to 5,000,000 times faster than MOLGEN for instances with 8 to 14 carbon atoms. Table 2 also compares the number of discovered compounds in PubChem, which are not limited to tree-like chemical compounds, with the number of compounds enumerated by the proposed algorithm for several chemical formulas. When the number of carbon atoms is large (greater than 8 in this case), the number of discovered compounds is much less than the number of enumerated compounds. This implies that there are still a numerous number of unknown compounds to be discovered, which possibly include some essential compounds. In this study, we examined chemical formulas including up to two benzene rings and one naphthalene ring because MOLGEN was not able to output results in practical time for chemical formulas including more benzene rings and naphthalene rings.

We plotted the relation between the number of enumerated structures and the computational time for both methods in Fig. 14, where both x-axis and y-axis are in a log scale. It is seen from the figure that the execution time of BfsBenNaphEnum is much smaller than that of MOLGEN.

Discussion

Our algorithm is limited to tree-like chemical structures without any cyclic structures except benzene rings and naphthalene rings while MOLGEN does not have such limitation. Therefore, in the future, we would like to extend the algorithm such that it can enumerate more complex cyclic structures, such as polycyclic aromatic compounds and nucleotides. Besides, in order to make enumeration tools practical, we need to rank enumerated structures because a large number of structures are usually enumerated. For that purpose, it might be useful to employ drug likeness filters such as Lipinski RO5, and QED score. Incorporation of such filters into our system is also important future work.

Conclusions

We proposed a way to represent a benzene ring in a molecular tree by regarding it as a new defined atom with valence six and introducing a new attribute named carbon position list to benzene nodes. Carbon position of an atom specifies which carbon in a benzene ring that the corresponding atom bonds with. We also proposed a new kind of bond called merge bond that merges two benzene rings together to form a naphthalene ring. With merge bond a molecular tree can represent a structure containing naphthalene rings without defining new kind of atom. Moreover, since a benzene ring and a naphthalene ring are symmetric structures, we defined a rule to assign carbon position lists such that no redundant structures due to the symmetry of a benzene ring and a naphthalene ring are enumerated.

The algorithm of this work consists of two main steps. Given the number of benzene rings, the number of naphthalene rings as well as a chemical formula, BfsSimEnum and BfsMulEnum are applied such that they can enumerate molecular trees with benzene nodes. Next, the new extension BfsBenNaphEnum assigns carbon position lists to benzene nodes in normal molecular trees.

To show the performance of our algorithm, all non-redundant chemical structures were enumerated for several chemical formulas by BfsBenNaphEnum and MOLGEN 5.0, a well-known general purpose structure generator. It is shown that our algorithm is reliable since it generated the same number of structures as MOLGEN, while expended much less computational time. BfsBenNaphEnum was from 50 times to 5,000,000 times faster than MOLGEN for instances with 8 to 14 carbon atoms in our experiments. This is mainly because the number of nodes decreases from six to one for each benzene ring and from ten to two for each naphthalene ring in a chemical structure and because we enumerate chemical structures in the form of trees instead of graphs.

Acknowledgements

This work was partially supported by Grants-in-Aid #26240034, #24500361, and #25-2920 from MEXT, Japan.

Footnotes

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

JJ and MH developed, implemented the methods, and drafted the manuscript. YZ and TA participated in the discussions during the development of the methods and helped draft the manuscript. All authors read and approved the final manuscript.

Contributor Information

Morihiro Hayashida, Email: morihiro@kuicr.kyoto-u.ac.jp.

Tatsuya Akutsu, Email: takutsu@kuicr.kyoto-u.ac.jp.

References

1.Ward RA, Kettle JG. Systematic enumeration of heteroaromatic ring systems as reagents for use in medicinal chemistry. J Med Chem. 2011;54(13):4670–7. doi: 10.1021/jm200338a. [DOI] [PubMed] [Google Scholar]
2.Blum LC, Reymond JL. 970 million druglike small molecules for virtual screening in the chemical universe database gdb-13. J Am Chem Soc. 2009;131(25):8732–3. doi: 10.1021/ja902302h. [DOI] [PubMed] [Google Scholar]
3.Mishima K, Kaneko H, Funatsu K. Development of a new de novo design algorithm for exploring chemical space. Mol Inform. 2014;33(11-12):779–89. doi: 10.1002/minf.201400056. [DOI] [PubMed] [Google Scholar]
4.Funatsu K, Sasaki S. Recent advances in the automated structure elucidation system, chemics. utilization of two-dimensional NMR spectral information and development of peripheral functions for examination of candidates. J Chem Inform Comput Sci. 1996;36(2):190–204. doi: 10.1021/ci950152r. [DOI] [Google Scholar]
5.Meringer M, Schymanski EL. Small molecule identification with MOLGEN and mass spectrometry. Metabolites. 2013;3:440–62. doi: 10.3390/metabo3020440. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Koichi S, Arisaka M, Koshino H, Aoki A, Iwata S, Uno T, Satoh H. Chemical structure elucidation from 13C NMR chemical shifts: Efficient data processing using bipartite matching and maximal clique algorithms. J Chem Inform Model. 2014;54:1027–35. doi: 10.1021/ci400601c. [DOI] [PubMed] [Google Scholar]
7.Bytautas L, Klein DJ, Schmalz TG. All acyclic hydrocarbons: Formula periodic table and property overlap plots via chemical combinatorics. New J Chem. 2000;24(5):329–36. doi: 10.1039/a906939i. [DOI] [Google Scholar]
8.Faulon J, Visco DP, Roe D. Enumerating molecules. Rev Comput Chem. 2005;21:209. [Google Scholar]
9.Koch MA, Schuffenhauer A, Scheck M, Wetzel S, Casaulta M, Odermatt A, Ertl P, Waldmann H. Charting biologically relevant chemical space: A structural classification of natural products (sconp) Proc Natl Acad Sci U S A. 2005;102(48):17272–7. doi: 10.1073/pnas.0503647102. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Mauser H, Stahl M. Chemical fragment spaces for de novo design. J Chem Inf Model. 2007;47(2):318–24. doi: 10.1021/ci6003652. [DOI] [PubMed] [Google Scholar]
11.Andricopulo AD, Guido RV, Oliva G. Virtual screening and its integration with modern drug design technologies. Curr Med Chem. 2008;15(1):37–46. doi: 10.2174/092986708783330683. [DOI] [PubMed] [Google Scholar]
12.Reymond JL, van Deursen R, Blum LC, Ruddigkeit L. Chemical space as a source for new drugs. MedChemComm. 2010;1(1):30–8. doi: 10.1039/c0md00020e. [DOI] [Google Scholar]
13.Bürgi JJ, Awale M, Boss SD, Schaer T, Marger F, Viveros-Paredes JM, Bertrand S, Gertsch J, Bertrand D, Reymond JL. Discovery of potent positive allosteric modulators of the α3β2 nicotinic acetylcholine receptor by a chemical space walk in chembl. ACS Chem Neurosci. 2014;5(5):346–59. doi: 10.1021/cn4002297. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Gugisch R, Kerber A, Kohnert A, Laue R, Meringer M, Rücker C, Wassermann A. MOLGEN 5.0, a molecular structure generator. Sharjah, United Arab Emirates: Bentham Science Publishers Ltd.; 2012. [Google Scholar]
15.Peironcely JE, Rojas-Chertó M, Fichera D, Reijmers T, Coulier L, Faulon JL, Hankemeier T. OMG: Open Molecule Generator. J Cheminformatics. 2012;4:21. doi: 10.1186/1758-2946-4-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Fujiwara H, Wang J, Zhao L, Nagamochi H, Akutsu T. Enumerating treelike chemical graphs with given path frequency. J Chem Inf Model. 2008;48(7):1345–57. doi: 10.1021/ci700385a. [DOI] [PubMed] [Google Scholar]
17.Shimizu M, Nagamochi H, Akutsu T. Enumerating tree-like chemical graphs with given upper and lower bounds on path frequencies. BMC Bioinformatics. 2011;12:14–3. doi: 10.1186/1471-2105-12-S14-S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Zhao Y, Hayashida M, Jindalertudomdee J, Akutsu T. Breadth-first search approach to enumeration of tree-like chemical compounds. J Bioinformatics Comput Biol. 2013;11:1343007. doi: 10.1142/S0219720013430075. [DOI] [PubMed] [Google Scholar]
19.Schüller A, Hähnke V, Schneider G. SmiLib v2.0: A Java-based tool for rapid combinatorial library enumeration. QSAR Comb Sci. 2007;26(3):407–10. doi: 10.1002/qsar.200630101. [DOI] [Google Scholar]
20.Song CM, Bernardo PH, Chai CLL, Tong JC. CLEVER: Pipeline for designing in silico chemical libraries. J Mol Graph Model. 2009;27(5):578–83. doi: 10.1016/j.jmgm.2008.09.009. [DOI] [PubMed] [Google Scholar]
21.Trinajstić N. Chemical Graph Theory. Boca Raton, Florida: CRC Press; 1992. [Google Scholar]
22.Meringer M. Handbook of Chemoinformatics Algorithms. Boca Raton, Florida: CRC Press; 2010. [Google Scholar]
23.Suzuki M, Nagamochi H, Akutsu T. Efficient enumeration of monocyclic chemical graphs with given path frequencies. J Cheminformatics. 2014;6:31. doi: 10.1186/1758-2946-6-31. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Hardinger SA, University of California LADoC . Biochemistry: Chemistry 14D: Organic Reactions and Pharmaceuticals : Course Thinkbook, Lecture Supplements, Concept Focus Questions, OWLS Problems, Practice Problems. Plymouth, MI 48170: Hayden-McNeil Pub; 2008. [Google Scholar]

[CR1] 1.Ward RA, Kettle JG. Systematic enumeration of heteroaromatic ring systems as reagents for use in medicinal chemistry. J Med Chem. 2011;54(13):4670–7. doi: 10.1021/jm200338a. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Blum LC, Reymond JL. 970 million druglike small molecules for virtual screening in the chemical universe database gdb-13. J Am Chem Soc. 2009;131(25):8732–3. doi: 10.1021/ja902302h. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Mishima K, Kaneko H, Funatsu K. Development of a new de novo design algorithm for exploring chemical space. Mol Inform. 2014;33(11-12):779–89. doi: 10.1002/minf.201400056. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Funatsu K, Sasaki S. Recent advances in the automated structure elucidation system, chemics. utilization of two-dimensional NMR spectral information and development of peripheral functions for examination of candidates. J Chem Inform Comput Sci. 1996;36(2):190–204. doi: 10.1021/ci950152r. [DOI] [Google Scholar]

[CR5] 5.Meringer M, Schymanski EL. Small molecule identification with MOLGEN and mass spectrometry. Metabolites. 2013;3:440–62. doi: 10.3390/metabo3020440. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Koichi S, Arisaka M, Koshino H, Aoki A, Iwata S, Uno T, Satoh H. Chemical structure elucidation from 13C NMR chemical shifts: Efficient data processing using bipartite matching and maximal clique algorithms. J Chem Inform Model. 2014;54:1027–35. doi: 10.1021/ci400601c. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Bytautas L, Klein DJ, Schmalz TG. All acyclic hydrocarbons: Formula periodic table and property overlap plots via chemical combinatorics. New J Chem. 2000;24(5):329–36. doi: 10.1039/a906939i. [DOI] [Google Scholar]

[CR8] 8.Faulon J, Visco DP, Roe D. Enumerating molecules. Rev Comput Chem. 2005;21:209. [Google Scholar]

[CR9] 9.Koch MA, Schuffenhauer A, Scheck M, Wetzel S, Casaulta M, Odermatt A, Ertl P, Waldmann H. Charting biologically relevant chemical space: A structural classification of natural products (sconp) Proc Natl Acad Sci U S A. 2005;102(48):17272–7. doi: 10.1073/pnas.0503647102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Mauser H, Stahl M. Chemical fragment spaces for de novo design. J Chem Inf Model. 2007;47(2):318–24. doi: 10.1021/ci6003652. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Andricopulo AD, Guido RV, Oliva G. Virtual screening and its integration with modern drug design technologies. Curr Med Chem. 2008;15(1):37–46. doi: 10.2174/092986708783330683. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Reymond JL, van Deursen R, Blum LC, Ruddigkeit L. Chemical space as a source for new drugs. MedChemComm. 2010;1(1):30–8. doi: 10.1039/c0md00020e. [DOI] [Google Scholar]

[CR13] 13.Bürgi JJ, Awale M, Boss SD, Schaer T, Marger F, Viveros-Paredes JM, Bertrand S, Gertsch J, Bertrand D, Reymond JL. Discovery of potent positive allosteric modulators of the α3β2 nicotinic acetylcholine receptor by a chemical space walk in chembl. ACS Chem Neurosci. 2014;5(5):346–59. doi: 10.1021/cn4002297. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Gugisch R, Kerber A, Kohnert A, Laue R, Meringer M, Rücker C, Wassermann A. MOLGEN 5.0, a molecular structure generator. Sharjah, United Arab Emirates: Bentham Science Publishers Ltd.; 2012. [Google Scholar]

[CR15] 15.Peironcely JE, Rojas-Chertó M, Fichera D, Reijmers T, Coulier L, Faulon JL, Hankemeier T. OMG: Open Molecule Generator. J Cheminformatics. 2012;4:21. doi: 10.1186/1758-2946-4-21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Fujiwara H, Wang J, Zhao L, Nagamochi H, Akutsu T. Enumerating treelike chemical graphs with given path frequency. J Chem Inf Model. 2008;48(7):1345–57. doi: 10.1021/ci700385a. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Shimizu M, Nagamochi H, Akutsu T. Enumerating tree-like chemical graphs with given upper and lower bounds on path frequencies. BMC Bioinformatics. 2011;12:14–3. doi: 10.1186/1471-2105-12-S14-S3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Zhao Y, Hayashida M, Jindalertudomdee J, Akutsu T. Breadth-first search approach to enumeration of tree-like chemical compounds. J Bioinformatics Comput Biol. 2013;11:1343007. doi: 10.1142/S0219720013430075. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Schüller A, Hähnke V, Schneider G. SmiLib v2.0: A Java-based tool for rapid combinatorial library enumeration. QSAR Comb Sci. 2007;26(3):407–10. doi: 10.1002/qsar.200630101. [DOI] [Google Scholar]

[CR20] 20.Song CM, Bernardo PH, Chai CLL, Tong JC. CLEVER: Pipeline for designing in silico chemical libraries. J Mol Graph Model. 2009;27(5):578–83. doi: 10.1016/j.jmgm.2008.09.009. [DOI] [PubMed] [Google Scholar]

[CR21] 21.Trinajstić N. Chemical Graph Theory. Boca Raton, Florida: CRC Press; 1992. [Google Scholar]

[CR22] 22.Meringer M. Handbook of Chemoinformatics Algorithms. Boca Raton, Florida: CRC Press; 2010. [Google Scholar]

[CR23] 23.Suzuki M, Nagamochi H, Akutsu T. Efficient enumeration of monocyclic chemical graphs with given path frequencies. J Cheminformatics. 2014;6:31. doi: 10.1186/1758-2946-6-31. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Hardinger SA, University of California LADoC . Biochemistry: Chemistry 14D: Organic Reactions and Pharmaceuticals : Course Thinkbook, Lecture Supplements, Concept Focus Questions, OWLS Problems, Practice Problems. Plymouth, MI 48170: Hayden-McNeil Pub; 2008. [Google Scholar]

PERMALINK

Enumeration method for tree-like chemical compounds with benzene rings and naphthalene rings by breadth-first search order

Jira Jindalertudomdee

Morihiro Hayashida

Yang Zhao

Tatsuya Akutsu

Abstract

Background

Results

Conclusions

Background

Preliminaries

Enumeration problem

Problem1.

Fig. 1.

Center-rooted and left-heavy

Fig. 2.

Fig. 3.

Fig. 4.

Carbon position list

Fig. 5.

Proposition1.

Proof.

Definition1.

Fig. 6.

Proposition2.

Proof.

Definition2.

Fig. 7.

Definition3.

Fig. 8.

Definition4.

Fig. 9.

Normal form of a carbon position-assigned molecular tree

Definition5.

Proposition3.

Proof.

Fig. 10.

Definition6.

Proposition4.

Proof.

Proposition5.

Proof.

Lemma1.

Proof.

Lemma2.

Proof.

Proposition6.

Proof.

Fig. 11.

Lemma3.

Proof.

Methods

Modification of BfsSimEnum and BfsMulEnum

Assignment of carbon positions for molecular trees

Table 1.

Fig. 12.

Theorem1.

Fig. 13.

Results

Table 2.

Fig. 14.

Discussion

Conclusions

Acknowledgements

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases