An Inverse QSAR Method Based on a Two-Layered Model and Integer Programming

Yu Shi; Jianshen Zhu; Naveed Ahmed Azam; Kazuya Haraguchi; Liang Zhao; Hiroshi Nagamochi; Tatsuya Akutsu

doi:10.3390/ijms22062847

. 2021 Mar 11;22(6):2847. doi: 10.3390/ijms22062847

An Inverse QSAR Method Based on a Two-Layered Model and Integer Programming

Yu Shi ¹, Jianshen Zhu ^1,^†, Naveed Ahmed Azam ^1,^†, Kazuya Haraguchi ^1,^†, Liang Zhao ², Hiroshi Nagamochi ^1,^*,^†, Tatsuya Akutsu ³

Editor: Hanoch Senderowitz

PMCID: PMC8002091 PMID: 33799613

Abstract

A novel framework for inverse quantitative structure–activity relationships (inverse QSAR) has recently been proposed and developed using both artificial neural networks and mixed integer linear programming. However, classes of chemical graphs treated by the framework are limited. In order to deal with an arbitrary graph in the framework, we introduce a new model, called a two-layered model, and develop a corresponding method. In this model, each chemical graph is regarded as two parts: the exterior and the interior. The exterior consists of maximal acyclic induced subgraphs with bounded height, the interior is the connected subgraph obtained by ignoring the exterior, and the feature vector consists of the frequency of adjacent atom pairs in the interior and the frequency of chemical acyclic graphs in the exterior. Our method is more flexible than the existing method in the sense that any type of graphs can be inferred. We compared the proposed method with an existing method using several data sets obtained from PubChem database. The new method could infer more general chemical graphs with up to 50 non-hydrogen atoms. The proposed inverse QSAR method can be applied to the inference of more general chemical graphs than before.

Keywords: QSAR, molecular design, artificial neural network, mixed integer linear programming, enumeration of graphs, cheminformatics, materials informatics

1. Introduction

Computer-aided design of chemical structures is one of the key topics in chemoinformatics. In particular, extensive studies have been done on inverse quantitative structure–activity relationships (inverse QSAR), which seek chemical structures having desired chemical activities under some constraints. In this framework, chemical compounds are usually represented as vectors of real or integer numbers, which are often called descriptors in chemoinformatics and correspond to feature vectors in machine learning. Using these chemical descriptors, various heuristic and statistical methods have been developed for inverse QSAR [1,2,3]. In many of such methods, inference or enumeration of graph structures from a given set of descriptors is a crucial subtask. Although various methods have been developed for that purpose [4,5,6,7], enumeration still remains a challenging task because the number of possible chemical graphs is huge, for example, chemical graphs with up to 30 atoms (vertices) C, N, O, and S, may exceed $10^{60}$ [8]. Furthermore, even inference is a challenging task because it is NP-hard (computationally difficult) except for some simple cases [9]. Due to this inherent difficulty, most existing methods for inverse QSAR do not guarantee optimal or exact solutions.

On the other hand, the design of novel graph structures has recently become a hot topic in artificial neural network (ANN) studies, and thus extensive studies have been done for inverse QSAR using ANNs, especially with graph convolutional networks [10]. For example, variational autoencoders [11], recurrent neural networks [12,13], grammar variational autoencoders [14], generative adversarial networks [15], and invertible flow models [16,17] have been applied. Note that QSAR using three-dimensional structures of chemical compounds (3D-QSAR) has also been studied [18]. Particularly, comparative molecular field analysis (CoMFA) has been extensively studied and applied to various molecular design problems [19,20]. In CoMFA, electrostatic potential interaction energies across superimposed molecular structures are used as descriptors and then regression is performed by using the partial least squares (PLS) fitting. Recently, deep neural networks have been applied to 3D-QSAR by combining potential interaction energies with convolutional neural networks [21]. However, in order to apply 3D-QSAR, we need to calculate accurate three-dimensional structures of chemical compounds, which is not a straightforward task.

A novel framework for inferring chemical graphs has recently been developed [22,23] based on ANNs and mixed integer linear programming (MILP), as illustrated in Figure 1. It constructs a prediction function in the first phase and infers a chemical graph in the second phase. The first phase of the framework consists of three stages. In Stage 1, we choose a chemical property $π$ and a class $G$ of graphs, where a property function a is defined so that $a (G)$ is the value of $π$ in $G \in G$ , and collect a data set $D_{π}$ of chemical graphs in $G$ such that $a (G)$ is available. In Stage 2, we introduce a feature function $f : G \to R^{K}$ for a positive integer K. In Stage 3, we construct a prediction function $η_{N}$ with an ANN $N$ that, given a vector $x \in R^{K}$ , returns a value $y = η_{N} (x) \in R$ so that $η_{N} (f (G))$ serves as a predicted value to $a (G)$ for each $G \in D_{π}$ . Given a target chemical value $y^{*}$ , the second phase infers chemical graphs $G^{*}$ with $η_{N} (f (G^{*})) = y^{*}$ in the next two stages. In Stage 4, we formulate an MILP that simulates the construction of $f (G)$ from G and the computation process in the ANN so that given a target value, $y^{*}$ , and solve the MILP to infer a chemical graph $G^{†}$ and a feature vector $x^{*}$ such that $f (G^{†}) = x^{*}$ and $η_{N} (x^{*}) = y^{*}$ . In Stage 5, we generate other chemical graphs $G^{*}$ such that $η_{N} (f (G^{*})) = y^{*}$ based on the output chemical graph $G^{†}$ .

An illustration of a framework for inferring a set of chemical graphs $G^{*}$ .

MILP formulations required in Stage 4 have been designed for chemical compounds with cycle index 0 (i.e., acyclic) [23,24], cycle index 1 [25], and cycle index 2 [26]. In particular, Azam et al. [24] introduced a restricted class of acyclic graphs that is characterize by an integer $ρ$ , called a “branch-parameter” such that the restricted class still covers most of the acyclic chemical compounds in the database.

Recently, Akutsu and Nagamochi [27] extended the idea to define a restricted class of cyclic graphs, called “ $ρ$ -lean cyclic graphs”, that covers most of the cyclic chemical compounds in the database. Based on this, they also defined a set of rules for specifying several topological substructures of a target chemical graph in a flexible way in Stage 4 before we solve an MILP. The method has been implemented by Zhu et al. [28], and computational results showed that chemical graphs with around up to 50 non-hydrogen atoms can be inferred. Although the method can infer the class of $ρ$ -lean cyclic graphs and specify topological structures of the cyclic part, we still need to introduce a new model to deal with an arbitrary graph and to include a prescribed structure in the acyclic part of a target chemical graph.

In this paper, we introduce a new model, called a two-layered model, for representing the feature of a chemical graph in order to deal with an arbitrary graph in the framework. In the two-layered model, a chemical graph G with a parameter $ρ \geq 1$ is regarded as two parts: the exterior and the interior. The exterior consists of maximal acyclic induced subgraphs with height at most $ρ$ and the interior is the connected subgraph obtained by ignoring the exterior. We define a feature vector $f (G)$ of a chemical graph G to be the frequency of adjacent atom pairs in the interior and the frequency of chemical acyclic graphs in the exterior. Figure 2 illustrates an example of a chemical graph G. For a branch-parameter $ρ = 2$ , the interior of the chemical graph G in Figure 2 is obtained by removing the set of vertices with degree 1 $ρ = 2$ times, i.e., first remove the set $V_{1} = {w_{1}, w_{2}, \dots, w_{14}}$ of vertices of degree 1 in G, and then remove the set $V_{2} = {w_{15}, w_{16}, \dots, w_{19}}$ of vertices of degree 1 in $G - V_{1}$ , where the removed vertices become the exterior-vertices of G and there are eight rooted trees $T_{1}, T_{2}, \dots, T_{8}$ in the exterior of G.

An illustration of a chemical graph G, where for $ρ = 2$ , the exterior-vertices are $w_{1}, w_{2}, \dots, w_{19}$ and the interior-vertices are $u_{1}, u_{2}, \dots, u_{28}$ .

We also introduce a new set of rules for specifying topological substructures of a target chemical graph G to be inferred so that a prescribed structure can be included in both of the acyclic and cyclic parts of G. The set of rules contains (i) a seed graph $G_{C}$ as an abstract form of a target chemical graph G; (ii) a set $F$ of chemical rooted trees as candidates for trees in the exterior of G; and (iii) lower and upper bounds on the number of components in a target chemical graph such as chemical elements, double/triple bounds and the interior-vertices in G. Figure 3a,b illustrates examples of a seed graph $G_{C}$ and a set $F$ of chemical rooted trees, respectively. Given a seed graph $G_{C}$ , the interior of a target chemical graph G is constructed from $G_{C}$ by replacing some edges $a = u v$ with paths $P_{a}$ between the end-vertices u and v, and by attaching new paths $Q_{v}$ to some vertices v. For example, the chemical graph G in Figure 2 is constructed from the seed graph $G_{C}$ in Figure 3a as follows. First replace five edges $a_{1} = u_{1} u_{2}, a_{2} = u_{1} u_{3}, a_{3} = u_{4} u_{7}, a_{4} = u_{10} u_{11}$ and $a_{5} = u_{11} u_{12}$ in $G_{C}$ with new paths $P_{a_{1}} = (u_{1}, u_{13}, u_{2})$ , $P_{a_{2}} = (u_{1}, u_{14}, u_{3})$ , $P_{a_{3}} = (u_{4}, u_{15}, u_{16}, u_{7})$ , $P_{a_{4}} = (u_{10}, u_{17}, u_{18}, u_{19}, u_{11})$ and $P_{a_{5}} = (u_{11}, u_{20}, u_{21}, u_{22}, u_{12})$ , respectively, to obtain the subgraph $G_{1}$ of G that consists of vertices depicted with squares. Next, attach to this graph $G_{1}$ three new paths, $Q_{u_{5}} = (u_{5}, u_{24})$ , $Q_{u_{18}} = (u_{18}, u_{25}, u_{26}, u_{27})$ , and $Q_{u_{22}} = (u_{22}, u_{28})$ , to obtain the interior of G in Figure 2. Finally, the chemical graph G in Figure 2 is obtained by attaching eight trees $T_{1}, T_{2}, \dots, T_{8}$ selected from the set $F$ and assigning chemical elements and bond-multiplicities in the interior. The frequency of chemical elements and the graph size are controlled with lower and upper bounds on the components in a target chemical graph G. See Section 2.2 for more details on the specification.

(a) An illustration of a seed graph $G_{C}$ where the vertices in $V_{C}$ are depicted with gray squares, the edges in $E_{(\geq 2)}$ are depicted with dotted lines, the edges in $E_{(\geq 1)}$ are depicted with dashed lines, the edges in $E_{(0 / 1)}$ are depicted with gray bold lines, and the edges in $E_{(= 1)}$ are depicted with black solid lines. (b) A set $F = {ψ_{1}, ψ_{2}, \dots, ψ_{11}} \subseteq F (D_{π})$ of 11 chemical rooted trees $ψ_{i}, i \in [1, 11]$ , where the root of each tree is depicted with a black circle.

We implemented the two-layered model and the results of computational experiments suggest that the proposed method can infer chemical graphs with around up to 50 non-hydrogen atoms.

The paper is organized as follows. Section 2.1 introduces some notions on graphs, a modeling of chemical compounds, and a choice of descriptors. Section 2.2 introduces a method of specifying topological substructures of target chemical graphs in Stage 4. Section 3 reports the results on some computational experiments conducted for chemical properties such as octanol/water partition coefficient, boiling point, melting point, flash point, lipophylicity, and solubility. Section 4 makes some concluding remarks. An MILP formulation used in Stage 4 and a review of the dynamic programming algorithm for generating isomers in Stage 5 can be found in Supplementary Materials. The proposed method/system is available at GitHub https://github.com/ku-dml/mol-infer.

2. Materials and Methods

This section presents mathematical details of our developed method. Readers not interested in mathematical details can skip this section.

2.1. Preliminary

This section introduces some notions and terminology on graphs, a modeling of chemical compounds, and our choice of descriptors.

Let $R$ , $Z$ and $Z_{+}$ denote the sets of reals, integers and non-negative integers, respectively. For two integers a and b, let $[a, b]$ denote the set of integers i with $a \leq i \leq b$ .

Graphs. Given a graph G, let $V (G)$ and $E (G)$ denote the sets of vertices and edges, respectively. For a subset $V^{'} \subseteq V (G)$ (resp., $E^{'} \subseteq E (G))$ of a graph G, let $G - V^{'}$ (resp., $G - E^{'}$ ) denote the graph obtained from G by removing the vertices in $V^{'}$ (resp., the edges in $E^{'}$ ), where we remove all edges incident to a vertex in $V^{'}$ in $G - V^{'}$ . The rank $r (G)$ of a graph G is defined to be the minimum $| F |$ of an edge subset $F \subseteq E (G)$ such that $G - F$ contains no cycle. A path with two end-vertices u and v is called a $u, v$ -path. An edge $e = u_{1} u_{2}$ in a connected graph G is called a bridge if the graph $G - e$ obtained from G by removing edge e is not connected, i.e., $G - e$ consists of two connected graphs $G_{i}$ containing vertex $u_{i}$ , $i = 1, 2$ . For a cyclic graph G, an edge e is called a core-edge if it is in a cycle of G or is a bridge $e = u_{1} u_{2}$ such that each of the connected graphs $G_{i}$ , $i = 1, 2$ of $G - e$ contains a cycle. A vertex incident to a core-edge is called a core-vertex of G.

A vertex designated in a graph G is called a root. In this paper, we designated at most two vertices as roots, and denote by $Rt (G)$ the set of roots of G. We call a graph G rooted (resp., bi-rooted) if $| Rt (G) | = 1$ (resp., $| Rt (G) | = 2$ ), where we call Gunrooted if $Rt (G) = \emptyset$ .

For a graph G, possibly with roots, a leaf-vertex is defined to be a non-root vertex $v \in V (G) \ Rt (G)$ with degree 1, call the edge $u v$ incident to a leaf vertex v a leaf-edge, and denote $V_{leaf} (G)$ and $E_{leaf} (G)$ the sets of leaf-vertices and leaf-edges in G, respectively. For a graph or a rooted graph G, we define graphs $G_{i}, i \in Z_{+}$ obtained from G by removing the set of leaf-vertices i times so that

G_{0} : = G; G_{i + 1} : = G_{i} - V_{leaf} (G_{i}),

where we call a vertex $v \in V_{leaf} (G_{k})$ a leaf k-branch and we say that a vertex $v \in V_{leaf} (G_{k})$ has height height ht( $v) = k$ in G. The height ht( $T)$ of a rooted tree T is defined to be the maximum of ht( $v)$ of a vertex $v \in V (T)$ . For an integer $k \geq 0$ , we call a rooted tree T k-lean if T has at most one leaf k-branch. For an unrooted cyclic graph G, we regard the set of non-core-edges in G induces a collection $T$ of trees each of which is rooted at a core-vertex, where we call G k-lean if each of the rooted trees in $T$ is k-lean. Nearly 97% of cyclic chemical compounds with up to 100 non-hydrogen atoms in PubChem are 2-lean [24].

Two-layered Model. Let G be an unrooted graph. For an integer $ρ \geq 0$ , which we call a branch-parameter, a two-layered model of G is a partition of G into an “interior” and an “exterior” in the following way. We call a vertex $v \in V (G)$ (resp., an edge $e \in E (G))$ of G an exterior-vertex (resp., exterior-edge) if ht( $v) < ρ$ (resp., e is incident to an exterior-vertex) and denote the sets of exterior-vertices and exterior-edges by $V^{ex} (G)$ and $E^{ex} (G)$ , respectively and denote $V^{int} (G) = V (G) \ V^{ex} (G)$ and $E^{int} (G) = E (G) \ E^{ex} (G)$ , respectively. We call a vertex in $V^{int} (G)$ (resp., an edge in $E^{int} (G)$ ) an interior-vertex (resp., interior-edge). The set $E^{ex} (G)$ of exterior-edges forms a collection of connected graphs each of which is regarded as a rooted tree T rooted at the vertex $v \in V (T)$ with the maximum ht( $v)$ , where we call T a ρ-fringe-tree (or a fringe-tree). Let $T^{ex} (G)$ denote the set of fringe-trees in G. The interior of G is defined to be the subgraph $(V^{int} (G), E^{int} (G))$ of G. Note that every core-vertex (resp., core-edge) in G is an interior-vertex (resp., interior-edge) of G. Figure 2 illustrates an example of a graph G, such that $V^{int} = {u_{1}, u_{2}, \dots, u_{28}}$ , $V^{ex} = {w_{1}, w_{2}, \dots, w_{19}}$ and $T^{ex} (G) = {T_{1}, T_{2}, \dots, T_{8}}$ for a branch-parameter $ρ = 2$ .

2.1.1. Modeling of Chemical Compounds

To represent a chemical compound, we assume that each chemical element $a$ has a unique valence $val (a) \in [1, 4]$ and we use a hydrogen-suppressed model, because hydrogen atoms can be added at the final stage under the assumption. In the hydrogen-suppressed model, a chemical compound C is represented by a tuple $G = (H, α, β)$ of a simple, connected undirected graph H and functions $α : V (H) \to Λ$ and $β : E (H) \to [1, 3]$ , where $Λ$ is a set of non-hydrogen chemical elements such as C (carbon), O (oxygen), N (nitrogen), and so on. The set of atoms and the set of bonds in the compound C are represented by the vertex set $V (H)$ and the edge set $E (H)$ , respectively. The chemical element assigned to a vertex $v \in V (H)$ is represented by $α (v)$ and the bond-multiplicity between two adjacent vertices $u, v \in V (H)$ is represented by $β (e)$ of the edge $e = u v \in E (H)$ . We say that two tuples $(H_{i}, α_{i}, β_{i}), i = 1, 2$ are isomorphic if they admit an isomorphism $ϕ$ , i.e., a bijection $ϕ : V (H_{1}) \to V (H_{2})$ such that $u v \in E (H_{1}), α_{1} (u) = a, α_{1} (v) = b, β_{1} (u v) = m$ ↔ $ϕ (u) ϕ (v) \in E (H_{2}), α_{2} (ϕ (u)) = a, α_{2} (ϕ (v)) = b, β_{2} (ϕ (u) ϕ (v)) = m$ . When $H_{i}$ is rooted at a vertex $r_{i}, i = 1, 2$ , $(H_{i}, α_{i}, β_{i}), i = 1, 2$ are rooted-isomorphic (r-isomorphic) if they admit an isomorphism $ϕ$ such that $ϕ (r_{1}) = r_{2}$ . Chemical rooted trees $T_{1}$ and $T_{5}$ in Figure 2 are r-isomorphic.

Associated with the two functions $α$ and $β$ in a tuple $G = (H, α, β)$ , we introduce the following functions: $β_{G} : V (H) \to [0, 12]$ , $ac : V (E) \to Λ \times Λ \times [1, 3]$ , $cs : V (E) \to Λ \times [1, 4]$ , and $ec : V (E) \to (Λ \times [1, 4]) \times (Λ \times [1, 4]) \times [1, 3]$ .

For a notational convenience, we use a function $β_{G} : V (H) \to [0, 4]$ such that $β_{G} (u)$ means the sum of bond-multiplicities of edges incident to a vertex u, i.e.,

β_{G} (u) ≜ \sum_{u v \in E (H)} β (u v) for each vertex u \in V (H) .

A chemical graph G is defined to be a tuple $(H, α, β)$ such that the valence condition at each vertex $v \in V (H)$ is satisfied, i.e.,

β_{G} (v) \leq val (α (v)),

where we define the hydro-degree ${deg}_{hyd} (v)$ of a vertex v to be $val (α (v)) - β_{G} (v)$ .

Figure 2 illustrates an example of a chemical graph $G = (H, α, β)$ .

To represent a feature of an edge $e = u v \in E (H)$ such that $α (u) = a$ , $α (v) = b$ and $β (e) = m$ in a chemical graph $G = (H, α, β)$ , we use a tuple $(a, b, m) \in Λ \times Λ \times [1, 3]$ , which we call the adjacency-configuration $ac (e)$ of the edge e. We introduce a total order < over the elements in $Λ$ to distinguish with $(a, b, m)$ and $(b, a, m)$ $(a \neq b)$ notationally. For a tuple $ν = (a, b, m)$ , let $\bar{ν}$ denote the tuple $(b, a, m)$ .

To represent a feature of a vertex $v \in V (H)$ with $α (v) = a$ that has d atoms in its neighbor in a chemical graph $G = (H, α, β)$ , we use a pair $(a, d) \in Λ \times [1, 4]$ , which we call the chemical symbol $cs (v)$ of the vertex v. We treat $(a, d)$ as a single symbol $a d$ , and define $Λ_{dg}$ to be the set of all chemical symbols $μ = a d \in Λ \times [1, 4]$ .

To represent a feature of an edge $e = u v \in E (H)$ such that $cs (u) = μ$ , $cs (v) = ξ$ and $β (e) = m$ in a chemical graph $G = (H, α, β)$ , we use a tuple $(μ, ξ, m) \in Λ_{dg} \times Λ_{dg} \times [1, 3]$ , which we call the edge-configuration $ec (e)$ of the edge e. We introduce a total order < over the elements in $Λ_{dg}$ to distinguish with $(μ, ξ, m)$ and $(ξ, μ, m)$ $(μ \neq ξ)$ notationally. For a tuple $γ = (μ, ξ, m)$ , let $\bar{γ}$ denote the tuple $(ξ, μ, m)$ .

To represent a feature of the exterior of a chemical graph $G = (H, α, β)$ , a $ρ$ -fringe-tree in $T^{ex} (G)$ is called a fringe-configuration in the exterior.

2.1.2. Introducing Descriptors of Feature Vectors

This section introduces descriptors to define our feature vectors. Let $π$ be a chemical property for which we will construct a prediction function $η_{N}$ from a feature vector $f (G)$ of a chemical graph to a predicted value $y \in R$ for the chemical property of G.

We first choose a set $Λ$ of non-hydrogen chemical elements and then collect a data set $D_{π}$ of chemical compounds C whose chemical elements belong to $Λ \cup {H}$ , where we regard $D_{π}$ as a set of chemical graphs that represent the chemical compounds C in $D_{π}$ . To define the interior/exterior of chemical graphs $G \in D_{π}$ , we next choose a branch-parameter $ρ$ , where we recommend $ρ = 2$ .

Let $Λ^{int} (D_{π})$ (resp., $Λ^{ex} (D_{π})$ ) denote the set of chemical elements used in the set of interior-vertices (resp., exterior-vertices) over all chemical graphs $G \in D_{π}$ , and $Γ^{int} (D_{π})$ denote the set of edge-configurations used in the set of interior-edges over all chemical graphs $G \in D_{π}$ . Let $F (D_{π})$ denote the set of chemical rooted trees $ψ$ r-isomorphic to a $ρ$ -fringe-tree $T \in T^{ex} (G)$ over all chemical graphs $G \in D_{π}$ .

We define an integer encoding of a finite set A of elements to be a bijection $σ : A \to [1, | A |]$ , where we denote by $[A]$ the set $[1, | A |]$ of integers. Introduce an integer coding of each of the sets $Λ^{int} (D_{π})$ , $Λ^{ex} (D_{π})$ , $Γ^{int} (D_{π})$ and $F (D_{π})$ . Let ${[a]}^{int}$ (resp., ${[a]}^{ex}$ ) denote the coded integer of an element $a \in Λ^{int} (D_{π})$ (resp., $a \in Λ^{ex} (D_{π})$ ), $[γ]$ denote the coded integer of an element $γ$ in $Γ^{int} (D_{π})$ and $[ψ]$ denote an element $ψ$ in $F (D_{π})$ .

For each chemical element $a \in Λ$ , let $mass (a)$ and $val (a)$ denote the mass and valence of $a$ , respectively. In our model, we use integers ${mass}^{*} (a) = ⌊ 10 \cdot mass (a) ⌋$ , $a \in Λ$ .

We define the feature vector $f (G)$ of a chemical graph $G = (H, α, β) \in D_{π}$ to be a vector that consists of the following non-negative integer descriptors ${dcp}_{i} (G)$ , $i \in [1, K]$ , where $K = 17 + | Λ^{int} (D_{π}) | + | Λ^{ex} (D_{π}) | + | Γ^{int} (D_{π}) | + | F (D_{π}) |$ .

${dcp}_{1} (G)$ : the number $n (G) = | V (G) |$ of vertices in G.
${dcp}_{2} (G)$ : the number $| V^{int} (G) |$ of interior-vertices in G.
${dcp}_{3} (G)$ : the average $\bar{ms} (G)$ of mass $^{*}$ over all non-hydrogen atoms in G, i.e., $\bar{ms} (G) ≜ \sum_{v \in V (G)} {mass}^{*} (α (v)) / n (G)$ .
${dcp}_{i} (G)$ , $i = 3 + d, d \in [1, 4]$ : the number ${dg}_{d} (G)$ of interior-vertices of degree d in G.
${dcp}_{i} (G)$ , $i = 7 + d, d \in [1, 4]$ : the number ${dg}_{d}^{int} (G)$ of interior-vertices of interior-degree ${deg}_{(V^{int}, E^{int})} (v) = d$ in the interior $(V^{int}, E^{int})$ of G.
${dcp}_{i} (G)$ , $i = 11 + d, d \in [0, 3]$ : the number ${hydg}_{d} (G)$ of vertices in G of hydro-degree ${deg}_{hyd} (v) = d$ .
${dcp}_{i} (G)$ , $i = 15 + m$ , $m \in [2, 3]$ : the number ${bd}_{m}^{int} (G)$ of interior-edges with bond multiplicity m in G, i.e., ${bd}_{m}^{int} (G) ≜ {e \in E^{int} ∣ β (e) = m}$ .
${dcp}_{i} (G)$ , $i = 17 + {[a]}^{int}$ , $a \in Λ^{int} (D_{π})$ : the frequency ${na}_{a}^{int} (G)$ of chemical element $a$ in the set of interior-vertices in G.
${dcp}_{i} (G)$ , $i = 17 + | Λ^{int} (D_{π}) | + {[a]}^{ex}$ , $a \in Λ^{ex} (D_{π})$ : the frequency ${na}_{a}^{ex} (G)$ of chemical element $a$ in the set of exterior-vertices in G.
${dcp}_{i} (G)$ , $i = 17 + | Λ^{int} (D_{π}) | + | Λ^{ex} (D_{π}) | + [γ]$ , $γ \in Γ^{int} (D_{π})$ : the frequency ${ec}_{γ} (G)$ of edge-configuration $γ$ in the set of interior-edges $e \in E^{int}$ in G.
${dcp}_{i} (G)$ , $i = 17 + | Λ^{int} (D_{π}) | + | Λ^{ex} (D_{π}) | + | Γ^{int} (D_{π}) | + [ψ]$ , $ψ \in F (D_{π})$ : the frequency ${fc}_{ψ} (G)$ of fringe-configuration $ψ$ in the set of $ρ$ -fringe-trees in G.

2.2. Specifying Target Chemical Graphs

Given a prediction function $η_{N}$ and a target value $y^{*} \in R$ , we call a chemical graph $G^{*}$ such that $η_{N} (x^{*}) = y^{*}$ for the feature vector $x^{*} = f (G^{*})$ a target chemical graph. This section presents a set of rules for specifying topological substructure of a target chemical graph in a flexible way in Stage 4.

We first describe how to reduce a chemical graph $G = (H, α, β)$ into an abstract form based on which our specification rules will be defined. To illustrate the reduction process, we use the chemical graph $G = (H, α, β)$ in Figure 2.

R1
Removal of all $ρ$ -fringe-trees: The interior $H^{int} = (V^{int} (H), E^{int} (H))$ of G is obtained by removing the non-root vertices of each $ρ$ -fringe-trees $T \in T^{ex} (G)$ . Figure 4 illustrates the interior $H^{int}$ of chemical graph G with $ρ = 2$ in Figure 2.
R2
Removal of some leaf paths: We call a $u, v$ -path Q in $H^{int}$ a leaf path if vertex v is a leaf-vertex of $H^{int}$ and the degree of each internal vertex of Q in $H^{int}$ is 2, where we regard that Q is rooted at vertex u. A connected subgraph S of the interior $H^{int}$ of G is called a cyclical-base if S is obtained from H by removing the vertices in $V (Q_{u}) \ {u}, u \in X$ for a subset X of interior-vertices and a set ${Q_{u} ∣ u \in X}$ of leaf $u, v$ -paths $Q_{u}$ such that no two paths $Q_{u}$ and $Q_{u^{'}}$ share a vertex. Figure 5a illustrates a cyclical-base $S = H^{int} - ⋃_{u \in X} (V (Q_{u}) \ {u})$ of the interior $H^{int}$ for a set ${Q_{u_{5}} = (u_{5}, u_{24}), Q_{u_{18}} = (u_{18}, u_{25}, u_{26}, u_{27}), Q_{u_{22}} = (u_{22}, u_{28})}$ of leaf paths in Figure 4.
R3
Contraction of some pure paths: A path in S is called pure if each internal vertex of the path is of degree 2. Choose a set $P$ of several pure paths in S so that no two paths share vertices except for their end-vertices. A graph $S^{'}$ is called a contraction of a graph S (with respect to $P$ ) if $S^{'}$ is obtained from S by replacing each pure $u, v$ -path with a single edge $a = u v$ , where $S^{'}$ may contain multiple edges between the same pair of adjacent vertices. Figure 5b illustrates a contraction $S^{'}$ obtained from the chemical graph S by contracting each $u v$ -path $P_{a} \in P$ into a new edge $a = u v$ , where $a_{1} = u_{1} u_{2}, a_{2} = u_{1} u_{3}, a_{3} = u_{4} u_{7}, a_{4} = u_{10} u_{11}$ , and $a_{5} = u_{11} u_{12}$ , and $P = {P_{a_{1}} = (u_{1}, u_{13}, u_{2}), P_{a_{2}} = (u_{1}, u_{14}, u_{3}), P_{a_{3}} = (u_{4}, u_{15}, u_{16}, u_{7}), P_{a_{4}} = (u_{10}, u_{17}, u_{18}, u_{19}, u_{11}), P_{a_{5}} = (u_{11}, u_{20}, u_{21}, u_{22}, u_{12})}$ of pure paths in Figure 5a.

The interior $H^{int}$ of chemical graph G with $ρ = 2$ in Figure 2.

(a) A cyclical-base $S = H^{int} - ⋃_{u \in {u_{5}, u_{18}, u_{22}}} (V (Q_{u}) \ {u})$ of the interior $H^{int}$ in Figure 4; (b) A contraction $S^{'}$ of S for a pure path set $P = {P_{a_{1}}, P_{a_{2}}, \dots, P_{a_{5}}}$ in (a), where a new edge obtained by contracting a pure path is depicted with a thick line.

We will define a set of rules so that a chemical graph can be obtained from a graph (called a seed graph in the next section) by applying processes R3 to R1 in a reverse way. We specify topological substructures of a target chemical graph with a tuple $(G_{C}, σ_{int}, σ_{ce})$ called a target specification defined under the set of the following rules.

Seed Graphs

A seed graph $G_{C} = (V_{C}, E_{C})$ is defined to be a graph (possibly with multiple edges) such that the edge set $E_{C}$ consists of four sets $E_{(\geq 2)}$ , $E_{(\geq 1)}$ , $E_{(0 / 1)}$ , and $E_{(= 1)}$ , where each of them can be empty. A seed graph plays a role of the most abstract form $S^{'}$ in R3. Figure 3a illustrates an example of a seed graph, where $V_{C} = {u_{1}, u_{2}, \dots, u_{12}}$ , $E_{(\geq 2)} = {a_{1}, a_{2}, \dots, a_{5}}$ , $E_{(\geq 1)} = {a_{6}}$ , $E_{(0 / 1)} = {a_{7}}$ , and $E_{(= 1)} = {a_{8}, a_{9}, \dots, a_{17}}$ .

A subdivision S of $G_{C}$ is a graph constructed from a seed graph $G_{C}$ according to the following rules:

-
Each edge $e = u v \in E_{(\geq 2)}$ is replaced with a $u, v$ -path $P_{e}$ of length at least 2;
-
Each edge $e = u v \in E_{(\geq 1)}$ is replaced with a $u, v$ -path $P_{e}$ of length at least 1 (equivalently e is directly used or replaced with a $u, v$ -path $P_{e}$ of length at least 2);
-
Each edge $e \in E_{(0 / 1)}$ is either used or discarded; and
-
Each edge $e \in E_{(= 1)}$ is always used directly.

We allow a possible elimination of edges in $E_{(0 / 1)}$ as an optional rule in constructing a target chemical graph from a seed graph, even though such an operation has not been included in the process R3. A subdivision S plays a role of a cyclical-base in R2. A target chemical graph $G = (H, α, β)$ will contain S as a subgraph of the interior $H^{int}$ of G.

Interior-Specification

A graph $H^{*}$ that serves as the interior $H^{int}$ of a target chemical graph G will be constructed as follows. First, construct a subdivision S of a seed graph $G_{C}$ by replacing each edge edge $e = u u^{'} \in E_{(\geq 2)} \cup E_{(\geq 1)}$ with a pure $u, u^{'}$ -path $P_{e}$ . Next, construct a supergraph $H^{*}$ of S by attaching a leaf path $Q_{v}$ at each vertex $v \in V_{C}$ or at an internal vertex $v \in V (P_{e}) \ {u, u^{'}}$ of each pure $u, u^{'}$ -path $P_{e}$ for some edge $e = u u^{'} \in E_{(\geq 2)} \cup E_{(\geq 1)}$ , where possibly $Q_{v} = v, E (Q_{v}) = \emptyset$ (i.e., we do not attach any new edges to v). We introduce the following rules for specifying the size of $H^{*}$ , the length $| E (P_{e}) |$ of a pure path $P_{e}$ , the length $| E (Q_{v}) |$ of a leaf path $Q_{v}$ , the number of leaf paths $Q_{v}$ , and a bond-multiplicity of each interior-edge, where we call the set of prescribed constants an interior-specification $σ_{int}$ :

-
Lower and upper bounds $n_{LB}^{int}, n_{UB}^{int} \in Z_{+}$ on the number of interior-vertices of a target chemical graph G.
-
For each edge $e = u u^{'} \in E_{(\geq 2)} \cup E_{(\geq 1)}$ ,
- a lower bound $ℓ_{LB} (e)$ and an upper bound $ℓ_{UB} (e)$ on the length $| E (P_{e}) |$ of a pure $u, u^{'}$ -path $P_{e}$ . (For a notational convenience, set $ℓ_{LB} (e) : = 0$ , $ℓ_{UB} (e) : = 1$ , $e \in E_{(0 / 1)}$ and $ℓ_{LB} (e) : = 1$ , $ℓ_{UB} (e) : = 1$ , $e \in E_{(= 1)}$ . )
- a lower bound ${bl}_{LB} (e)$ and an upper bound ${bl}_{UB} (e)$ on the number of leaf paths $Q_{v}$ attached to at internal vertices v of a pure $u, u^{'}$ -path $P_{e}$ .
- a lower bound ${ch}_{LB} (e)$ and an upper bound ${ch}_{UB} (e)$ on the maximum length $| E (Q_{v}) |$ of a leaf path $Q_{v}$ attached at an internal vertex $v \in V (P_{e}) \ {u, u^{'}}$ of a pure $u, u^{'}$ -path $P_{e}$ .
-
For each vertex $v \in V_{C}$ ,
- a lower bound ${ch}_{LB} (e)$ and an upper bound ${ch}_{UB} (e)$ on the number of leaf paths $Q_{v}$ attached to v, where $0 \leq {ch}_{LB} (e) \leq {ch}_{UB} (e) \leq 1$ .
- a lower bound ${ch}_{LB} (v)$ and an upper bound ${ch}_{UB} (v)$ on the length $| E (Q_{v}) |$ of a leaf path $Q_{v}$ attached to v.
-
For each edge $e = u u^{'} \in E_{C}$ , a lower bound ${bd}_{m, LB} (e)$ and an upper bound ${bd}_{m, UB} (e)$ on the number of edges with bond-multiplicity $m \in [2, 3]$ in $u, u^{'}$ -path $P_{e}$ , where we regard $P_{e}$ , $e \in E_{(0 / 1)} \cup E_{(= 1)}$ as single edge e.

We call a graph $H^{*}$ that satisfies an interior-specification $σ_{int}$ a $σ_{int}$ -extension of $G_{C}$ , where the bond-multiplicity of each edge has been determined.

Table 1 shows an example of an interior-specification $σ_{int}$ to the seed graph $G_{C}$ in Figure 3.

Table 1.

Example 1 of an interior-specification $σ_{int}$ .

$π$	$Λ$	$\| D_{π} \|$	$\| Γ^{int} (D_{π}) \|$	$\| F (D_{π}) \|$	$[\underline{n}, \bar{n}]$	$[\underline{a}, \bar{a}]$
K_OW	C,O,N	644	24	109	[4, 58]	[−7.53, 13.45]
K_OW	C,O,N,S,Cl	837	31	142	[4, 73]	[−7.53, 13.45]
B_P	C,O,N	358	21	91	[4, 30]	[−11.70, 470.0]
B_P	C,O,N,S,Cl	425	23	114	[4, 30]	[−11.70, 470.0]
M_P	C,O,N	448	22	94	[4, 122]	[−185.3, 300.0]
M_P	C,O,N,S,Cl	548	26	118	[4, 122]	[−185.3, 300.0]
F_P	C,O,N	348	20	85	[4, 66]	[−82.99, 300.0]
F_P	C,O,N,S,Cl	399	24	107	[4, 66]	[−82.99, 300.0]
L_P	C,O,N	592	27	71	[6, 60]	[−3.62, 6.84]
L_P	C,O,N,S,Cl	779	32	78	[6, 74]	[−3.62, 6.84]
S_L	C,O,N	640	25	111	[4, 55]	[−9.33, 1.11]
S_L	C,O,N,S,Cl	847	31	144	[4, 55]	[−11.60, 1.11]

$π$	$Λ$	L-Time	t- $R_{cv}^{2}$ (Best)	t- $R_{\max}^{2}$	Arch.
K_OW	C,O,N	0.7	0.959	0.983	(156,10,10,1)
K_OW	C,O,N,S,Cl	0.7	0.947	0.968	(199,20,10,1)
B_P	C,O,N	3.5	0.858	0.923	(135,30,20,1)
B_P	C,O,N,S,Cl	3.3	0.821	0.899	(163,10,1)
M_P	C,O,N	3.8	0.784	0.893	(139,40,1)
M_P	C,O,N,S,Cl	4.1	0.796	0.880	(170,10,10,1)
F_P	C,O,N	1.1	0.750	0.874	(128,40,1)
F_P	C,O,N,S,Cl	1.8	0.707	0.853	(157,10,10,1)
L_P	C,O,N	0.5	0.868	0.908	(121,30,1)
L_P	C,O,N,S,Cl	0.7	0.861	0.892	(137,20,10,1)
S_L	C,O,N	0.7	0.870	0.913	(159,30,1)
S_L	C,O,N,S,Cl	0.9	0.870	0.903	(201,30,20,1)

	$f = f_{ec}$ , $D = {\tilde{D}}_{π}$			$f = f_{fc}$ , $D = {\tilde{D}}_{π}$		$f = f_{fc}$ , $D = D_{π}$
$π$	$\| {\tilde{D}}_{π} \|$	t- $R_{cv}^{2}$ (ave.)	t- $R_{cv}^{2}$ (Best)	t- $R_{cv}^{2}$ (ave.)	t- $R_{cv}^{2}$ (Best)	$\| D_{π} \|$	t- $R_{cv}^{2}$ (ave.)	t- $R_{cv}^{2}$ (Best)
K_OW	580	0.952	0.959	0.950	0.954	837	0.944	0.947
B_P	224	0.688	0.718	0.680	0.693	425	0.809	0.821
M_P	348	0.668	0.694	0.712	0.736	548	0.776	0.796
F_P	218	0.435	0.476	0.574	0.623	399	0.688	0.707
L_P	776	0.832	0.842	0.853	0.861	779	0.854	0.861
S_L	638	0.851	0.863	0.853	0.861	847	0.860	0.870

Instance	$Λ$	$\| Γ^{int} \|$	$\| F^{*} \|$	$[n_{LB}^{int}, n_{UB}^{int}]$	$[n_{LB}, n^{*}]$
$I_{a}$	C,O,N	10	11	[30,50]	[20,28]
$I_{b, 1}$	C,O,N	28	40	[38,38]	[6,6]
$I_{b, 2}$	C,O,N	28	35	[50,50]	[30,30]
$I_{b, 3}$	C,O,N	28	30	[50,50]	[30,30]
$I_{b, 4}$	C,O,N	28	25	[50,50]	[30,30]
$I_{c}$	C,O,N	8	12	[46,46]	[24,24]
$I_{d}$	C,O,N	7	8	[40,45]	[18,18]

Instance	$[\underline{a}, \bar{a}]$	$[\underline{y}, \bar{y}]$	$y^{*}$	#v	#c	IP-Time	n	$n^{int}$
$I_{a}$	[−7.53, 13.45]	[−7.0, 13.4]	3.2	7663	9162	3.9	35	24
$I_{b, 1}$	[−7.53, 13.45]	[−7.5, 13.4]	3.0	9894	6626	17.5	38	7
$I_{b, 2}$	[−7.53, 13.45]	[−7.5, 13.4]	3.0	11,514	8934	14.0	50	30
$I_{b, 3}$	[−7.53, 13.45]	[−7.5, 13.4]	3.0	11,318	8926	24.6	50	30
$I_{b, 4}$	[−7.53, 13.45]	[−7.5, 13.4]	3.0	11,122	8918	22.0	50	30
$I_{c}$	[−7.53, 13.45]	[−7.5, 13.4]	3.0	7867	8630	2.1	49	32
$I_{d}$	[−7.53, 13.45]	[−7.5, 13.4]	3.0	5395	6899	5.2	45	23

Instance	$[\underline{a}, \bar{a}]$	$[\underline{y}, \bar{y}]$	$y^{*}$	#v	#c	IP-Time	n	$n^{int}$
$I_{a}$	[−11.70, 470.0]	[352, 470]	411	7583	8982	2.7	42	25
$I_{b, 1}$	[−11.70, 470.0]	[−11, 470]	229	9816	6449	2.7	38	7
$I_{b, 2}$	[−11.70, 470.0]	[−11, 470]	229	11,436	8757	9.1	50	30
$I_{b, 3}$	[−11.70, 470.0]	[−11, 470]	229	11,240	8749	11.0	50	30
$I_{b, 4}$	[−11.70, 470.0]	[−11, 470]	229	11,044	8741	24.0	50	30
$I_{c}$	[−11.70, 470.0]	[170, 470]	320	7575	8450	25.9	49	33
$I_{d}$	[−11.70, 470.0]	[151, 470]	310	5315	6719	4.4	43	23

Instance	$[\underline{a}, \bar{a}]$	$[\underline{y}, \bar{y}]$	$y^{*}$	#v	#c	IP-Time	n	$n^{int}$
$I_{a}$	[−185.3, 300.0]	[55, 300]	177.5	7602	9023	16.1	41	24
$I_{b, 1}$	[−185.3, 300.0]	[−180, 300]	60	9833	6487	2.3	38	9
$I_{b, 2}$	[−185.3, 300.0]	[−185, 300]	57.4	11,453	8795	44.7	50	30
$I_{b, 3}$	[−185.3, 300.0]	[−185, 300]	57.4	11,257	8787	10.5	50	30
$I_{b, 4}$	[−185.3, 300.0]	[−185, 300]	57.4	11,061	8779	93.9	50	30
$I_{c}$	[−185.3, 300.0]	[253, 300]	260.0	7580	6172	24.0	41	33
$I_{d}$	[−185.3, 300.0]	[−75, 299]	58	5110	4050	104.6	45	23

Instance	$[\underline{a}, \bar{a}]$	$[\underline{y}, \bar{y}]$	$y^{*}$	#v	#c	IP-Time	n	$n^{int}$
$I_{a}$	[−82.99, 300.0]	[98, 300]	199	7459	8696	1.6	35	22
$I_{b, 1}$	[−82.99, 300.0]	[−82, 300]	109	9694	6166	1.4	38	8
$I_{b, 2}$	[−82.99, 300.0]	[−82, 300]	109	11,314	8474	8.7	50	30
$I_{b, 3}$	[−82.99, 300.0]	[−82, 300]	109	11,118	8466	25.8	50	30
$I_{b, 4}$	[−82.99, 300.0]	[−82, 300]	109	10,922	8458	8.5	50	30
$I_{c}$	[−82.99, 300.0]	[250, 300]	275	7667	8170	60.9	47	34
$I_{d}$	[−82.99, 300.0]	[54, 300]	177	5193	6436	2.0	45	23

Instance	$[\underline{a}, \bar{a}]$	$[\underline{y}, \bar{y}]$	$y^{*}$	#v	#c	IP-Time	n	$n^{int}$
$I_{a}$	[−3.6, 6.84]	[−3.6, 6.8]	1.6	7597	9008	1.9	39	23
$I_{b, 1}$	[−3.6, 6.84]	[−3.6, 6.8]	1.6	9836	6481	2.9	38	8
$I_{b, 2}$	[−3.6, 6.84]	[−3.6, 6.8]	1.6	11,456	8789	21.1	50	30
$I_{b, 3}$	[−3.6, 6.84]	[−3.6, 6.8]	1.6	11,260	8781	20.4	50	30
$I_{b, 4}$	[−3.6, 6.84]	[−3.6, 6.8]	1.6	11,064	8773	24.2	50	30
$I_{c}$	[−3.6, 6.84]	[−3.6, 6.8]	1.6	7801	8476	1.1	47	32
$I_{d}$	[−3.6, 6.84]	[−3.6, 6.8]	1.6	5335	6754	4.3	45	23

Instance	$[\underline{a}, \bar{a}]$	$[\underline{y}, \bar{y}]$	$y^{*}$	#v	#c	IP-Time	n	$n^{int}$
$I_{a}$	[−9.33, 1.11]	[−9.3, −2.0]	−5.6	7674	9186	2.4	41	23
$I_{b, 1}$	[−9.33, 1.11]	[−9.3, −2.0]	−5.6	9906	6650	22.3	38	12
$I_{b, 2}$	[−9.33, 1.11]	[−9.3, −2.0]	−5.6	11,526	8958	15.2	50	30
$I_{b, 3}$	[−9.33, 1.11]	[−9.3, −2.0]	−5.6	11,330	8950	16.2	50	30
$I_{b, 4}$	[−9.33, 1.11]	[−9.3, −2.0]	−5.6	11,134	8942	122.7	50	30
$I_{c}$	[−9.33, 1.11]	[−9.3, −2.0]	−5.6	7874	8648	1.2	54	33
$I_{d}$	[−9.33, 1.11]	[−9.3, −3.0]	−6.1	5402	6917	8.1	43	23

		Kow			Lp			Bp
Instance	DP-Time	G-LB	#G	DP-Time	G-LB	#G	DP-time	G-LB	#G
$I_{a}$	0.031	16	16	0.164	128	100	0.164	$1.4 \times 10^{5}$	100
$I_{b}^{1}$	0.149	$2.8 \times 10^{5}$	100	0.148	$2.0 \times 10^{10}$	100	0.162	$4.4 \times 10^{5}$	100
$I_{b}^{2}$	44.1	$3.9 \times 10^{10}$	100	118	900	100	171	6	6
$I_{b}^{3}$	27.2	20	20	80.2	6	6	28.6	7	7
$I_{b}^{4}$	0.166	6000	100	73	12	12	142	5	5
$I_{c}$	0.166	6000	100	0.168	288	100	0.168	$4.0 \times 10^{5}$	100
$I_{d}$	22.3	$8.3 \times 10^{10}$	100	1.44	$3.2 \times 10^{8}$	100	1.7	$9.7 \times 10^{9}$	100

		F_P			M_P			S_L
Instance	DP-Time	G-LB	#G	DP-Time	G-LB	#G	DP-Time	G-LB	#G
$I_{a}$	0.057	32	32	0.165	256	100	0.165	1024	100
$I_{b}^{1}$	0.164	$3.1 \times 10^{6}$	100	0.166	$1.4 \times 10^{6}$	100	0.163	$4.5 \times 10^{5}$	100
$I_{b}^{2}$	28.8	720	100	8.26	$2.4 \times 10^{10}$	100	1.07	$5.6 \times 10^{9}$	100
$I_{b}^{3}$	72.2	27	27	51.9	1	1	46.5	1680	100
$I_{b}^{4}$	40.3	20	20	125	$6.1 \times 10^{7}$	100	7.01	$1.1 \times 10^{8}$	100
$I_{c}$	0.169	$1.1 \times 10^{5}$	100	0.173	6048	100	0.168	120	100
$I_{d}$	0.057	32	32	0.17	$4.2 \times 10^{8}$	100	0.165	1024	100

ANN	artificial neural network
MILP	mixed integer linear programming

PERMALINK

An Inverse QSAR Method Based on a Two-Layered Model and Integer Programming

Yu Shi

Jianshen Zhu

Naveed Ahmed Azam

Kazuya Haraguchi

Liang Zhao

Hiroshi Nagamochi

Tatsuya Akutsu

Roles

Abstract

1. Introduction

Figure 1.

Figure 2.

Figure 3.

2. Materials and Methods

2.1. Preliminary

2.1.1. Modeling of Chemical Compounds

2.1.2. Introducing Descriptors of Feature Vectors

2.2. Specifying Target Chemical Graphs

Figure 4.

Figure 5.

Table 1.

Figure 6.

Table 2.

2.3. Examples of Specification

Figure 7.

Figure 8.

3. Results

Table 3.

Table 4.

Table 5.

Figure 9.

Figure 10.

Table 6.

Table 7.

Table 8.

Table 9.

Table 10.

Table 11.

Table 12.

Figure 11.

Table 13.

Table 14.

4. Discussions and Conclusions

Acknowledgments

Abbreviations

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases