Skip to main content
Entropy logoLink to Entropy
. 2020 Aug 22;22(9):923. doi: 10.3390/e22090923

An Efficient Algorithm to Count Tree-Like Graphs with a Given Number of Vertices and Self-Loops

Naveed Ahmed Azam 1,*, Aleksandar Shurbevski 1, Hiroshi Nagamochi 1
PMCID: PMC7597174  PMID: 33286692

Abstract

Graph enumeration with given constraints is an interesting problem considered to be one of the fundamental problems in graph theory, with many applications in natural sciences and engineering such as bio-informatics and computational chemistry. For any two integers n1 and Δ0, we propose a method to count all non-isomorphic trees with n vertices, Δ self-loops, and no multi-edges based on dynamic programming. To achieve this goal, we count the number of non-isomorphic rooted trees with n vertices, Δ self-loops and no multi-edges, in O(n2(n+Δ(n+Δ·min{n,Δ}))) time and O(n2(Δ2+1)) space, since every tree can be uniquely viewed as a rooted tree by either regarding its unicentroid as the root, or in the case of bicentroid, by introducing a virtual vertex on the bicentroid and assuming the virtual vertex to be the root. By this result, we get a lower bound and an upper bound on the number of tree-like polymer topologies of chemical compounds with any “cycle rank”.

Keywords: trees, chemical graphs, enumeration, dynamic programming, polymer topology

1. Introduction

Counting and generation of discrete objects are two fundamental problems in combinatorial mathematics and have many applications in the fields of natural science and engineering, such as computational chemistry and bioinformatics. The counting problem asks to count all possible objects under given constraints. On the other hand, the generation problem asks to list all possible objects under given constraints. One of the notable advantages of the counting problem is that we can know the size of the solution space before generating all solutions.

Different kinds of enumeration methods are used to solve counting and generation problems, where branching algorithms and Polya’s enumeration theorem are the two most commonly used methods for these problems. In branching algorithms, the computation is performed by following a computation tree, and the required solutions are attained at the leaves of the computation tree. It is important to mention that the branching algorithms can only count all solutions after generating each one of them, and therefore they are inefficient for the problem where we first want to know the size of the solution space before the generation of solutions.

The well-known Polya’s enumeration theorem [1,2] is used for counting all distinct objects. The idea of this method is to use the cyclic index of the group of symmetries of the underlying object to develop a generating function, which is then used to count all possible objects. Note that finding the group of symmetries and its cyclic index is a challenging task, which may make the use of Polya’s theorem harder for some problems.

The drawback of branching algorithms discussed above and the difficulty of using Polya’s theorem necessitate the exploration of new enumeration methods to solve counting problems efficiently. For an enumeration method, it is necessary to satisfy the following three conditions:

  • (i)

    Consider all solutions: The method does not miss any of the required objects;

  • (ii)

    Avoid duplication: The method does not count and generate isomorphic objects; and

  • (iii)

    Low computational complexity: The method can count and generate all solutions in low time and space complexity.

Designing such a method is not an easy task, because of the underlying symmetries and the computation difficulty for their detection.

Counting and generation of chemical compounds have a long history and numerous applications in designing novel drugs [3,4,5,6,7,8] and structure elucidation [9]. The problem of counting and generation of chemical compounds can be viewed as the problem of enumerating graphs with given constraints. There are several available chemical compound enumeration tools [10,11,12]. We can divide these tools into two classes. One class of enumeration tools treats general graph structures [10,12]. In the other class, the tools are focused on enumerating some restricted chemical compounds. One such tool is Enumol2 [11]. Enumeration of restricted chemical compounds with specialized tools is more efficient than with the tools which use general graph structures. This led to a new trend of developing efficient enumeration of restricted chemical compounds in the field of chemoinformatics [13].

A polymer is a large molecule with interesting chemical properties consisting of many sub-molecules. From a graph-theoretic perspective, we represent the structure of a polymer with a graph G called polymer topology, possibly with self-loops and multi-edges, such that G is connected and the degree of each vertex in G is at least three [14]. For a chemical graph, we get its polymer topology by repeatedly removing the vertices of degree one and two. For example, the polymer topology of Remdesivir C27H35N6O8P Figure 1a, a potential candidate of treatment for COVID-19, is illustrated in Figure 1b.

Figure 1.

Figure 1

The chemical compound Remdesivir C27H35N6O8P and its polymer topology: (a) chemical structure of Remdesivir C27H35N6O8P obtained from the PubChem database; (b) the polymer topology of Remdesivir with six vertices, two multi-edges of multiplicity 2, one self-loop and cycle rank 4.

Tezuka and Oike [15] pointed out that a classification of polymer topologies will lay a foundation for the elucidation of structural relationships between different macro-chemical molecules and their synthetic pathways. Different kinds of graph-theoretic approaches have been applied to classify and enumerate polymer topologies [16,17]. For a connected graph G, possibly with self-loops and multi-edges, the cycle rank is defined to be the number of edges that must be removed to get a simple spanning tree of G. Recently, Haruna et al. [14] proposed a method to enumerate all polymer topologies with cycle rank up to five.

Notice that trees with no multi-edges but with Δ0 self-loops have cycle rank Δ and include all polymer topologies with the said structure. Therefore, it is of interest to count and generate all trees with no multi-edges and a given number of vertices and self-loops.

We use dynamic programming (DP) to count all mutually non-isomorphic trees with n vertices, Δ self-loops and no multi-edges. The basic idea of DP is to partition the original problem into subproblems that satisfy some recursive relations, and the union of their solution sets is equal to the solution set of the original problem. Unlike branching algorithms and Polya’s theorem, the main advantage of using the DP is that we can count all non-isomorphic structures without their generation and calculation of their group of symmetries. As an application of our results, we get lower and upper bounds on the number of tree-like polymer topologies with self-loops of a given cycle rank.

The rest of the paper is organized as follows: Section 2 reviews some notions and results related to graph theory. Section 3 explains our tree counting method. Section 4 makes some concluding remarks.

2. Preliminaries

Throughout this draft, the term graph stands for an undirected graph with no multi-edges and possibly with self-loops unless stated otherwise. Let G be a graph. We denote an edge between two vertices u and v in G by uv(=vu). Let V(G) and E(G) denote the vertex set and edge set of G, respectively. Let s(G) denote the number of self-loops in G. For a vertex vV(G), we denote by s(v) the number of self-loops on the vertex v. For a vertex v in G, let NG(v) denote the set of vertices incident to v except v itself and the degree degG(v) of v in G is defined to be |NG(v)|. A graph H with the properties V(H)V(G) and E(G)E(G) is called a subgraph of G. A simple path between two distinct vertices u,vV(G) is defined to be a subgraph P of G with vertex set V(P)={u=w1,w2,,v=wk} and edge set E(P)={wiwi+11ik1}. A graph is called a connected graph if there is a path between any two distinct vertices in the graph. A connected component of a graph G is defined to be a maximal connected subgraph H of G, i.e., for any vertex vV(G)\V(H) it holds that every subgraph with the vertex set V(H){v} is disconnected.

By Jordan [18], any simple tree with n1 vertices has either a unique vertex or edge, the removal of which creates connected components with at most (n1)/2 or exactly n/2 vertices, respectively. Such a vertex is called the unicentroid, the edge is called the bicentroid, and collectively they are called the centroid of the tree. It is important to note that there exits a bicentroid only for trees with an even number of vertices. A tree with a fixed vertex r is called a rooted tree with root r. Note that any tree can be uniquely viewed as a rooted tree by either regarding its unicentroid as the root, or in the case of a bicentroid, by introducing a virtual vertex on the bicentroid and assuming the virtual vertex as the root.

Let H be a rooted tree. Let rH denote the root of H. For any two distinct vertices u,vV(H), let PH(u,v) denote the unique simple path between them in H. For a vertex vV(H)\{rH}, we define the ancestors of v to be the vertices on the path PH(v,rH) other than v. If u is an ancestor of v, then we call v a descendant of u. For a vertex vV(H)\{rH}, the parent p(v) of v is defined to be the ancestor u of v such that uNH(v). We call the vertex v a child of p(v). Two vertices with the same parent in H are called siblings. For a vertex vV(H), let Hv denote the subtree of H rooted at v induced by v and its descendants.

Two rooted trees T and H are called isomorphic if there exists a bijection σ:V(T)V(H) such that

  • (i)

    σ(rT)=rH;

  • (ii)

    for each vertex vV(T), it holds that s(v)=s(σ(v)); and

  • (iii)

    for any two vertices u,vV(T), it holds that uvE(T) if and only if σ(u)σ(v)E(H).

For any two integers n1 and Δ0, let H(n,Δ) denote a maximal set of mutually non-isomorphic rooted trees with n vertices and Δ self-loops, and we define h(n,Δ)H(n,Δ).

3. Counting Tree-Like Graphs with a Given Number of Vertices and Self-Loops

We develop a method to compute for any two integers n1 and Δ0, the size h(n,Δ) of a maximal set H(n,Δ) of mutually non-isomorphic rooted trees with n vertices and Δ self-loops; i.e., we are interested in the following problem:

Counting Problem

Input: Two integers n1 and Δ0.

Output: h(n,Δ).

We solve this problem by using dynamic programming based on the information of the number of vertices and self-loops in the subtrees rooted at the children of the root of each tree in H(n,Δ). We define the following notions.

Let n1 and Δ0 be any two integers. For each tree HH(n,Δ), we define

Maxv(H)max{{|V(Hv)|vNH(rH)}{0}},Maxs(H)max{{s(Hv)vNH(rH),|V(Hv)|=Maxv(H)}{0}}.

Note that for any tree HH(1,Δ), it holds that Maxv(H)=0 and Maxs(H)=0.

Let m,d0 be any two integers. We define

H(n,Δ,m,d){HH(n,Δ)Maxv(H)m,Maxs(H)d}.

Observe that by the definition of H(n,Δ,m,d) it holds that

  • (i)

    H(n,Δ,m,d)=H(n,Δ,n1,d) if mn;

  • (ii)

    H(n,Δ,m,d)=H(n,Δ,m,Δ) if dΔ+1; and

  • (iii)

    H(n,Δ)=H(n,Δ,n1,Δ).

Therefore, from now on, we assume that mn1 and dΔ. Further, by the definition of H(n,Δ,m,d) it holds that H(n,Δ,m,d) (resp., H(n,Δ,m,d)=) if “n=1” or “n1m1” (resp., otherwise (n2 and m=0)).

We define

H(n,Δ,m=,d){HH(n,Δ,m,d)Maxv(H)=m}.

It follows from the definition of H(n,Δ,m=,d) that H(n,Δ,m=,d) (resp., H(n,Δ,m=,d)=) if “n=1” or “n1m1” (resp., otherwise (n2 and m=0)). Further we have the following relation:

H(n,Δ,m,d)=H(n,Δ,0=,d)ifm=0, (1)
H(n,Δ,m,d)=H(n,Δ,m1,d)H(n,Δ,m=,d)ifm1, (2)

where H(n,Δ,m1,d)H(n,Δ,m=,d)= for m1.

Next we define

H(n,Δ,m=,d=){HH(n,Δ,m=,d)Maxs(H)=d}.

Note that if “n=1 and d=0” or “n1m1” (resp., otherwise (“n=1 and d1” or “n2 and m=0”)), then by the definition of H(n,Δ,m=,d=) it holds that H(n,Δ,m=,d=) (resp., H(n,Δ,m=,d=)=). Furthermore, we get the following relation for H(n,Δ,m=,d):

H(n,Δ,m=,d)=H(n,Δ,m=,0=)ifd=0, (3)
H(n,Δ,m=,d)=H(n,Δ,m=,d1)H(n,Δ,m=,d=)ifd1, (4)

where H(n,Δ,m=,d1)H(n,Δ,m=,d=)= for d1.

Let n1m0, and Δd0 be four integers. Let h(n,Δ,m,d), h(n,Δ,m=,d) and h(n,Δ,m=,d=) denote the number of elements in the families H(n,Δ,m,d), H(n,Δ,m=,d) and H(n,Δ,m=,d=), respectively. We discuss recursive relations for h(n,Δ,m,d) and h(n,Δ,m=,d) in Lemma 1.

Lemma 1.

For any four integers n1m0, and Δd0, it holds that

  • (i)

    h(n,Δ,m,d)=h(n,Δ,0=,d) if m=0;

  • (ii)

    h(n,Δ,m,d)=h(n,Δ,m1,d)+h(n,Δ,m=,d) if m1;

  • (iii)

    h(n,Δ,m=,d)=h(n,Δ,m=,0=) if d=0; and

  • (iv)

    h(n,Δ,m=,d)=h(n,Δ,m=,d1)+h(n,Δ,m=,d=) if d1.

Proof. 

The case (i) follows by Equation (1). The case (ii) follows by Equation (2) and the fact that for m1 it holds that H(n,Δ,m1,d)H(n,Δ,m=,d)=. By Equation (3) the case (iii) follows. The case (iv) follows by Equation (4) and the fact that for d1 it holds that H(n,Δ,m=,d1)H(n,Δ,m=,d=)=. □

Next we discuss some boundary conditions for our DP to compute h(n,Δ).

Lemma 2.

For any four integers n1m0, and Δd0, it holds that

  • (i)

    h(n,Δ,0=,d=)=1 (resp., h(n,Δ,0=,d=)=0) if n=1 and d=0 (resp., otherwise (“n=1 and d1” or “ n2”));

  • (ii)

    h(n,Δ,0=,d)=h(n,Δ,0,d)=1 (resp., h(n,Δ,0=,d)=h(n,Δ,0,d)=0) if n=1 (resp., otherwise (n2));

  • (iii)

    h(n,Δ,1=,d=)=1 if “ n=2” or “ n3 and d=0”; and

  • (iv)

    h(n,Δ,1=,d)=h(n,Δ,1,d)=d+1 if “ n=2” or “ n3 and d=0”.

Proof. 

  • (i)

    The result follows from the definition of H(n,Δ,0=,d=), since a tree H with maxv(H)=0 exists if and only if |V(H)|=1 and maxs(H)=0.

  • (ii)

    By Lemma 1(i), (ii) and (iv) it holds that h(n,Δ,0,d)=h(n,Δ,0=,d)=p=0dh(n,Δ,0=,p=). This and Lemma 2(i) imply the required result.

  • (iii)

    When n2, then for any tree HH(n,Δ,1=,d=) it holds that |NH(rH)|=n1. Thus for each vNH(rH) it holds that |V(Hv)|=1 and s(Hv)=d if “n=2” or “n3 and d=0”, i.e., HvH(1,d,0,d). But by Lemma 2(ii) it holds that h(1,d,0,d)=1. Hence we have the required result.

  • (iv)
    Let “n=2” or “n3 and d=0”. By Lemma 1(iii) and (iv) it holds that h(n,Δ,1=,d)=p=0dh(n,Δ,1=,p=). This and Lemma 2(iii) imply that
    h(n,Δ,1=,d)=d+1. (5)

    Furthermore, by Lemma 1(iii) it holds that h(n,Δ,1,d)=h(n,Δ,0=,d)+h(n,Δ,1=,d). By Lemma 2(ii), we have h(n,Δ,1,d)=h(n,Δ,1=,d). Hence the result follows by Equation (5).

By Lemma 2, we can get that h(1,Δ)=1 and h(2,Δ)=Δ+1. Furthermore, Lemma 1(i)–(iv) give recursive relations for h(n,Δ,m,d) and h(n,Δ,m=,d) which depend on h(n,Δ,m=,d=). Thus for n3, m1, and Δd0, our next goal is to develop a recursive relation for h(n,Δ,m=,d=). For any tree HH(n,Δ,m=,d=) and any vertex vNH(rH), the subtree Hv of H satisfies exactly one of the following three conditions:

  • (C-1)

    V(Hv)=m and s(Hv)=d.

  • (C-2)

    V(Hv)=m and 0s(Hv)<d.

  • (C-3)

    V(Hv)<m and 0s(Hv)Δ.

For any tree HH(n,Δ,m=,d=), we define the residual tree of H to be the subtree of H rooted at rH induced by the vertices V(H)\V(Hv)vNH(rH),HvH(m,d,m1,d). Note that the residual tree of a tree H has at least one vertex, i.e., the root of H. We give an illustration of a residual tree in Figure 2.

Figure 2.

Figure 2

An illustration of a residual tree, where HH(n,Δ,m=,d=) and the residual tree of H is shown by dashed lines.

Lemma 3.

For any four integers n3, m1, and Δd0, and a tree HH(n,Δ,m=,d=), let q=|{vNH(rH)HvH(m,d,m1,d)}|. Then it holds that

  • (i)

    1q(n1)/m with qΔ/d when d1.

  • (ii)

    The residual tree of H belongs to exactly one of the families H(nqm,Δdq,m=,min{Δdq,d1}) and H(nqm,Δdq,min{nqm1,m1},Δdq).

Proof. 

  • (i)

    Since HH(n,Δ,m=,d=), there exists at least one vertex vNH(rH) such that HvH(m,d,m1,d). This implies that q1. Also, it holds that n1mq and Δdq. This implies that q(n1)/m with qΔ/d when d1.

  • (ii)

    Let K denote the residual tree of H. By the definition of K it holds that KH(nmq,Δdq,nmq1,Δdq). Furthermore, for each vertex vNH(rH)V(K), the tree Hv satisfies exactly one of the conditions (C-2) and (C-3). Now, if there exists a vertex vNH(rH)V(K) such that Hv satisfies condition (C-2), then d10, and hence KH(nqm,Δdq,m=,min{Δdq,d1}). On the other hand, if condition (C-2) does not hold for any vNH(rH)V(K); i.e., either NH(rH)V(K)= or for each vNH(rH)V(K) it holds that |V(Hv)|min{nqm1,m1} and 0s(Hv)Δdq, then by the definition of K it holds that KH(nqm,Δdq,min{nqm1,m1},Δdq). This completes the proof.

For any five integers n3, m1, Δd0, and t0, let c(m,d;t)h(m,d,m1,d)+t1t denote the number of combinations with repetition of t trees from the family H(m,d,m1,d). In Lemma 4, we give a recursive relation for h(n,Δ,m=,d=).

Lemma 4.

For any five integers n3, m1, Δd0, and q, such that 1q(n1)/m with qΔ/d when d1, it holds that

  • (i)

    h(n,Δ,m=,d=)=qc(m,d;q)h(nqm,Δ,min{nqm1,m1},Δ) if d=0;

  • (ii)

    h(n,Δ,m=,d=)=qc(m,d;q)(h(nqm,Δdq,m=,min{Δdq,d1})+h(nqm,Δdq,min{nqm1,m1},Δdq)) if d1;

  • (iii)

    h(n,Δ,m=,d=)=qc(m,d;q1)((h(m,d,m1,d)+q1)/q)h(nqm,Δ,min{nqm1,m1},Δ) if d=0; and

  • (iv)

    h(n,Δ,m=,d=)=qc(m,d;q1)((h(m,d,m1,d)+q1)/q)(h(nqm,Δdq,m=,min{Δdq,d1})+h(nqm,Δdq,min{nqm1,m1},Δdq)) if d1.

Proof. 

Let H be a tree in the family H(n,Δ,m=,d=). By Lemma 3(i), there exists a unique integer q, 1q(n1)/m with qΔ/d when d1, such that there are exactly q subtrees Hv with vNH(rH) and HvH(m,d,m1,d). Further, by Lemma 3(ii) the residual tree of H belongs to the family H(nqm,Δ,min{nqm1,m1},Δ) (resp., H(nqm,Δdq,m=,min{Δdq,d1})H(nqm,Δdq,min{nqm1,m1},Δdq)) if d=0 (resp., otherwise). Note that H(nqm,Δdq,m=,min{Δdq,d1})H(nqm,Δdq,min{nqm1,m1},Δdq)=. This implies that for a fixed integer q in the range given in the lemma, the number of trees K in the family H(n,Δ,m=,d=) with exactly q subtrees KvH(m,d,m1,d), for vNK(rK), are

  • (a)

    c(m,d;q)h(nqm,Δ,min{nqm1,m1},Δ) if d=0; and

  • (b)

    c(m,d;q)(h(nqm,Δdq,m=,min{Δdq,d1})+h(nqm,Δdq,min{nqm1,m1},Δdq)) if d1.

Note that, for m=1 and d=0, we have 1qn1, and by Lemma 2(ii) it holds that h(nq,Δ,0,Δ)=0 (resp., h(nq,Δ,0,Δ)=1), if 1qn2 (resp., otherwise (if q=n1)). This implies that any tree HH(n,Δ,1=,0=) has exactly q=n1 subtrees HvH(1,0,0,0), for vNH(rH). However, observe that for each integer m2 or d1, and q satisfying the conditions given in the lemma, there exists at least one tree HH(n,Δ,m=,d=) such that H has exactly q subtrees HvH(m,d,m1,d), for vNH(rH). Hence, this and case (a) (resp., case (b)) imply Lemma 4(i) (resp., Lemma 4(ii)).

Furthermore, it holds that

c(m,d;q)=(h(m,d,m1,d)+q1)!(h(m,d,m1,d)1)!q!=(h(m,d,m1,d)+q2)!(h(m,d,m1,d)1)!(q1)!×(h(m,d,m1,d)+q1)q=c(m,d;q1)×(h(m,d,m1,d)+q1)q.

Hence, Lemma 4(iii) and (iv) follow from Lemma 4(i) and (ii), respectively. □

We design a DP algorithm to compute h(n,Δ) based on the recursive structures of h(n,Δ,m,d), h(n,Δ,m=,d) and h(n,Δ,m=,d=), 0mn1 and 0dΔ, as given in Lemmas 1 and 4, where h(n,Δ)=h(n,Δ,n1,Δ) for n1 and Δ0.

Lemma 5.

For any four integers n1m0, and Δd0, h(n,Δ,m,d) can be obtained in O(nm(n+Δ(n+d·min{n,Δ}))) time and O(nm(Δ(d+1)+1)) space.

The proof of Lemma 5 follows from Algorithm 1 and Lemma 6.

Corollary 1.

For any two integers n1 and Δ0,h(n,Δ,n1,Δ) can be obtained in O(n2(n+Δ(n+Δ·min{n,Δ}))) time and O(n2(Δ2+1)) space.

Next, for any four integers n1m0, and Δd0, we present Algorithm 1 for solving the problem of calculating h(n,Δ,m,d). In this algorithm, for each integers 1in,0jΔ,0kmin{i,m}, and 0pmin{j,d}, the variables hi,j,k,p, hi,j,k=,p, and hi,j,k=,p= store the values of h(i,j,k,p),h(i,j,k=,p), and h(i,j,k=,p=), respectively.

Lemma 6.

For any four integers n1m0, and Δd0, Algorithm 1 outputs h(n,Δ,m,d) in O(nm(n+Δ(n+d·min{n,Δ}))) time and O(nm(Δ(d+1)+1)) space.

Proof. 

Correctness: For each integer 1in,0jΔ,0kmin{i,m}, and 0pmin{j,d}, all the substitutions and if-conditions in Algorithm 1 follow from Lemmas 1, 2, 3 and 4. Furthermore, the values h[i,j,k,p], h[i,j,k=,p], and h[i,j,k=,p=] are computed by the recursive relations given in Lemmas 1 and 4. This implies that Algorithm 1 correctly computes the required value h[n,Δ,m,d].

Complexity analysis: There are three nested loops over the variables i,j, and p at line 4, which take O(n(Δ(d+1)+1)) time. Following there are five nested loops: over variables i,j,k,p, and q at lines 5, 6, 7, 8, and 31, respectively. The loop at line 5 is of size O(n), while the loop at line 6 is of size O(Δ). Similarly, the loops at lines 7 and 8 are of size O(m) and O(d), respectively. The fifth nested loop at line 18 is of size O(n) (resp., O(min{n,Δ})) if p=0 (resp., otherwise). Thus from line 5–36, Algorithm 1 takes O(n2m) (resp., O(nmΔ(n+d·min{n,Δ}))) time if Δ=0 (resp., otherwise). Therefore, Algorithm 1 takes O(nm(n+Δ(n+d·min{n,Δ}))) time.

The algorithm stores three four-dimensional arrays. When Δ=0, for each integer 1in, and 1kmin{i,m} we store h[i,0,k,0],h[i,0,k=,0] and h[i,0,k=,0=], taking O(nm) space. When Δ1, then for each integer 1in, 0jΔ, 1kmin{i,m} and 0pmin{j,d} we store h[i,j,k,p],h[i,j,k=,p] and h[i,j,k=,p=], taking O(nmΔ(d+1)) space. Hence, Algorithm 1 takes O(nm(Δ(d+1)+1)) space. □

Algorithm 1 DP based counting algorithm for h(n,Δ,m,d)
Input: Integers n1m0 and Δd0.
Output:h(n,Δ,m,d).
h[1,j,0=,0=]:=h[1,j,0=,p]:=h[1,j,0,p]:=1;
h[i,j,0=,p]:=h[i,j,0,p]:=0;
h[2,j,1=,p=]:=1; h[2,j,1=,p]:=h[2,j,1,p]:=p+1
 for each 2in,0jΔ, 0pmin{j,d};
for i:=3,4,,n do
   for j:=0,1,,Δ do
     for k:=1,2,,min{i,m} do
       for p:=0,1,,min{j,d} do
         if p=0 and k=1 then
           h[i,j,1=,0=]:=h[i,j,1=,0]:=h[i,j,1,0]:=1
         else /* p1 or k2 */
           c:=1; h[i,j,k=,p=]:=0; /* Initialization */
           if p=0 then
             :=(i1)/k
           else /* p1 */
             :=min{(i1)/k,j/p}
           end if;
           for q:=1,2,, do
             c:=c·(h[k,p,k1,p]+q1)/q;
             if p=0 then
               h[i,j,k=,p=]:=h[i,j,k=,p=]+c·h[iqk,j,min{ikq1,k1},j]
             else /* p1 */
               h[i,j,k=,p=]:=h[i,j,k=,p=]+c·h[ikq,jpq,k=,min{jpq,p1}]+h[ikq,jpq,min{ikq1,k1},jpq]
             end if
           end for;
           if p=0then /* k2 */
             h[i,j,k=,0]:=h[i,j,k=,0=]
           else /* p1 */
             h[i,j,k=,p]:=h[i,j,k=,p1]+h[i,j,k=,p=]
          end if;
          h[i,j,k,p]:=h[i,j,k1,p]+h[i,j,k=,p]
        end if
      end for
    end for
  end for
end for;
output h[n,Δ,m,d] as h(n,Δ,m,d).

Theorem 1.

For any two integers n1 and Δ0, the number of non-isomorphic trees with n vertices and Δ self-loops can be obtained in O(n2(n+Δ(n+Δ·min{n,Δ}))) time and O(n2(Δ2+1)) space.

Proof. 

By Jordan [18], we can uniquely consider any tree as a rooted tree by either regarding its unicentroid as the root, or in the case of a bicentroid, by introducing a virtual vertex on the bicentroid and assuming the virtual vertex as the root of the tree. By the definition of a unicentroid, the number of mutually non-isomorphic trees with n vertices, Δ self-loops and a unicentroid is h(n,Δ,(n1)/2,Δ). Further, if n is even, then there exist trees with n vertices and a bicentroid. This implies that the number of mutually non-isomorphic trees with n vertices and Δ self-loops is h(n,Δ,(n1)/2,Δ) when n is odd. Let n be an even integer. Then any tree H with n vertices, Δ self-loops and a bicentroid has two connected components, A and B obtained by the removal of the bicentroid such that AH(n/2,i,n/21,i) and BH(n/2,Δi,n/21,Δi) for some 0iΔ/2, where if Δ is even then for i=Δ/2, both of the components A and B belong to H(n/2,Δ/2,n/21,Δ/2).

Note that for any 0i(Δ1)/2, it holds that

H(n/2,i,n/21,i)H(n/2,Δi,n/21,Δi)=.

Therefore, when Δ is odd (resp., even), the number of mutually non-isomorphic trees with n vertices, Δ self-loops, and a bicentroid is

i=0(Δ1)/2h(n/2,i,n/21,i)h(n/2,Δi,n/21,Δi)+αh(n/2,Δ/2,n/21,Δ/2)+12,

such that α=0 (resp., α=1). Thus, the number of mutually non-isomorphic trees with n vertices and Δ self-loops is

h(n,Δ,(n1)/2,Δ)+i=0(Δ1)/2h(n/2,i,n/21,i)h(n/2,Δi,n/21,Δi)+αh(n/2,Δ/2,n/21,Δ/2)+12 (6)

such that α=0 (resp., α=1) when Δ is odd (resp., even). Moreover, for each 0iΔ, Algorithm 1 also computes and stores h(n/2,i,n/21,i) during the calculation of h(n,Δ,(n1)/2,Δ), and therefore the required result follows from Lemma 6. □

We implemented the proposed DP algorithm and counting trees with a given number of vertices and self-loops. The experimental results in Table 1 show that the proposed method efficiently counts trees with n vertices and Δ self-loops.

Table 1.

Experimental result of the counting method.

(n,Δ) Number of Trees Time [s]
(10,0) 106 0.000173
(20,0) 823,065 0.00048
(10,5) 91,037 0.001193
(10,30) 6,629,790,712 0.00881
(20,10) 5,143,681,226,004 0.006869
(30,10) 2,547,562,522,909,694,331 0.015901

We next give a lower bound and an upper bound on the number of tree-like polymer topologies with self-loops of a given rank. For this we prove the following results.

Lemma 7.

For an integer n2, there exists at least one tree-like polymer with n vertices and Δ self-loops if Δn2+1.

Proof. 

Consider a tree T of n vertices of diameter n2 such that T contains a path of length n2, in which each non-end vertex has degree at least 3. Observe that when n is even, the tree T has exactly n21 vertices of degree 3, and hence nn2+1=n2+1 vertices of degree less than 3. When n is odd, the tree T has n23 vertices of degree 3 and one vertex of degree 4. Thus, in this case, the number of vertices of degree less than 3 is nn2+2=2n21n2+2=n2+1. This implies that T can be transformed into a polymer with n2+1 self-loops by assigning a self-loop to each vertex of degree less than 3. Hence, n2+1 self-loops are sufficient to get a tree-like polymer with n vertices. □

For two integers n1 and Δ0, let t(n,Δ) denote the number of trees with n vertices and Δ self-loops. For r1, let p(r) denote the number of tree-like polymers with self-loops and no multi-edges of rank r. Observe that a tree with n vertices and k self-loops at each vertex is a polymer with n vertices of cycle rank kn. From this fact and Lemma 7 it holds that

n,kZ+:nk=rt(n,0)p(r)nZ+:n2+1rt(n,r).

4. Conclusions

This paper presented an efficient method to count the number of all mutually non-isomorphic trees with a given number of vertices and self-loops. The proposed method is based on dynamic programming where we count the number of all mutually non-isomorphic rooted trees with a given number n of vertices and Δ self-loops in O(n2(n+Δ(n+Δ·min{n,Δ}))) time and O(n2(Δ2+1)) space. As an application of our results, we gave lower and upper bounds on the number of tree-like polymer topologies with a given cycle rank. This is an interesting application of DP to objects such as trees, and offers the advantage of getting the size of the entire solution space at low computational complexity without explicitly generating each object.

An interesting direction for future research is to efficiently generate all mutually non-isomorphic trees with a given number of vertices and self-loops by using the result from the developed counting method. Further, another possible extension of this research is to count and generate all mutually non-isomorphic tree-like polymer topologies with a given number of vertices and self-loops.

Author Contributions

Conceptualization, N.A.A. and H.N.; funding acquisition, N.A.A.; methodology, N.A.A. and H.N.; software, N.A.A.; supervision, H.N.; validation, N.A.A., A.S. and H.N.; writing—original draft, N.A.A.; writing—review and editing, N.A.A. and A.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research is partially funded by JSPS KAKENHI Grant Number 18J23484.

Conflicts of Interest

The authors declare no conflict of interest.

References

  • 1.Pólya G. Kombinatorische anzahlbestimmungen für gruppen, graphen und chemische verbindungen. Acta Math. 1937;68:145–254. doi: 10.1007/BF02546665. [DOI] [Google Scholar]
  • 2.Polya G., Read R.C. Combinatorial Enumeration of Groups, Graphs, and Chemical Compounds. Springer Science & Business Media; New York, NY, USA: 2012. [Google Scholar]
  • 3.Blum L.C., Reymond J.L. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J. Am. Chem. Soc. 2009;131:8732–8733. doi: 10.1021/ja902302h. [DOI] [PubMed] [Google Scholar]
  • 4.Azam N.A., Chiewvanichakorn R., Zhang F., Shurbevski A., Nagamochi H., Akutsu T. A method for the inverse QSAR/QSPR based on artificial neural networks and mixed integer linear programming; Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies—Volume 3: Bioinformatics; Valletta, Malta. 24–26 February 2020. [Google Scholar]
  • 5.Ito R., Azam N.A., Wang C., Shurbevski A., Nagamochi H., Akutsu T. Advances in Computer Vision and Computational Biology. Springer; Berlin/Heidelberg, Germany: 2020. A novel method for the inverse QSAR/QSPR to monocyclic chemical compounds based on artificial neural networks and integer programming. (Springer Nature-Research Book Series). [Google Scholar]
  • 6.Zhu J., Wang C., Shurbevski A., Nagamochi H., Akutsu T. A novel method for inference of chemical compounds of cycle index two with desired properties based on artificial neural networks and integer programming. Algorithms. 2020;13:124. doi: 10.3390/a13050124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Méndez-Lucio O., Baillif B., Clevert D.A., Rouquié D., Wichard J. De novo generation of hit-like molecules from gene expression signatures using artificial intelligence. Nat. Commun. 2020;11:10. doi: 10.1038/s41467-019-13807-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Lim J., Hwang S.Y., Moon S., Kim S., Kim W.Y. Scaffold-based molecular design with a graph generative model. Chem. Sci. 2020;11:1153–1164. doi: 10.1039/C9SC04503A. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Meringer M., Schymanski E.L. Small molecule identification with MOLGEN and mass spectrometry. Metabolites. 2013;3:440–462. doi: 10.3390/metabo3020440. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Benecke C., Grund R., Hohberger R., Kerber A., Laue R., Wieland T. MOLGEN+, a generator of connectivity isomers and stereoisomers for molecular structure elucidation. Anal. Chim. Acta. 1995;314:141–147. doi: 10.1016/0003-2670(95)00291-7. [DOI] [Google Scholar]
  • 11. [(accessed on 4 July 2020)]; Available online: http://sunflower.kuicr.kyoto-u.ac.jp/tools/enumol2/
  • 12.Peironcely J.E., Rojas-Chertó M., Fichera D., Reijmers T., Coulier L., Faulon J.L., Hankemeier T. OMG: Open molecule generator. J. Cheminf. 2012;4:21. doi: 10.1186/1758-2946-4-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Vogt M., Bajorath J. Chemoinformatics: A view of the field and current trends in method development. Bioorg. Med. Chem. 2012;20:5317–5323. doi: 10.1016/j.bmc.2012.03.030. [DOI] [PubMed] [Google Scholar]
  • 14.Haruna T., Horiyama T., Shimokawa K. On the enumeration of polymer topologies. IPSJ SIG Tech. Rep. 2017;2017-Al-162:1–5. [Google Scholar]
  • 15.Tezuka Y., Oike H. Topological polymer chemistry. Prog. Polym. Sci. 2002;27:1069–1122. doi: 10.1016/S0079-6700(02)00009-6. [DOI] [Google Scholar]
  • 16.Galina H., Sysło M.M. Some applications of graph theory to the study of polymer configuration. Discret. Appl. Math. 1988;19:167–176. doi: 10.1016/0166-218X(88)90012-1. [DOI] [Google Scholar]
  • 17.Zimm B.H., Stockmayer W.H. The dimensions of chain molecules containing branches and rings. J. Chem. Phys. 1949;17:1301–1314. doi: 10.1063/1.1747157. [DOI] [Google Scholar]
  • 18.Jordan C. Sur les assemblages de lignes. J. Reine Angew. Math. 1869;70:81. [Google Scholar]

Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES