Skip to main content
Entropy logoLink to Entropy
. 2022 Jan 12;24(1):116. doi: 10.3390/e24010116

On the Depth of Decision Trees with Hypotheses

Mikhail Moshkov 1
Editors: Alessandra Palmigiano1, Yiyu Yao1, Willem Conradie1, Friedhelm Schwenker1
PMCID: PMC8774416  PMID: 35052142

Abstract

In this paper, based on the results of rough set theory, test theory, and exact learning, we investigate decision trees over infinite sets of binary attributes represented as infinite binary information systems. We define the notion of a problem over an information system and study three functions of the Shannon type, which characterize the dependence in the worst case of the minimum depth of a decision tree solving a problem on the number of attributes in the problem description. The considered three functions correspond to (i) decision trees using attributes, (ii) decision trees using hypotheses (an analog of equivalence queries from exact learning), and (iii) decision trees using both attributes and hypotheses. The first function has two possible types of behavior: logarithmic and linear (this result follows from more general results published by the author earlier). The second and the third functions have three possible types of behavior: constant, logarithmic, and linear (these results were published by the author earlier without proofs that are given in the present paper). Based on the obtained results, we divided the set of all infinite binary information systems into four complexity classes. In each class, the type of behavior for each of the considered three functions does not change.

Keywords: test theory, rough set theory, exact learning, decision trees, complexity classes

1. Introduction

Decision trees are studied in different areas of computer science, in particular in exact learning [1], rough set theory [2,3,4], and test theory [5]. In some sense, these theories deal with dual objects: for example, membership queries from exact learning correspond to attributes from test theory and rough set theory. In contrast to test theory and rough set theory, in exact learning, besides membership queries, equivalence queries are also considered.

We extend the model considered in test theory and rough set theory by adding the notion of a hypothesis that is an analog of equivalence query. Papers [6,7,8,9,10] are related mainly to the experimental study of decision trees with hypotheses. The present paper contains a theoretical study of the depth of decision trees with hypotheses.

An infinite binary information system is a pair U=(A,F) where A is an infinite set of elements and F is an infinite set of functions (attributes) from A to {0,1}. A problem over U is given by a finite number of attributes f1,,fn from F: for aA, we should find the tuple (f1(a),,fn(a)). To solve this problem, we can use decision trees with two types of queries. We can ask about the value of an attribute fi{f1,,fn}. As a result, we obtain an answer of the kind fi(x)=δ where δ{0,1}. We also can ask if a hypothesis f1(x)=δ1,,fn(x)=δn is true where δ1,,δn{0,1}. Either we obtain the confirmation or a counterexample in the form fi(x)=¬δi.

The depth of decision trees with hypotheses can be essentially less than the depth of decision trees using only attributes. As an example, we consider the problem of the computation of the disjunction x1xn. The minimum depth of a decision tree solving this problem using only attributes x1,,xn is equal to n. However, the minimum depth of a decision tree with hypotheses solving this problem is equal to one: it is enough to ask only about the hypothesis x1=0,,xn=0. If it is true, then the considered disjunction is equal to zero. Otherwise, it is equal to one.

Based on the results of exact learning, rough set theory, and test theory [1,11,12,13,14,15,16], we study for an arbitrary infinite binary information system three functions of the Shannon type that characterize the growth in the worth case of the minimum depth of a decision tree solving a problem with the growth of the number of attributes in the problem description. The considered three functions correspond to the following three cases:

  • (i)

    Only attributes are used in decision trees;

  • (ii)

    Only hypotheses are used in decision trees;

  • (iii)

    Both attributes and hypotheses are used in decision trees.

We show that the first function has two possible types of behavior: logarithmic and linear. The second and third functions have three possible types of behavior: constant, logarithmic, and linear. Bounds for the case (i) can be derived from more general results obtained in [15,16]. Results related to the cases (ii) and (iii) were presented in the conference paper [17] without proofs. In the present paper, we consider complete proofs for the cases (ii) and (iii). We also investigate the join behavior of these three functions and describe four complexity classes of infinite binary information systems; these results are completely new.

The obtained results allow us to understand the difference of time complexity for conventional decision trees that use only queries based on one attribute each and for decision trees with hypotheses. Moreover, we know now which combinations of types of behavior of the three Shannon-type functions we can take under consideration of an arbitrary infinite binary system, and we know the criteria for each combination.

This paper consists of six sections. In Section 2 and Section 3, we consider the basic notions and main results. Section 4 and Section 5 contain proofs of the main results, and Section 6 gives a short conclusion.

2. Basic Notions

Let A be a set of elements and F be a set of functions from A to {0,1}. Functions from F are called attributes, and the pair U=(A,F) is called a binary information system (this notion is close to the notion of information systems proposed by Pawlak [18]). If A and F are infinite sets, then the pair U=(A,F) is called an infinite binary information system.

A problem over U is an arbitrary n-tuple z=(f1,,fn) where nN, N is the set of natural numbers {1,2,}, and f1,,fnF. The problem z may be interpreted as a problem of searching for the tuple z(a)=(f1(a),,fn(a)) for an arbitrary aA. The number dimz=n is called the dimension of the problem z. Denote F(z)={f1,,fn}. We denote by P(U) the set of problems over U.

A system of equations over U is an arbitrary equation system of the kind:

{g1(x)=δ1,,gm(x)=δm}

where mN{0}, g1,,gmF, and δ1,,δm{0,1} (if m=0, then the considered equation system is empty). This equation system is called a system of equations over z if g1,,gmF(z). The considered equation system is called consistent (on A) if its set of solutions on A is nonempty. The set of solutions of the empty equation system coincides with A.

As algorithms for problem z solving, we consider decision trees with two types of queries. We can choose an attribute fiF(z) and ask about its value. This query has two possible answers: {fi(x)=0} and {fi(x)=1}. We can formulate a hypothesis over z in the form H={f1(x)=δ1,,fn(x)=δn} where δ1,,δn{0,1} and ask about this hypothesis. This query has n+1 possible answers: H,{f1(x)=¬δ1},,{fn(x)=¬δn} where ¬1=0 and ¬0=1. The first answer means that the hypothesis is true. Other answers are counterexamples.

A decision tree over z is a marked finite directed tree with the root in which:

  • Each terminal node is labeled with an n-tuple from the set {0,1}n;

  • Each node, which is not terminal (such nodes are called working), is labeled with an attribute from the set F(z) or with a hypothesis over z;

  • If a working node is labeled with an attribute fi from F(z), then there are two edges, which leave this node and are labeled with the systems of equations {fi(x)=0} and {fi(x)=1}, respectively;

  • If a working node is labeled with a hypothesis:
    H={f1(x)=δ1,,fn(x)=δn}
    over z, then there are n+1 edges, which leave this node and are labeled with the system of equations H,{f1(x)=¬δ1},,{fn(x)=¬δn}, respectively.

Let Γ be a decision tree over z. A complete path in Γ is an arbitrary directed path from the root to a terminal node in Γ. We now define an equation system S(ξ) over U associated with the complete path ξ. If there are no working nodes in ξ, then S(ξ) is the empty system. Otherwise, S(ξ) is the union of equation systems assigned to the edges of the path ξ. We denote by A(ξ) the set of solutions on A of the system of equations S(ξ) (if this system is empty, then its solution set is equal to A).

We say that a decision tree Γ over z solves the problem z relative to U if, for each element aA and for each complete path ξ in Γ such that aA(ξ), the terminal node of the path ξ is labeled with the tuple z(a).

We now consider an equivalent definition of a decision tree solving a problem. Denote by ΔU(z) the set of tuples (δ1,,δn){0,1}n such that the system of equations {f1(x)=δ1,,fn(x)=δn} is consistent. The set ΔU(z) is the set of all possible solutions to the problem z. Let ΔΔU(z), fi1,,fim{f1,,fn}, and σ1,,σm{0,1}. Denote:

Δ(fi1,σ1)(fim,σm)

the set of all n-tuples (δ1,,δn)Δ for which δi1=σ1,,δim=σm.

Let Γ be a decision tree over the problem z. We correspond to each complete path ξ in the tree Γ a word π(ξ) in the alphabet {(fi,δ):fiF(z),δ{0,1}}. If the equation system S(ξ) is empty, then π(ξ) is the empty word. If S(ξ)={fi1(x)=σ1,,fim(x)=σm}, then π(ξ)=(fi1,σ1)(fim,σm). The decision tree Γ over z solves the problem z relative to U if, for each complete path ξ in Γ, the set ΔU(z)π(ξ) contains at most one tuple, and if this set contains exactly one tuple, then the considered tuple is assigned to the terminal node of the path ξ.

As the time complexity of a decision tree Γ, we consider its depth h(Γ), that is the maximum number of working nodes in a complete path in the tree Γ.

Let zP(U). We denote by hU(1)(z) the minimum depth of a decision tree over z, which solves z relative to U and uses only attributes from F(z). We denote by hU(2)(z) the minimum depth of a decision tree over z, which solves z relative to U and uses only hypotheses over z. We denote by hU(3)(z) the minimum depth of a decision tree over z, which solves z relative to U and uses both attributes from F(z) and hypotheses over z.

For i=1,2,3, we define a function of the Shannon type hU(i)(n) that characterizes the dependence of hU(i)(z) on dimz in the worst case. Let i{1,2,3} and nN. Then:

hU(i)(n)=max{hU(i)(z):zP(U),dimzn}.

3. Main Results

Let U=(A,F) be an infinite binary information system and rN. The information system U is called r-reduced if, for each consistent on A system of equations over U, there exists a subsystem of this system that has the same set of solutions and contains at most r equations. We denote by R the set of infinite binary information systems each of which is r-reduced for some rN.

The next theorem follows from the results obtained in [15], where we considered closed classes of test tables (decision tables). It also follows from the results obtained in [16], where we considered the weighted depth of decision trees.

Theorem 1.

Let U be an infinite binary information system. Then, the following statements hold:

  • (a) 

    If UR, then hU(1)(n)=Θ(logn);

  • (b) 

    If UR, then hU(1)(n)=n for any nN.

A subset {f1,,fm} of F is called independent if, for any δ1,,δm{0,1}, the system of equations {f1(x)=δ1,,fm(x)=δm} is consistent on the set A. The empty set of attributes is independent by definition. We now define the independence dimension or I-dimensionI(U) of the information system U (this notion is similar to the notion of the independence number of the family of sets considered by Naiman and Wynn in [19]). If, for each mN, the set F contains an independent subset of cardinality m, then I(U)=. Otherwise, I(U) is the maximum cardinality of an independent subset of the set F. We denote by D the set of infinite binary information systems with a finite independence dimension.

Let U=(A,F) be a binary information system, which is not necessarily infinite, fF, and δ{0,1}. Denote:

A(f,δ)={a:aA,f(a)=δ}.

We now define inductively the notion of a k-information system, kN{0}. The binary information system U is called a 0-information system if all attributes from F are constant on the set A. Let, for some kN{0}, the notion of a m-information system be defined for m=0,,k. The binary information system U is called a (k+1)-information system if it is not a m-information system for m=0,,k and, for any fF, there exist numbers δ{0,1} and m{0,,k} such that the information system (A(f,δ),F) is a m-information system. It is easy to show by induction on k that if U=(A,F) is a k-information system, then U=(A,F), AA, is a l-information system for some lk. We denote by C the set of infinite binary information systems for each of which there exists kN such that the considered system is a k-information system. The following theorem was presented in [17] without proof.

Theorem 2.

Let U be an infinite binary information system. Then, the following statements hold:

  • (a) 

    If UC, then hU(2)(n)=O(1) and hU(3)(n)=O(1);

  • (b) 

    If UDC, then hU(2)(n)=Θ(logn), hU(3)(n)=Ω(lognloglogn), and hU(3)(n)=O(logn);

  • (c) 

    If UD, then hU(2)(n)=n and hU(3)(n)=n for any nN.

Let U be an infinite binary information system. We now consider the join behavior of the functions hU(1)(n), hU(2)(n), and hU(3)(n). It depends on the belonging of the information system U to the sets R, D, and C. We correspond to the information system U its indicator vector ind(U)=(c1,c2,c3){0,1}3 in which c1=1 if and only if UR, c2=1 if and only if UD, and c3=1 if and only if UC.

Theorem 3.

For any infinite binary information system, its indicator vector coincides with one of the rows of Table 1. Each row of Table 1 is the indicator vector of some infinite binary information system.

Table 1.

Possible indicator vectors of infinite binary information systems.

R D C
1 0 0 0
2 0 1 0
3 0 1 1
4 1 1 0

For i=1,2,3,4, we denote by Vi the class of all infinite binary information systems, for which the indicator vector coincides with the ith row of Table 1. Table 2 summarizes Theorems 1–3. The first column contains the name of complexity class Vi. The next three columns describe the indicator vector of information systems from this class. The last three columns hU(1)(n), hU(2)(n), and hU(3)(n) contain information about the behavior of the functions hU(1)(n), hU(2)(n), and hU(3)(n) for information systems from the class Vi.

Table 2.

Summary of Theorems 1–3.

R D C hU(1)(n) hU(2)(n) hU(3)(n)
V1 0 0 0 n n n
V2 0 1 0 n Θ(logn) Ω(lognloglogn),O(logn)
V3 0 1 1 n O(1) O(1)
V4 1 1 0 Θ(logn) Θ(logn) Ω(lognloglogn),O(logn)

4. Proof of Theorem 2

We precede with the proof of Theorem 2 by two lemmas.

Let dN. A d-complete tree over the information system U=(A,F) is a marked finite directed tree with the root in which:

  • Each terminal node is not labeled;

  • Each nonterminal node is labeled with an attribute fF. There are two edges leaving this node that are labeled with the systems of equations {f(x)=0} and {f(x)=1}, respectively;

  • The length of each complete path (the path from the root to a terminal node) is equal to d;

  • For each complete path ξ, the equation system S(ξ), which is the union of equation systems assigned to the edges of the path ξ, is consistent.

Let G be a d-complete tree over U and F(G) be the set of all attributes attached to the nonterminal nodes of the tree G. The number of nonterminal nodes in G is equal to 20+21++2d1=2d1. Therefore, |F(G)|2d.

The results mentioned in the following lemma are obtained by methods similar to those used by Littlestone [12], Maass and Turán [13], and Angluin [11].

Lemma 1.

Let U=(A,F) be a binary information system, dN, G be a d-complete tree over U, and z be a problem over U such that F(G)F(z). Then

  • (a) 

    hU(2)(z)d;

  • (b) 

    hU(3)(z)dlog2(2d).

Proof. 

(a) We prove the inequality hU(2)(z)d by induction on d. Let d=1. Then, the tree G has the only one nonterminal node, which is labeled with an attribute f that is not constant on A. Therefore, |ΔU(z)|2 and hU(2)(z)1. Let, for tN and for any natural d, 1dt, the considered statement hold. Assume now that d=t+1, G is a d-complete tree over U, z is a problem over U such that F(G)F(z), and Γ is a decision tree over z with the minimum depth, which solves the problem z and uses only hypotheses. Let f be the attribute attached to the root of the tree G and H be the hypothesis attached to the root of the decision tree Γ. Then, there is an edge that leaves the root of Γ and is labeled with the equation system {f(x)=δ} where the equation f(x)=¬δ belongs to the hypothesis H. This edge enters to the root of the subtree of Γ, which is denoted by Γf. There is an edge that leaves the root of G and is labeled with the equation system {f(x)=δ}. This edge enters the root of the subtree of G, which is denoted by Gδ. One can show that the decision tree Γf solves the problem z relative to the information system U=(A(f,δ),F) and Gδ is a t-complete tree over U. It is clear that F(Gδ)F(z). Using the inductive hypothesis, we obtain h(Γf)t. Therefore, h(Γ)t+1=d and hU(2)(z)d.

(b) We now prove the inequality hU(3)(z)dlog2(2d). Let z=(f1,,fn) and Γ be a decision tree over z with the minimum depth, which solves the problem z and uses both attributes and hypotheses. The d-complete tree G has 2d complete paths ξ1,,ξ2d. For i=1,,2d, we denote by ai a solution of the equation system S(ξi). Denote B={a1,,a2d}. We now show that the decision tree Γ contains a complete path, the length of which is at least dlog2(2d). We describe the process of this path construction beginning with the root of Γ.

Let the root of Γ be labeled with an attribute fi0. For δ{0,1}, we denote by Bδ the set of solutions on B of the equation system {fi0(x)=δ} and choose σ{0,1} for which |Bσ|=max{|B0|,|B1|}. It is clear that |Bσ||B|2|B|2d. In the considered case, the beginning of the constructed path in Γ is the root of Γ, the edge that leaves the root and is labeled with the equation system {fi0(x)=σ}, and the node to which this edge enters.

Let as assume now that the root of Γ is labeled with a hypothesis H={f1(x)=δ1,,fn(x)=δn}. We denote by ξH the complete path in G for which the system of equations S(ξH) is a subsystem of H. Let the nonterminal nodes of the complete path ξH be labeled with the attributes fi1,,fid. For j=1,,d, we denote by Bj the set of solutions on B of the equation system {fij(x)=¬δij}. It is clear that |B1|++|Bd||B|1. Therefore, there exists l{1,,d} such that |Bl||B|1d|B|2d. In the considered case, the beginning of the constructed path in Γ is the root of Γ, the edge that leaves the root and is labeled with the equation system {fil(x)=¬δil}, and the node to which this edge enters.

We continue the construction of the complete path in Γ in the same way such that after the tth query, we have at least |B|(2d)t elements from B. The process of path construction continues at least until |B|(2d)t1, i.e., at least until log2|B|tlog2(2d). Since |B|=2d, we have h(Γ)tdlog2(2d) and hU(3)(z)dlog2(2d). □

Lemma 2.

Let U=(A,F) be a binary information system, kN{0}, and U not be an m-information system for m=0,,k. Then, there exists a (k+1)-complete tree over U.

Proof. 

We prove the considered statement by induction on k. Let k=0. In this case, U is not a 0-information system. Then, there exists an attribute fF, which is not constant on A. Using this attribute, it is easy to construct a 1-complete tree over U.

Let the considered statement hold for some k, k0. We now show that it also holds for k+1. Let U=(A,F) be a binary information system, which is not an m-information system for m=1,,k+1. Then, there exists an attribute fF such that, for any δ{0,1}, the information system Uδ=(A(f,δ),F) is not an m-information system for m=1,,k. Using the inductive hypothesis, we conclude that, for any δ{0,1}, there exists a (k+1)-complete tree Gδ over Uδ. Denote by G a directed tree with root in which the root is labeled with the attribute f, and for any δ{0,1}, there is an edge that leaves the root, is labeled with the equation system {f(x)=δ}, and enters the root of the tree Gδ. One can show that the tree G is a (k+2)-complete tree over U. □

Proof of Theorem 2.

It is clear that hU(3)(z)hU(2)(z) for any problem z over U. Therefore, hU(3)(n)hU(2)(n) for any nN.

(a) Let kN{0}. We now show by induction on k that, for each binary k-information system U (not necessarily infinite) for each problem z over U, the inequality hU(2)(z)k holds. Let U=(A,F) be a binary 0-information system and z be a problem over U. Since all attributes from F(z) are constant on A, the set ΔU(z) contains only one tuple. Therefore, the decision tree containing only one node labeled with this tuple solves the problem z relative to U, and hU(2)(z)=0.

Let kN{0} and, for each m, 0mk, the considered statement hold. Let us show that it holds for k+1. Let U=(A,F) be a binary (k+1)-information system and z=(f1,,fn) be a problem over U. For i=1,,n, choose a number δi{0,1} such that the information system (A(fi,¬δi),F) is an mi-information system where 1mik. Using the inductive hypothesis, we conclude that, for i=1,,n, there is a decision tree Γi over z, which uses only hypotheses, solves the problem z over (A(fi,¬δi),F), and has depth at most mi. We denote by Γ a decision tree in which the root is labeled with the hypothesis H={f1(x)=δ1,,fn(x)=δn}, the edge leaving the root and labeled with H enters the terminal node labeled with the tuple (δ1,,δn), and for i=1,,n, the edge leaving the root and labeled with {fi(x)=¬δi} enters the root of the tree Γi. One can show that Γ solves the problem z relative to U and h(Γ)k+1. Therefore, hU(2)(z)k+1 for any problem z over U.

Let UC. Then, U is a k-information system for some natural k, and for each problem z over U, we have hU(3)(z)hU(2)(z)k. Therefore, hU(2)(n)=O(1) and hU(3)(n)=O(1).

(b) Let U=(A,F)DC. First, we show that hU(2)(n)=O(logn). Let z=(f1,,fn) be an arbitrary problem over U. From Lemma 5.1 [16], it follows that |ΔU(z)|(4n)I(U). The proof of this lemma is based on results similar to the ones obtained by Sauer [20] and Shelah [21]. We consider a decision tree Γ over z, which solves z relative to U and uses only hypotheses. This tree is constructed by the halving algorithm [1,12]. We describe the work of this tree for an arbitrary element a from A. Set Δ=ΔU(z). If |Δ|=1, then the only n-tuple from Δ is the solution z(a) of the problem z for the element a. Let |Δ|2. For i=1,,m, we denote by δi a number from {0,1} such that |Δ(fi,δi)||Δ(fi,¬δi)|. The root of Γ is labeled with the hypothesis H={f1(x)=δ1,,fn(x)=δn}. After this query, either the problem z is solved (if the answer is H) or we halve the number of objects in the set Δ (if the answer is a counterexample {fi(x)=¬δi}). In the latter case, set Δ=ΔU(z)(fi,¬δi). The decision tree Γ continues to work with the element a and the set of n-tuples Δ in the same way. Let, during the work with the element a, the considered decision tree make q queries. After the (q1)th query, the number of remaining n-tuples in the set Δ is at least two and at most (4n)I(U)/2q1. Therefore, 2q(4n)I(U) and qI(U)log2(4n). Therefore, during the processing of the element a, the decision tree Γ makes at most I(U)log2(4n) queries. Since a is an arbitrary element from A, the depth of Γ is at most I(U)log2(4n). Since z is an arbitrary problem over U, we obtain hU(2)(n)=O(logn). Therefore, hU(3)(n)=O(logn).

Using Lemma 2 and the relation UC, we obtain that, for any dN, there exists d-complete tree Gd over U. Let F(Gd)={f1,,fnd}. We know that nd2d. Denote zd=(f1,,fnd). From Lemma 1, it follows that hU(2)(zd)d and hU(3)(zd)dlog2(2d). As a result, we have hU(2)(2d)d and hU(3)(2d)dlog2(2d). Let nN and n8. Then, there exists dN such that 2dn<2d+1. We have d>log2n1, hU(2)(n)log2n1, hU(2)(n)=Ω(logn), and hU(2)(n)=Θ(logn). It is easy to show that the function xlog2(2x) is nondecreasing for x2. Therefore, hU(3)(n)log2n1log2(2(log2n1)) and hU(3)(n)=Ω(lognloglogn).

(c) Let U=(A,F)D. We now consider an arbitrary problem z=(f1,,fn) over U and a decision tree over z, which uses only hypotheses and solves the problem z over U in the following way. For a given element aA, the first query is about the hypothesis H1={f1(x)=1,,fn(x)=1}. If the answer is H1, then the problem z is solved for the element a. If, for some i{1,,n}, the answer is {fi(x)=0}, then the second query is about the hypothesis H2 obtained from H1 by replacing the equality fi(x)=1 with the equality fi(x)=0, etc. It is clear that after at most n queries, the problem z for the element a will be solved. Thus, hU(2)(z)n and hU(3)(z)n. Since z is an arbitrary problem over U, we have hU(2)(n)n and hU(3)(n)n for any nN.

Let nN. Since UD, there exist attributes f1,,fnF such that, for any (δ1,,δn){0,1}n, the equation system {f1(x)=δ1,,fn(x)=δn} is consistent on A. We now consider the problem z=(f1,,fn) and an arbitrary decision tree Γ over z, which solves the problem z over U and uses both attributes and hypotheses. Let us show that h(Γ)n. If n=1, then the considered inequality holds since |ΔU(z)|2. Let n2. It is easy to show that an equation system over z is inconsistent if and only if it contains equations fi(x)=0 and fi(x)=1 for some i{1,,n}. For each node v of the decision tree Γ, we denote by Sv the union of systems of equations attached to edges in the path from the root of Γ to v. A node v of Γ will be called consistent if the equation system Sv is consistent.

We now construct a complete path ξ in the decision tree Γ, for which the nodes are consistent. We start from the root that is a consistent node. Let the path reach a consistent node v of Γ. If v is a terminal node, then the path ξ is constructed. Let v be a working node labeled with an attribute fiF(z). Then, there exists δ{0,1} for which the system of equations Sv{fi(x)=δ} is consistent. Then, the path ξ will pass through the edge leaving v and labeled with the system of equations {fi(x)=δ}. Let v be labeled with a hypothesis H={f1(x)=δ1,,fn(x)=δn}. If there exists i{1,,n} such that the system of equations Sv{fi(x)=¬δ} is consistent, then the path ξ will pass through the edge leaving v and labeled with the system of equations {fi(x)=¬δ}. Otherwise, Sv=H, and the path ξ will pass through the edge leaving v and labeled with the system of equations H.

Let all edges in the path ξ be labeled with systems of equations containing one equation each. Since all nodes of ξ are consistent, the equation system S(ξ) is consistent. We now show that S(ξ) contains at least n equations. Let us assume that this system contains less than n equations. Then, the set ΔU(z)π(ξ) contains more than one n-tuple, which is impossible. Therefore, the length of the path ξ is at least n. Let there be edges in ξ, which are labeled with hypotheses, and the first edge in ξ labeled with a hypothesis H leaves the node v. Then, Sv=H, and the length of ξ is at least n. Therefore, h(Γ)n, hU(3)(z)n, and hU(2)(z)n. As a result, we obtain hU(3)(n)n and hU(2)(n)n. Thus, hU(2)(n)=n and hU(3)(n)=n for any nN. □

5. Proof of Theorem 3

First, we prove several auxiliary statements.

Proposition 1.

RD.

Proof. 

Let UR. By Theorem 1, hU(1)(n)=Θ(logn). Let us assume that UD. Then, for any nN, there exists a problem z=(f1,,fn) over U such that |ΔU(z)|=2n. Let Γ be a decision tree over z, which solves the problem z relative to U and uses only attributes. Then, Γ should have at least 2n terminal nodes. One can show that the number of terminal nodes in the tree Γ is at most 2h(Γ). Then, 2n2h(Γ), h(Γ)n, and hU(z)n. Therefore, hU(1)(n)n for any nN, which is impossible. Thus, RD. □

Proposition 2.

CD.

Proof. 

Let UC. By Theorem 2, hU(2)(n)=O(1). Let us assume that UD. Then, by Theorem 2, hU(2)(n)=n for any nN, which is impossible. Therefore, CD. □

Proposition 3.

RC=.

Proof. 

Assume the contrary: RC and U=(A,F)RC. Let r,kN, U be an r-reduced information system and U be a k-information system. We now consider an arbitrary problem z=(f1,,fn) over U and describe a decision tree Γ over z, which uses only attributes, solves the problem z over U, and has depth at most kr.

For i=1,,n, let δi be a number from {0,1} such that (A(fi,¬δi),F) is an mi-information system with 0mi<k. Let t be the maximum number from the set {1,,n} such that the system of equations S={f1(x)=δ1,,ft(x)=δt} is consistent. Then, there exists a subsystem {fi1(x)=δi1,,fip(x)=δip} of the system S, which has the same set of solutions as S and for which pr. For a given aA, the decision tree Γ computes sequentially values fi1(a),,fip(a).

If, for some q{1,,p}, fi1(a)=δi1,,fiq1(a)=δiq1, and fiq(a)=¬δiq, then the decision tree Γ continues to work with the problem z and the information system U=(A,F) where A is the set of solutions on A of the equation system {fi1(x)=δi1,,fiq1(x)=δiq1,fiq(x)=¬δiq}. We have that U is an l-information system for some lmiq<k.

Let fi1(a)=δi1,,fip(a)=δip. If t=n, then (δ1,,δn) is the solution of the problem z for the considered element a. Let t<n. Then, the decision tree Γ continues to work with the problem z and the information system U=(A,F) where A is the set of solutions on A of the equation system {fi1(x)=δi1,,fip(x)=δip}. We know that the equation system {f1(x)=δ1,,ft(x)=δt,ft+1(x)=δt+1} is inconsistent. Therefore, the system {fi1(x)=δi1,,fip(x)=δip,ft+1(x)=δt+1} is inconsistent. Hence, AA(ft+1,¬δt+1) and U is an l-information system for some lmt+1<k.

As a result, after the computation of the values of at most r attributes, we either solve the problem z or reduce the consideration of the problem z over the k-information system U to the consideration of the problem z over some l-information system where l<k. After the computation of the values of at most rk attributes, we solve the problem z since each problem over the 0-information system has exactly one possible solution. Therefore, hU(1)(z)rk and hU(1)(n)=O(1). By Theorem 1, hU(1)(n)=Θ(logn). The obtained contradiction shows that RC=. □

Proposition 4.

For any infinite binary information system, its indicator vector coincides with one of the rows of Table 1.

Proof. 

Table 3 contains as rows all three-tuples from the set {0,1}3. We now show that the rows with the numbers 5–8 cannot be indicator vectors of infinite binary information systems. Assume the contrary: there is i{5,6,7,8} such that the row with the number i is the indicator vector of an infinite binary information system U. If i=5, then UR and UD, but this is impossible, since, by Proposition 1, RD. If i=6, then UC and UD, but this is impossible, since, by Proposition 2, CD. If i=7, then UR and UD, but this is impossible, since, by Proposition 1, RD. If i=8, then UR and UC, but this is impossible, since, by Proposition 3, RC=. Therefore, for any infinite binary information system, its indicator vector coincides with one of the rows of Table 3 with Numbers 1–4. Thus, it coincides with one of the rows of Table 1. □

Table 3.

All 3-tuples from the set {0,1}3.

R D C
1 0 0 0
2 0 1 0
3 0 1 1
4 1 1 0
5 1 0 0
6 0 0 1
7 1 0 1
8 1 1 1

Define an infinite binary information system U1=(A1,F1) as follows: A1=N and F1 is the set of all functions from N to {0,1}.

Lemma 3.

The information system U1 belongs to the class V1.

Proof. 

It is easy to show that the information system U1 has an infinite I-dimension. Therefore, U1D. Using Proposition 4, we obtain ind(U)=(0,0,0), i.e., U1V1. □

For any iN, we define two functions pi:N{0,1} and li:N{0,1}. Let jN. Then, pi(j)=1 if and only if j=i and li(j)=1 if and only if j>i.

Define an infinite binary information system U2=(A2,F2) as follows: A2=N and F2={pi:iN}{li:iN}.

Lemma 4.

The information system U2 belongs to the class V2.

Proof. 

For nN, denote Sn={p1(x)=0,,pn(x)=0}. One can show that the equation system Sn is consistent and each proper subsystem of Sn has a set of solutions different from the set of solutions of Sn. Therefore, U2R. Using attributes from the set {li:iN}, we can construct a d-complete tree over U2 for each dN. By Lemma 1 and Theorem 2, U2C. One can show that I(U2)=1. Therefore, U2D. Thus, ind(U2)=(0,1,0), i.e., U2V2. □

Define an infinite binary information system U3=(A3,F3) as follows: A3=N and F3={pi:iN}.

Lemma 5.

The information system U3 belongs to the class V3.

Proof. 

It is easy to show that U3 is a 1-information system. Therefore, U3C. Using Proposition 4, we obtain ind(U3)=(0,1,1), i.e., U3V3. □

Define an infinite binary information system U4=(A4,F4) as follows: A4=N and F4={li:iN}.

Lemma 6.

The information system U4 belongs to the class V4.

Proof. 

Let us consider an arbitrary consistent system of equations S over U4. We now show that there is a subsystem of S, which has at most two equations and the same set of solutions as S. Let S contain both equations of the kind li(x)=1 and lj(x)=0. Denote i0=max{i:li(x)=1S} and j0=min{j:lj(x)=0S}. One can show that the system of equations S={li0(x)=1,lj0(x)=0} has the same set of solutions as S. The case when S contains for some δ{0,1} only equations of the kind lp(x)=δ can be considered in a similar way. In this case, the equation system S contains only one equation. Therefore, the information system U4 is 2-reduced and U4R. Using Proposition 4, we obtain ind(U4)=(1,1,0), i.e., U4V4. □

Proof of Theorem 3.

From Proposition 4, it follows that, for any infinite binary information system, its indicator vector coincides with one of the rows of Table 1. Using Lemmas 3–6, we conclude that each row of Table 1 is the indicator vector of some infinite binary information system. □

6. Conclusions

Based on the results of exact learning, test theory, and rough set theory, for an arbitrary infinite binary information system, we studied three functions of the Shannon type, which characterize the dependence in the worst case of the minimum depth of a decision tree solving a problem on the number of attributes in the problem description. These three functions correspond to (i) decision trees using attributes, (ii) decision trees using hypotheses, and (iii) decision trees using both attributes and hypotheses. We described possible types of behavior for each of these three functions. We also studied the join behavior of these functions and distinguished four corresponding complexity classes of infinite binary information systems. In the future, we plan to translate the obtained results into the language of exact learning.

The problems studied in this paper allow us to confine ourselves to considering only the crisp (conventional) sets that are completely defined by attributes. However, in the future, when we investigate approximately defined problems or approximate decision trees, it will be necessary to work with rough sets given by their lower and upper approximations. This will require a wider range of rough set theory techniques than those used in the present paper.

Acknowledgments

Research reported in this publication was supported by King Abdullah University of Science and Technology (KAUST). The author is greatly indebted to the anonymous reviewers for their useful comments and suggestions.

Funding

Research funded by King Abdullah University of Science and Technology.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

Footnotes

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Angluin D. Queries and concept learning. Mach. Learn. 1988;2:319–342. doi: 10.1007/BF00116828. [DOI] [Google Scholar]
  • 2.Pawlak Z. Rough sets. Int. J. Parallel Program. 1982;11:341–356. doi: 10.1007/BF01001956. [DOI] [Google Scholar]
  • 3.Pawlak Z. Rough Sets—Theoretical Aspects of Reasoning about Data. Volume 9 Kluwer; Dordrecht, The Netherlands: 1991. (Theory and Decision Library: Series D). [Google Scholar]
  • 4.Pawlak Z., Skowron A. Rudiments of rough sets. Inf. Sci. 2007;177:3–27. doi: 10.1016/j.ins.2006.06.003. [DOI] [Google Scholar]
  • 5.Chegis I.A., Yablonskii S.V. Logical methods of control of work of electric schemes. Trudy Mat. Inst. Steklov. 1958;51:270–360. (In Russian) [Google Scholar]
  • 6.Azad M., Chikalov I., Hussain S., Moshkov M. Minimizing depth of decision trees with hypotheses. In: Ramanna S., Cornelis C., Ciucci D., editors. Rough Sets–International Joint Conference, Proceedings of the IJCRS 2021, Bratislava, Slovakia, 19–24 September 2021. Volume 12872. Springer; Cham, Switzerland: 2021. pp. 123–133. Lecture Notes in Computer Science. [Google Scholar]
  • 7.Azad M., Chikalov I., Hussain S., Moshkov M. Minimizing number of nodes in decision trees with hypotheses. In: Watrobski J., Salabun W., Toro C., Zanni-Merk C., Howlett R.J., Jain L.C., editors. Proceedings of the 25th International Conference on Knowledge—Based and Intelligent Information & Engineering Systems (KES 2021); Szczecin, Poland. 8–10 September 2021; Amsterdam, The Netherlands: Elsevier; 2021. pp. 232–240. [Google Scholar]
  • 8.Azad M., Chikalov I., Hussain S., Moshkov M. Sorting by decision trees with hypotheses (extended abstract). In: Schlingloff H., Vogel T., editors. Proceedings of the 29th International Workshop on Concurrency, Specification and Programming, CS&P 2021; Berlin, Germany. 27–28 September 2021; Aachen, Germany: CEUR-WS.org; 2021. pp. 126–130. CEUR Workshop Proceedings. [Google Scholar]
  • 9.Azad M., Chikalov I., Hussain S., Moshkov M. Optimization of decision trees with hypotheses for knowledge representation. Electronics. 2021;10:1580. doi: 10.3390/electronics10131580. [DOI] [Google Scholar]
  • 10.Azad M., Chikalov I., Hussain S., Moshkov M. Entropy-based greedy algorithm for decision trees using hypotheses. Entropy. 2021;23:808. doi: 10.3390/e23070808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Angluin D. Queries revisited. Theor. Comput. Sci. 2004;313:175–194. doi: 10.1016/j.tcs.2003.11.004. [DOI] [Google Scholar]
  • 12.Littlestone N. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Mach. Learn. 1988;2:285–318. doi: 10.1007/BF00116827. [DOI] [Google Scholar]
  • 13.Maass W., Turán G. Lower bound methods and separation results for on-line learning models. Mach. Learn. 1992;9:107–145. doi: 10.1007/BF00992674. [DOI] [Google Scholar]
  • 14.Moshkov M. Conditional tests. In: Yablonskii S.V., editor. Problemy Kibernetiki. Volume 40. Nauka Publishers; Moscow, Russia: 1983. pp. 131–170. (In Russian) [Google Scholar]
  • 15.Moshkov M. On depth of conditional tests for tables from closed classes. In: Markov A.A., editor. Combinatorial-Algebraic and Probabilistic Methods of Discrete Analysis. Gorky University Press; Gorky, Russia: 1989. pp. 78–86. (In Russian) [Google Scholar]
  • 16.Moshkov M. Time complexity of decision trees. In: Peters J.F., Skowron A., editors. Transactions on Rough Sets III. Volume 3400. Springer; Berlin/Heidelberg, Germany: 2005. pp. 244–459. [Google Scholar]
  • 17.Moshkov M. Test theory and problems of machine learning; Proceedings of the International School-Seminar on Discrete Mathematics and Mathematical Cybernetics; Ratmino, Russia. 31 May–3 June 2001; Moscow, Russia: MAX Press; 2001. pp. 6–10. [Google Scholar]
  • 18.Pawlak Z. Information systems theoretical foundations. Inf. Syst. 1981;6:205–218. doi: 10.1016/0306-4379(81)90023-5. [DOI] [Google Scholar]
  • 19.Naiman D.Q., Wynn H.P. Independence number and the complexity of families of sets. Discr. Math. 1996;154:203–216. doi: 10.1016/0012-365X(94)00318-D. [DOI] [Google Scholar]
  • 20.Sauer N. On the density of families of sets. J. Comb. Theory A. 1972;13:145–147. doi: 10.1016/0097-3165(72)90019-2. [DOI] [Google Scholar]
  • 21.Shelah S. A combinatorial problem; stability and order for models and theories in infinitary languages. Pac. J. Math. 1972;41:241–261. doi: 10.2140/pjm.1972.41.247. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Not applicable.


Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES