Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2020 Jan 7;12038:89–112. doi: 10.1007/978-3-030-40608-0_6

Deciding Classes of Regular Languages: The Covering Approach

Thomas Place 5,
Editors: Alberto Leporati8, Carlos Martín-Vide9, Dana Shapira10, Claudio Zandron11
PMCID: PMC7206641

Abstract

We investigate the membership problem that one may associate to every class of languages Inline graphic. The problem takes a regular language as input and asks whether it belongs to Inline graphic. In practice, finding an algorithm provides a deep insight on the class Inline graphic. While this problem has a long history, many famous open questions in automata theory are tied to membership. Recently, a breakthrough was made on several of these open questions. This was achieved by considering a more general decision problem than membership: covering. In the paper, we investigate how the new ideas and techniques brought about by the introduction of this problem can be applied to get new insight on earlier results. In particular, we use them to give new proofs for two of the most famous membership results: Schützenberger’s theorem and Simon’s theorem.

Keywords: Regular languages, Automata, Covering, Membership, Star-free languages, Piecewise testable languages

Introduction

Historical Context. A prominent question in formal languages theory is to solve the membership problem for classes of regular languages. Given a fixed class Inline graphic, one must find an algorithm which decides whether an input regular language belongs to Inline graphic. Such a procedure is called a Inline graphic-Inline graphic. What motivates this question is the deep insight on the class Inline graphic that is usually provided by a solution. Intuitively, being able to formulate an algorithm requires a solid understanding of all languages contained in the class Inline graphic. In other words, membership is used as a mathematical tool whose purpose is to analyze classes.

This research effort started with a famous theorem of Schützenberger [36] which describes the class of star-free languages (Inline graphic). These are the languages that can be expressed by a regular expression using union, concatenation and complement, but not Kleene star. This is a prominent class which admits natural alternate definitions. For example, the star-free languages are those which can be defined in first-order logic [15] or equivalently in linear temporal logic [11]. Schützenberger’s theorem yields an algorithm which decides whether an input regular language is star-free (i.e. an SF-membership algorithm). This provides insight on Inline graphic not because of the algorithm itself, but rather because of its proof. Indeed, it includes a generic construction which builds an expression witnessing membership in Inline graphic for every input language on which the algorithm answers positively. This result was highly influential and pioneered a very successful line of research. The theorem itself was often revisited [5, 7, 8, 10, 14, 16, 17, 21, 23, 41] and researchers successfully obtained similar results for other prominent classes of languages. Famous examples include the locally testable languages [4, 42] or the piecewise testable languages [38]. However, membership is a difficult question and despite years of investigation, there are still many open problems.

Among these open problems, a famous one is the dot-depth problem. Brzozowski and Cohen [2] defined a natural classification of the star-free languages: the dot-depth hierarchy. Each star-free language is assigned a “complexity level” (called dot-depth) according to the number of alternations between concatenations and complements that are required to define it with an expression. It is known that this hierarchy is strict [3]. Hence, a natural question is whether membership is decidable for each level. This has been a very active research topic since the 70s (see [20, 28, 32] for surveys). Yet, only the first two levels are known to be decidable so far. An algorithm for dot-depth one was published by Knast in 1983 [13]. Despite a lot of partial results along the way, it took thirty more years to solve the next level: the decidability of dot-depth two was shown in 2014 [26, 33]. This situation is easily explained: in practice, getting new membership results always required new conceptual ideas and techniques. In the paper, we are interested in the ideas that led to a solution for dot-depth two. The key ingredient was a new more general decision problem called covering.

Covering. The problem was first considered implicitly in [26] and properly defined later in [31]. Given a class Inline graphic, the Inline graphic-covering problem is as follows. The input consists in two objects: a regular language L and a finite set of regular languages Inline graphic. One must decide whether there exists a Inline graphic-cover Inline graphic of L (a finite set of languages in Inline graphic whose union includes L) such that no language in Inline graphic intersects all languages in Inline graphic. Naturally, this definition is more involved than the one of membership and it is more difficult to find an algorithm for Inline graphic-covering than for Inline graphic-membership. Yet, covering was recently shown to be decidable for many natural classes (see for example [6, 24, 25, 30, 34, 35]) including the star-free languages [29].

At the time of its introduction, there were two motivations for investigating this new question. First, while harder, covering is also more rewarding than membership: it yields a more robust understanding of the classes. Indeed, a Inline graphic-membership algorithm only yields benefits for the languages of Inline graphic: we manage to detect them and to build a description witnessing this membership. On the other hand, a Inline graphic-covering algorithm applies to arbitrary languages. One may view Inline graphic-covering as an approximation problem: on inputs L and Inline graphic, we want to over-approximate L with a Inline graphic-cover while Inline graphic specifies what an acceptable approximation is. A second key motivation was the application to the dot-depth hierarchy. It turns out that all recent membership results for this hierarchy rely heavily on covering arguments. More precisely, they are based on techniques that allow to lift covering results for a level in the hierarchy as membership results for a higher level (see [32] for a detailed explanation).

Contribution. In the paper, we are not looking to provide new covering algorithms. Instead, we look at a slightly different question. As we explained, finding an algorithm for Inline graphic-covering is even harder than for Inline graphic-membership. Consequently, the recent breakthroughs that were made on this question required developing new ideas, new techniques and new ways to formulate intricate proof arguments. In the paper, we look back at the original membership problem and investigate how these new developments can be applied to get new insight on earlier results. We prove that even if one is only interested in membership, reasoning in terms of “covers” is quite natural and rather intuitive when presenting proof arguments. In particular, Inline graphic-covers are a very powerful tool for presenting generic constructions which build descriptions of languages in the class Inline graphic. We illustrate this point by using covers to give new intuitive proofs for two of the most important membership results in the literature: Schützenberger theorem [36] for the star-free languages and Simon’s theorem [38] for the piecewise testable languages.

Organization of the Paper. We first recall standard terminology about regular languages and define membership in Sect. 2. We introduce covering in Sect. 3 and explain why reasoning in terms of covers is intuitive and relevant even if one is only interested in membership. We illustrate this point in Sect. 4 with a new proof of Schützenberger’s theorem. Finally, we present a second example in Sect. 5 with a new proof of Simon’s theorem.

Preliminaries

In this section, we briefly recall standard terminology about finite words and classes regular languages. Moreover, we introduce the membership problem.

Regular Languages. An alphabet is a finite set A. As usual, Inline graphic denotes the set of all words over A, including the empty word Inline graphic. For Inline graphic, we write Inline graphic for the length of w (i.e. the number of letters in w). Moreover, for Inline graphic, we denote by uv the word obtained by concatenating u and v.

Given an alphabet A, a language (over A) is a subset of Inline graphic. Abusing terminology, we shall often denote by u the singleton language Inline graphic. We lift concatenation to languages: for Inline graphic, we let Inline graphic. Finally, we use Kleene star: if Inline graphic, Inline graphic denotes the union of all languages Inline graphic for Inline graphic and Inline graphic. In the paper, we only consider regular languages. These are the languages that can be equivalently defined by regular expressions, monadic second-order logic, finite automata or finite monoids. We shall use the definition based on monoids which we briefly recall now (see [21] for details).

A monoid is a set M endowed with an associative multiplication Inline graphic (also denoted by st) having a neutral element Inline graphic. An idempotent of a monoid M is an element Inline graphic such that Inline graphic. It is folklore that for any finite monoid M, there exists a natural number Inline graphic (denoted by Inline graphic when M is understood) such that Inline graphic is an idempotent for every Inline graphic. Observe that Inline graphic is a monoid whose multiplication is concatenation (the neutral element is Inline graphic). Thus, we may consider monoid morphisms Inline graphic where M is an arbitrary monoid. Given such a morphism and Inline graphic, we say that L is recognized by Inline graphic when there exists a set Inline graphic such that Inline graphic. A language L is regular if and only if it is recognized by a morphism into a finite monoid.

Classes. We investigate classes of languages. Mathematically speaking, a class of languages Inline graphic is a correspondence Inline graphic which associates a (possibly infinite) set of languages Inline graphic over A to every alphabet A. For the sake of avoiding clutter, we shall often abuse terminology and omit the alphabet when manipulating classes. That is, whenever A is fixed and understood, we directly write Inline graphic to indicate that some language Inline graphic belongs to Inline graphic.

While this is the mathematical definition, in practice, the term “class” is used to indicate that Inline graphic is presented in a specific way. Typically, classes are tied to a particular syntax used to describe all the languages they contain. For example, the regular languages are tied to regular expressions and monadic second-order logic. Consequently, the classes that we consider in practice are natural and have robust properties that we present now.

A lattice is a class Inline graphic which is closed under finite union and intersection: for every alphabet A, we have Inline graphic and for every Inline graphic, we have Inline graphic. Moreover, a Boolean algebra is a lattice Inline graphic which is additionally closed under complement: for every alphabet A and Inline graphic, we have Inline graphic. Finally, we say that a class Inline graphic is quotient-closed when for every alphabet A, every Inline graphic and every Inline graphic, the following two languages belong to Inline graphic as well:

graphic file with name M81.gif

The techniques that we discuss in the paper are meant to be applied for classes that are quotient-closed lattices and contain only regular languages. The two examples that we detail are quotient-closed Boolean algebras of regular languages.

Membership. When encountering a new class Inline graphic, a natural objective is to precisely understand the languages it contains. In other words, we want to understand what properties can be expressed with the syntax defining Inline graphic. Of course, this is an informal objective. In practice, we rely on a decision problem called membership which we use as a mathematical tool to approach this question.

The problem is parameterized by an arbitrary class of languages Inline graphic: we speak of Inline graphic-membership. It takes as input a regular language L and asks whether L belongs to Inline graphic. The key idea is that obtaining an algorithm for Inline graphic-membership is not possible without a solid understanding of Inline graphic. In the literature, such an algorithm is also called a decidable characterization of Inline graphic.

Remark 1

We are not only interested in Inline graphic-membership algorithms themselves but also in their correctness proofs. In practice, the deep insight that we obtain on the class Inline graphic comes from these proofs. Typically, the difficult part in such an argument is to prove that a membership is sound: when it answers positively, prove that the input language does belong to Inline graphic. Typically, this requires a generic construction for building a syntactic description of the language witnessing its membership in Inline graphic.    Inline graphic

Finding membership algorithms has been an important quest for a long time in formal languages theory. The solutions that were obtained for important classes are milestones in the theory of regular languages [13, 22, 33, 36, 38, 40]. In the paper, we prove two of them: Schützenberger’s theorem [36] and Simon’s theorem [38]. We frame these proofs using a new formalism based on a more general problem which was recently introduced [31]: covering.

The Covering Problem

The covering problem generalizes membership. It was first considered implicitly in [26, 27] and was later formalized in [31] (along with a detailed framework designed for handling it). At the time, its introduction was motivated by two reasons. First, an algorithm for covering is usually more rewarding than an algorithm for membership as the former provides more insight on the investigated class of languages. Second, covering was introduced as a key ingredient for handling difficult membership questions. For several important classes, membership is effectively reducible to covering for another simpler class. Recently, this idea was applied to prominent hierarchies of classes called “concatenation hierarchies” (see the surveys [28, 32] for details on these results).

In the paper, we are interested in covering for a slightly different reason. In particular, we do not present any covering algorithm. Instead, we look at how the new ideas that were recently introduced with covering in mind can be applied in the simpler membership setting. It turns out that even for the early membership results, reasoning in terms of covers is quite natural and allows to present arguments in a very intuitive way. We manage to formulate new proof arguments for two famous membership algorithms.

We first define covering and explain why it generalizes membership as a decision problem. Then, we come back to membership and briefly recall the general approach that is usually followed in order to handle it. We show that this approach can actually be formulated in a convenient and natural way with covering. For the sake of avoiding clutter, we fix an arbitrary alphabet A for the presentation: all languages that we consider are over A.

Definition

Similarly to membership, covering is parameterized by an arbitrary class of languages Inline graphic: we speak of Inline graphic-covering. It is designed with the same objective in mind: it serves as a mathematical tool for investigating the class Inline graphic.

For a class Inline graphic, the Inline graphic-covering takes a language L and a finite set of languages Inline graphic as input. It asks whether there exists a Inline graphic-cover of L which is separating for Inline graphic. Let us first define these two notions.

Given a language L, a cover of L is a finite set of languages Inline graphic such that Inline graphic. Additionally, given some class Inline graphic, a Inline graphic-cover of L is a cover Inline graphic of L such that every Inline graphic belongs to Inline graphic.

Moreover, given two finite sets of languages Inline graphic and Inline graphic, we say that Inline graphic is separating for Inline graphic if for every Inline graphic, there exists Inline graphic which satisfies Inline graphic. In other words, there exists no language in Inline graphic which intersects all languages in Inline graphic. Given a class Inline graphic, the Inline graphic-covering problem is now defined as follows:

INPUT: A regular language L and a finite set of regular languages Inline graphic.

OUTPUT: Does there exist a Inline graphic-cover of L which is separating for Inline graphic?

A simple observation is that covering generalizes another well-known decision problem called separation. Given a class Inline graphic and two languages Inline graphic and Inline graphic, we say that Inline graphic is Inline graphic-separable from Inline graphic when there exists a third language Inline graphic such that Inline graphic and Inline graphic. We have the following lemma (see [31] for a proof).

Lemma 2

Let Inline graphic be a lattice and Inline graphic two languages. Then Inline graphic is Inline graphic-separable from Inline graphic, if and only if there exist a Inline graphic-cover of Inline graphic which is separating for Inline graphic.

Lemma 2 proves that Inline graphic-covering generalizes Inline graphic-membership as a decision problem. Indeed, given as input a regular language L, it is immediate that L belongs to Inline graphic if and only if L is Inline graphic-separable from Inline graphic (which is also regular). Thus, there exists an effective reduction from Inline graphic-membership to Inline graphic-covering.

Yet, this not the only connection between membership and covering. More importantly, this is not how we use covering in the paper. While each membership algorithm existing in the literature is based on unique ideas (specific to the class under investigation), most of them are formulated and proved within a standard common framework. It turns out that this framework boils down to a particular kind of covering question: this is the property that we shall exploit in the paper.

Application to Membership

We first summarize the standard general approach that is commonly used to handle membership questions and formulate solutions. Historically, this approach was initiated by Schützenberger who applied it to obtain the first known membership algorithm [36] (for the class of star-free languages). We shall detail and prove this result in Sect. 4.

The syntactic approach. Obtaining a membership algorithm for a given class Inline graphic is intuitively hard, as it requires to decide a semantic property which may not be apparent on the piece of syntax that defines the input regular language L (be it a regular expression, an automaton or a monoid morphism). To palliate this issue, the syntactic approach relies on the existence of a canonical recognizer for any given regular language. The idea is that while belonging to Inline graphic may not be apparent on an arbitrary syntax for L, it should be apparent on a canonical representation of L. Typically, the syntactic morphism of L serves as this canonical representation. As the name suggests, this object is a canonical morphism into a finite monoid which recognizes L (and can be computed from any representation of L).

Let us first define the syntactic morphism properly. Consider a language L. One may associate a canonical equivalence relation Inline graphic over Inline graphic to L. Given two words Inline graphic, we write,

graphic file with name 492458_1_En_6_Equ19_HTML.gif

Clearly, Inline graphic is an equivalence relation and one may verify that it is a congruence for word concatenation: for every Inline graphic, if Inline graphic and Inline graphic, then Inline graphic. Consequently, the quotient set Inline graphic is a monoid called the syntactic monoid of L. Moreover, the map Inline graphic which maps each word to its Inline graphic-class is a monoid morphism called the syntactic morphism of L. In particular, this morphism recognizes the language L: Inline graphic where F is the set of all Inline graphic-classes which intersect L. It is well-known and simple to verify that L is regular if and only if its syntactic monoid is finite. Moreover, in that case, one may compute the syntactic morphism of L from any representation of L (such as an automaton or an arbitrary monoid morphism recognizing L).

We are ready to present the key result behind the syntactic approach: for every quotient-closed Boolean algebra Inline graphic, membership of an arbitrary regular language in Inline graphic depends only on its syntactic morphism. This claim is formalized with the following standard result.

Proposition 3

Let Inline graphic be a quotient-closed Boolean algebra, L a regular language and Inline graphic its syntactic morphism. Then L belongs to Inline graphic if and only if every language recognized by Inline graphic belongs to Inline graphic.

Proof

The right to left implication is immediate since L is recognized by its syntactic morphism. We concentrate on the converse one. Assume that Inline graphic. We show that every language recognized by Inline graphic belongs to Inline graphic as well. By definition, these languages are exactly the unions of Inline graphic-classes. Thus, since Inline graphic is closed under union, it suffices to show that every Inline graphic-class belongs to Inline graphic. Observe that the definition of Inline graphic can be reformulated as follows. Given Inline graphic, we have,

graphic file with name 492458_1_En_6_Equ20_HTML.gif

Let Inline graphic. Since L is recognized by Inline graphic, it is clear that whether some word Inline graphic belongs to Inline graphic depends only on its image Inline graphic. In other words, Inline graphic is recognized by Inline graphic. Moreover, since L is regular, its syntactic monoid is finite which implies that Inline graphic recognizes finitely many languages. Thus, while there are infinitely many words Inline graphic, there are finitely many languages Inline graphic.

Altogether, we obtain that every Inline graphic-class is a finite Boolean combination of languages Inline graphic where Inline graphic. Since Inline graphic and Inline graphic is quotient-closed, every such language belongs to Inline graphic. Hence, since Inline graphic is a Boolean algebra, we conclude that every Inline graphic-class belongs to Inline graphic, completing the proof.   Inline graphic

Proposition 3 implies that membership of a regular language L in some fixed quotient-closed Boolean algebra is equivalent to some property of an algebraic abstraction of L: its syntactic morphism. In particular, this is independent from the accepting set Inline graphic. By itself, this is a simple result. Yet, it captures the gist of the syntactic approach.

Naturally, the proposition tells nothing about the actual the property on the syntactic morphism that one should look for. This question is specific to each particular class Inline graphic: one has to find the right decidable property characterizing Inline graphic.

Remark 4

This may seem counterintuitive. We replaced the question of deciding whether a single language belongs to the class Inline graphic by an intuitively harder one: deciding whether all languages recognized by a given monoid morphism belong to Inline graphic. The idea is that the set of languages recognized by a morphism has a structure which can be exploited in membership arguments.    Inline graphic

Remark 5

Proposition 3 is restricted quotient-closed Boolean algebras. This excludes quotient-closed lattices that are not closed under complement. One may generalize the syntactic approach to such classes (as done by Pin [19]). We do not discuss this as our two examples are quotient-closed Boolean algebras.    Inline graphic

Back to Covering. We proved that for every quotient-closed Boolean algebra Inline graphic, the associated membership problem boils down to deciding whether all languages recognized by an input morphism belong to Inline graphic. It turns out that this new question is a particular instance of Inline graphic-covering. In order to explain this properly, we require a last definition.

Consider a morphism Inline graphic into a finite monoid M and a finite set of languages Inline graphic. We say that Inline graphic is confined by Inline graphic if it is separating for the set Inline graphic. The following fact can be verified from the definitions and reformulates this property in a way that is easier to manipulate.

Fact 6

Let Inline graphic be a morphism into a finite monoid and Inline graphic a finite set of languages. Then Inline graphic is confined by Inline graphic if and only if for every Inline graphic, there exists Inline graphic such that Inline graphic.

Proof

By definition Inline graphic is confined by Inline graphic if and only if for every Inline graphic, there exists Inline graphic such that Inline graphic. Since Inline graphic, the fact follows.    Inline graphic

We show that given a lattice Inline graphic and a morphism Inline graphic into a finite monoid, all languages recognized by Inline graphic belong to Inline graphic if and only if there exists a Inline graphic-cover of Inline graphic which is confined by Inline graphic. The latter question is a particular case of Inline graphic-covering. In fact, we prove a slightly more general result that we shall need later when dealing with our two examples.

Proposition 7

Let Inline graphic be a lattice, Inline graphic a morphism into a finite monoid and Inline graphic a language. The two following properties are equivalent:

  1. For every language L recognized by Inline graphic, we have Inline graphic.

  2. There exists a Inline graphic-cover of H which is confined by Inline graphic.

Proof

Assume first that Inline graphic for every language L recognized by Inline graphic. We define Inline graphic. Clearly, Inline graphic is a cover of H and it is a Inline graphic-cover by hypothesis. Moreover, it is clear from Fact 6 that Inline graphic is confined by Inline graphic.

For the converse direction, assume that there exists a Inline graphic-cover Inline graphic of H which is confined by Inline graphic. Let L be a language recognized by Inline graphic, we show that,

graphic file with name M254.gif

This implies that Inline graphic since Inline graphic, every language in Inline graphic belongs to Inline graphic and Inline graphic is a lattice. The left to right inclusion is immediate since Inline graphic is a cover of H. We prove the converse one. Let Inline graphic such that Inline graphic, we show that Inline graphic. Let Inline graphic. Consider Inline graphic (which is nonempty by definition of K). Since Inline graphic and Inline graphic is confined by Inline graphic, we have Inline graphic by Fact 6. Thus, since Inline graphic and L is recognized by Inline graphic, it follows that Inline graphic, concluding the proof: we obtain Inline graphic.    Inline graphic

Let us combine Propositions 3 and 7. When put together, they imply that for every quotient-closed Boolean algebra Inline graphic, a regular language L belongs to Inline graphic if and only if there exists a Inline graphic-cover of Inline graphic which is confined by the syntactic morphism of L.

The key point is that this formulation is very convenient when writing proof arguments. As we explained in Remark 1, the technical core of membership proofs consists in generic constructions which build descriptions of languages in Inline graphic. It turns out that building a Inline graphic-cover which is confined by some input morphism is an objective that is much easier to manipulate than directly proving that all languages recognized by the morphism belong to Inline graphic. We illustrate this point in the next section with new proofs for two well-known membership algorithms: the star-free languages and the piecewise testable languages.

Star-Free Languages and Schützenberger’s Theorem

We now illustrate the discussion of the previous section with a first example: Schützenberger’s theorem [36]. This result is important as it started the quest for membership algorithms. It provides such an algorithm for a very famous class: the star-free languages (Inline graphic). Informally, these are the languages which can be defined by a regular expression in which the Kleene star is disallowed (hence the name “star-free”) but a new operator for the complement operation is allowed instead. This class is important as it admits several natural alternate definitions. For example, the star-free languages are those which can be defined in first-order logic [15] or equivalently in linear temporal logic [11].

Schützenberger’s theorem states an algebraic characterization of Inline graphic: a regular language is star-free if and only if its syntactic monoid is aperiodic. This yields an algorithm for SF-membership as aperiodicity is a decidable property of finite monoids. Historically, Schützenberger’s theorem was the first result of its kind. It motivated the systematic investigation of the membership problem for important classes of languages. It is often viewed as one of the most important results of automata theory. This claim is supported by the number of times this theorem has been revisited over the years and the wealth of existing proofs [5, 7, 8, 10, 14, 16, 17, 21, 23, 41].

In this section, we present our own proof, based on SF-covers. Let us point out that while the formulation is new, the original ideas behind the argument can be traced back to the proof of Wilke [41]. We first recall the definition of the star-free languages. Then, we state the theorem properly and present the proof.

Definition

Let us define the class of star-free languages (Inline graphic). For every alphabet A, Inline graphic is the least set containing Inline graphic and all singletons Inline graphic for Inline graphic, which is closed under union, complement and concatenation. That is, for every Inline graphic, the languages Inline graphic, Inline graphic and KL belong to Inline graphic as well.

Example 8

For every sub-alphabet Inline graphic, we have Inline graphic. Indeed, by closure under complement, Inline graphic. We then get Inline graphic by closure under concatenation. Finally, this yields,

graphic file with name M297.gif

Another standard example is Inline graphic (where ab are two distinct letters of A). Indeed, Inline graphic is the complement of Inline graphic (provided that Inline graphic) which is clearly star-free.    Inline graphic

By definition, Inline graphic is a Boolean algebra and one may verify that it is quotient-closed (the details are left to the reader). We complete the definition with a standard property that we require to prove the “easy” direction of Schützenberger’s theorem (every star-free language has an aperiodic syntactic monoid). Another typical application of this property is to show that examples of languages are not star-free. For example, Inline graphic (words with even length) is not star-free since since it does not satisfy the following lemma.

Lemma 9

Let A be an alphabet and Inline graphic. There exists a number Inline graphic such that for every Inline graphic and Inline graphic, we have Inline graphic.

Proof

We proceed by structural induction on the definition of L as a star-free language. When Inline graphic, it is clear that the lemma holds for Inline graphic. When Inline graphic for Inline graphic, one may verify that the lemma holds for Inline graphic. We turn to the inductive cases. Assume first that Inline graphic where Inline graphic are simpler languages. Induction yields Inline graphic such that for Inline graphic, if Inline graphic and Inline graphic, we have Inline graphic. Hence, the lemma holds for Inline graphic in that case. We turn to complement: Inline graphic where Inline graphic is a simpler language. By induction, we get Inline graphic such that for every Inline graphic and Inline graphic, we have Inline graphic. Clearly, the lemma holds for Inline graphic.

We now consider concatenation: Inline graphic where Inline graphic are simpler languages. Induction yields Inline graphic such that for Inline graphic, if Inline graphic and Inline graphic, we have Inline graphic. Let m be the maximum between Inline graphic and Inline graphic. We prove that the lemma holds for Inline graphic. Let Inline graphic and Inline graphic, we have to show that Inline graphic, i.e. Inline graphic for every Inline graphic. We concentrate on the right to left implication (the converse one is symmetrical). Assume that Inline graphic. Since Inline graphic, we get Inline graphic and Inline graphic such that Inline graphic. Since Inline graphic, it follows that either Inline graphic is a prefix of Inline graphic or Inline graphic is a suffix of Inline graphic. By symmetry, we assume that the former property holds: we have Inline graphic for some Inline graphic. Observe that since Inline graphic, it follows that Inline graphic. Moreover, we have Inline graphic by definition of m. Since Inline graphic, we know therefore that Inline graphic by definition of Inline graphic. Thus, Inline graphic. Since Inline graphic, this yields Inline graphic, concluding the proof.    Inline graphic

Schützenberger’s Theorem

We may now present and prove Schützenberger’s theorem. Let us first define aperiodic monoids. There are several equivalent definitions in the literature. We use an equational one based on the idempotent power Inline graphic available in finite monoids. A finite monoid M is aperiodic when it satisfies the following property:

graphic file with name M368.gif 1

We are ready to state Schützenberger’s theorem.

Theorem 10

(Schützenberger [36]). A regular language is star-free if and only if its syntactic monoid is aperiodic.

Theorem 10 illustrates of the syntactic approach presented in Sect. 3. It validates Proposition 3: the star-free languages are characterized by a property of their syntactic morphism. In fact, for this particular class, one does not even need the full morphism, the syntactic monoid suffices.

The main application is a membership algorithm for the class of star-free languages. Given as input a regular language L, one may compute its syntactic monoid and check whether it satisfies Eq. (1): this boils down to testing all elements in the monoid. By Theorem 10, this decides whether L is star-free. However, as we explained in Remark 1 when we first introduced membership, this theorem is also important for the arguments that are required to prove it. Indeed, providing these arguments requires a deep insight on Inline graphic. The right to left implication is of particular interest: “given a regular language whose syntactic monoid is aperiodic, prove that it is star-free”. This involves devising a generic way to construct a star-free description for every regular language recognized by a monoid satisfying a syntactic property. This is the implication that we handle with covers. On the other hand, the converse implication is simple and standard (essentially, we already proved it with Lemma 9).

Proof

We fix an alphabet A and a regular language Inline graphic for the proof. Let Inline graphic be the syntactic morphism of L. We prove that Inline graphic if and only if M is aperiodic. Let us first handle the left to right implication.

From star-free languages to aperiodicity. Assume that Inline graphic. We prove that M is aperiodic, i.e. that (1) is satisfied. Let Inline graphic, we have to show that Inline graphic.

Since Inline graphic is a syntactic morphism, it is surjective and there exists Inline graphic such that Inline graphic. Moreover, since Inline graphic, Lemma 9 yields Inline graphic such that Inline graphic. By definition of the syntactic morphism, this implies that Inline graphic. Since Inline graphic, this yields Inline graphic as desired.

From aperiodicity to star-free languages. Assume that M is aperiodic. We show that L is star-free. We rely on the notions introduced in the Sect. 3 and directly prove that every language recognized by Inline graphic is star-free.

Remark 11

Intuitively, this property is stronger than L being star-free. Yet, since Inline graphic is a quotient-closed Boolean algebra, it is equivalent by Proposition 3.    Inline graphic

The argument is based on Proposition 7: we use induction to construct an SF-cover Inline graphic of Inline graphic which is confined by Inline graphic. By the proposition, this implies that every language recognized by Inline graphic belongs to Inline graphic. We start with a preliminary definition that we require to formulate the induction.

Let B be an arbitrary alphabet, Inline graphic a morphism and Inline graphic. We say that a finite set of languages Inline graphic (over B) is Inline graphic-safe if for every Inline graphic and every Inline graphic, we have Inline graphic.

Lemma 12

Let B be an alphabet. Consider a morphism Inline graphic, Inline graphic and Inline graphic. There exists an SF-cover of Inline graphic which is Inline graphic-safe.

We first use Lemma 12 to conclude the main argument. We apply the lemma for Inline graphic, Inline graphic and Inline graphic. This yields an SF-cover Inline graphic of Inline graphic which is Inline graphic-safe. By definition, it follows that for every Inline graphic, we have Inline graphic for all Inline graphic. By Fact 6, this implies that Inline graphic is confined by Inline graphic, completing the main argument.

It remains to prove Lemma 12. Let B be an alphabet, Inline graphic a morphism, Inline graphic and Inline graphic. We build an SF-cover Inline graphic of Inline graphic which is Inline graphic-safe using induction on the three following parameters listed by order of importance:

  1. The size of Inline graphic.

  2. The size of C.

  3. The size of Inline graphic.

Remark 13

The aperiodic monoid M remains fixed throughout the whole proof. On the other hand, the alphabets B and C, the morphism Inline graphic and Inline graphic may change when applying induction.   Inline graphic

We distinguish two cases depending on the following property of Inline graphic, C and s. We say that s is Inline graphic-stable when the following holds:

graphic file with name M429.gif 2

We first consider the case when s is Inline graphic-stable. This is the base case which we handle using the hypothesis that M is aperiodic.

Base case: s is Inline graphic-stable. In that case, we define Inline graphic which is clearly an SF-cover of Inline graphic (we have Inline graphic as seen in Example 8). It remains to show that Inline graphic is Inline graphic-safe. For Inline graphic, we have to show that Inline graphic. We actually prove that Inline graphic for every Inline graphic which implies the desired result. Since s is Inline graphic-stable, we have the following fact.

Fact 14

For every Inline graphic, there exists Inline graphic such that Inline graphic.

Proof

We use induction on the length of Inline graphic. If Inline graphic, the fact holds for Inline graphic. Assume now that Inline graphic. We have Inline graphic for Inline graphic and Inline graphic. Induction yields Inline graphic such that Inline graphic. Moreover, since s is Inline graphic-stable, (2) yields Inline graphic such that Inline graphic. Altogether, we obtain that Inline graphic which concludes the proof.    Inline graphic

Consider the word Inline graphic (with Inline graphic as the idempotent power of M). We apply Fact 14 for Inline graphic. This yields Inline graphic such that Inline graphic. Since M is aperiodic, we have Inline graphic by Eq. (1). This yields Inline graphic, concluding the base case.

Inductive case: s is not Inline graphic-stable. By hypothesis, there exists a letter Inline graphic such that the following strict inclusion holds Inline graphic. We fix Inline graphic for the remainder of the argument.

Let D be the sub-alphabet Inline graphic. By definition, Inline graphic. Hence, induction on our second parameter in Lemma 12 (i.e., the size of C) yields an SF-cover Inline graphic of Inline graphic which is Inline graphic-safe. Note that it is clear that our first induction parameter (the size of Inline graphic) has not increased since Inline graphic.

We distinguish two independent sub-cases. Clearly, we have Inline graphic. The argument differs depending on whether this inclusion is strict or not.

Sub-case 1: Inline graphic. Consider a language Inline graphic. Since Inline graphic is a cover of Inline graphic which is Inline graphic-safe by definition, there exists some element Inline graphic such that Inline graphic for every Inline graphic. The construction of the desired SF-cover Inline graphic of Inline graphic is based on the following fact which we prove using induction on our third parameter (the size of Inline graphic).

Fact 15

For every language Inline graphic, there exists an SF-cover Inline graphic of Inline graphic which is Inline graphic-safe.

Proof

Since Inline graphic, it is immediate that Inline graphic. Hence, Inline graphic. Moreover, Inline graphic by hypothesis in Sub-case 1. Thus, Inline graphic. Finally, recall that the letter c satisfies Inline graphic by definition. Consequently, we have the strict inclusion Inline graphic. Hence, we may apply induction on our third parameter in Lemma 12 (i.e. the size of Inline graphic) to obtain the desiredn SF-cover Inline graphic of Inline graphic which is Inline graphic-safe. Note that here, our first two parameters have not increased (they only depend on Inline graphic and C which remain unchanged).   Inline graphic

We may now use Fact 15 to build the desired cover Inline graphic of Inline graphic. We define Inline graphic. Clearly, Inline graphic is an SF-cover of Inline graphic by hypothesis on Inline graphic and Inline graphic since Inline graphic and Inline graphic is closed under concatenation. We need to show that Inline graphic is Inline graphic-safe. Let Inline graphic and Inline graphic, we need to show that Inline graphic. By definition of Inline graphic, there are two cases. When Inline graphic, the result is immediate since Inline graphic is Inline graphic-safe by definition. Otherwise, Inline graphic for Inline graphic and Inline graphic. Thus, we get Inline graphic and Inline graphic such that Inline graphic and Inline graphic. By definition, Inline graphic. Moreover, since Inline graphic is Inline graphic-safe by definition in Fact 15, we have Inline graphic. Altogether, this yields Inline graphic, i.e. Inline graphic as desired.

Sub-case 2: Inline graphic. Let us first explain informally how the cover Inline graphic of Inline graphic is built in this case. Let Inline graphic. Since Inline graphic, w admits a unique decomposition Inline graphic such that Inline graphic and Inline graphic (i.e., v is the largest suffix of w in Inline graphic and u is the corresponding prefix). Using induction, we construct SF-covers of the possible prefixes and suffixes. Then, we combine them to construct a cover of the whole set Inline graphic. Actually, we already covered the suffixes: we have an SF-cover Inline graphic of Inline graphic which is Inline graphic-safe. It remains to cover the prefixes. We do so this in the following lemma which we prove using induction on our first parameter (the size of Inline graphic).

Lemma 16

There exists an SF-cover Inline graphic of Inline graphic which is Inline graphic-safe.

Proof

Let Inline graphic. Using E as a new alphabet, we apply induction on the first parameter in Lemma 12 (i.e., the size of Inline graphic) to build an auxiliary SF-cover of Inline graphic which we then use to construct Inline graphic.

Since Inline graphic, there exists a natural morphism Inline graphic defined by Inline graphic for every Inline graphic. Clearly, Inline graphic. Since Inline graphic by hypothesis of Sub-case 2, this implies Inline graphic and induction on the first parameter in Lemma 12 yields an SF-cover Inline graphic of Inline graphic which is Inline graphic-safe. We use Inline graphic to construct Inline graphic. First, we define a map Inline graphic.

We let Inline graphic. Otherwise, let Inline graphic be a nonempty word. Since Inline graphic, w admits a unique decomposition Inline graphic with Inline graphic. Hence, we may define Inline graphic with Inline graphic for every Inline graphic (recall that Inline graphic by definition). We are ready to define Inline graphic. We let,

graphic file with name M581.gif

It remains to show that Inline graphic is an SF-cover of Inline graphic which is Inline graphic-safe. It is immediate that Inline graphic is a cover of Inline graphic since Inline graphic was a cover of Inline graphic.

Let us prove that Inline graphic is Inline graphic-safe. Let Inline graphic and Inline graphic. We prove that Inline graphic. By definition, there exists Inline graphic such that Inline graphic. Thus, Inline graphic which implies that Inline graphic since Inline graphic is Inline graphic-safe by definition. One may now verify from the definitions that Inline graphic and Inline graphic. Thus, we obtain Inline graphic as desired.

It remains to show that every Inline graphic is star-free. By definition of Inline graphic, it suffices to show that for every Inline graphic, we have Inline graphic. We proceed by induction on the definition of W as a star-free language. When Inline graphic, it is clear that Inline graphic. Assume now that Inline graphic for some Inline graphic. By definition, Inline graphic. This may be reformulated as follows: Inline graphic with Inline graphic. Clearly, U is the intersection of Inline graphic with a language recognized by Inline graphic. Recall that we have an SF-cover Inline graphic of Inline graphic which is Inline graphic-safe (and therefore confined by Inline graphic). Hence, Proposition 7 implies that Inline graphic. It follows that Inline graphic as desired. We turn to the inductive cases.

First, assume that there are simpler languages Inline graphic such that either Inline graphic or Inline graphic. By induction, Inline graphic for Inline graphic. Moreover, the definition of Inline graphic implies that Inline graphic and Inline graphic. Hence, we obtain Inline graphic. Finally, assume that Inline graphic for a simpler language Inline graphic. By induction, Inline graphic. Moreover, Inline graphic. Clearly, Inline graphic. Thus, we get Inline graphic as desired.    Inline graphic

We are ready to construct the desired SF-cover Inline graphic of Inline graphic. Let Inline graphic be the Inline graphic-safe SF-cover of Inline graphic given by Lemma 16 and consider our Inline graphic-safe SF-cover Inline graphic of Inline graphic. We define Inline graphic. It is immediate by definition that Inline graphic is an SF-cover of Inline graphic since Inline graphic and Inline graphic is closed under concatenation. It remains to verify that Inline graphic is Inline graphic-safe (it is in fact Inline graphic-safe). Let Inline graphic and Inline graphic, we show that Inline graphic (which implies Inline graphic). By definition, Inline graphic with Inline graphic and Inline graphic. Therefore, Inline graphic and Inline graphic with Inline graphic and Inline graphic. Since U and V are both Inline graphic-safe by definition, we have Inline graphic and Inline graphic. It follows that Inline graphic. This concludes the proof of Lemma 12.   Inline graphic

Piecewise Testable Languages and Simon’s Theorem

We turn to our second example: Simon’s theorem [38]. This results states an algebraic characterization of another prominent class of regular languages: the piecewise testable languages (Inline graphic). It is quite important in the literature as it was among the first results of this kind after Schützenberger’s theorem (which we proved in Sect. 4). Over the years, many different proofs have been found (examples include [1, 9, 12, 18, 38, 39]). We present a new proof, based on Inline graphic-covers and entirely independent from previously known arguments. It relies on a concatenation principle for the piecewise testable languages that can only be formulated with Inline graphic-covers.

We first recall the definition of piecewise testable languages. Then, we state the theorem properly and present the proof.

Definition

Let us define the class of piecewise testable languages (Inline graphic). Given an alphabet A and Inline graphic, we say that u is a piece of v and write Inline graphic when u can be obtained from v by removing letters and gluing the remaining ones together. More precisely, Inline graphic when there exist Inline graphic and Inline graphic such that,

graphic file with name M679.gif

For instance, acb is a piece of Inline graphic. Note that by definition, the empty word “Inline graphic” is a piece of every word (this is the case Inline graphic). Furthermore, it is clear that the relation Inline graphic is a preorder on Inline graphic.

For every word Inline graphic, we write Inline graphic for the language consisting of all words v such that u is a piece of v. If Inline graphic, we have by definition:

graphic file with name M688.gif

We may now define Inline graphic. A language Inline graphic is piecewise testable (i.e. Inline graphic) when L is a (finite) Boolean combination of languages Inline graphic for Inline graphic.

Example 17

We let Inline graphic as the alphabet. Then Inline graphic. Indeed, Inline graphic. Moreover, observe that every finite language is piecewise testable. Since Inline graphic is closed under union, it suffices to show that every singleton is piecewise testable. Consider a word Inline graphic. By definition, w is the only word belonging to Inline graphic but not to Inline graphic, where Inline graphic denotes any sequence of Inline graphic letters. Hence, Inline graphic is piecewise testable.   Inline graphic

Clearly Inline graphic is a Boolean algebra and one may verify that it is quotient-closed (the details are left to the reader). We complete the definition with two properties of Inline graphic. The first one is standard and we shall need it to prove that “easy” direction of Simon’s theorem (every piecewise testable language satisfies the characterization).

Lemma 18

Let A be an alphabet and Inline graphic. There exists Inline graphic such that for every Inline graphic and Inline graphic, we have Inline graphic.

Proof

Since Inline graphic, there exists Inline graphic such that L is a Boolean combinations of language Inline graphic with Inline graphic such that Inline graphic (i.e. w has length at most k). We prove that the lemma holds for this number k. Let Inline graphic and Inline graphic. We show that Inline graphic. By symmetry, we concentrate on Inline graphic: given Inline graphic, we show that Inline graphic. Since Inline graphic, one may verify that for every Inline graphic such that Inline graphic, we have Inline graphic. In other words, Inline graphic. Since L is a Boolean combination of such languages, this implies the equivalence Inline graphic as desired.    Inline graphic

The second result is specific to our covering-based approach for proving Simon’s theorem. It turns out that elegant proof arguments for membership algorithms often apply to classes that are closed under concatenation (or some weak variant thereof). As seen in the previous section, the star-free languages are an example. Unfortunately, Inline graphic is not closed under concatenation. For example, consider the alphabet Inline graphic. We have Inline graphic and Inline graphic as seen in Example 17. Yet, one may verify with Lemma 18 that Inline graphic.

We solve this issue with a “weak concatenation principle” for piecewise testable languages. This result can only be formulated using Inline graphic-covers. While its proof is rather technical, an interesting observation is that it characterizes the piecewise testable languages. In the proof of Simon’s theorem, we only use this concatenation principle and the hypothesis that Inline graphic is a Boolean algebra (we never come back to the original definition of Inline graphic).

Proposition 19

Let Inline graphic and Inline graphic. Moreover, let Inline graphic and Inline graphic be Inline graphic-covers of Inline graphic and Inline graphic respectively. There exists a Inline graphic-cover Inline graphic of Inline graphic such that for every Inline graphic we have Inline graphic and Inline graphic satisfying Inline graphic.

Proof

We start with standard definitions that we need to describe Inline graphic. For every Inline graphic, we associate a preorder Inline graphic over Inline graphic. For Inline graphic, we write Inline graphic to indicate that for every Inline graphic such that Inline graphic, we have Inline graphic. Clearly, Inline graphic is a preorder which is coarser than Inline graphic: for every Inline graphic such that Inline graphic, we have Inline graphic. Moreover, we write Inline graphic for the equivalence generated by this preorder: Inline graphic if and only if Inline graphic for every Inline graphic such that Inline graphic. Clearly, Inline graphic has finite index.

Since Inline graphic and Inline graphic are Inline graphic-covers, there exists some number Inline graphic every language Inline graphic is a finite Boolean combination of languages Inline graphic for Inline graphic such that Inline graphic. In other words, every such language K is a union of Inline graphic-classes. Moreover, we may choose k so that Inline graphic and Inline graphic. We shall define the cover Inline graphic as a set of Inline graphic-classes for an appropriate number h that we choose using the following technical lemma.

Lemma 20

Let Inline graphic, Inline graphic and Inline graphic such that Inline graphic. There exist Inline graphic such that Inline graphic, Inline graphic and Inline graphic.

Proof

We claim that there exist Inline graphic with length at most Inline graphic such that Inline graphic and Inline graphic. We first use this claim to prove the lemma. Clearly, Inline graphic and Inline graphic. Therefore, since Inline graphic, it follows that Inline graphic. This yields a decomposition Inline graphic such that Inline graphic and Inline graphic. Since Inline graphic and Inline graphic, this implies Inline graphic and Inline graphic as desired.

It remains to prove the claim. We only construct a piece Inline graphic such that Inline graphic and Inline graphic, as the construction of z is analogous. Let F be the set of all pieces of Inline graphic of size at most k, that is,

graphic file with name M812.gif

Clearly, Inline graphic. For Inline graphic, let Inline graphic be the set of words of F that are pieces of x. Let Inline graphic be some decomposition of Inline graphic. Note that Inline graphic. We say that the occurrence of a given by the decomposition Inline graphic is bad if Inline graphic. Let y be the word obtained from Inline graphic by deleting all bad letters (and keeping the other ones). By construction, Inline graphic and Inline graphic. The latter property implies that Inline graphic for every Inline graphic. By definition of F, this means that Inline graphic. Furthermore, letters of y are not bad, and one may verify that there are at most Inline graphic such letters. Therefore, Inline graphic, which concludes the proof.    Inline graphic

We define Inline graphic. It is immediate that every Inline graphic-class is a language of Inline graphic (it is a Boolean combination of languages Inline graphic for Inline graphic such that Inline graphic). Hence, the set Inline graphic containing all Inline graphic-classes which intersect Inline graphic is a Inline graphic-cover of Inline graphic. It remains to show that for every Inline graphic, there exist Inline graphic and Inline graphic such that Inline graphic. We fix the language Inline graphic for the proof. We need the following result.

Lemma 21

Let Inline graphic be a finite language. There exist Inline graphic and Inline graphic such that Inline graphic.

Proof

Let Inline graphic be the words in H, i.e., Inline graphic. Our goal is to find Inline graphic and Inline graphic such that Inline graphic for all Inline graphic. Therefore, we first have to find a suitable decomposition of each word Inline graphic as Inline graphic, and then to show that all Inline graphic’s belong to some Inline graphic and all Inline graphic’s belong to some Inline graphic.

By definition, K is a Inline graphic-class and it intersects Inline graphic. This yields a word Inline graphic such that Inline graphic. Since Inline graphic, there exist Inline graphic and Inline graphic such that Inline graphic. Let Inline graphic. We may write the relations Inline graphic as follows:

graphic file with name M872.gif

Since Inline graphic by definition, may apply Lemma 20 Inline graphic times to get Inline graphic and Inline graphic such that,

  • for every Inline graphic and Inline graphic, we have Inline graphic, and,

  • Inline graphic, and,

  • Inline graphic.

Since Inline graphic, the first property and the pigeonhole principle yield Inline graphic such that Inline graphic and Inline graphic. For every Inline graphic, we let Inline graphic and Inline graphic. Therefore, for all Inline graphic, we have Inline graphic.

The second and third properties now yield Inline graphic and Inline graphic, whence:

graphic file with name M893.gif

Recall that Inline graphic by definition of k. Since Inline graphic and Inline graphic, it follows that Inline graphic. Since Inline graphic is a cover of Inline graphic, this yields Inline graphic such that Inline graphic. Since Inline graphic is a union of Inline graphic-classes by choice of k and since Inline graphic, we deduce that Inline graphic. Symmetrically, we obtain Inline graphic such that Inline graphic. Finally, since Inline graphic for every Inline graphic, this yields Inline graphic, as desired.    Inline graphic

We may now finish the proof. For every Inline graphic, we let Inline graphic be the (finite) language containing all words of length at most n in K. Clearly, Inline graphic and Inline graphic for every Inline graphic. Moreover, Lemma 21 implies that for every Inline graphic, we have Inline graphic and Inline graphic such that Inline graphic. Since Inline graphic and Inline graphic are finite sets, there exist Inline graphic and Inline graphic such that Inline graphic and Inline graphic for infinitely many n. Since Inline graphic for every Inline graphic, it then follows that Inline graphic for every Inline graphic. Finally, since Inline graphic, this implies Inline graphic which concludes the proof.    Inline graphic

Simon’s Theorem

We may now present and prove Simon’s theorem. It characterizes the star-free languages as those whose syntactic monoid is Inline graphic-trivial. The original definition of this notion is based on the Green relation Inline graphic defined on every finite monoid. Here, we do not consider this relation. Instead, we use an equational definition. A finite monoid M is Inline graphic-trivial when it satisfies the following property:

graphic file with name M937.gif 3

Theorem 22

(Simon [38]). A regular language is piecewise testable if and only if its syntactic monoid is Inline graphic-trivial.

As expected, the main application of Simon’s theorem is the decidability of Inline graphic-membership. Given a regular language L as input, one may compute its syntactic monoid and check whether it satisfies Eq. (3) by testing all possible combinations. By Theorem 22, this decides whether L is piecewise testable. Yet, as for the star-free languages in Sect. 4, this theorem is also important for the arguments that are required to prove it. We present such a proof now.

Proof

We fix an alphabet A and a regular language Inline graphic for the proof. Let Inline graphic be the syntactic morphism of L. We prove that Inline graphic if and only if M is Inline graphic-trivial. We start with the left to right implication which is essentially immediate from Lemma 18. As expected, the difficult and most interesting part of the proof is the converse implication.

From piecewise testable languages to Inline graphic-triviality. Assume that we have Inline graphic. We prove that M is Inline graphic-trivial: (3) holds. Let Inline graphic, we have to show that Inline graphic.

Since Inline graphic is a syntactic morphism, it is surjective and there exists Inline graphic such that Inline graphic and Inline graphic. Moreover, since Inline graphic, Lemma 18 yields Inline graphic such that Inline graphic. By definition of the syntactic morphism, this implies that Inline graphic. Since Inline graphic and Inline graphic, this yields Inline graphic as desired.

From Inline graphic-triviality to piecewise testable languages. Assume that M is Inline graphic-trivial. We show that L is piecewise testable. We rely on the notions introduced in the Sect. 3 and directly prove that every language recognized by Inline graphic is piecewise testable. The argument is based on Proposition 7: we use induction to construct a Inline graphic-cover Inline graphic of Inline graphic which is confined by Inline graphic. By the proposition, this implies that every language recognized by Inline graphic belongs to Inline graphic. We start with a preliminary definition that we require to formulate the induction.

Given a finite set of languages Inline graphic, and Inline graphic, we say that Inline graphic is (st)-safe if for every Inline graphic and Inline graphic, we have Inline graphic. The argument is based on the following lemma.

Lemma 23

Let Inline graphic and Inline graphic. There exists a Inline graphic-cover of Inline graphic which is (st)-safe.

We first use Lemma 23 to complete the main argument. We apply the lemma for Inline graphic and Inline graphic. Since Inline graphic, this yields a Inline graphic-cover Inline graphic of Inline graphic which is Inline graphic-safe. Thus, for every Inline graphic and Inline graphic, we have Inline graphic. By Fact 6, this implies that Inline graphic is confined by Inline graphic, concluding the proof.

It remains to prove Lemma 23. Let Inline graphic and Inline graphic. We construct a Inline graphic-cover Inline graphic of Inline graphic which is (st)-safe. We write Inline graphic for the following set:

graphic file with name M997.gif

We proceed by induction on the two following parameters, listed by order of importance:

  1. The size of P[swt].

  2. The length of w.

We consider two cases depending on whether w is empty or not. We first assume that this property holds.

First case: Inline graphic. We handle this case using induction on our first parameter. Let Inline graphic be the language of all words Inline graphic such that Inline graphic. We use induction to build a Inline graphic-cover of H (note that it may happen that H is empty in which case we do not need induction).

Fact 24

There exists a Inline graphic-cover Inline graphic of H which is (st)-safe.

Proof

One may verify with a pumping argument that there exists a finite set Inline graphic such that Inline graphic (this is also an immediate consequence of Higman’s lemma). Hence, it suffices to prove that for every Inline graphic, there exists a Inline graphic-cover Inline graphic of Inline graphic which is (st)-safe. Indeed, one may then choose Inline graphic to be the union of all covers Inline graphic for Inline graphic. We fix Inline graphic for the proof.

Since Inline graphic, we have Inline graphic. Since Inline graphic is surjective (it is a syntactic morphism), it follows that Inline graphic. Therefore, we have Inline graphic and Inline graphic. Since Inline graphic by definition of H, we get Inline graphic. Hence, induction on the first parameter in Lemma 23 (the size of P[swt]) yields a Inline graphic-cover Inline graphic of Inline graphic which is (st)-safe, as desired.    Inline graphic

We let Inline graphic be the Inline graphic-cover Inline graphic of H given by Fact 24. We define,

graphic file with name M1030.gif

Finally, we let Inline graphic. It is immediate that Inline graphic is a Inline graphic-cover of Inline graphic since Inline graphic is a Boolean algebra. It remains to verify that Inline graphic is (st)-safe. Consider Inline graphic and let Inline graphic. We prove that Inline graphic. If Inline graphic, this is immediate since Inline graphic is (st)-safe by construction. Hence, it suffices to show that Inline graphic is (st)-safe. This is a direct consequence of the following fact. Note that this is the only place in the proof where we use the hypothesis that M satisfies (3).

Fact 25

For every word Inline graphic, we have Inline graphic.

Proof

Let Inline graphic. By definition of Inline graphic, Inline graphic for every Inline graphic. Since Inline graphic is a cover of H, it follows that Inline graphic. By definition of H, it follows that Inline graphic. By definition, this yields Inline graphic such that Inline graphic, Inline graphic and Inline graphic. The latter property yields Inline graphic such that Inline graphic, Inline graphic and Inline graphic. We prove that Inline graphic and Inline graphic, which yields as desired that Inline graphic. By symmetry, we only show that Inline graphic.

Since Inline graphic, we have Inline graphic. Moreover, since Inline graphic, we have Inline graphic and Inline graphic such that Inline graphic and Inline graphic. It follows from (3) that for every Inline graphic, we have:

graphic file with name M1072.gif

This yields Inline graphic. Therefore, since we know that Inline graphic, we obtain Inline graphic. Finally, this yields,

graphic file with name M1076.gif

This concludes the proof.    Inline graphic

Second case: Inline graphic. In that case, we have Inline graphic and Inline graphic such that Inline graphic (the choice of uv and a is arbitrary). Consider the two following subsets of M:

graphic file with name M1082.gif

Moreover, we say that a cover Inline graphic of some language H is tight when Inline graphic for every Inline graphic. We use induction to prove the following fact.

Fact 26

There exist tight Inline graphic-covers Inline graphic and Inline graphic of Inline graphic and Inline graphic which satisfy the following properties:

  • for every Inline graphic, the cover Inline graphic of Inline graphic is (srt)-safe.

  • for every Inline graphic, the cover Inline graphic of Inline graphic is (srt)-safe.

Proof

We construct Inline graphic (the construction of Inline graphic is symmetrical). Let Inline graphic. For every Inline graphic, assume that we already have a Inline graphic-cover Inline graphic of Inline graphic which is Inline graphic-safe. We define,

graphic file with name M1105.gif

Since Inline graphic is a Boolean algebra, it is immediate that Inline graphic is a tight Inline graphic-cover of Inline graphic which is (srt)-safe for every Inline graphic. Thus, it remains to build for every Inline graphic such a Inline graphic-cover Inline graphic.

We fix Inline graphic for the proof. By definition of Inline graphic, we have Inline graphic for some word Inline graphic. Observe that since Inline graphic, we have Inline graphic by definition: our first induction parameter (i.e., the size of P[swt]) has not increased. Hence, since Inline graphic, it follows by induction on our second parameter in Lemma 23 (the length of w) that there exists a Inline graphic-cover Inline graphic of Inline graphic which is Inline graphic-safe. This concludes the proof.   Inline graphic

We are ready to construct the desired Inline graphic-cover Inline graphic of Inline graphic. Consider the tight Inline graphic-covers Inline graphic and Inline graphic of Inline graphic and Inline graphic described in Fact 26. Since Inline graphic, Proposition 19 yields a Inline graphic-cover Inline graphic of Inline graphic such that for every Inline graphic, there exist Inline graphic and Inline graphic satisfying Inline graphic. It remains to prove that Inline graphic is (st)-safe. Let Inline graphic and Inline graphic. We prove that Inline graphic.

By definition, Inline graphic for Inline graphic and Inline graphic. Hence, there exist Inline graphic and Inline graphic such that Inline graphic and Inline graphic. Since Inline graphic is a tight cover of Inline graphic, we know that Inline graphic, which implies that Inline graphic by definition. It follows that Inline graphic is Inline graphic-safe by Fact 26. Therefore, since Inline graphic and Inline graphic, we obtain Inline graphic. Symmetrically, one may verify that Inline graphic. Altogether, it follows that Inline graphic, meaning that Inline graphic. This concludes the proof of Lemma 23.   Inline graphic

Conclusion

We explained how covering provides a natural and convenient framework for handling membership questions. We illustrated this point by using covers to formulate new proofs for Schützenberger’s theorem and Simon’s theorem. We chose these two examples as they are arguably the two most famous characterization theorems of this kind. However, this approach is also relevant for other prominent characterization theorems. A first promising example is the class of unambiguous languages. It was also characterized by Schützenberger [37] and it also famous as the class of languages that can be define in two-variable first-order logic (this was shown by Thérien and Wilke [40]). Another interesting example is Knast’s theorem [13] which characterizes the languages of dot-depth one. This class is natural generalization of the piecewise testable languages.

Contributor Information

Alberto Leporati, Email: alberto.leporati@unimib.it.

Carlos Martín-Vide, Email: carlos.martin@urv.cat.

Dana Shapira, Email: shapird@g.ariel.ac.il.

Claudio Zandron, Email: zandron@disco.unimib.it.

Thomas Place, Email: tplace@labri.fr.

References

  • 1.Almeida J. Implicit operations on finite j-trivial semigroups and a conjecture of I. Simon. J. Pure Appl. Algebra. 1990;69:205–218. doi: 10.1016/0022-4049(91)90019-X. [DOI] [Google Scholar]
  • 2.Brzozowski JA, Cohen RS. Dot-depth of star-free events. J. Comput. Syst. Sci. 1971;5(1):1–16. doi: 10.1016/S0022-0000(71)80006-5. [DOI] [Google Scholar]
  • 3.Brzozowski JA, Knast R. The dot-depth hierarchy of star-free languages is infinite. J. Comput. Syst. Sci. 1978;16(1):37–55. doi: 10.1016/0022-0000(78)90049-1. [DOI] [Google Scholar]
  • 4.Brzozowski JA, Simon I. Characterizations of locally testable events. Discrete Math. 1973;4(3):243–271. doi: 10.1016/S0012-365X(73)80005-6. [DOI] [Google Scholar]
  • 5.Colcombet T. Green’s relations and their use in automata theory. In: Dediu A-H, Inenaga S, Martín-Vide C, editors. Language and Automata Theory and Applications; Heidelberg: Springer; 2011. pp. 1–21. [Google Scholar]
  • 6.Czerwiński W, Martens W, Masopust T. Efficient separability of regular languages by subsequences and suffixes. In: Fomin FV, Freivalds R, Kwiatkowska M, Peleg D, editors. Automata, Languages, and Programming; Heidelberg: Springer; 2013. pp. 150–161. [Google Scholar]
  • 7.Diekert V, Gastin P. First-order definable languages. In: Flum J, Grädel E, Wilke T, editors. Logic and Automata: History and Perspectives, Texts in Logic and Games. Amsterdam: Amsterdam University Press; 2008. pp. 261–306. [Google Scholar]
  • 8.Eilenberg S. Automata, Languages, and Machines. Orlando: Academic Press Inc.; 1976. [Google Scholar]
  • 9.Higgins P. A proof of simon’s theorem on piecewise testable languages. Theor. Comput. Sci. 1997;178(1):257–264. doi: 10.1016/S0304-3975(96)00230-7. [DOI] [Google Scholar]
  • 10.Higgins PM. A new proof of Schützenberger’s theorem. Int. J. Algebra Comput. 2000;10(02):217–220. doi: 10.1142/S0218196700000066. [DOI] [Google Scholar]
  • 11.Kamp, H.W.: Tense logic and the theory of linear order. Ph.D. thesis, Computer Science Department, University of California at Los Angeles, USA (1968)
  • 12.Klima O. Piecewise testable languages via combinatorics on words. Discrete Math. 2011;311(20):2124–2127. doi: 10.1016/j.disc.2011.06.013. [DOI] [Google Scholar]
  • 13.Knast R. A semigroup characterization of dot-depth one languages. RAIRO - Theor. Inform. Appl. 1983;17(4):321–330. doi: 10.1051/ita/1983170403211. [DOI] [Google Scholar]
  • 14.Lucchesi CL, Simon I, Simon I, Simon J, Kowaltowski T. Aspectos teóricos da computação. Sao Paulo: IMPA; 1979. [Google Scholar]
  • 15.McNaughton R, Papert SA. Counter-Free Automata. Cambridge: MIT Press; 1971. [Google Scholar]
  • 16.Meyer AR. A note on star-free events. J. ACM. 1969;16(2):220–225. doi: 10.1145/321510.321513. [DOI] [Google Scholar]
  • 17.Perrin, D.: Finite automata. In: Formal Models and Semantics. Elsevier (1990)
  • 18.Pin JE. Varieties of Formal Languages. New York: Plenum Publishing Co.; 1986. [Google Scholar]
  • 19.Pin JE. A variety theorem without complementation. Russ. Math. (Izvestija vuzov.Matematika) 1995;39:80–90. [Google Scholar]
  • 20.Pin, J.E.: The dot-depth hierarchy, 45 years later, pp. 177–202. World Scientific (2017). (chap. 8)
  • 21.Pin, J.E.: Mathematical foundations of automata theory (2019, in preparation). https://www.irif.fr/~jep/PDF/MPRI/MPRI.pdf
  • 22.Pin JE, Weil P. Polynomial closure and unambiguous product. Theory Comput. Syst. 1997;30(4):383–422. doi: 10.1007/BF02679467. [DOI] [Google Scholar]
  • 23.Pippenger N. Theories of Computability. Cambridge: Cambridge University Press; 1997. [Google Scholar]
  • 24.Place, T.: Separating regular languages with two quantifier alternations. Log. Methods Comput. Sci. 14(4) (2018)
  • 25.Place T, van Rooijen L, Zeitoun M. Separating regular languages by piecewise testable and unambiguous languages. In: Chatterjee K, Sgall J, editors. Mathematical Foundations of Computer Science 2013; Heidelberg: Springer; 2013. pp. 729–740. [Google Scholar]
  • 26.Place T, Zeitoun M. Going higher in the first-order quantifier alternation hierarchy on words. In: Esparza J, Fraigniaud P, Husfeldt T, Koutsoupias E, editors. Automata, Languages, and Programming; Heidelberg: Springer; 2014. pp. 342–353. [Google Scholar]
  • 27.Place, T., Zeitoun, M.: Separating regular languages with first-order logic. In: Proceedings of the Joint Meeting of the 23rd EACSL Annual Conference on Computer Science Logic (CSL 2014) and the 29th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS 2014), pp. 75:1–75:10. ACM, New York (2014)
  • 28.Place T, Zeitoun M. The tale of the quantifier alternation hierarchy of first-order logic over words. SIGLOG News. 2015;2(3):4–17. doi: 10.1145/2815493.2815495. [DOI] [Google Scholar]
  • 29.Place, T., Zeitoun, M.: Separating regular languages with first-order logic. Log. Methods Comput. Sci. 12(1) (2016)
  • 30.Place, T., Zeitoun, M.: Separation for dot-depth two. In: Proceedings of the 32th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS 2017), pp. 202–213. IEEE Computer Society (2017)
  • 31.Place, T., Zeitoun, M.: The covering problem. Log. Methods Comput. Sci. 14(3) (2018)
  • 32.Place T, Zeitoun M. Generic results for concatenation hierarchies. Theory Comput. Syst. (ToCS) 2019;63(4):849–901. doi: 10.1007/s00224-018-9867-0. [DOI] [Google Scholar]
  • 33.Place T, Zeitoun M. Going higher in first-order quantifier alternation hierarchies on words. J. ACM. 2019;66(2):12:1–12:65. doi: 10.1145/3303991. [DOI] [Google Scholar]
  • 34.Place, T., Zeitoun, M.: On all things star-free. In: Proceedings of the 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), pp. 126:1–126:14 (2019)
  • 35.Place, T., Zeitoun, M.: Separation and covering for group based concatenation hierarchies. In: Proceedings of the 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS 2019), pp. 1–13 (2019)
  • 36.Schützenberger MP. On finite monoids having only trivial subgroups. Inf. Control. 1965;8(2):190–194. doi: 10.1016/S0019-9958(65)90108-7. [DOI] [Google Scholar]
  • 37.Schützenberger MP. Sur le produit de concaténation non ambigu. Semigroup Forum. 1976;13:47–75. doi: 10.1007/BF02194921. [DOI] [Google Scholar]
  • 38.Simon I. Piecewise testable events. In: Brakhage H, editor. Automata Theory and Formal Languages; Heidelberg: Springer; 1975. pp. 214–222. [Google Scholar]
  • 39.Straubing H, Thérien D. Partially ordered finite monoids and a theorem of I. Simon. J. Algebra. 1988;119(2):393–399. doi: 10.1016/0021-8693(88)90067-1. [DOI] [Google Scholar]
  • 40.Thérien, D., Wilke, T.: Over words, two variables are as powerful as one quantifier alternation. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC 1998), pp. 234–240. ACM, New York (1998)
  • 41.Wilke T. Classifying discrete temporal properties. In: Meinel C, Tison S, editors. STACS 99; Heidelberg: Springer; 1999. pp. 32–46. [Google Scholar]
  • 42.Zalcstein Y. Locally testable languages. J. Comput. Syst. Sci. 1972;6(2):151–167. doi: 10.1016/S0022-0000(72)80020-5. [DOI] [Google Scholar]

Articles from Language and Automata Theory and Applications are provided here courtesy of Nature Publishing Group

RESOURCES