Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2020 Jan 7;12038:68–88. doi: 10.1007/978-3-030-40608-0_5

How to Prove that a Language Is Regular or Star-Free?

Jean-Éric Pin 5,
Editors: Alberto Leporati8, Carlos Martín-Vide9, Dana Shapira10, Claudio Zandron11
PMCID: PMC7206632

Abstract

This survey article presents some standard and less standard methods used to prove that a language is regular or star-free.


Most books of automata theory [9, 23, 29, 45, 49] offer exercises on regular languages, including some difficult ones. Further examples can be found on the web sites math.stackexchange.com and cs.stackexchange.com. Another good source of tough questions is the recent book 200 Problems in Formal Languages and Automata Theory [36]. Surprisingly, there are very few exercises related to star-free languages. In this paper, we present various methods to prove that a language is regular or star-free.

Background

Regular and Star-Free Languages

Let’s start by reminding us what a regular language and a star-free language are.

Definition 1

The class of regular languages is the smallest class of languages containing the finite languages that is closed under finite union, finite product and star.

The definition of star-free languages follows the same pattern, with the difference that the star operation is replaced by the complement:

Definition 2

The class of star-free languages is the smallest class of languages containing the finite languages that is closed under finite union, finite product and complement.

For instance, the language Inline graphic is star-free, since Inline graphic. More generally, if B is a subset of A, then Inline graphic is star-free since

graphic file with name 492458_1_En_5_Equ22_HTML.gif

On the alphabet Inline graphic, the language Inline graphic is star-free since

graphic file with name M6.gif

Since regular languages are closed under complement, every star-free language is regular, but the converse is not true: one can show that the language Inline graphic is not star-free.

Early Results and Their Consequences

Kleene’s theorem [26] states that regular languages are accepted by finite automata.

Theorem 1

Let L be a language. The following conditions are equivalent:

  1. L is regular,

  2. L is accepted by a finite deterministic automaton,

  3. L is accepted by a finite non-deterministic automaton.

Given a language L and a word u, the left [right] quotient of L by u are defined by Inline graphic and Inline graphic, respectively. The quotients of a regular [star-free] language are also regular [star-free].

Here is another standard result, due to Nerode.

Theorem 2

A language is regular if and only if it has finitely many left (respectively right) quotients.

Example 1

Nerode’s theorem suffices to show that if Inline graphic and Inline graphic are regular [star-free], then the language

graphic file with name M12.gif

is also regular [star-free]. Indeed Inline graphic and since Inline graphic and Inline graphic are regular [star-free], this apparently infinite union can be rewritten as a finite union. Thus L is regular [star-free].

Recognition by a Monoid and Syntactic Monoid

It is often useful to have a more algebraic definition of regular languages, based on the following result.

Proposition 1

Let L be a language. The following conditions are equivalent:

  1. L is regular,

  2. L is recognised by a finite monoid,

  3. the syntactic monoid of L is finite.

For readers who may have forgotten the definitions used in this proposition, here are some reminders. A language L of Inline graphic is recognised by a monoid M if there is a surjective monoid morphism Inline graphic and a subset P of M such that Inline graphic.

The syntactic congruence of a language L of Inline graphic is the equivalence relation Inline graphic on Inline graphic defined as follows: Inline graphic if and only if, for every Inline graphic, xuy and xvy are either both in L or both outside of L. The syntactic monoid of L is the quotient monoid Inline graphic.

Moreover, the syntactic monoid of a regular language is the transition monoid of its minimal automaton, which gives a convenient algorithm to compute it. It is also the minimal monoid (in size, but also for the division ordering1) that recognises the language.

Syntactic monoids are particularly useful to show that a language is star-free. Recall that a finite monoid M is aperiodic if, for every Inline graphic, there exists Inline graphic such that Inline graphic.

Theorem 3

(Schützenberger [46]). For a language L, the following conditions are equivalent:

  1. L is star-free,

  2. L is recognised by a finite aperiodic monoid,

  3. the syntactic monoid of L a finite aperiodic monoid.

Schützenberger’s theorem is considered, right after Kleene’s theorem, as the most important result of the algebraic theory of automata.

Example 2

The languages Inline graphic and Inline graphic are star-free, but the languages Inline graphic and Inline graphic are not. This is easy to prove by computing the syntactic monoid of these languages.

The following classic example is a good example of the usefulness of the monoid approach. For each language L, let Inline graphic.

Proposition 2

If L is regular [star-free], then so is Inline graphic.

Proof

Let Inline graphic be the syntactic morphism of L, let Inline graphic and let Inline graphic. Then

graphic file with name M37.gif

Thus M recognises Inline graphic and the result follows.

Although the star operation is prohibited in the definition of a star-free language, some languages of the form Inline graphic are star-free. A submonoid M of Inline graphic is pure if, for all Inline graphic and Inline graphic, the condition Inline graphic implies Inline graphic. The following result is due to Restivo [43] for finite languages and to Straubing [52] for the general case.

Theorem 4

If L is star-free and Inline graphic is pure, then Inline graphic is star-free.

Here is another example, based on [51, Theorem 5]. For each language L, let

graphic file with name M47.gif

Proposition 3

If L is regular [star-free], then so is Inline graphic.

Proof

Let Inline graphic be the syntactic morphism of L and let Inline graphic. Note that the conditions Inline graphic and Inline graphic are equivalent, for any Inline graphic. Setting Inline graphic and Inline graphic one gets

graphic file with name M56.gif

and the result now follows easily.

Iteration Properties

The bible on this topic is the book of de Luca and Varricchio [13]. I only present here a selection of their numerous results.

Pumping

The standard pumping lemma is designed to prove that a language is non-regular, although some students try to use it to prove the opposite. In a commendable effort to comfort these poor students, several authors have proposed extensions of the pumping lemma that characterise regular languages. The first is due to Jaffe [24]:

Theorem 5

A language L is regular if and only if there is an integer m such that every word x of length Inline graphic can be written as Inline graphic, with Inline graphic, and for all words z and for all Inline graphic, Inline graphic if and only if Inline graphic.

Stronger versions were proposed by Stanat and Weiss [50] and Ehrenfeucht, Parikh and Rozenberg [15], but the most powerful version was given by Varricchio [54].

Theorem 6

A language L is regular if and only if there is an integer Inline graphic such that, for all words x, Inline graphic and y, there exist ij with Inline graphic such that for all Inline graphic,

graphic file with name M67.gif

Periodicity and Permutation

Definition 3

Let L be a language of Inline graphic.

  1. L is periodic if, for any Inline graphic, there exist integers Inline graphic such that, for all Inline graphic, Inline graphic.

  2. L is n-permutable if, for any sequence Inline graphic of n words of Inline graphic, there exists a nontrivial permutation Inline graphic of Inline graphic such that, for all Inline graphic, Inline graphic.

  3. L is permutable if it is permutable for some Inline graphic.

These definitions were introduced by Restivo and Reutenauer [44], who proved the following result.

Proposition 4

A language is regular if and only if it is periodic and permutable.

Iteration Properties

The book of de Luca and Varricchio [13] also contains many results about iterations properties. Here is an example of this type of results.

Proposition 5

A language L is regular if and only if there exist integers m and s such that for any Inline graphic, there exist integers hk with Inline graphic, such that for all for all Inline graphic,

graphic file with name M83.gif 1

for all Inline graphic.

Rewriting Systems and Well Quasi-orders

Rewriting systems and well quasi-orders are two powerful methods to prove the regularity of a language. We follow the terminology of Otto’s survey [37].

Rewriting Systems

A rewriting system is a binary relation R on Inline graphic. A pair Inline graphic from R is usually referred to as the rewrite rule or simply the rule Inline graphic. A rule is special if Inline graphic, context-free if Inline graphic, inverse context-free if Inline graphic, length-reducing if Inline graphic. It is monadic if it is length-reducing and inverse context-free. A rewriting system is special (context-free, inverse context-free, length-reducing, monadic) if its rules have the corresponding properties.

The reduction relation Inline graphic the reflexive and transitive closure of the single-step reduction relation Inline graphic defined as follows: Inline graphic if Inline graphic and Inline graphic for some Inline graphic and some Inline graphic. For each language L, we set

graphic file with name M99.gif

A rewriting system R is said to preserve regularity if, for each regular language L, the language Inline graphic is regular. The following result is well-known.

Theorem 7

Inverse context-free rewriting systems preserve regularity.

Proof

Let R be an inverse context-free rewriting system and let L be a regular language. Starting from the minimal deterministic automaton of L, construct an automaton with the same set of states, but with 1-transitions, by iterating the following process: for each rule Inline graphic and for each path Inline graphic, create a new transition Inline graphic; for each rule Inline graphic with Inline graphic and for each path Inline graphic, create a new transition Inline graphic. The automaton obtained at the end of the iteration process will accept Inline graphic.

A similar technique can be used to prove the following result [38]. If K is a regular language, then the smallest language L containing K and such that Inline graphic is regular.

Suffix Rewriting Systems

A suffix rewriting system is a binary relation S on Inline graphic. Its elements are called suffix rules. The suffix-reduction relation Inline graphic defined by S is the reflexive transitive closure of the single-step suffix-reduction relation defined as follows: Inline graphic if Inline graphic and Inline graphic for some Inline graphic and some Inline graphic. Prefix rewriting systems are defined symmetrically. For each language L, we set

graphic file with name M115.gif

The following early result is due to Büchi [8].

Theorem 8

Suffix (prefix) rewriting systems preserve regularity.

Deleting Rewriting Systems

We follow Hofbauer and Waldmann [22] for the definition of deleting systems. If u is a word, the content of u is the set c(u) of all letters of u occurring in u. A precedence relation is an irreflexive and transitive binary relation. A precedence relation < on an alphabet A can be extended to a precedence relation on Inline graphic, by setting Inline graphic if Inline graphic and, for each Inline graphic, there exists Inline graphic such that Inline graphic. A rewriting system R is <-deleting if for each rule Inline graphic of R, Inline graphic.

Hofbauer and Waldmann [22] proved the following result.

Theorem 9

Every deleting string rewriting system preserves regularity.

Rules of the Form Inline graphic

Rules of the form Inline graphic were studied in several papers, for instance [5, 16, 34]. The following result is due to Bovet and Varricchio [5].

Proposition 6

The rewriting systems Inline graphic and Inline graphic both preserve regularity.

This result can be used to solve the following exercise. Let L be a language such that, for all Inline graphic, Inline graphic is a semigroup. Prove that L is regular. Indeed, this condition implies that Inline graphic implies Inline graphic.

Several results were obtained by Leupold [33, 34]. Let us say that a rewriting system is k-period-expanding [k-period-reducing] if its rules are of the form Inline graphic, with Inline graphic [Inline graphic] and Inline graphic. Any union of finitely many k-period-expanding and k-period reducing SRSs is called a k-periodic rewriting system.

Proposition 7

(Leupold) .

  1. Every k-periodic rewriting system preserves regularity.

  2. For each Inline graphic, the rewriting system Inline graphic preserves regularity.

  3. For each k and for Inline graphic, the rewriting system Inline graphic preserves regularity.

Well Quasi-orders

A quasi-order (or preorder) on Inline graphic is a reflexive and transitive relation. A quasi-order Inline graphic is stable (or monotone) if, for all words uvxy, the condition Inline graphic implies Inline graphic. A language U is an upper set with respect to a quasi-order Inline graphic is the conditions Inline graphic and Inline graphic imply Inline graphic. The upper set generated by a language L is the language Inline graphic

A quasi-order Inline graphic on Inline graphic is a well quasi-order (wqo) if every upper set is generated by some finite language. The connection with regular languages was first established in [14] (see also [13, Theorem 6.3.1, p. 203] and [12]).

Theorem 10

A language is regular if and only if it is an upper set with respect to some stable well quasi-order on Inline graphic.

It follows that if the reduction relation defined by a rewriting system is a well quasi-order, then this rewriting system preserves regularity. Actually, a stronger property holds. Following Conway [11], let us say that a rewriting system R is a total regulator if for any language L, the language Inline graphic is regular.

Theorem 11

Any rewriting system whose reduction relation is a well quasi-order is a total regulator.

The most famous example is the rewriting system Inline graphic, which defines the subword ordering. A word Inline graphic is a subword of a word v if Inline graphic. Higman’s theorem states that if A is finite, the subword relation is a well quasi-order on Inline graphic. It follows that for any language L (regular or not), the shuffle product Inline graphic is regular.

The following result extends Higman’s theorem on the subword order. Let us say that a set H of words of Inline graphic is unavoidable if the language Inline graphic is finite.

Theorem 12

(Ehrenfeucht, Haussler, Rozenberg [14, Theorem 4.8]). If H is a unavoidable finite set of words of Inline graphic, then the reduction relation of the rewriting system Inline graphic is a well quasi-order on Inline graphic.

A similar result holds for rewriting systems with rules of the form Inline graphic, where a is a letter.

Theorem 13

(Bucher, Ehrenfeucht and Haussler [6, Theorem 2.3]). Let R be a finite rewriting system with rules of the form Inline graphic with Inline graphic and Inline graphic. The following conditions are equivalent:

  1. the relation Inline graphic is a well quasi-order,

  2. The set Inline graphic is unavoidable,

  3. The set Inline graphic is unavoidable.

It follows for instance that the following rewriting systems are total regulators:

graphic file with name M170.gif

Bucher, Ehrenfeucht and Haussler [6] considered context-free rewriting systems related to semigroup morphims. Recall that an ordered semigroup is a semigroup equipped with a stable partial order. Let Inline graphic be a finite ordered semigroup and let Inline graphic be a semigroup morphism. Consider the rewriting system

graphic file with name M173.gif

Let Inline graphic be a finite set of languages of Inline graphic. Consider a (possibly infinite) system of inequations of the form

graphic file with name M176.gif 2

where each Inline graphic is a product built from the variables Inline graphic and arbitrary constant languages and each Inline graphic is an expression built from the variables Inline graphic and constant languages belonging to the set Inline graphic, using concatenation, possibly infinite union and possibly infinite intersection. Note that the expressions Inline graphic can also use Kleene star, since it can be rewritten as an infinite union of products.

Theorem 14

(Kunc [30]). Let Inline graphic be a semigroup morphism that recognises all languages in Inline graphic. If Inline graphic is a well quasi-order on Inline graphic, then the components of every maximal solution of (2) is regular and they are star-free is S is aperiodic.

Characterising the semigroup morphisms for which Inline graphic is a well quasi-order, is an open problem. However, Kunc found a complete answer for finite semigroups Inline graphic ordered by the equality relation.

Theorem 15

(Kunc [30]). Let Inline graphic be a finite ordered semigroup ordered by the equality relation and let Inline graphic be a surjective semigroup morphism. Then the relation Inline graphic is a well quasi-order on Inline graphic if and only if S is a chain of simple semigroups.

In particular any finite group is a simple semigroup. It follows that if L is a language recognised by a finite group, then, for any subset S of Inline graphic, the language Inline graphic is regular.

Example 3

The following example is given by Kunc [30, Example 19]. Let L be the language consisting of those words Inline graphic which contain some occurrence of b and where the difference between the length of u and the number of blocks of occurrences of b in u is even. Here is the minimal automaton of this language.

graphic file with name 492458_1_En_5_Figa_HTML.jpg

The syntactic semigroup of L is defined by the relations Inline graphic, Inline graphic, Inline graphic, Inline graphic and Inline graphic. It is a chain of two simple semigroups whose elements are represented by the words a, Inline graphic and b, Inline graphic, ab, Inline graphic, ba, Inline graphic, aba, Inline graphic, respectively.

Let us consider the inequality Inline graphic with one variable X. It is easy to verify that this inequality has a largest solution, namely the regular language Inline graphic.

Equations and Inequalities

Inequations in languages in which the right hand side is a constant language were first considered by Conway [11], see also Bala [1]. In Chap. 21 of the forthcoming Handbook of Automata Theory, Kunc and Okhotin [32] give the following remarkable result. Consider a finite system of inequations of the form

graphic file with name M208.gif 3

where each Inline graphic is a product of arbitrary constant languages and variables, each Inline graphic is a constant regular language and each index set Inline graphic is possibly infinite.

Theorem 16

(Kunc and Okhotin [32]). Every system of the form (3) has only finitely many maximal solutions and every maximal solution has all components regular. If all Inline graphic are star-free, then the maximal solutions are star-free. Furthermore, the result still holds if any inequalities are replaced by equations.

Proof

Let Inline graphic be the simultaneous syntactic monoid of the languages Inline graphic. If Inline graphic is a solution, then so is Inline graphic. It follows that every solution is contained in a solution in which all components are recognised by h and the result follows.

Inequations of the form Inline graphic were considered by Kunc [30].

Theorem 17

(Kunc [30]). Let K be an arbitrary language and let L be a regular language. Then the greatest solution of the inequality Inline graphic is regular.

The situation is totally different for equations of the type Inline graphic. Indeed Kunc [31] has shown that there exists a finite language L such that the greatest solution of the equation Inline graphic is co-recursively enumerable complete.

Logic

Logic can be used in various ways to characterise regular languages. We consider successively logic on words, linear temporal logic and logic on trees.

Logic on Words

Let Inline graphic be a nonempty word on the alphabet A. The domainDomain of u, denoted by Inline graphic, is the set Inline graphic. For each letter Inline graphic, let Inline graphic be a unary predicate symbol, where Inline graphic is interpreted as “the letter in position x is an a”. We also use the binary predicate symbols < and S, interpreted as the usual order relation and the successor relation on Inline graphic, respectively. The language defined by a sentence Inline graphic is the set

graphic file with name M229.gif

We let Inline graphic and Inline graphic denote the set of first-order and monadic second-order formulas of signature Inline graphic, respectively. Similarly, we let Inline graphic and Inline graphic denote the same sets of formulas of signature Inline graphic.

Let us say that a syntactic fragment of logic F captures a class of languages Inline graphic if every sentence of the fragment F defines a language of Inline graphic and every language of Inline graphic can be defined by a sentence of F.

Two famous results are a natural ingredient of this survey. The first one is due to Buchi [7] and was independently obtained by Elgot [20] and Trakhtenbrot [53].

Theorem 18

(Buchi [7]). Inline graphic captures the class of regular languages.

The second one relates first order logic and star-free languages.

Theorem 19

(McNaughton [35]). Inline graphic captures the class of star-free languages.

Second order logic Inline graphic is much more expressive than monadic second order, but two successive results led to a complete characterisation of the syntactic fragments of Inline graphic — in the signature Inline graphic — that capture the regular languages.

A quantifier prefix is any word on the alphabet Inline graphic. A quantifier prefix class is any set of quantifier prefixes. For any quantifier prefix Q, let Inline graphic (resp. Inline graphic be the set of all formulas of the shape Inline graphic (resp. Inline graphic) where Inline graphic is a list of relations and Inline graphic is quantifier free. For every Inline graphic, let Inline graphic (resp., Inline graphic) be the set of all formulas of the form Inline graphic (resp. Inline graphic) where Inline graphic is a Inline graphic (resp. Inline graphic) formula. Finally, for every quantifier prefix class Inline graphic, let Inline graphic.

The fragment Inline graphic, also known as existential second order and frequently denoted by Inline graphic, was first explored by Eiter, Gottlob and Gurevich [17].

Theorem 20

(Eiter, Gottlob and Gurevich [17]). A syntactic fragment Inline graphic captures the regular languages if and only if Inline graphic is a quantifier prefix class contained in Inline graphic whose intersection with Inline graphic is nonempty.

The proof of this result is very difficult. It relies on combinatorial methods related to hypergraph transversals for the fragment Inline graphic and on more logical techniques for the fragment Inline graphic. Eiter, Gottlob and Gurevich further proved the following dichotomy theorem: a class Inline graphic either expresses only regular languages or it expresses some NP-complete languages.

The fragments Inline graphic, with Inline graphic, were explored by Eiter, Gottlob and Schwentick [18].

Theorem 21

(Eiter, Gottlob and Schwentick [18]). The fragments Inline graphic and Inline graphic capture the class of regular languages. Furthermore, for each Inline graphic, the fragments Inline graphic and Inline graphic only define regular languages.

For more information on this topic, the reader is invited to read the beautiful survey of Eiter, Gottlob and Schwentick [19].

Linear Temporal Logic

Linear temporal logic (LTL for short) on an alphabet A is defined as follows. The vocabulary consists of an atomic proposition Inline graphic (for each letter Inline graphic), the usual connectives Inline graphic, Inline graphic and Inline graphic and the temporal operators Inline graphic (next), Inline graphic (eventually) and Inline graphic (until). The formulas are constructed according to the following rules:

  1. for every Inline graphic, Inline graphic is a formula,

  2. if Inline graphic and Inline graphic are formulas, so are Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic and Inline graphic.

Semantics are defined by induction on the formation rules. Given a word Inline graphic, and Inline graphic, we define the expression “w satisfies Inline graphic at the instant n” (denoted Inline graphic) as follows:

  1. Inline graphic if the n-th letter of w is an a.

  2. Inline graphic (resp. Inline graphic, Inline graphic) if Inline graphic or Inline graphic (resp. if Inline graphic and Inline graphic, if (wn) does not satisfy Inline graphic).

  3. Inline graphic if Inline graphic satisfies Inline graphic.

  4. Inline graphic if there exists m such that Inline graphic and Inline graphic.

  5. Inline graphic if there exists m such that Inline graphic, Inline graphic and, for every k such that Inline graphic , Inline graphic.

Note that, if Inline graphic, Inline graphic only depends on the word Inline graphic.

Example 4

Let Inline graphic. Then Inline graphic since the fourth letter of w is an a, Inline graphic since the fifth letter of w is a b and Inline graphic since cb is a factor of babcba.

If Inline graphic is a temporal formula, we say that w satisfies Inline graphic if Inline graphic. The language defined by a LTL formula Inline graphic is the set Inline graphic of all words of Inline graphic that satisfy Inline graphic.

A famous result of Kamp [25] states that LTL is equivalent to the first-order logic of order. As a consequence, one gets the following result.

Theorem 22

A language of Inline graphic is star-free if and only if it is LTL-definable.

We just defined future temporal formulas but one can define in the same way past temporal formulas by reversing time: it suffices to replace next by previous, eventually by sometimes and until by since. The expressive power of this extended temporal logic remains the same: it still captures the class of star-free languages.

Rabin’s Tree Theorem

We now consider the structure Inline graphic, where each Inline graphic is a binary relation symbol, interpreted on Inline graphic as follows: Inline graphic if and only if Inline graphic. Let Inline graphic be a monadic second order formula with a free set-variable X. We write Inline graphic as a short hand for the formula Inline graphic. A language L is said to be definable in Inline graphic if there exists a monadic second order formula Inline graphic such that L satisfies Inline graphic.

The following result is a consequence of Rabin’s tree theorem [42].

Theorem 23

A language of Inline graphic is regular if and only if it is definable in Inline graphic.

Transductions

Transductions proved to be a powerful tool to study regular languages. Let us first recall some useful facts about rational and recognisable sets.

Rational and Recognisable Sets

Let M be a monoid. A subset P of M is recognisable if there exists a finite monoid F, and a monoid morphism Inline graphic such that Inline graphic. It is well known that the class Inline graphic of recognisable subsets of M is closed under Boolean operations, left and right quotients and under inverses of monoid morphisms. The recognisable subsets of a product of monoids were described by Mezei (unpublished).

Theorem 24

Let Inline graphic be monoids. A subset of Inline graphic is recognisable if and only if it is a finite union of subsets of the form Inline graphic, where Inline graphic.

Furthermore, the following property holds:

Proposition 8

Let Inline graphic be finite alphabets. Then Inline graphic is closed under product.

The class Inline graphic of rational subsets of M is the smallest set Inline graphic of subsets of M containing the finite subsets and closed under finite union, product and star (where Inline graphic is the submonoid of M generated by X). Rational sets are closed under monoid morphisms. Kleene’s theorem shows that Inline graphic, but this result does not extend to arbitrary monoids.

Matrix Representations of Transductions

Let M be a monoid. We denote by Inline graphic the semiring of subsets of M with union as addition and the usual product of subsets as multiplication. Note that both Inline graphic and Inline graphic are subsemirings of Inline graphic. Let also Inline graphic denote the semiring of Inline graphic-matrices with entries in Inline graphic.

Let M and N be two monoids. A transduction Inline graphic is a relation on M and N, viewed as a function from M to Inline graphic. One extends Inline graphic to a function Inline graphic by setting Inline graphic. The inverse transduction Inline graphic is defined by Inline graphic. The transduction is rational if the set Inline graphic is a rational subset of Inline graphic.

A transduction Inline graphic admits a linear matrix representation Inline graphic of degree n if there exist Inline graphic, a monoid morphism Inline graphic, a row vector Inline graphic and a column vector Inline graphic such that, for all Inline graphic, Inline graphic.

A substitution from Inline graphic to a monoid M is a monoid morphism from Inline graphic to Inline graphic. Thus a substitution has linear matrix representation of degree 1.

Kleene-Schützenberger’s theorem (see [2]) states that a transduction Inline graphic is rational if and only if it admits a linear matrix representation with entries in Inline graphic.

The following result already suffices for most of the applications we have in mind. It relies on the fact that every monoid morphism Inline graphic can be extended to a semiring morphism Inline graphic and, for each Inline graphic, to a semiring morphism Inline graphic.

Theorem 25

Let Inline graphic be a transduction that admits a linear matrix representation Inline graphic of degree n and let P be a subset of M recognised by a morphism Inline graphic. Then the language Inline graphic is recognised by the submonoid Inline graphic of the monoid of matrices Inline graphic.

This result was generalised in [39, 40]. Let us say that a transduction Inline graphic admits a matrix representation Inline graphic of degree n if there exist a morphism Inline graphic and an expression Inline graphic, where S is a possibly infinite union of products involving arbitrary languages and the variables Inline graphic, such that, for all Inline graphic, Inline graphic. Theorem 25 can now be generalized as follows.

Theorem 26

Let Inline graphic be a transduction that admits a matrix representation Inline graphic of degree n and let P be a subset of M recognised by a morphism Inline graphic. Then the language Inline graphic is recognised by the submonoid Inline graphic of the monoid of matrices Inline graphic.

Example 5

Let us come back to the example Inline graphic. Observe that Inline graphic where Inline graphic . Clearly Inline graphic admits the matrix representation Inline graphic where Inline graphic and Inline graphic.

Example 6

Let us show that if L is a regular language and S is a subset of Inline graphic then the language

graphic file with name M420.gif

is also regular. It suffices to observe that Inline graphic where the transduction Inline graphic admits the matrix representation Inline graphic, where

graphic file with name M424.gif

Example 7

Finally the reader who likes more complicated examples may prove by the same method that if Inline graphic is regular, then the following language is also regular (Inline graphic is the Dyck language):

graphic file with name M427.gif

Many more examples can be found in [39, 40].

Decompositions of Languages

For each Inline graphic, consider the transduction Inline graphic defined by

graphic file with name M430.gif

Theorem 27

Let L be a language of Inline graphic. The following conditions are equivalent:

  1. L is rational,

  2. for some Inline graphic, Inline graphic is a recognisable subset of Inline graphic,

  3. for all Inline graphic, Inline graphic is a recognisable subset of Inline graphic.

Proof

(1) implies (3). Let Inline graphic be the minimal automaton of L. For each state p, q of Inline graphic, let Inline graphic be the language accepted by Inline graphic with p as initial state and q as unique final state. Let Inline graphic. We claim that

graphic file with name M443.gif 4

Let R be the right hand side of (4). Let Inline graphic. Let Inline graphic, Inline graphic, ..., Inline graphic. Since Inline graphic, one has Inline graphic and hence Inline graphic. Moreover, by construction, Inline graphic and hence Inline graphic.

Let now Inline graphic. Then, for some Inline graphic, one has Inline graphic, ..., Inline graphic. It follows that Inline graphic, ..., Inline graphic and thus Inline graphic and hence Inline graphic.

Profinite Topology

Let M be a monoid. A monoid morphism Inline graphic separates two elements u and v of M if Inline graphic. By extension, we say that a monoid N separates two elements of M if there exists a morphism Inline graphic which separates them. A monoid is residually finite if any pair of distinct elements of M can be separated by a finite monoid.

Let us consider the class Inline graphic of monoids that are finitely generated and residually finite. This class include finite monoids, free monoids, free groups, free commutative monoids and many others. It is closed under direct products and thus monoids of the form Inline graphic are also in Inline graphic.

Each monoid M of Inline graphic can be equipped with the profinite metric, defined as follows. Let, for each Inline graphic,

graphic file with name 492458_1_En_5_Equ23_HTML.gif

Then we set Inline graphic, with the usual conventions Inline graphic and Inline graphic. One can show that d is an ultrametric and that the product on M is uniformly continuous for this metric.

Uniformly Continuous Functions and Recognisable sets

The connection with recognisable sets is given by the following result:

Proposition 9

Let Inline graphic and let Inline graphic be a function. Then the following conditions are equivalent:

  1. for every Inline graphic, one has Inline graphic,

  2. the function f is uniformly continuous for the profinite metric.

Here is an interesting example [41].

Proposition 10

The function Inline graphic defined by Inline graphic is uniformly continuous.

Example 8

As an application, let us show that if L is a regular language of Inline graphic, then the language

graphic file with name M479.gif

is also regular. Indeed, Inline graphic, where h is the function defined by Inline graphic. Observe that Inline graphic, where Inline graphic is the monoid morphism defined by Inline graphic and g is the function defined in Proposition 10. Now since Inline graphic, one gets Inline graphic by Proposition 10 and Inline graphic since f is a monoid morphism. Thus K is regular.

Uniformly continuous functions from Inline graphic to Inline graphic are of special interest. A function Inline graphic is residually ultimately periodic (rup) if, for each monoid morphism h from Inline graphic to a finite monoid F, the sequence h(f(n)) is ultimately periodic. It is cyclically ultimately periodic if, for every Inline graphic, there exist two integers Inline graphic and Inline graphic such that, for each Inline graphic, Inline graphic. It is ultimately periodic threshold t if the function Inline graphic is ultimately periodic.

For instance, the functions Inline graphic and n! are residually ultimately periodic. The function Inline graphic is not cyclically ultimately periodic. Indeed, it is known that Inline graphic if and only if n is a power of 2. It is shown in [48] that the sequence Inline graphic is not cyclically ultimately periodic.

Let us mention a last example, first given in [10]. Let Inline graphic be a non-ultimately periodic sequence of 0 and 1. The function Inline graphic is residually ultimately periodic. It follows that the function Inline graphic is not residually ultimately periodic since Inline graphic.

The following result was proved in [3].

Proposition 11

For a function Inline graphic, the following conditions are equivalent:

  1. f is uniformly continuous,

  2. f is residually ultimately periodic,

  3. f is cyclically ultimately periodic and ultimately periodic threshold t for all Inline graphic.

The class of cyclically ultimately periodic functions has been studied by Siefkes [48], who gave in particular a recursion scheme for producing such functions. The class of residually ultimately periodic sequences was also thoroughly studied in [10, 55] (see also [27, 28, 47]). Their properties are summarized in the next proposition.

Theorem 28

Let g and g be rup functions. Then the following functions are also rup: Inline graphic, Inline graphic, fg, Inline graphic, Inline graphic, Inline graphic. Furthermore, if Inline graphic for all n and Inline graphic, then Inline graphic is also rup.

In particular, the functions Inline graphic and Inline graphic (for a fixed k), are rup. The tetration function Inline graphic (exponential stack of 2’s of height n), considered in [47], is also rup, according to the following result: if k is a positive integer, then the function f(n) defined by Inline graphic and Inline graphic is rup.

The existence of non-recursive rup functions was established in [47]: if f is a strictly increasing, non-recursive function, then the function Inline graphic is non-recursive but is rup.

Coming back to regular languages, Seiferas and McNaughton [47] proved the following result.

Theorem 29

Let Inline graphic be a rup function. If L is regular, then so is the language

graphic file with name M523.gif

Here is another application of rup functions. A filter is a strictly increasing function Inline graphic. Filtering a word Inline graphic by f consists in deleting the letters Inline graphic such that i is not in the range of f. For each language L, let L[f] denote the set of all words of L filtered by f. A filter is said to preserve regular languages if, for every regular language L, the language L[f] is also regular. The following result was proved in [3].

Theorem 30

A filter f preserves regular languages if and only if the function Inline graphic defined by Inline graphic is rup.

Transductions and Recognisable Sets

Some further topological results are required to extend Proposition 9 to transductions.

The completion of the metric space (Md), denoted by Inline graphic, is called the profinite completion of M. Since multiplication on M is uniformly continuous, it extends, in a unique way, to a multiplication on Inline graphic, which is again uniformly continuous. One can show that Inline graphic is a metric compact monoid.

Let Inline graphic be the monoid of compact subsets of Inline graphic. The Hausdorff metric on Inline graphic is defined as follows. For Inline graphic, let

graphic file with name 492458_1_En_5_Equ24_HTML.gif

By a standard result of topology, Inline graphic, equipped with this metric, is compact.

Let now Inline graphic be a transduction. Define a map Inline graphic by setting, for each Inline graphic, Inline graphic, the topological closure of Inline graphic. The following extension of Proposition 9 was proved in [41].

Theorem 31

Let Inline graphic and let Inline graphic be a transduction. Then the following conditions are equivalent:

  1. for every Inline graphic, one has Inline graphic,

  2. the function Inline graphic is uniformly continuous.

Let us say that a transduction Inline graphic is uniformly continuous, if Inline graphic is uniformly continuous. Uniformly continuous transductions are closed under composition and they are also closed under direct product.

Proposition 12

Let Inline graphic and Inline graphic be uniformly continuous transductions. Then the transduction Inline graphic defined by Inline graphic is uniformly continuous.

Proposition 13

For every Inline graphic, the transduction Inline graphic defined by Inline graphic is uniformly continuous.

Further Examples and Conclusion

Here are a few results relating regular languages and Turing machines.

Theorem 32

([9, Theorem 3.84, p. 185]). The language accepted by a one-tape Turing machine that never writes on its input is regular.

Theorem 33

(Hartmanis [21]). The language accepted by a one-tape Turing machine that works in time Inline graphic is regular.

The following result is proposed as an exercise in [9, Exercise 4.16, p. 243].

Theorem 34

The language accepted by a Turing machine that works in space Inline graphic is regular.

Let me also mention a result related to formal power series.

Theorem 35

(Restivo and Reutenauer [44]). If a language and its complement are support of a rational series, then it is a regular language.

Many other examples could not be included in this survey, notably the work of Bertoni, Mereghetti and Palano [4, Theorem 3, p. 8] on 1-way quantum automata and the large literature on splicing systems.

I would be very grateful to any reader providing me new interesting examples to enrich this survey.

Acknowledgements

I would like to thank Olivier Carton for his useful suggestions.

Footnotes

1

Let M and N be monoids. We say that M divides N if there is a submonoid R of N and a monoid morphism that maps R onto M.

J.-É. Pin—Work supported by the DeLTA project (ANR-16-CE40-0007).

Contributor Information

Alberto Leporati, Email: alberto.leporati@unimib.it.

Carlos Martín-Vide, Email: carlos.martin@urv.cat.

Dana Shapira, Email: shapird@g.ariel.ac.il.

Claudio Zandron, Email: zandron@disco.unimib.it.

Jean-Éric Pin, Email: Jean-Eric.Pin@irif.fr.

References

  • 1.Bala S. Complexity of regular language matching and other decidable cases of the satisfiability problem for constraints between regular open terms. Theory Comput. Syst. 2006;39(1):137–163. doi: 10.1007/s00224-005-1262-y. [DOI] [Google Scholar]
  • 2.Berstel, J.: Transductions and Context-Free Languages. Teubner (1979)
  • 3.Berstel J, Boasson L, Carton O, Petazzoni B, Pin J-É. Operations preserving recognizable languages. Theor. Comput. Sci. 2006;354:405–420. doi: 10.1016/j.tcs.2005.11.034. [DOI] [Google Scholar]
  • 4.Bertoni A, Mereghetti C, Palano B. Quantum computing: 1-way quantum automata. In: Ésik Z, Fülöp Z, editors. Developments in Language Theory; Heidelberg: Springer; 2003. pp. 1–20. [Google Scholar]
  • 5.Bovet DP, Varricchio S. On the regularity of languages on a binary alphabet generated by copying systems. Inform. Process. Lett. 1992;44(3):119–123. doi: 10.1016/0020-0190(92)90050-6. [DOI] [Google Scholar]
  • 6.Bucher W, Ehrenfeucht A, Haussler D. On total regulators generated by derivation relations. Theor. Comput. Sci. 1985;40(2–3):131–148. doi: 10.1016/0304-3975(85)90162-8. [DOI] [Google Scholar]
  • 7.Büchi JR. Weak second-order arithmetic and finite automata. Z. Math. Logik und Grundl. Math. 1960;6:66–92. doi: 10.1002/malq.19600060105. [DOI] [Google Scholar]
  • 8.Büchi, J.R.: Regular canonical systems. Arch. Math. Logik Grundlagenforsch. 6, 91–111 (1964) (1964)
  • 9.Carton, O.: Langages formels, calculabilité et complexité. Vuibert (2008)
  • 10.Carton O, Thomas W. The monadic theory of morphic infinite words and generalizations. Inform. Comput. 2002;176:51–76. doi: 10.1006/inco.2001.3139. [DOI] [Google Scholar]
  • 11.Conway JH. Regular Algebra and Finite Machines. London: Chapman and Hall; 1971. [Google Scholar]
  • 12.D’Alessandro F, Varricchio S. Well quasi-orders in formal language theory. In: Ito M, Toyama M, editors. Developments in Language Theory; Heidelberg: Springer; 2008. pp. 84–95. [Google Scholar]
  • 13.De Luca A, Varricchio S. Finiteness and Regularity in Semigroups and Formal Languages. Monographs in Theoretical Computer Science. An EATCS Series. Heidelberg: Springer; 1999. [Google Scholar]
  • 14.Ehrenfeucht A, Haussler D, Rozenberg G. On regularity of context-free languages. Theor. Comput. Sci. 1983;27(3):311–332. doi: 10.1016/0304-3975(82)90124-4. [DOI] [Google Scholar]
  • 15.Ehrenfeucht A, Parikh R, Rozenberg G. Pumping lemmas for regular sets. SIAM J. Comput. 1981;10(3):536–541. doi: 10.1137/0210039. [DOI] [Google Scholar]
  • 16.Ehrenfeucht A, Rozenberg G. On regularity of languages generated by copying systems. Discrete Appl. Math. 1984;8(3):313–317. doi: 10.1016/0166-218X(84)90129-X. [DOI] [Google Scholar]
  • 17.Eiter T, Gottlob G, Gurevich Y. Existential second-order logic over strings. J. ACM. 2000;47(1):77–131. doi: 10.1145/331605.331609. [DOI] [Google Scholar]
  • 18.Eiter T, Gottlob G, Schwentick T. Second-order logic over strings: regular and non-regular fragments. In: Kuich W, Rozenberg G, Salomaa A, editors. Developments in Language Theory; Heidelberg: Springer; 2002. pp. 37–56. [Google Scholar]
  • 19.Eiter T, Gottlob G, Schwentick T. The model checking problem for prefix classes of second-order logic: a survey. In: Blass A, Dershowitz N, Reisig W, editors. Fields of Logic and Computation; Heidelberg: Springer; 2010. pp. 227–250. [Google Scholar]
  • 20.Elgot CC. Decision problems of finite automata design and related arithmetics. Trans. Am. Math. Soc. 1961;98:21–51. doi: 10.1090/S0002-9947-1961-0139530-9. [DOI] [Google Scholar]
  • 21.Hartmanis J. Computational complexity of one-tape Turing machine computations. J. Assoc. Comput. Mach. 1968;15:325–339. doi: 10.1145/321450.321464. [DOI] [Google Scholar]
  • 22.Hofbauer D, Waldmann J. Deleting string rewriting systems preserve regularity. Theor. Comput. Sci. 2004;327(3):301–317. doi: 10.1016/j.tcs.2004.04.009. [DOI] [Google Scholar]
  • 23.Hopcroft, J.E., Ullman, J.D.: Introduction To Automata Theory, Languages, And Computation. Addison-Wesley Publishing Co., Reading (1979). Addison-Wesley Series in Computer Science
  • 24.Jaffe J. A necessary and sufficient pumping lemma for regular languages. SIGACT News. 1978;10(2):48–49. doi: 10.1145/990524.990528. [DOI] [Google Scholar]
  • 25.Kamp, J.: Tense Logic and the Theory of Linear Order. Ph.D. thesis, University of California, Los Angeles (1968)
  • 26.Kleene, S.C.: Representation of events in nerve nets and finite automata. In: Automata Studies, pp. 3–41. Princeton University Press, Princeton (1956). Ann. Math. Stud. 34
  • 27.Kosaraju SR. Regularity preserving functions. SIGACT News. 1974;6(2):16–17. doi: 10.1145/1008304.1008306. [DOI] [Google Scholar]
  • 28.Kozen D. On regularity-preserving functions. Bull. Europ. Assoc. Theor. Comput. Sci. 1996;58:131–138. [Google Scholar]
  • 29.Kozen DC. Automata and computability. Undergraduate Texts in Computer Science. New York: Springer; 1997. [Google Scholar]
  • 30.Kunc M. Regular solutions of language inequalities and well quasi-orders. Theor. Comput. Sci. 2005;348(2–3):277–293. doi: 10.1016/j.tcs.2005.09.018. [DOI] [Google Scholar]
  • 31.Kunc M. The power of commuting with finite sets of words. Theory Comput. Syst. 2007;40(4):521–551. doi: 10.1007/s00224-006-1321-z. [DOI] [Google Scholar]
  • 32.Kunc, M., Okhotin, A.: Language equations. In: Pin, J.E. (ed.) Handbook of Automata Theory, vol. II, chap. 21. European Mathematical Society, Zürich (2020, To appear)
  • 33.Leupold P. Languages generated by iterated idempotency. Theor. Comput. Sci. 2007;370(1–3):170–185. doi: 10.1016/j.tcs.2006.10.021. [DOI] [Google Scholar]
  • 34.Leupold P. On regularity-preservation by string-rewriting systems. In: Martín-Vide C, Otto F, Fernau H, editors. Language and Automata Theory and Applications; Heidelberg: Springer; 2008. pp. 345–356. [Google Scholar]
  • 35.McNaughton R, Papert S. Counter-Free Automata. Cambridge: The M.I.T. Press; 1971. [Google Scholar]
  • 36.Niwinśki, D., Rytter, W.: 200 Problems in Formal Languages and Automata Theory. University of Warsaw (2017)
  • 37.Otto F. On the connections between rewriting and formal language theory. In: Narendran P, Rusinowitch M, editors. Rewriting Techniques and Applications; Heidelberg: Springer; 1999. pp. 332–355. [Google Scholar]
  • 38.Pin J-É. Topologies for the free monoid. J. Algebra. 1991;137:297–337. doi: 10.1016/0021-8693(91)90094-O. [DOI] [Google Scholar]
  • 39.Pin J-É, Sakarovitch J. Some operations and transductions that preserve rationality. In: Cremers AB, Kriegel H-P, editors. Theoretical Computer Science; Heidelberg: Springer; 1982. pp. 277–288. [Google Scholar]
  • 40.Pin J-É, Sakarovitch J. Une application de la représentation matricielle des transductions. Theor. Comput. Sci. 1985;35:271–293. doi: 10.1016/0304-3975(85)90019-2. [DOI] [Google Scholar]
  • 41.Pin J-É, Silva PV. A topological approach to transductions. Theor. Comput. Sci. 2005;340:443–456. doi: 10.1016/j.tcs.2005.03.029. [DOI] [Google Scholar]
  • 42.Rabin MO. Decidability of second-order theories and automata on infinite trees. Trans. Am. Math. Soc. 1969;141:1–35. [Google Scholar]
  • 43.Restivo, A.: Codes and aperiodic languages. In: Erste Fachtagung der Gesellschaft für Informatik über Automatentheorie und Formale Sprachen (Bonn, 1973), LNCS, vol. 2, pp. 175–181. Springer, Berlin (1973)
  • 44.Restivo A, Reutenauer C. On cancellation properties of languages which are supports of rational power series. J. Comput. Syst. Sci. 1984;29(2):153–159. doi: 10.1016/0022-0000(84)90026-6. [DOI] [Google Scholar]
  • 45.Sakarovitch J. Elements of Automata Theory. Cambridge: Cambridge University Press; 2009. [Google Scholar]
  • 46.Schützenberger MP. On finite monoids having only trivial subgroups. Inf. Control. 1965;8:190–194. doi: 10.1016/S0019-9958(65)90108-7. [DOI] [Google Scholar]
  • 47.Seiferas JI, McNaughton R. Regularity-preserving relations. Theor. Comput. Sci. 1976;2(2):147–154. doi: 10.1016/0304-3975(76)90030-X. [DOI] [Google Scholar]
  • 48.Siefkes, D.: Decidable extensions of monadic second order successor arithmetic. In: Automatentheorie und formale Sprachen (Tagung, Math. Forschungsinst., Oberwolfach, 1969), pp. 441–472. Bibliographisches Inst., Mannheim (1970)
  • 49.Sipser, M.: Introduction to the Theory of Computation. 3rd edn. Cengage Learning (2012)
  • 50.Stanat DF, Weiss SF. A pumping theorem for regular languages. SIGACT News. 1982;14(1):36–37. doi: 10.1145/1008892.1008895. [DOI] [Google Scholar]
  • 51.Stearns RE, Hartmanis J. Regularity preserving modifications of regular expressions. Inf. Control. 1963;6:55–69. doi: 10.1016/S0019-9958(63)90110-4. [DOI] [Google Scholar]
  • 52.Straubing H. Relational morphisms and operations on recognizable sets. RAIRO Inf. Theor. 1981;15:149–159. doi: 10.1051/ita/1981150201491. [DOI] [Google Scholar]
  • 53.Trakhtenbrot, B.A.: BarzdinInline graphic, Y.M.: Finite Automata, Behavior and Synthesis. North-Holland Publishing Co., Amsterdam (1973). Translated from the Russian by D. Louvish, English translation edited by E. Shamir and L. H. Landweber, Fundamental Studies in Computer Science, vol. 1
  • 54.Varricchio S. A pumping condition for regular sets. SIAM J. Comput. 1997;26(3):764–771. doi: 10.1137/S0097539790179944. [DOI] [Google Scholar]
  • 55.Zhang GQ. Automata, boolean matrices, and ultimate periodicity. Inform. Comput. 1999;152(1):138–154. doi: 10.1006/inco.1998.2787. [DOI] [Google Scholar]

Articles from Language and Automata Theory and Applications are provided here courtesy of Nature Publishing Group

RESOURCES