Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2020 May 26;12086:83–95. doi: 10.1007/978-3-030-48516-0_7

The State Complexity of Lexicographically Smallest Words and Computing Successors

Lukas Fleischer ‡,, Jeffrey Shallit
Editors: Nataša Jonoska8, Dmytro Savchuk9
PMCID: PMC7247871

Abstract

Given a regular language L over an ordered alphabet Inline graphic, the set of lexicographically smallest (resp., largest) words of each length is itself regular. Moreover, there exists an unambiguous finite-state transducer that, on a given word Inline graphic, outputs the length-lexicographically smallest word larger than w (henceforth called the L-successor of w). In both cases, naïve constructions result in an exponential blowup in the number of states. We prove that if L is recognized by a DFA with n states, then Inline graphic states are sufficient for a DFA to recognize the subset S(L) of L composed of its lexicographically smallest words. We give a matching lower bound that holds even if S(L) is represented as an NFA. We then show that the same upper and lower bounds hold for an unambiguous finite-state transducer that computes L-successors.

Introduction

One of the most basic problems in formal language theory is the problem of enumerating the words of a language L. Since, in general, L is infinite, language enumeration is often formalized in one of the following two ways:

  1. A function that maps an integer Inline graphic to the n-th word of L.

  2. A function that takes a word and maps it to the next word in L.

Both descriptions require some linear ordering of the words in order for them to be well-defined. Usually, radix order (also known as length-lexicographical order) is used. Throughout this work, we focus on the second formalization.

While enumeration is non-computable in general, there are many interesting special cases. In this paper, we investigate the case of fixed regular languages, where successors can be computed in linear time [1, 2, 9]. Moreover, Frougny [7] showed that for every regular language L, the mapping of words to their successors in L can be realized by a finite-state transducer. Later, Angrand and Sakarovitch refined this result [3], showing that the successor function of any regular language is a finite union of functions computed by sequential transducers that operate from right to left. However, to the best of our knowledge, no upper bound on the size of smallest transducer computing the successor function was known.

In this work, we consider transducers operating from left to right, and prove that the optimal upper bound for the size of transducers computing successors in L is in Inline graphic, where n is the size of the smallest DFA for L.

The construction used to prove the upper bound relies heavily on another closely related result. Many years before Frougny published her proof, it had already been shown that if L is a regular language, the set of all lexicographically smallest (resp., largest) words of each length is itself regular; see, e.g., [11, 12]. This fact is used both in [3] and in our construction. In [12], it was shown that if L is recognized by a DFA with n states, then the set of all lexicographically smallest words is recognized by a DFA with Inline graphic states. While it is easy to improve this upper bound to Inline graphic, the exact state complexity of this operation remained open. We prove that Inline graphic states are sufficient and that this upper bound is optimal. We also prove that nondeterminism does not help with recognizing lexicographically smallest words, i.e., the corresponding lower bound still holds if the constructed automaton is allowed to be nondeterministic.

The key component to our results is a careful investigation of the structure of lexicographically smallest words. This is broken down into a series of technical lemmas in Sect. 3, which are interesting in their own right. Some of the other techniques are similar to those already found in [3], but need to be carried out more carefully to achieve the desired upper bound.

For some related results, see [5, 10].

Preliminaries

We assume familiarity with basic concepts of formal language theory and automata theory; see [8, 13] for a comprehensive introduction. Below, we introduce concepts and notation specific to this work.

Ordered Words and Languages. Let Inline graphic be a finite ordered alphabet. Throughout the paper, we consider words ordered by radix order, which is defined by Inline graphic if either Inline graphic or there exist factorizations Inline graphic, Inline graphic with Inline graphic and Inline graphic such that Inline graphic. We write Inline graphic if Inline graphic or Inline graphic. In this case, the word u is smaller than v and the word v is larger than u.

For a language Inline graphic and two words Inline graphic, we say that v is the L-successor of u if Inline graphic and Inline graphic for all Inline graphic with Inline graphic. Similarly, u is the L-predecessor of v if Inline graphic and Inline graphic for all Inline graphic with Inline graphic. A word is L-minimal if it has no L-predecessor. A word is L-maximal if it has no L-successor. Note that every nonempty language contains exactly one L-minimal word. It contains a (unique) L-maximal word if and only if L is finite. A word Inline graphic is L-length-preserving if it is not L-maximal and the L-successor of u has length Inline graphic. Words that are not L-length-preserving are called L-length-increasing. Note that by definition, an L-maximal word is always L-length-increasing. For convenience, we sometimes use the terms successor (resp., predecessor) instead of Inline graphic-successor (resp., Inline graphic-predecessor).

For a given language Inline graphic, the set of all smallest words of each length in L is denoted by S(L). It is formally defined as follows:

graphic file with name M35.gif

Similarly, we define B(L) to be the set of all L-length-increasing words:

graphic file with name M36.gif

A language Inline graphic is thin if it contains at most one word of each length, i.e., Inline graphic for all Inline graphic. It is easy to see that for every language Inline graphic, the languages S(L) and B(L) are thin.

Finite Automata and Transducers. A nondeterministic finite automaton (NFA for short) is a 5-tuple Inline graphic where Q is a finite set of states, Inline graphic is a finite alphabet, Inline graphic is the initial state, Inline graphic is the set of accepting states and Inline graphic is the transition function. We usually use the notation Inline graphic instead of Inline graphic, and we extend the transition function to Inline graphic by letting Inline graphic and Inline graphic for all Inline graphic, Inline graphic, and Inline graphic. For a state Inline graphic and a word Inline graphic, we also use the notation Inline graphic instead of Inline graphic for convenience. A word Inline graphic is accepted by the NFA if Inline graphic. We sometimes use the notation Inline graphic to indicate that Inline graphic. An NFA is unambiguous if for every input, there exists at most one accepting run. Unambiguous NFA are also called unambiguous finite state automata (UFA). A deterministic finite automaton (DFA for short) is an NFA Inline graphic with Inline graphic for all Inline graphic and Inline graphic. Since this implies Inline graphic for all Inline graphic, we sometimes identify the singleton Inline graphic with the only element it contains.

A finite-state transducer is a nondeterministic finite automaton that additionally produces some output that depends on the current state, the current letter and the successor state. For each transition, we allow both the input and the output letter to be empty. Formally, it is a 6-tuple Inline graphic where Q is a finite set of states, Inline graphic and Inline graphic are finite alphabets, Inline graphic is the initial state and Inline graphic is the set of accepting states, and Inline graphic is the transition function. One can extend this transition function to the product Inline graphic. To this end, we first define the Inline graphic-closure of a set Inline graphic as the smallest superset C of T with Inline graphic. We then define Inline graphic to be the Inline graphic-closure of Inline graphic and Inline graphic to be the Inline graphic-closure of Inline graphic for all Inline graphic, Inline graphic and Inline graphic. We sometimes use the notation Inline graphic to indicate that Inline graphic. A finite-state transducer is unambiguous if, for every input, there exists at most one accepting run.

The State Complexity of S(L)

It is known that if L is a regular language, then both S(L) and B(L) are also regular [11, 12]. In this section, we investigate the state complexity of the operations Inline graphic and Inline graphic for regular languages. Since the operations are symmetric, we focus on the former. To this end, we first prove some technical lemmas. The first lemma is a simple observation that helps us investigate the structure of words in S(L).

Lemma 1

Let Inline graphic with Inline graphic. Then Inline graphic or Inline graphic or Inline graphic.

Proof

Note that uy and yv are words of the same length. If Inline graphic, then Inline graphic. Similarly, Inline graphic immediately yields Inline graphic. The last case is Inline graphic, which implies Inline graphic.   Inline graphic

Using this observation, we can generalize a well-known factorization technique for regular languages to minimal words. For a DFA with state set Q, a state Inline graphic and a word Inline graphic, we define

graphic file with name M106.gif

to be the sequence of all states that are visited when starting in state q and following the transitions labeled by the letters from w.

Lemma 2

Let Inline graphic be a DFA over Inline graphic with n states and with initial state Inline graphic. Then for every word Inline graphic, there exists a factorization Inline graphic with Inline graphic and Inline graphic such that, for all Inline graphic, the following hold:

  1. Inline graphic,

  2. Inline graphic, and

  3. Inline graphic is not a prefix of Inline graphic.

Additionally, if Inline graphic, this factorization can be chosen such that

  • (d)

    the lengths Inline graphic are pairwise disjoint (i.e., Inline graphic) and

  • (e)

    there exists at most one Inline graphic with Inline graphic.

Proof

To construct the desired factorization, initialize Inline graphic and Inline graphic and follow these steps:

  1. If Inline graphic, we are done. If Inline graphic and the states in Inline graphic are pairwise distinct, let Inline graphic and Inline graphic and we are done. Otherwise, factorize Inline graphic with Inline graphic minimal such that Inline graphic contains exactly one state twice, i.e., Inline graphic distinct states in total.

  2. Choose the unique factorization Inline graphic such that Inline graphic and Inline graphic.

  3. Let Inline graphic and Inline graphic.

  4. If Inline graphic and Inline graphic and Inline graphic, increment Inline graphic and go back to step 1. Otherwise, let Inline graphic, Inline graphic and Inline graphic; then go back to step 1.

This factorization satisfies the first three properties by construction. It remains to show that if Inline graphic, then Properties (d) and (e) are satisfied as well.

Let us begin with Property (d). For the sake of contradiction, assume that there exist two indices ab with Inline graphic and Inline graphic. Note that by construction, Inline graphic and Inline graphic must be nonempty. Moreover, by Property (a), the words

graphic file with name M152.gif

both belong to Inline graphic. However, since Inline graphic, neither Inline graphic nor Inline graphic can be strictly smaller than w. Using Lemma 1, we obtain that Inline graphic. This contradicts Property (c).

Property (e) can be proved by using the same argument: Assume that there exist indices ab with Inline graphic and Inline graphic. The words Inline graphic and Inline graphic have the same lengths. We define

graphic file with name M162.gif

and obtain Inline graphic, which is a contradiction as above.   Inline graphic

The existence of such a factorization almost immediately yields our next technical ingredient.

Lemma 3

Let Inline graphic be a DFA with Inline graphic states. Let Inline graphic be the initial state of Inline graphic and let Inline graphic. Then there exists a factorization Inline graphic with Inline graphic, Inline graphic and Inline graphic such that Inline graphic. In particular, Inline graphic.

Proof

Let Inline graphic be a factorization that satisfies all properties in the statement of Lemma 2. Suppose first that all exponents Inline graphic are at most n. Using Properties (b) and (d), we obtain Inline graphic and the maximum length of w is achieved when all lengths Inline graphic are present among the factors Inline graphic and the corresponding Inline graphic have lengths Inline graphic. This yields

graphic file with name M183.gif

where the last inequality uses Inline graphic. Therefore, we may set Inline graphic, Inline graphic and Inline graphic.

If not all exponents are at most n, by Property (e), there exists a unique index j with Inline graphic. In this case, let Inline graphic, Inline graphic and Inline graphic. The upper bound Inline graphic still follows by the argument above, and Inline graphic is a direct consequence of Property (b). Moreover, Inline graphic and Property (a) together imply that Inline graphic.   Inline graphic

For the next lemma, we need one more definition. Let Inline graphic be a DFA with initial state Inline graphic. Two tuples (xyz) and Inline graphic are cycle-disjoint with respect to Inline graphic if the sets of states in Inline graphic and Inline graphic are either equal or disjoint.

Lemma 4

Let Inline graphic be a DFA with Inline graphic states and initial state Inline graphic. Let (xyz) and Inline graphic be tuples that are not cycle-disjoint with respect to Inline graphic such that

graphic file with name M208.gif

Then either Inline graphic or Inline graphic only contains words of length at most Inline graphic.

Proof

Since the tuples are not cycle-disjoint with respect to Inline graphic, we can factorize Inline graphic and Inline graphic such that Inline graphic.

Note that since Inline graphic, the sets of states in Inline graphic and Inline graphic coincide for all Inline graphic. By the same argument, the sets of states in Inline graphic and Inline graphic coincide for all Inline graphic.

If the powers Inline graphic and Inline graphic were equal, then Inline graphic and Inline graphic coincide. By the previous observation, this would imply that the tuples (xyz) and Inline graphic are cycle-disjoint, a contradiction. We conclude Inline graphic.

By symmetry, we may assume that Inline graphic. But then, for every word of the form Inline graphic with Inline graphic, there exists a strictly smaller word Inline graphic in Inline graphic. To see that this word indeed belongs to Inline graphic, note that Inline graphic. This means that all words in Inline graphic are of the form Inline graphic with Inline graphic.   Inline graphic

The previous lemmas now allow us to replace any language L by another language that has a simple structure and approximates L with respect to S(L).

Lemma 5

Let Inline graphic be a DFA over Inline graphic with Inline graphic states. Then there exist an integer Inline graphic and tuples Inline graphic such that the following properties hold:

  • (i)

    Inline graphic,

  • (ii)

    Inline graphic for all Inline graphic, and

  • (iii)

    Inline graphic where Inline graphic.

Proof

If we ignore the required upper bound Inline graphic and Property (iii) for now, the statement follows immediately from Lemma 3 and the fact that there are only finitely many different tuples (xyz) with Inline graphic and Inline graphic. We start with such a finite set of tuples Inline graphic and show that we can repeatedly eliminate tuples until at most Inline graphic cycle-disjoint tuples remain. The desired upper bound Inline graphic then follows automatically.

In each step of this elimination process, we handle one of the following cases:

  • If there are two distinct tuples Inline graphic and Inline graphic with Inline graphic and Inline graphic, there are two possible scenarios. If Inline graphic, then for every word in Inline graphic there exists a smaller word in Inline graphic and we can remove Inline graphic from the set of tuples. By the same argument, we can remove the tuple Inline graphic if Inline graphic and Inline graphic.

  • Now consider the case that there are two distinct tuples Inline graphic and Inline graphic with Inline graphic and Inline graphic but Inline graphic. We first check whether Inline graphic. If true, we add the tuple Inline graphic, otherwise we add Inline graphic. If Inline graphic, we know that each word in Inline graphic has a smaller word in Inline graphic, and we remove the tuple Inline graphic. Otherwise, we can remove Inline graphic by the same argument.

  • The last case is that there exist two tuples Inline graphic and Inline graphic that are not cycle-disjoint. By Lemma 4, we can remove at least one of these tuples and replace it by multiple tuples of the form Inline graphic. Note that the newly introduced tuples might be of the form Inline graphic with Inline graphic but Lemma 4 asserts that they still satisfy Inline graphic.

Note that we introduce new tuples of the form Inline graphic during this elimination process. These new tuples are readily eliminated using the first rule.

After iterating this elimination process, the remaining tuples are pairwise cycle-disjoint and the pairs Inline graphic assigned to these tuples Inline graphic are pairwise disjoint. Properties (ii) and (iii) yield the desired upper bound on k.    Inline graphic

Remark 1

While S(L) can be approximated by a language of the simple form given in Lemma 5, the language S(L) itself does not necessarily have such a simple description. An example of a regular language L where S(L) does not have such a simple form is given in the proof of Theorem 2.

The last step is to investigate languages L of the simple structure described in the previous lemma and show how to construct a small DFA for S(L).

Lemma 6

Let Inline graphic. Let Inline graphic with Inline graphic and Inline graphic for all Inline graphic and Inline graphic where Inline graphic. Then S(L) is recognized by a DFA with Inline graphic states.

Proof

We describe how to construct a DFA of the desired size that recognizes the language S(L). This DFA is the product automaton of multiple components.

In one component (henceforth called the counter component), we keep track of the length of the processed input as long as at most Inline graphic letters have been consumed. If more than Inline graphic letters have been consumed, we only keep track of the length of the processed input modulo all numbers Inline graphic for Inline graphic.

For each Inline graphic, there is an additional component (henceforth called the i-th activity component). In this component, we keep track of whether the currently processed prefix u of the input is a prefix of a word in Inline graphic, whether u is a prefix of a word in Inline graphic and whether Inline graphic. Note that if some prefix of the input is not a prefix of a word in Inline graphic, no longer prefix of the input can be a prefix of a word in Inline graphic. The information stored in the counter component suffices to compute the possible letters of Inline graphic allowed to be read in each step to maintain the prefix invariants.

It remains to describe how to determine whether a state is final. To this end, we use the following procedure. First, we determine which sets of the form Inline graphic the input word leading to the considered state belongs to. These languages are called the active languages of the state. They can be obtained from the activity components of the state. If there are no active languages, the state is immediately marked as not final. If the length of the input word w leading to the considered state is Inline graphic or less, we can obtain Inline graphic from the counter component and reconstruct w from the set of active languages. If the length of the input is larger than Inline graphic, we cannot fully recover the input from the information stored in the state. However, we can determine the shortest word w with Inline graphic such that Inline graphic is consistent with the length information stored in the counter component and w itself is consistent with the set of active languages. In either case, we then compute the set A of all words of length Inline graphic that belong to any (possibly not active) language Inline graphic with Inline graphic. If w is the smallest word in A, the state is final, otherwise it is not final.

The desired upper bound on the number of states follows from known estimates on the least common multiple of a set of natural numbers with a given sum; see e.g., [6].   Inline graphic

We can now combine the previous lemmas to obtain an upper bound on the state complexity of S(L).

Theorem 1

Let L be a regular language that is recognized by a DFA with n states. Then S(L) is recognized by a DFA with Inline graphic states.

Proof

By Lemma 5, we know that there exists a language Inline graphic of the form described in the statement of Lemma 6 with Inline graphic. Since Inline graphic implies Inline graphic and since Inline graphic, this also means that Inline graphic. Lemma 6 now shows that there exists a DFA of the desired size.   Inline graphic

To show that the result is optimal, we provide a matching lower bound.

Theorem 2

There exists a family of DFA Inline graphic over a binary alphabet such that Inline graphic has n states and every NFA for Inline graphic has Inline graphic states.

Proof

For Inline graphic, let Inline graphic be the i-th prime number and let Inline graphic. We define a language

graphic file with name M334.gif

It is easy to see that L is recognized by a DFA with Inline graphic states. We show that S(L) is not recognized by any NFA with less than p states. From known estimates on the prime numbers (e.g., [4, Sec. 2.7]), this suffices to prove our claim.

Let Inline graphic be a NFA for S(L) and assume, for the sake of contradiction, that Inline graphic has less than p states. Note that since for each Inline graphic, the integer p is a multiple of Inline graphic, the language L does not contain any word of the form Inline graphic. Therefore, the word Inline graphic belongs to S(L) and by assumption, an accepting path for this word in Inline graphic must contain a loop of some length Inline graphic. But then Inline graphic is accepted by Inline graphic, too. However, since Inline graphic, there exists some Inline graphic such that Inline graphic does not divide Inline graphic. This means that Inline graphic also does not divide Inline graphic. Thus, Inline graphic, contradicting the fact that Inline graphic belongs to S(L).   Inline graphic

Combining the previous two theorems, we obtain the following corollary.

Corollary 1

Let L be a language that is recognized by a DFA with n states. Then, in general, Inline graphic states are necessary and sufficient for a DFA or NFA to recognize S(L).

By reversing the alphabet ordering, we immediately obtain similar results for largest words.

Corollary 2

Let L be a language that is recognized by a DFA with n states. Then, in general, Inline graphic states are necessary and sufficient for a DFA or NFA to recognize B(L).

The State Complexity of Computing Successors

One approach to efficient enumeration of a regular language L is constructing a transducer that reads a word and outputs its L-successor [3, 7]. We consider transducers that operate from left to right. Since the output letter in each step might depend on letters that have not yet been read, this transducer needs to be nondeterministic. However, the construction can be made unambiguous, meaning that for any given input, at most one computation path is accepting and yields the desired output word. In this paper, we prove that, in general, Inline graphic states are necessary and sufficient for a transducer that performs this computation.

Our proof is split into two parts. First, we construct a transducer that only maps L-length-preserving words to their corresponding L-successors. All other words are rejected. This construction heavily relies on results from the previous section. Then we extend this transducer to L-length-increasing words by using a technique called padding. For the first part, we also need the following result.

Theorem 3

Let Inline graphic be a thin language that is recognized by a DFA with n states. Then the languages

graphic file with name M359.gif

are recognized by UFA with 2n states.

Proof

Let Inline graphic be a DFA for L and let Inline graphic. We construct a UFA with 2n states for Inline graphic. The statement for Inline graphic follows by symmetry.

The state set of the UFA is Inline graphic, the initial state is Inline graphic and the set of final states is Inline graphic. The transitions are graphic file with name 492976_1_En_7_Figa_HTML.jpg

It is easy to verify that this automaton indeed recognizes Inline graphic. To see that this automaton is unambiguous, consider an accepting run of a word w of length Inline graphic. Note that the sequence of first components of the states in this run yield an accepting path of length Inline graphic in Inline graphic. Since Inline graphic is thin, this path is unique. Therefore, the sequence of first components is uniquely defined. The second components are then uniquely defined, too: they are 0 up to the first position where w differs from the unique word of length Inline graphic in L, and 1 afterwards.   Inline graphic

For a language Inline graphic, we denote by Inline graphic the language of all words from Inline graphic such that there exists no strictly larger word of the same length in L. Combining Theorem 1 and Theorem 3, the following corollary is immediate.

Corollary 3

Let L be a language that is recognized by a DFA with n states. Then there exists a UFA with Inline graphic states that recognizes the language Inline graphic.

For a language Inline graphic, we define

graphic file with name M380.gif

If L is regular, it is easy to construct an NFA for the complement of X(L), henceforth denoted as Inline graphic. To this end, we take a DFA for L and replace the label of each transition with all letters from Inline graphic. This NFA can also be viewed as an NFA over the unary alphabet Inline graphic; here, Inline graphic is interpreted as a letter, not a set. It can be converted to a DFA for Inline graphic by using Chrobak’s efficient determinization procedure for unary NFA [6]. The resulting DFA can then be complemented to obtain a DFA for X(L):

Corollary 4

Let L be a language that is recognized by a DFA with n states. Then there exists a DFA with Inline graphic states that recognizes the language X(L).

We now use the previous results to prove an upper bound on the size of a transducer performing a variant of the L-successor computation that only works for L-length-preserving words.

Theorem 4

Let L be a language that is recognized by a DFA with n states. Then there exists an unambiguous finite-state transducer with Inline graphic states that rejects all L-length-increasing words and maps every L-length-preserving word to its L-successor.

Proof

Let Inline graphic be a DFA for L and let Inline graphic. For every Inline graphic, we denote by Inline graphic the DFA that is obtained by making q the new initial state of Inline graphic. We use Inline graphic to denote DFA with Inline graphic states that recognizes the language Inline graphic. These DFA exist by Theorem 1. Moreover, by Corollary 3, there exist UFA with Inline graphic states that recognize the languages Inline graphic. We denote these UFA by Inline graphic. Similarly, we use Inline graphic to denote DFA with Inline graphic states that recognize Inline graphic. These DFA exist by Corollary 4.

In the finite-state transducer, we first simulate Inline graphic on a prefix u of the input, copying the input letters in each step, i.e., producing the output u. At some position, after having read a prefix u leading up to the state Inline graphic, we nondeterministically decide to output a letter b that is strictly larger than the current input letter a. From then on, we guess an output letter in each step and start simulating multiple automata in different components. In one component, we simulate Inline graphic on the remaining input. In another component, we simulate Inline graphic on the guessed output. In additional components, for each Inline graphic with Inline graphic, we simulate Inline graphic on the input. The automata in all components must accept in order for the transducer to accept the input.

The automaton Inline graphic verifies that there is no word in L that starts with the prefix ua, has the same length as the input word and is strictly larger than the input word. The automaton Inline graphic verifies that there is no word in L that starts with the prefix ub, has the same length as the input word and is strictly smaller than the output word. It also certifies that the output word belongs to L. For each letter c, the automaton Inline graphic verifies that there is no word in L that starts with the prefix uc and has the same length as the input word.

Together, the components ensure that the guessed output is the unique successor of the input word, given that it is L-length-preserving. It is also clear that L-length-increasing words are rejected, since the Inline graphic-component does not accept for any sequence of nondeterministic choices.   Inline graphic

The construction given in the previous proof can be extended to also compute L-successors of L-length-increasing words. However, this requires some quite technical adjustments to the transducer. Instead, we use a technique called padding. A very similar approach appears in [3, Prop. 5.1].

We call the smallest letter of an ordered alphabet Inline graphic the padding symbol of Inline graphic. A language Inline graphic is Inline graphic-padded if Inline graphic is the padding symbol of Inline graphic and Inline graphic for some Inline graphic. The key property of padded languages is that all words prefixed by a sufficiently long block of padding symbols are L-length-preserving.

Lemma 7

Let Inline graphic be a DFA over Inline graphic with n states such that Inline graphic is a Inline graphic-padded language. Let Inline graphic and let Inline graphic. Let Inline graphic be a word that is not K-maximal. Then the Inline graphic-successor of Inline graphic has length Inline graphic.

Proof

Let v be the K-successor of u. By a standard pumping argument, we have Inline graphic. This means that Inline graphic is well-defined and belongs to Inline graphic. Note that this word is strictly greater than Inline graphic and has length Inline graphic. Thus, the Inline graphic-successor of Inline graphic has length Inline graphic, too.   Inline graphic

We now state the main result of this section.

Theorem 5

Let Inline graphic be a deterministic finite automaton over Inline graphic with n states. Then there exists an unambiguous finite-state transducer with Inline graphic states that maps every word to its Inline graphic-successor.

Proof

We extend the alphabet by adding a new padding symbol Inline graphic and convert Inline graphic to a DFA for Inline graphic by adding a new initial state. The language Inline graphic accepted by this new DFA is Inline graphic-padded. By Theorem 4 and Lemma 7, there exists an unambiguous transducer of the desired size that maps every word from Inline graphic to its successor in Inline graphic. It is easy to modify this transducer such that all words that do not belong to Inline graphic are rejected. We then replace every transition that reads a Inline graphic by a corresponding transition that reads the empty word instead. Similarly, we replace every transition that outputs a Inline graphic by a transition that outputs the empty word instead. Clearly, this yields the desired construction for the original language L. A careful analysis of the construction shows that the transducer remains unambiguous after each step.   Inline graphic

We now show that this construction is optimal up to constants in the exponent. The idea is similar to the construction used in Theorem 2.

Theorem 6

There exists a family of deterministic finite automata Inline graphic such that Inline graphic has n states whereas the smallest unambiguous transducer that maps every word to its Inline graphic-successor has Inline graphic states.

Proof

Let Inline graphic. Let Inline graphic be the k smallest prime numbers such that Inline graphic and let Inline graphic. We construct a deterministic finite automaton Inline graphic with Inline graphic states such that the smallest transducer computing the desired mapping has at least p states. From known estimates on the prime numbers (e.g., [4, Sec. 2.7]), this suffices to prove our claim.

The automaton is defined over the alphabet Inline graphic. It consists of an initial state Inline graphic, an error state Inline graphic, and states (ij) for Inline graphic and Inline graphic with transitions defined as follows:

graphic file with name M471.gif

The set of accepting states is Inline graphic. The language Inline graphic is the set of all words of the form Inline graphic with Inline graphic such that j is a multiple of Inline graphic.

Assume, to get a contradiction, that there exists an unambiguous transducer with less than p states that maps w to the smallest word in Inline graphic strictly greater than w. Consider an accepting run of this transducer on some input of the form Inline graphic with Inline graphic large enough such that the run contains a cycle. Clearly, since Inline graphic and p are coprime, the output of the transducer has to be Inline graphic. We fix one cycle in this run.

If the number of Inline graphic read in this cycle does not equal the number of Inline graphic output in this cycle, by using a pumping argument, we can construct a word of the form Inline graphic that is mapped to a word or the form Inline graphic with Inline graphic. This contradicts the fact that Inline graphic is a subset of Inline graphic. Therefore, we may assume that both the number of letters read and output on the cycle is Inline graphic.

Again, by a pumping argument, this implies that Inline graphic is mapped to Inline graphic for every Inline graphic. Since Inline graphic, at least one of the prime numbers Inline graphic is coprime to r. Therefore, we can choose j such that Inline graphic. However, this means that Inline graphic belongs to Inline graphic, contradicting the fact that the transducer maps Inline graphic to Inline graphic.   Inline graphic

Combining the two previous theorems, we obtain the following corollary.

Corollary 5

Let L be a language that is recognized by a DFA with n states. Then, in general, Inline graphic states are necessary and sufficient for an unambiguous finite-state transducer that maps words to their L-successors.

Contributor Information

Nataša Jonoska, Email: jonoska@mail.usf.edu.

Dmytro Savchuk, Email: savchuk@usf.edu.

Lukas Fleischer, Email: lukas.fleischer@uwaterloo.ca.

Jeffrey Shallit, Email: shallit@uwaterloo.ca.

References

  • 1.Ackerman M, Mäkinen E. Three new algorithms for regular language enumeration. In: Ngo HQ, editor. Computing and Combinatorics; Heidelberg: Springer; 2009. pp. 178–191. [Google Scholar]
  • 2.Ackerman M, Shallit J. Efficient enumeration of words in regular languages. Theoret. Comput. Sci. 2009;410(37):3461–3470. doi: 10.1016/j.tcs.2009.03.018. [DOI] [Google Scholar]
  • 3.Angrand P-Y, Sakarovitch J. Radix enumeration of rational languages. RAIRO - Theoret. Inform. Appl. 2010;44(1):19–36. doi: 10.1051/ita/2010003. [DOI] [Google Scholar]
  • 4.Bach E, Shallit J. Algorithmic Number Theory. Cambridge: MIT Press; 1996. [Google Scholar]
  • 5.Berthé, V., Frougny, C., Rigo, M., Sakarovitch, J.: On the cost and complexity of the successor function. In: Arnoux, P., Bédaride, N., Cassaigne, J. (eds.) Proceedings of WORDS 2007, Technical Report, Institut de mathématiques de Luminy, pp. 43–56 (2007)
  • 6.Chrobak M. Finite automata and unary languages. Theoret. Comput. Sci. 1986;47:149–158. doi: 10.1016/0304-3975(86)90142-8. [DOI] [Google Scholar]
  • 7.Frougny C. On the sequentiality of the successor function. Inf. Comput. 1997;139(1):17–38. doi: 10.1006/inco.1997.2650. [DOI] [Google Scholar]
  • 8.Hopcroft JE, Motwani R, Ullman JD. Introduction to Automata Theory, Languages, and Computation. 3. New York: Addison-Wesley Longman Publishing Co., Inc.; 2006. [Google Scholar]
  • 9.Mäkinen E. On lexicographic enumeration of regular and context-free languages. Acta Cybern. 1997;13(1):55–61. [Google Scholar]
  • 10.Okhotin AS. On the complexity of the string generation problem. Discret. Math. Appl. 2003;13:467–482. doi: 10.1515/156939203322694745. [DOI] [Google Scholar]
  • 11.Sakarovitch J. Deux remarques sur un théorème de S. Eilenberg. RAIRO - Theoret. Inform. Appl. 1983;17(1):23–48. doi: 10.1051/ita/1983170100231. [DOI] [Google Scholar]
  • 12.Shallit J. Numeration systems, linear recurrences, and regular sets. Inf. Comput. 1994;113(2):331–347. doi: 10.1006/inco.1994.1076. [DOI] [Google Scholar]
  • 13.Shallit J. A Second Course in Formal Languages and Automata Theory. Cambridge: Cambridge University Press; 2008. [Google Scholar]

Articles from Developments in Language Theory are provided here courtesy of Nature Publishing Group

RESOURCES