Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2020 Jan 7;12038:437–448. doi: 10.1007/978-3-030-40608-0_31

Complete Variable-Length Codes: An Excursion into Word Edit Operations

Jean Néraud 5,
Editors: Alberto Leporati8, Carlos Martín-Vide9, Dana Shapira10, Claudio Zandron11
PMCID: PMC7206637

Abstract

Given an alphabet A and a binary relation Inline graphic, a language Inline graphic is Inline graphic-independent if Inline graphic; X is Inline graphic-closed if Inline graphic. The language X is complete if any word over A is a factor of some concatenation of words in X. Given a family of languages Inline graphic containing X, X is maximal in Inline graphic if no other set of Inline graphic can strictly contain X. A language Inline graphic is a variable-length code if any equation among the words of X is necessarily trivial. The study discusses the relationship between maximality and completeness in the case of Inline graphic-independent or Inline graphic-closed variable-length codes. We focus to the binary relations by which the images of words are computed by deleting, inserting, or substituting some characters.

Keywords: Closed, Code, Complete, Deletion, Detection, Dependent, Distribution, Edition, Embedding, Independent, Insertion, Levenshtein, Maximal, String, Substitution, Substring, Subword, Variable-length, Word

Introduction

In formal language theory, given a property Inline graphic, the embedding problem with respect to Inline graphic consists in examining whether a language X satisfying Inline graphic can be included into some language Inline graphic that is maximal with respect to Inline graphic, in the sense that no language satisfying Inline graphic can strictly contain Inline graphic. In the literature, maximality is often connected to completeness: a language X over the alphabet A is complete if any string in the free monoid Inline graphic (the set of the words over A) is a factor of some word of Inline graphic (the submonoid of all concatenations of words in X). Such connection takes on special importance for codes: a language X over the alphabet A is a variable-length code (for short, a code) if every equation among the words (i.e. strings) of X is necessarily trivial.

A famous result due to M.P. Schützenberger states that, for the family of the so-called thin codes (which contains regular codes and therefore also finite ones), being maximal is equivalent to being complete. In connection with these two concepts lots of challenging theoretical questions have been stated. For instance, to this day the problem of the existence of a finite maximal code containing a given finite one is not known to be decidable. From this latter point of view, in [16] the author asked the question of the existence of a regular complete code containing a given finite one: a positive answer was brought in [4], where was provided a now classical formula for embedding a given regular code into some complete regular one. Famous families of codes have also been concerned by those studies: we mention prefix and bifix codes [2, Theorem 3.3.8, Proposition 6.2.1], codes with a finite deciphering delay [3], infix [10], solid [11], or circular [13].

Actually, with each of those families, a so-called dependence system can be associated. Formally, such a system is a family Inline graphic of languages constituted by those sets X that contain a non-empty finite subset in Inline graphic. Languages in Inline graphic are Inline graphic-dependent, the other ones being Inline graphic-independent. A special case corresponds to binary words relations Inline graphic, where a dependence systems is constituted by those sets X satisfying Inline graphic: X is Inline graphic-independent if we have Inline graphic (with Inline graphic). Prefix codes certainly constitute the best known example: they constitute those codes that are independent with respect to the relation obtained by removing each pair (xx) from the famous prefix order. Bifix, infix or solid codes can be similarly characterized.

As regards to dependence, some extremal condition corresponds to the so-called closed sets: given a word relation Inline graphic, a language X is closed under Inline graphic (Inline graphic-closed, for short) if we have Inline graphic. Lots of topics are concerned by the notion. We mention the framework of prefix order where a one-to-one correspondence between independent and closed sets is provided in [2, Proposition 3.1.3] (cf. also [1, 18]). Congruences in the free monoid are also concerned [15], as well as their connections to DNA computing [7]. With respect to morphisms, involved topics are also provided by the famous L-systems [17] and, in the case of one-to-one (anti)-automorphisms, the so-called invariant sets [14].

As commented in [6], maximality and completeness concern the economy of a code. If X is a complete code then every word occurs as part of a message, hence no part of Inline graphic is potentially useless. The present paper emphasizes the following questions: given a regular binary relation Inline graphic, in the family of regular Inline graphic-independent (-closed) codes, are maximality and completeness equivalent notions? Given a non-complete regular Inline graphic-independent (-closed) code, is it embeddable into some complete one?

Independence has some peculiar importance in the framework of coding theory. Informally, given some concatenation of words in X, each codeword Inline graphic is transmitted via a channel into a corresponding Inline graphic. According to the combinatorial structure of X, and the type of channel, one has to make use of codes with prescribed error-detecting constraints: some minimum-distance restraint is generally applied. In this paper, where we consider variable length codewords, we address to the Levenshtein metric [12]: given two different words xy, their distance is the minimal total number of elementary edit operations that can transform x into y, such operation consisting in a one character deletion, insertion, or substitution. Formally, it is the smallest integer p such that we have Inline graphic, with Inline graphic, where Inline graphic, Inline graphic, Inline graphic are further defined below. From the point of view of error detection, X being Inline graphic-independent guarantees that Inline graphic implies Inline graphic. In addition, a code satisfies the property of error correction if its elements are such that Inline graphic unless Inline graphic: according to [9, chap. 6], the existence of such codes is decidable. Denote by Subw(x) the set of the subsequences of x:

  • Inline graphic, the k-character deletion, associates with every word Inline graphic, all the words Inline graphic whose length is Inline graphic. The at most p-character deletion is Inline graphic;

  • Inline graphic, the k-character insertion, is the converse relation of Inline graphic and we set Inline graphic (at most p-character insertion);

  • Inline graphic, the k-character substitution, associates with every Inline graphic, all Inline graphic with length |x| such that Inline graphic (the letter of position i in y), differs of Inline graphic in exactly k positions Inline graphic; we set Inline graphic;

  • We denote by Inline graphic the antireflexive relation obtained by removing all pairs (xx) from Inline graphic (we have Inline graphic).

For short, we will refer the preceding relations to edit relations. For reasons of consistency, in the whole paper we assume Inline graphic and Inline graphic. In what follows, we draw the main contributions of the study:

Firstly, we prove that, given a positive integer k, the two families of languages that are independent with respect to Inline graphic or Inline graphic are identical. In addition, for Inline graphic, no set can be Inline graphic-independent. We establish the following result:

Theorem A

Let A be a finite alphabet, Inline graphic, and Inline graphicInline graphic. Given a regular Inline graphic-independent code Inline graphic, X is complete if, and only if, it is maximal in the family of Inline graphic-independent codes.

A code X is Inline graphic-independent if the Levenshtein distance between two distinct words of X is always larger than k: from this point of view, Theorem A states some noticeable characterization of maximal k-error detecting codes in the framework of the Levenshtein metric.

Secondly, we explore the domain of closed codes. A noticeable fact is that for any k, there are only finitely many Inline graphic-closed codes and they have finite cardinality. Furthermore, one can decide whether a given non-complete Inline graphic-closed code can be embedded into some complete one. We also prove that no closed code can exist with respect to the relations Inline graphic, Inline graphic, Inline graphic.

As regard to substitutions, beforehand, we focus to the structure of the set Inline graphic. Actually, excepted for two special cases (that is, Inline graphic [5, 19], or Inline graphic with Inline graphic [8, ex. 8, p.77]), to our best knowledge, in the literature no general description is provided. In any event we provide such a description; furthermore we establish the following result:

Theorem B

Let A be a finite alphabet and Inline graphic. Given a complete Inline graphic-closed code Inline graphic, either every word in X has length not greater than k, or a unique integer Inline graphic exists such that Inline graphic. In addition for every Inline graphic(Inline graphic)-closed code X, some positive integer n exists such that Inline graphic.

In other words, no Inline graphic-closed code can simultaneously possess words in Inline graphic and words in Inline graphic. As a consequence, one can decide whether a given non-complete Inline graphic-closed code Inline graphic is embeddable into some complete one.

Preliminaries

We adopt the notation of the free monoid theory. Given a word w, we denote by |w| its length; for Inline graphic, Inline graphic denotes the number of occurrences of the letter a in w. The set of the words whose length is not greater (not smaller) than n is denoted by Inline graphic (Inline graphic). Given Inline graphic and Inline graphic, we say that x is a factor of w if words uv exist such that Inline graphic; a subword of w consists in any (perhaps empty) subsequence Inline graphic of Inline graphic. We denote by Inline graphic (Inline graphic) the set of the words that are factor (subword) of some word in X (we have Inline graphic). A pair of words Inline graphic is overlapping-free if no pair uv exist such that Inline graphic or Inline graphic, with Inline graphic and Inline graphic; if Inline graphic, we say that w itself is overlapping-free.

It is assumed that the reader has a fundamental understanding with the main concepts of the theory of variable-length codes: we suggest, if necessary, that he (she) report to [2]. A set X is a variable-length code (a code for short) if for any pair of sequences of words in X, say Inline graphic, Inline graphic, the equation Inline graphic implies Inline graphic, and Inline graphic for each integer i (equivalently the submonoid Inline graphic is free). The two following results are famous ones from the variable-length codes theory:

Theorem 1

Schützenberger [2, Theorem 2.5.16] Let Inline graphic be a regular code. Then the following properties are equivalent:

  • (i)

    X is complete;

  • (ii)

    X is a maximal code;

  • (iii)

    a positive Bernoulli distribution Inline graphic exists such that Inline graphic;

  • (iv)

    for every positive Bernoulli distribution Inline graphic we have Inline graphic.

Theorem 2

[4] Given a non-complete code X, let Inline graphic be an overlapping-free word and Inline graphic. Then Inline graphic is a complete code.

With regard to word relations, the following statement comes from the definitions:

Lemma 3

Let Inline graphic and Inline graphic. Each of the following properties holds:

  • (i)

    X is Inline graphic-independent if, and only if, it is Inline graphic-independent (Inline graphic denotes the converse relation of Inline graphic).

  • (ii)

    X is Inline graphic(Inline graphic)-independent if, and only if, it is Inline graphic(Inline graphic-independent.

  • (iii)

    X is Inline graphic-closed if, and only if, it is Inline graphic-closed.

Complete Independent Codes

We start by providing a few examples:

Example 4

For Inline graphic, Inline graphic, the prefix code Inline graphic is not Inline graphic-independent (we have Inline graphic), whereas the following codes are Inline graphic-independent:

  • the regular code: Inline graphic. Note that since it contains Inline graphic, Inline graphic is not a code.

  • the non-complete finite bifix code Inline graphic: actually, Inline graphic is the complete uniform code Inline graphic.

  • for every pair of different integers Inline graphic, the prefix code Inline graphic. We have Inline graphic, which is not a code, although it is complete.

In view of establishing the main result of Sect. 3, we will construct some peculiar word:

Lemma 5

Let Inline graphic, Inline graphic, Inline graphic. Given a non-complete code Inline graphic some overlapping-free word Inline graphic exists such that Inline graphic does not intersect X and Inline graphic.

Proof

Let X be a non-complete code, and let Inline graphic. Trivially, we have Inline graphic. Moreover, in a classical way a word Inline graphic exists such that Inline graphic is overlapping-free (e.g. [2, Proposition 1.3.6]). Since we assume Inline graphic, each word in Inline graphic is constructed by deleting (inserting, substituting) at most k letters from y, hence by construction it contains at least one occurrence of w as a factor. This implies Inline graphic, thus Inline graphic does not intersect X.

By contradiction, assume that a word Inline graphic exists such that Inline graphic. It follows from Inline graphic and Inline graphic that Inline graphic is obtained by deleting (inserting, substituting) at most k letters from x: consequently at least one occurrence of w appears as a factor of Inline graphic: this contradicts Inline graphic, therefore we obtain Inline graphic (cf. Fig. 1).    Inline graphic

Fig. 1.

Fig. 1.

Proof of Lemma 5: Inline graphic implies Inline graphic; for Inline graphic and Inline graphic, the action of the substitution Inline graphic is represented in some extremal condition.

As a consequence, we obtain the following result:

Theorem 6

Let Inline graphic and Inline graphic. Given a regular Inline graphic-independent code Inline graphic, X is complete if, and only if, it is maximal as a Inline graphic-independent codes.

Proof

According to Theorem 1, every complete Inline graphic-independent code is a maximal code, hence it is maximal in the family of Inline graphic-independent codes. For proving the converse, we make use of the contrapositive. Let X be a non-complete Inline graphic-independent code, and let Inline graphic satisfying the conditions of Lemma 5. With the notation of Theorem 2, necessarily Inline graphic, which is a subset of Inline graphic, is a code. According to Lemma 5, we have Inline graphic. Since X is Inline graphic-independent and Inline graphic antireflexive, this implies Inline graphic, thus X non-maximal as a Inline graphic-independent code.    Inline graphic

We notice that for Inline graphic no Inline graphic-independent set can exist (indeed, we have Inline graphic). However, the following result holds:

Corollary 7

Let Inline graphic. Given a regular Inline graphic-independent code Inline graphic, X is complete if, and only if, it is maximal as a Inline graphic-independent code.

Proof

As indicated above, if X is complete, it is maximal as a Inline graphic-independent code. For the converse, once more we argue by contrapositive that is, with the notation of Lemma 5, we prove that Inline graphic remains independent. By definition, for each Inline graphic, we have Inline graphic, with Inline graphic. According to Lemma 5, since Inline graphic is antireflexive, for each Inline graphic we have Inline graphic: this implies Inline graphic, thus Inline graphic is Inline graphic-independent.    Inline graphic

With regard to the relation Inline graphic, Corollary 7 expresses some interesting property in term of error detection. Indeed, as indicated in Sect. 1, every code is Inline graphic-independent if the Levenshtein distance between its (distinct) elements is always larger than k. From this point of view, Corollary 7 states some characterization of the maximality in the family of such codes.

It should remain to develop some method in view of embedding a given non-complete Inline graphic-code into a complete one. Since the construction from the proof Theorem 2 does not preserve independence, this question remains open.

Complete Closed Codes with Respect to Deletion or Insertion

We start with the relation Inline graphic. A noticeable fact is that corresponding closed codes are necessarily finite, as attested by the following result:

Proposition 8

Given a Inline graphic-closed code X, and Inline graphic, we have Inline graphic.

Proof

It follows from Inline graphic and X being Inline graphic-closed that Inline graphic. By contradiction, assume Inline graphic and let qr be the unique pair of integers such that Inline graphic, with Inline graphic. Since we have Inline graphic, an integer Inline graphic exists such that Inline graphic, thus words Inline graphic exist such that Inline graphic, with Inline graphic and Inline graphic. By construction, every word Inline graphic with Inline graphic belongs to Inline graphic (indeed, we have Inline graphic and Inline graphic). This implies Inline graphic, thus Inline graphic: a contradiction with X being a code.    Inline graphic

Example 9

  1. According to Proposition 8, no code can be Inline graphic-closed. This can be also drawn from the fact that, for every set Inline graphic we have Inline graphic.

  2. Let Inline graphic and Inline graphic. According to Proposition 8, every word in any Inline graphic-closed code has length not greater than 5. It is straightforward to verify that Inline graphic is a Inline graphic-closed code. In addition, a finite number of examinations lead to verify that X is maximal as a Inline graphic-closed code. Taking for Inline graphic the uniform distribution we have Inline graphic: thus X is non-complete.

According to Example 9(2), no result similar to Theorem 6 can be stated in the framework of Inline graphic-closed codes. We also notice that, in Proposition 8 the bound does not depend of the size of the alphabet, but only depends of k.

Corollary 10

Given a finite alphabet A and a positive integer k, one can decide whether a non-complete Inline graphic-closed code Inline graphic is included into some complete one. In addition there are a finite number of such complete codes, all of them being computable, if any.

Proof

According to Proposition 8 only a finite number of Inline graphic-closed codes over A can exist, each of them being a subset of Inline graphic.    Inline graphic

We close the section by considering the relations Inline graphic, Inline graphic and Inline graphic:

Proposition 11

No code can be Inline graphic-closed, Inline graphic-closed, nor Inline graphic-closed.

Proof

By contradiction assume that some Inline graphic-closed code Inline graphic exists. Let Inline graphic, Inline graphic and Inline graphic such that Inline graphic. It follows from Inline graphic, that Inline graphic. According to Lemma 3(iii), we have Inline graphic, thus Inline graphic. Since Inline graphic, we have Inline graphic: a contradiction with X being a code. Consequently no Inline graphic-closed codes can exist. According to Example 9(1), given a code Inline graphic, we have Inline graphic: this implies Inline graphic, thus X not Inline graphic-closed.    Inline graphic

Complete Codes Closed Under Substitutions

Beforehand, given a word Inline graphic, we need a thorough description of the set Inline graphic. Actually, it is well known that, over a binary alphabet, all n-bit words can be computed by making use of some Gray sequence [5]. With our notation, we have Inline graphic. Furthermore, for every finite alphabet A, the so-called |A|-arity Gray sequences allow to generate Inline graphic [8, 19]: once more we have Inline graphic. In addition, in the special case where Inline graphic and Inline graphic, it can be proved that we have Inline graphic [8, Exercise 8, p. 28]. However, except in these special cases, to the best of our knowledge no general description of the structure of Inline graphic appears in the literature. In any event, in the next paragraph we provide an exhaustive description of Inline graphic. Strictly speaking, the proofs, that we have reported in Sect. 5.2, are not involved in Inline graphic-closed codes: we suggest the reader that, in a first reading, after Sect. 5.1 he (she) directly jumps to Sect. 5.3.

Basic Results Concerning Inline graphic

Proposition 12

Assume Inline graphic. For each Inline graphic, we have Inline graphic.

In the case where A is a binary alphabet, we set Inline graphic: this allows a well-known algebraic interpretation of Inline graphic. Indeed, denote by Inline graphic the addition in the group Inline graphic with identity 0, and fix a positive integer n; given Inline graphic, define Inline graphic as the unique word of Inline graphic such that, for each Inline graphic, the letter of position i in Inline graphic is Inline graphic. With this notation the sets Inline graphic and Inline graphic are in one-to-one correspondence. Classically, we have Inline graphic if, and only if, some Inline graphic exists such that Inline graphic with Inline graphic (thus Inline graphic). From the fact that Inline graphic, the following property holds:

graphic file with name M331.gif 1

In addition Inline graphic is equivalent to Inline graphic. Let Inline graphic. The following property follows from Inline graphic and Inline graphic:

graphic file with name M337.gif 2

Finally, for Inline graphic we denote by Inline graphic its complementary letter that is, Inline graphic; for Inline graphic we set Inline graphic.

Lemma 13

Let Inline graphic, Inline graphic. Given Inline graphic the two following properties hold:

  • (i)

    If k is even and Inline graphic then Inline graphic is an even integer;

  • (ii)

    If Inline graphic is even then we have Inline graphic, for every Inline graphic.

Given a positive integer n, we denote Inline graphic (Inline graphic) the set of the words Inline graphic such that Inline graphic is even (odd).

Proposition 14

Assume Inline graphic. Given Inline graphic exactly one of the following conditions holds:

  • (i)

    Inline graphic, k is even, and Inline graphic;

  • (ii)

    Inline graphic, k is odd, and Inline graphic;

  • (iii)

    Inline graphic and Inline graphic.

Proofs of the Statements 12, 13 and 14

Actually, Proposition 12 is a consequence of the following property:

Lemma 15

Assume Inline graphic. For every word Inline graphic we have Inline graphic.

Proof

Let Inline graphic and Inline graphic. We prove that Inline graphic exists with Inline graphic and Inline graphic. By construction, Inline graphic exists such that:

  1. Inline graphic if, and only if, Inline graphic.

    It follows from Inline graphic that some Inline graphic-element subset Inline graphic exists. Since we have Inline graphic, some letter Inline graphic exists. Let Inline graphic such that:

  2. Inline graphic and, for each Inline graphic: Inline graphic if, and only if, Inline graphic.

    By construction we have Inline graphic, moreover Inline graphic implies Inline graphic. According to (a) and (b), we obtain:

  3. Inline graphic,

  4. Inline graphic if Inline graphic, and:

  5. Inline graphic if Inline graphic.

Since we have Inline graphic, this implies Inline graphic.    Inline graphic

Proof of Proposition 12. Let Inline graphic: we prove that Inline graphic. Let Inline graphic and let Inline graphic be a sequence of words such that Inline graphic, Inline graphic and, for each Inline graphic: Inline graphic if, and only if, Inline graphic. Since we have Inline graphic (Inline graphic), by induction over j we obtain Inline graphic thus, according to Lemma 15: Inline graphic.

   Inline graphic

In view of proving Lemma 13 and Proposition 14, we need some new lemma:

Lemma 16

Assume Inline graphic. For every Inline graphic, we have Inline graphic.

Proof

Set Inline graphic. It follows from Inline graphic that the result holds for Inline graphic. Assume Inline graphic and let Inline graphic, Inline graphic. By construction, there are distinct integers Inline graphic such that the following holds:

  1. Inline graphic if, and only if, Inline graphic.

    Since some Inline graphic-element set Inline graphic exists, Inline graphic exist with:

  2. Inline graphic if, and only if, Inline graphic, and:

  3. Inline graphic if, and only if, Inline graphic.

    By construction, we have Inline graphic and Inline graphic, thus Inline graphic. Moreover, the fact that we have Inline graphic is attested by the following equations:

  4. Inline graphic,

  5. Inline graphic, and:

  6. for Inline graphic: Inline graphic if, and only if, Inline graphic.

   Inline graphic

Proof of Lemma 13. Assume k even. According to Property (1) we have Inline graphic with Inline graphic. According to (2), Inline graphic is even: hence (i) follows. Conversely, assume Inline graphic even and let Inline graphic. According to (2), Inline graphic is also even, moreover according to (1) we obtain Inline graphic: this implies Inline graphic. According to Lemma 16, we have Inline graphic: this establishes (ii).

   Inline graphic

Proof of Proposition 14. Let Inline graphic and Inline graphic. (iii) is trivial and (i) follows from Lemma 13(i): indeed, since k is even, Inline graphic is the set of the words Inline graphic such that Inline graphic is even. Assume k odd, and let Inline graphic; we will prove that Inline graphic. If Inline graphic is even, the result comes from Lemma 13(ii). Assume Inline graphic odd and let Inline graphic, thus Inline graphic that is, Inline graphic for some Inline graphic. It follows from Inline graphic that Inline graphic is odd, whence Inline graphic is even: according to Lemma 13(ii), this implies Inline graphic. But since Inline graphic is even, we have Inline graphic: according to Lemma 16, this implies Inline graphic (we have Inline graphic). We obtain Inline graphic: this completes the proof.    Inline graphic

The Consequences for Inline graphic-Closed Codes

Given a Inline graphic-closed code Inline graphic, we say that the tuple (kAX) satisfies Condition (3) if each of the three following properties holds:

graphic file with name M474.gif

We start by proving the following technical result:

Lemma 17

Assume Inline graphic and k even. Given a pair of words Inline graphic, if Inline graphic then the set Inline graphic cannot be a code.

Proof

Let Inline graphic, and Inline graphic (hence Inline graphic). By contradiction, we assume that Inline graphic is a code. We are in Condition (i) of Proposition 14 that is, we have Inline graphic. On a first hand, since Inline graphic is a right-complete prefix code [2, Theorem 3.3.8], it follows from Inline graphic that a (perhaps empty) word s exists such that Inline graphic. On another hand, it follows from Inline graphic that, for each Inline graphic, a unique pair of letters Inline graphic, exists such that Inline graphic, Inline graphic with Inline graphic that is, Inline graphic exists with Inline graphic. According to Lemma 13(i), Inline graphic is even; according to Lemma 13(ii), this implies Inline graphic. Since we have Inline graphic, the set Inline graphic cannot be a code.    Inline graphic

As a consequence of Lemma 17, we obtain the following result:

Lemma 18

Given a Inline graphic-closed code Inline graphic, if (kAX) satisfies Condition (3) then either we have Inline graphic, or we have Inline graphic for some Inline graphic.

Proof

Firstly, consider two words Inline graphic and by contradiction, assume Inline graphic that is, without loss of generality Inline graphic. Since X is Inline graphic-closed, we have Inline graphic, whence the set Inline graphic, which a subset of X is a code: this contradicts the result of Lemma 17. Consequently, we have Inline graphic, with Inline graphic. Secondly, once more by contradiction assume that words Inline graphic, Inline graphic exist. As indicated above, since X is Inline graphic-closed, Inline graphic is a code: since we have Inline graphic and Inline graphic, once more this contradicts the result of Lemma 17. As a consequence, necessarily we have Inline graphic, for some Inline graphic. With such a condition, according to Proposition 14 for each pair of words Inline graphic, we have Inline graphic, Inline graphic: this implies Inline graphic.    Inline graphic

According to Lemma 18, with Condition (3) no Inline graphic-closed code can simultaneously possess words in Inline graphic and words in Inline graphic.

Lemma 19

Given a Inline graphic-closed code Inline graphic, if (kAX) does not satisfy Condition (3) then either we have Inline graphic, or we have Inline graphic, with Inline graphic.

Proof

If Condition (3) doesn’t hold then exactly one of the three following conditions holds:

  1. Inline graphic;

  2. Inline graphic and Inline graphic;

  3. Inline graphic with Inline graphic and k odd.

With each of the two last conditions, let Inline graphic. Since X is Inline graphic-closed, according to the propositions 12 and 14(ii), we have Inline graphic. Since Inline graphic is a maximal code, it follows from Lemma 3(iii) that Inline graphic.

   Inline graphic

As a consequence, every Inline graphic-closed code is finite. In addition, we state:

Theorem 20

Given a complete Inline graphic (Inline graphic, Inline graphic)-closed code X, exactly one of the following conditions holds:

  • (i)

    X is a subset of Inline graphic;

  • (ii)

    a unique integer Inline graphic exists such that Inline graphic.

In addition, every Inline graphic(Inline graphic)-closed code is equal to Inline graphic, for some Inline graphic.

Proof

Let X be a complete Inline graphic-closed code. If Condition (3) does not hold, the result is expressed by Lemma 19. Assume that Condition (3) holds. According to Lemma 18, in any case some integer Inline graphic exists such that Inline graphic. Taking for Inline graphic the uniform distribution, we have Inline graphic and Inline graphic thus, according to Theorem 1: Inline graphic. Recall that we have Inline graphic (e.g. [8]). Assume X Inline graphic-closed, and let Inline graphic, Inline graphic: we have Inline graphic thus Inline graphic (indeed, Inline graphic is a maximal code). Since Inline graphic, if X is Inline graphic-closed then it is Inline graphic-closed, thus we have Inline graphic.    Inline graphic

As a corollary, in the family of Inline graphic(Inline graphic)-closed codes, maximality and completeness are equivalent notions. With regard to Inline graphic-closed codes, things are otherwise: indeed, as shown in [16], there are finite codes that have no finite completion. Let X be one of them, and Inline graphic. By definition X is Inline graphic-closed. Since every Inline graphic-closed code is finite, no complete Inline graphic-closed code can contain X.

Proposition 21

Let X be a (finite) non-complete Inline graphic-closed code. Then one can decide whether some complete Inline graphic-closed code containing X exists. More precisely, there is only a finite number of such codes, each of them being computable, if any.

Proof Sketch. We draw the scheme of an algorithm that allows to compute every complete Inline graphic-closed code Inline graphic containing X. In a first step, we compute Inline graphic. If Inline graphic, according to Theorem 20, we have Inline graphic: Inline graphic, if any, can be computed in a finite number of steps. Otherwise, Inline graphic exists if, and only if, for some Inline graphic we have Inline graphic: this can be straightforwardly checked.    Inline graphic

Acknowledgment

We would like to thank the anonymous reviewers for their fruitful suggestions and comments.

Contributor Information

Alberto Leporati, Email: alberto.leporati@unimib.it.

Carlos Martín-Vide, Email: carlos.martin@urv.cat.

Dana Shapira, Email: shapird@g.ariel.ac.il.

Claudio Zandron, Email: zandron@disco.unimib.it.

Jean Néraud, Email: jean.neraud@univ-rouen.fr, Email: neraud.jean@gmail.com, http://neraud.jean.free.fr.

References

  • 1.Berstel J, Felice CD, Perrin D, Reutenauer C, Rindonne G. Bifix codes and Sturmian words. J. Algebra. 2012;369:146–202. doi: 10.1016/j.jalgebra.2012.07.013. [DOI] [Google Scholar]
  • 2.Berstel J, Perrin D, Reutenauer C. Codes and Automata. New York: Cambridge University Press; 2010. [Google Scholar]
  • 3.Bruyère V, Wang L, Zhang L. On completion of codes with finite deciphering delay. Eur. J. Comb. 1990;11:513–521. doi: 10.1016/S0195-6698(13)80036-4. [DOI] [Google Scholar]
  • 4.Ehrenfeucht A, Rozenberg S. Each regular code is included in a regular maximal one. RAIRO Theoret. Inf. Appl. 1986;20:89–96. doi: 10.1051/ita/1986200100891. [DOI] [Google Scholar]
  • 5.Ehrlich G. Loopless algorithms for generating permutations, combinations, and other combinatorial configurations. J. ACM. 1973;20:500–513. doi: 10.1145/321765.321781. [DOI] [Google Scholar]
  • 6.Jürgensen H, Konstantinidis S. Codes1. In: Rozenberg G, Salomaa A, editors. Handbook of Formal Languages; Heidelberg: Springer; 1997. pp. 511–607. [Google Scholar]
  • 7.Kari, L., Păun, G., Thierrin, G., Yu, S.: At the crossroads of linguistic, DNA computing and formal languages: characterizing RE using insertion-deletion systems. In: Proceedings of Third DIMACS Workshop on DNA Based Computing, pp. 318–333 (1997)
  • 8.Knuth D. The Art of Computer Programming, Volume 4, Fascicule 2 : Generating All Tuples and Permutations. Boston: Addison Wesley; 2005. [Google Scholar]
  • 9.Konstantinidis, S.: Error correction and decodability. Ph.D. thesis, The University of Western Ontario, London, Canada (1996)
  • 10.Lam N. Finite maximal infix codes. Semigroup Forum. 2000;61:346–356. doi: 10.1007/PL00006033. [DOI] [Google Scholar]
  • 11.Lam N. Finite maximal solid codes. Theoret. Comput. Sci. 2001;262:333–347. doi: 10.1016/S0304-3975(00)00277-2. [DOI] [Google Scholar]
  • 12.Levenshtein V. Binary codes capable of correcting deletions, insertion and reversals. Sov. Phys. Dokl. 1965;163:845–848. [Google Scholar]
  • 13.Néraud J. Completing circular codes in regular submonoids. Theoret. Comp. Sci. 2008;391:90–98. doi: 10.1016/j.tcs.2007.10.033. [DOI] [Google Scholar]
  • 14.Néraud J, Selmi C. Embedding a Inline graphic-invariant code into a complete one. Theoret. Comput. Sci. 2020;806:28–41. doi: 10.1016/j.tcs.2018.08.022. [DOI] [Google Scholar]
  • 15.Nivat M, et al. Congruences parfaites et semi-parfaites. Séminaire Dubreil. Algèbre et théorie des nombres. 1971;25:1–9. [Google Scholar]
  • 16.Restivo A. On codes having no finite completion. Discrete Math. 1977;17:309–316. doi: 10.1016/0012-365X(77)90164-9. [DOI] [Google Scholar]
  • 17.Rozenberg G, Salomaa A. The Mathematical Theory of L-Systems. New York: Academic Press; 1980. [Google Scholar]
  • 18.Rudi K, Wonham WM. The infimal prefix-closed and observable superlanguage of a given language. Syst. Control Lett. 1990;15:361–371. doi: 10.1016/0167-6911(90)90059-4. [DOI] [Google Scholar]
  • 19.Savage C. A survey of combinatorial gray codes. SIAM Rev. 1997;39(4):605–629. doi: 10.1137/S0036144595295272. [DOI] [Google Scholar]

Articles from Language and Automata Theory and Applications are provided here courtesy of Nature Publishing Group

RESOURCES