Abstract
Given an alphabet A and a binary relation
, a language
is
-independent if
; X is
-closed if
. The language X is complete if any word over A is a factor of some concatenation of words in X. Given a family of languages
containing X, X is maximal in
if no other set of
can strictly contain X. A language
is a variable-length code if any equation among the words of X is necessarily trivial. The study discusses the relationship between maximality and completeness in the case of
-independent or
-closed variable-length codes. We focus to the binary relations by which the images of words are computed by deleting, inserting, or substituting some characters.
Keywords: Closed, Code, Complete, Deletion, Detection, Dependent, Distribution, Edition, Embedding, Independent, Insertion, Levenshtein, Maximal, String, Substitution, Substring, Subword, Variable-length, Word
Introduction
In formal language theory, given a property
, the embedding problem with respect to
consists in examining whether a language X satisfying
can be included into some language
that is maximal with respect to
, in the sense that no language satisfying
can strictly contain
. In the literature, maximality is often connected to completeness: a language X over the alphabet A is complete if any string in the free monoid
(the set of the words over A) is a factor of some word of
(the submonoid of all concatenations of words in X). Such connection takes on special importance for codes: a language X over the alphabet A is a variable-length code (for short, a code) if every equation among the words (i.e. strings) of X is necessarily trivial.
A famous result due to M.P. Schützenberger states that, for the family of the so-called thin codes (which contains regular codes and therefore also finite ones), being maximal is equivalent to being complete. In connection with these two concepts lots of challenging theoretical questions have been stated. For instance, to this day the problem of the existence of a finite maximal code containing a given finite one is not known to be decidable. From this latter point of view, in [16] the author asked the question of the existence of a regular complete code containing a given finite one: a positive answer was brought in [4], where was provided a now classical formula for embedding a given regular code into some complete regular one. Famous families of codes have also been concerned by those studies: we mention prefix and bifix codes [2, Theorem 3.3.8, Proposition 6.2.1], codes with a finite deciphering delay [3], infix [10], solid [11], or circular [13].
Actually, with each of those families, a so-called dependence system can be associated. Formally, such a system is a family
of languages constituted by those sets X that contain a non-empty finite subset in
. Languages in
are
-dependent, the other ones being
-independent. A special case corresponds to binary words relations
, where a dependence systems is constituted by those sets X satisfying
: X is
-independent if we have
(with
). Prefix codes certainly constitute the best known example: they constitute those codes that are independent with respect to the relation obtained by removing each pair (x, x) from the famous prefix order. Bifix, infix or solid codes can be similarly characterized.
As regards to dependence, some extremal condition corresponds to the so-called closed sets: given a word relation
, a language X is closed under
(
-closed, for short) if we have
. Lots of topics are concerned by the notion. We mention the framework of prefix order where a one-to-one correspondence between independent and closed sets is provided in [2, Proposition 3.1.3] (cf. also [1, 18]). Congruences in the free monoid are also concerned [15], as well as their connections to DNA computing [7]. With respect to morphisms, involved topics are also provided by the famous L-systems [17] and, in the case of one-to-one (anti)-automorphisms, the so-called invariant sets [14].
As commented in [6], maximality and completeness concern the economy of a code. If X is a complete code then every word occurs as part of a message, hence no part of
is potentially useless. The present paper emphasizes the following questions: given a regular binary relation
, in the family of regular
-independent (-closed) codes, are maximality and completeness equivalent notions? Given a non-complete regular
-independent (-closed) code, is it embeddable into some complete one?
Independence has some peculiar importance in the framework of coding theory. Informally, given some concatenation of words in X, each codeword
is transmitted via a channel into a corresponding
. According to the combinatorial structure of X, and the type of channel, one has to make use of codes with prescribed error-detecting constraints: some minimum-distance restraint is generally applied. In this paper, where we consider variable length codewords, we address to the Levenshtein metric [12]: given two different words x, y, their distance is the minimal total number of elementary edit operations that can transform x into y, such operation consisting in a one character deletion, insertion, or substitution. Formally, it is the smallest integer p such that we have
, with
, where
,
,
are further defined below. From the point of view of error detection, X being
-independent guarantees that
implies
. In addition, a code satisfies the property of error correction if its elements are such that
unless
: according to [9, chap. 6], the existence of such codes is decidable. Denote by Subw(x) the set of the subsequences of x:
, the k-character deletion, associates with every word
, all the words
whose length is
. The at most p-character deletion is
;
, the k-character insertion, is the converse relation of
and we set
(at most p-character insertion);
, the k-character substitution, associates with every
, all
with length |x| such that
(the letter of position i in y), differs of
in exactly k positions
; we set
;We denote by
the antireflexive relation obtained by removing all pairs (x, x) from
(we have
).
For short, we will refer the preceding relations to edit relations. For reasons of consistency, in the whole paper we assume
and
. In what follows, we draw the main contributions of the study:
Firstly, we prove that, given a positive integer k, the two families of languages that are independent with respect to
or
are identical. In addition, for
, no set can be
-independent. We establish the following result:
Theorem A
Let A be a finite alphabet,
, and 
. Given a regular
-independent code
, X is complete if, and only if, it is maximal in the family of
-independent codes.
A code X is
-independent if the Levenshtein distance between two distinct words of X is always larger than k: from this point of view, Theorem A states some noticeable characterization of maximal k-error detecting codes in the framework of the Levenshtein metric.
Secondly, we explore the domain of closed codes. A noticeable fact is that for any k, there are only finitely many
-closed codes and they have finite cardinality. Furthermore, one can decide whether a given non-complete
-closed code can be embedded into some complete one. We also prove that no closed code can exist with respect to the relations
,
,
.
As regard to substitutions, beforehand, we focus to the structure of the set
. Actually, excepted for two special cases (that is,
[5, 19], or
with
[8, ex. 8, p.77]), to our best knowledge, in the literature no general description is provided. In any event we provide such a description; furthermore we establish the following result:
Theorem B
Let A be a finite alphabet and
. Given a complete
-closed code
, either every word in X has length not greater than k, or a unique integer
exists such that
. In addition for every
(
)-closed code X, some positive integer n exists such that
.
In other words, no
-closed code can simultaneously possess words in
and words in
. As a consequence, one can decide whether a given non-complete
-closed code
is embeddable into some complete one.
Preliminaries
We adopt the notation of the free monoid theory. Given a word w, we denote by |w| its length; for
,
denotes the number of occurrences of the letter a in w. The set of the words whose length is not greater (not smaller) than n is denoted by
(
). Given
and
, we say that x is a factor of w if words u, v exist such that
; a subword of w consists in any (perhaps empty) subsequence
of
. We denote by
(
) the set of the words that are factor (subword) of some word in X (we have
). A pair of words
is overlapping-free if no pair u, v exist such that
or
, with
and
; if
, we say that w itself is overlapping-free.
It is assumed that the reader has a fundamental understanding with the main concepts of the theory of variable-length codes: we suggest, if necessary, that he (she) report to [2]. A set X is a variable-length code (a code for short) if for any pair of sequences of words in X, say
,
, the equation
implies
, and
for each integer i (equivalently the submonoid
is free). The two following results are famous ones from the variable-length codes theory:
Theorem 1
Schützenberger [2, Theorem 2.5.16] Let
be a regular code. Then the following properties are equivalent:
-
(i)
X is complete;
-
(ii)
X is a maximal code;
-
(iii)
a positive Bernoulli distribution
exists such that
; -
(iv)
for every positive Bernoulli distribution
we have
.
Theorem 2
[4] Given a non-complete code X, let
be an overlapping-free word and
. Then
is a complete code.
With regard to word relations, the following statement comes from the definitions:
Lemma 3
Let
and
. Each of the following properties holds:
-
(i)
X is
-independent if, and only if, it is
-independent (
denotes the converse relation of
). -
(ii)
X is
(
)-independent if, and only if, it is
(
-independent. -
(iii)
X is
-closed if, and only if, it is
-closed.
Complete Independent Codes
We start by providing a few examples:
Example 4
For
,
, the prefix code
is not
-independent (we have
), whereas the following codes are
-independent:
the regular code:
. Note that since it contains
,
is not a code.the non-complete finite bifix code
: actually,
is the complete uniform code
.for every pair of different integers
, the prefix code
. We have
, which is not a code, although it is complete.
In view of establishing the main result of Sect. 3, we will construct some peculiar word:
Lemma 5
Let
,
,
. Given a non-complete code
some overlapping-free word
exists such that
does not intersect X and
.
Proof
Let X be a non-complete code, and let
. Trivially, we have
. Moreover, in a classical way a word
exists such that
is overlapping-free (e.g. [2, Proposition 1.3.6]). Since we assume
, each word in
is constructed by deleting (inserting, substituting) at most k letters from y, hence by construction it contains at least one occurrence of w as a factor. This implies
, thus
does not intersect X.
By contradiction, assume that a word
exists such that
. It follows from
and
that
is obtained by deleting (inserting, substituting) at most k letters from x: consequently at least one occurrence of w appears as a factor of
: this contradicts
, therefore we obtain
(cf. Fig. 1). 
Fig. 1.
Proof of Lemma 5:
implies
; for
and
, the action of the substitution
is represented in some extremal condition.
As a consequence, we obtain the following result:
Theorem 6
Let
and
. Given a regular
-independent code
, X is complete if, and only if, it is maximal as a
-independent codes.
Proof
According to Theorem 1, every complete
-independent code is a maximal code, hence it is maximal in the family of
-independent codes. For proving the converse, we make use of the contrapositive. Let X be a non-complete
-independent code, and let
satisfying the conditions of Lemma 5. With the notation of Theorem 2, necessarily
, which is a subset of
, is a code. According to Lemma 5, we have
. Since X is
-independent and
antireflexive, this implies
, thus X non-maximal as a
-independent code. 
We notice that for
no
-independent set can exist (indeed, we have
). However, the following result holds:
Corollary 7
Let
. Given a regular
-independent code
, X is complete if, and only if, it is maximal as a
-independent code.
Proof
As indicated above, if X is complete, it is maximal as a
-independent code. For the converse, once more we argue by contrapositive that is, with the notation of Lemma 5, we prove that
remains independent. By definition, for each
, we have
, with
. According to Lemma 5, since
is antireflexive, for each
we have
: this implies
, thus
is
-independent. 
With regard to the relation
, Corollary 7 expresses some interesting property in term of error detection. Indeed, as indicated in Sect. 1, every code is
-independent if the Levenshtein distance between its (distinct) elements is always larger than k. From this point of view, Corollary 7 states some characterization of the maximality in the family of such codes.
It should remain to develop some method in view of embedding a given non-complete
-code into a complete one. Since the construction from the proof Theorem 2 does not preserve independence, this question remains open.
Complete Closed Codes with Respect to Deletion or Insertion
We start with the relation
. A noticeable fact is that corresponding closed codes are necessarily finite, as attested by the following result:
Proposition 8
Given a
-closed code X, and
, we have
.
Proof
It follows from
and X being
-closed that
. By contradiction, assume
and let q, r be the unique pair of integers such that
, with
. Since we have
, an integer
exists such that
, thus words
exist such that
, with
and
. By construction, every word
with
belongs to
(indeed, we have
and
). This implies
, thus
: a contradiction with X being a code. 
Example 9
According to Proposition 8, no code can be
-closed. This can be also drawn from the fact that, for every set
we have
.Let
and
. According to Proposition 8, every word in any
-closed code has length not greater than 5. It is straightforward to verify that
is a
-closed code. In addition, a finite number of examinations lead to verify that X is maximal as a
-closed code. Taking for
the uniform distribution we have
: thus X is non-complete.
According to Example 9(2), no result similar to Theorem 6 can be stated in the framework of
-closed codes. We also notice that, in Proposition 8 the bound does not depend of the size of the alphabet, but only depends of k.
Corollary 10
Given a finite alphabet A and a positive integer k, one can decide whether a non-complete
-closed code
is included into some complete one. In addition there are a finite number of such complete codes, all of them being computable, if any.
Proof
According to Proposition 8 only a finite number of
-closed codes over A can exist, each of them being a subset of
. 
We close the section by considering the relations
,
and
:
Proposition 11
No code can be
-closed,
-closed, nor
-closed.
Proof
By contradiction assume that some
-closed code
exists. Let
,
and
such that
. It follows from
, that
. According to Lemma 3(iii), we have
, thus
. Since
, we have
: a contradiction with X being a code. Consequently no
-closed codes can exist. According to Example 9(1), given a code
, we have
: this implies
, thus X not
-closed. 
Complete Codes Closed Under Substitutions
Beforehand, given a word
, we need a thorough description of the set
. Actually, it is well known that, over a binary alphabet, all n-bit words can be computed by making use of some Gray sequence [5]. With our notation, we have
. Furthermore, for every finite alphabet A, the so-called |A|-arity Gray sequences allow to generate
[8, 19]: once more we have
. In addition, in the special case where
and
, it can be proved that we have
[8, Exercise 8, p. 28]. However, except in these special cases, to the best of our knowledge no general description of the structure of
appears in the literature. In any event, in the next paragraph we provide an exhaustive description of
. Strictly speaking, the proofs, that we have reported in Sect. 5.2, are not involved in
-closed codes: we suggest the reader that, in a first reading, after Sect. 5.1 he (she) directly jumps to Sect. 5.3.
Basic Results Concerning
Proposition 12
Assume
. For each
, we have
.
In the case where A is a binary alphabet, we set
: this allows a well-known algebraic interpretation of
. Indeed, denote by
the addition in the group
with identity 0, and fix a positive integer n; given
, define
as the unique word of
such that, for each
, the letter of position i in
is
. With this notation the sets
and
are in one-to-one correspondence. Classically, we have
if, and only if, some
exists such that
with
(thus
). From the fact that
, the following property holds:
![]() |
1 |
In addition
is equivalent to
. Let
. The following property follows from
and
:
![]() |
2 |
Finally, for
we denote by
its complementary letter that is,
; for
we set
.
Lemma 13
Let
,
. Given
the two following properties hold:
-
(i)
If k is even and
then
is an even integer; -
(ii)
If
is even then we have
, for every
.
Given a positive integer n, we denote
(
) the set of the words
such that
is even (odd).
Proposition 14
Assume
. Given
exactly one of the following conditions holds:
-
(i)
, k is even, and
; -
(ii)
, k is odd, and
; -
(iii)
and
.
Proofs of the Statements 12, 13 and 14
Actually, Proposition 12 is a consequence of the following property:
Lemma 15
Assume
. For every word
we have
.
Proof
Let
and
. We prove that
exists with
and
. By construction,
exists such that:
-
if, and only if,
.It follows from
that some
-element subset
exists. Since we have
, some letter
exists. Let
such that: -
and, for each
:
if, and only if,
.By construction we have
, moreover
implies
. According to (a) and (b), we obtain:
,
if
, and:
if
.
Since we have
, this implies
. 
Proof of Proposition 12. Let
: we prove that
. Let
and let
be a sequence of words such that
,
and, for each
:
if, and only if,
. Since we have
(
), by induction over j we obtain
thus, according to Lemma 15:
.

In view of proving Lemma 13 and Proposition 14, we need some new lemma:
Lemma 16
Assume
. For every
, we have
.
Proof
Set
. It follows from
that the result holds for
. Assume
and let
,
. By construction, there are distinct integers
such that the following holds:
-
if, and only if,
.Since some
-element set
exists,
exist with:
if, and only if,
, and:-
if, and only if,
.By construction, we have
and
, thus
. Moreover, the fact that we have
is attested by the following equations:
,
, and:for
:
if, and only if,
.

Proof of Lemma 13. Assume k even. According to Property (1) we have
with
. According to (2),
is even: hence (i) follows. Conversely, assume
even and let
. According to (2),
is also even, moreover according to (1) we obtain
: this implies
. According to Lemma 16, we have
: this establishes (ii).

Proof of Proposition 14. Let
and
. (iii) is trivial and (i) follows from Lemma 13(i): indeed, since k is even,
is the set of the words
such that
is even. Assume k odd, and let
; we will prove that
. If
is even, the result comes from Lemma 13(ii). Assume
odd and let
, thus
that is,
for some
. It follows from
that
is odd, whence
is even: according to Lemma 13(ii), this implies
. But since
is even, we have
: according to Lemma 16, this implies
(we have
). We obtain
: this completes the proof. 
The Consequences for
-Closed Codes
Given a
-closed code
, we say that the tuple (k, A, X) satisfies Condition (3) if each of the three following properties holds:
![]() |
We start by proving the following technical result:
Lemma 17
Assume
and k even. Given a pair of words
, if
then the set
cannot be a code.
Proof
Let
, and
(hence
). By contradiction, we assume that
is a code. We are in Condition (i) of Proposition 14 that is, we have
. On a first hand, since
is a right-complete prefix code [2, Theorem 3.3.8], it follows from
that a (perhaps empty) word s exists such that
. On another hand, it follows from
that, for each
, a unique pair of letters
, exists such that
,
with
that is,
exists with
. According to Lemma 13(i),
is even; according to Lemma 13(ii), this implies
. Since we have
, the set
cannot be a code. 
As a consequence of Lemma 17, we obtain the following result:
Lemma 18
Given a
-closed code
, if (k, A, X) satisfies Condition (3) then either we have
, or we have
for some
.
Proof
Firstly, consider two words
and by contradiction, assume
that is, without loss of generality
. Since X is
-closed, we have
, whence the set
, which a subset of X is a code: this contradicts the result of Lemma 17. Consequently, we have
, with
. Secondly, once more by contradiction assume that words
,
exist. As indicated above, since X is
-closed,
is a code: since we have
and
, once more this contradicts the result of Lemma 17. As a consequence, necessarily we have
, for some
. With such a condition, according to Proposition 14 for each pair of words
, we have
,
: this implies
. 
According to Lemma 18, with Condition (3) no
-closed code can simultaneously possess words in
and words in
.
Lemma 19
Given a
-closed code
, if (k, A, X) does not satisfy Condition (3) then either we have
, or we have
, with
.
Proof
If Condition (3) doesn’t hold then exactly one of the three following conditions holds:
;
and
;
with
and k odd.
With each of the two last conditions, let
. Since X is
-closed, according to the propositions 12 and 14(ii), we have
. Since
is a maximal code, it follows from Lemma 3(iii) that
.

As a consequence, every
-closed code is finite. In addition, we state:
Theorem 20
Given a complete
(
,
)-closed code X, exactly one of the following conditions holds:
-
(i)
X is a subset of
; -
(ii)
a unique integer
exists such that
.
In addition, every
(
)-closed code is equal to
, for some
.
Proof
Let X be a complete
-closed code. If Condition (3) does not hold, the result is expressed by Lemma 19. Assume that Condition (3) holds. According to Lemma 18, in any case some integer
exists such that
. Taking for
the uniform distribution, we have
and
thus, according to Theorem 1:
. Recall that we have
(e.g. [8]). Assume X
-closed, and let
,
: we have
thus
(indeed,
is a maximal code). Since
, if X is
-closed then it is
-closed, thus we have
. 
As a corollary, in the family of
(
)-closed codes, maximality and completeness are equivalent notions. With regard to
-closed codes, things are otherwise: indeed, as shown in [16], there are finite codes that have no finite completion. Let X be one of them, and
. By definition X is
-closed. Since every
-closed code is finite, no complete
-closed code can contain X.
Proposition 21
Let X be a (finite) non-complete
-closed code. Then one can decide whether some complete
-closed code containing X exists. More precisely, there is only a finite number of such codes, each of them being computable, if any.
Proof Sketch. We draw the scheme of an algorithm that allows to compute every complete
-closed code
containing X. In a first step, we compute
. If
, according to Theorem 20, we have
:
, if any, can be computed in a finite number of steps. Otherwise,
exists if, and only if, for some
we have
: this can be straightforwardly checked. 
Acknowledgment
We would like to thank the anonymous reviewers for their fruitful suggestions and comments.
Contributor Information
Alberto Leporati, Email: alberto.leporati@unimib.it.
Carlos Martín-Vide, Email: carlos.martin@urv.cat.
Dana Shapira, Email: shapird@g.ariel.ac.il.
Claudio Zandron, Email: zandron@disco.unimib.it.
Jean Néraud, Email: jean.neraud@univ-rouen.fr, Email: neraud.jean@gmail.com, http://neraud.jean.free.fr.
References
- 1.Berstel J, Felice CD, Perrin D, Reutenauer C, Rindonne G. Bifix codes and Sturmian words. J. Algebra. 2012;369:146–202. doi: 10.1016/j.jalgebra.2012.07.013. [DOI] [Google Scholar]
- 2.Berstel J, Perrin D, Reutenauer C. Codes and Automata. New York: Cambridge University Press; 2010. [Google Scholar]
- 3.Bruyère V, Wang L, Zhang L. On completion of codes with finite deciphering delay. Eur. J. Comb. 1990;11:513–521. doi: 10.1016/S0195-6698(13)80036-4. [DOI] [Google Scholar]
- 4.Ehrenfeucht A, Rozenberg S. Each regular code is included in a regular maximal one. RAIRO Theoret. Inf. Appl. 1986;20:89–96. doi: 10.1051/ita/1986200100891. [DOI] [Google Scholar]
- 5.Ehrlich G. Loopless algorithms for generating permutations, combinations, and other combinatorial configurations. J. ACM. 1973;20:500–513. doi: 10.1145/321765.321781. [DOI] [Google Scholar]
- 6.Jürgensen H, Konstantinidis S. Codes1. In: Rozenberg G, Salomaa A, editors. Handbook of Formal Languages; Heidelberg: Springer; 1997. pp. 511–607. [Google Scholar]
- 7.Kari, L., Păun, G., Thierrin, G., Yu, S.: At the crossroads of linguistic, DNA computing and formal languages: characterizing RE using insertion-deletion systems. In: Proceedings of Third DIMACS Workshop on DNA Based Computing, pp. 318–333 (1997)
- 8.Knuth D. The Art of Computer Programming, Volume 4, Fascicule 2 : Generating All Tuples and Permutations. Boston: Addison Wesley; 2005. [Google Scholar]
- 9.Konstantinidis, S.: Error correction and decodability. Ph.D. thesis, The University of Western Ontario, London, Canada (1996)
- 10.Lam N. Finite maximal infix codes. Semigroup Forum. 2000;61:346–356. doi: 10.1007/PL00006033. [DOI] [Google Scholar]
- 11.Lam N. Finite maximal solid codes. Theoret. Comput. Sci. 2001;262:333–347. doi: 10.1016/S0304-3975(00)00277-2. [DOI] [Google Scholar]
- 12.Levenshtein V. Binary codes capable of correcting deletions, insertion and reversals. Sov. Phys. Dokl. 1965;163:845–848. [Google Scholar]
- 13.Néraud J. Completing circular codes in regular submonoids. Theoret. Comp. Sci. 2008;391:90–98. doi: 10.1016/j.tcs.2007.10.033. [DOI] [Google Scholar]
-
14.Néraud J, Selmi C. Embedding a
-invariant code into a complete one. Theoret. Comput. Sci. 2020;806:28–41. doi: 10.1016/j.tcs.2018.08.022. [DOI] [Google Scholar] - 15.Nivat M, et al. Congruences parfaites et semi-parfaites. Séminaire Dubreil. Algèbre et théorie des nombres. 1971;25:1–9. [Google Scholar]
- 16.Restivo A. On codes having no finite completion. Discrete Math. 1977;17:309–316. doi: 10.1016/0012-365X(77)90164-9. [DOI] [Google Scholar]
- 17.Rozenberg G, Salomaa A. The Mathematical Theory of L-Systems. New York: Academic Press; 1980. [Google Scholar]
- 18.Rudi K, Wonham WM. The infimal prefix-closed and observable superlanguage of a given language. Syst. Control Lett. 1990;15:361–371. doi: 10.1016/0167-6911(90)90059-4. [DOI] [Google Scholar]
- 19.Savage C. A survey of combinatorial gray codes. SIAM Rev. 1997;39(4):605–629. doi: 10.1137/S0036144595295272. [DOI] [Google Scholar]




