Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2020 Jan 7;12038:397–411. doi: 10.1007/978-3-030-40608-0_28

Reducing the Ambiguity of Parikh Matrices

Jeffery Dick 5, Laura K Hutchinson 5,, Robert Mercaş 5, Daniel Reidenbach 5
Editors: Alberto Leporati8, Carlos Martín-Vide9, Dana Shapira10, Claudio Zandron11
PMCID: PMC7206623

Abstract

The Parikh matrix mapping allows us to describe words using matrices. Although compact, this description comes with a level of ambiguity since a single matrix may describe multiple words. This work looks at how considering the Parikh matrices of various transformations of a given word can decrease that ambiguity. More specifically, for any word, we study the Parikh matrix of its Lyndon conjugate as well as that of its projection to a smaller alphabet. Our results demonstrate that ambiguity can often be reduced using these concepts, and we give conditions on when they succeed.

Keywords: Combinatorics, Parikh matrix, Ambiguity, Lyndon conjugate

Introduction

An approach for a more compact representation of data can be provided by histograms, which are also a well established statistical tool used in a wide range of applications. The concept of a Parikh vector [15] represents a type of such histograms that is specific to the analysis of sequences of symbols (or: words), considering the number of occurrences of each letter that exists in a word.

Parikh vectors can be easily computed and are guaranteed to be logarithmic in the size of the word they represent, but they are ambiguous; that is, multiple words typically share the same Parikh vector. Following this, in [14] the authors look at a refinement of the vector notion which is meant to reduce this ambiguity, and introduce an extension for it in the form of a Parikh matrix. A Parikh matrix not only contains the Parikh vector of the word, but also information regarding some of the word’s (scattered) subwords. Such a matrix has the same asymptotic compactness as a Parikh vector and is associated to a significantly smaller number of words. However, it does not normally remove ambiguity entirely.

The bulk of the work done on the Parikh matrix mapping concerns the ambiguity that Parikh matrices exhibit. A lot of effort is spent on identifying an alternative to the Parikh matrix concept that would make a mapping from a word injective, or less ambiguous in general [1, 2, 811, 18]. These include even more refined versions of the matrices by inclusion of polynomials, various extensions on the mappings, or both. For Parikh matrices explicitly, due to the difficulty arising from this ambiguity, the primary focus was on investigating this property on binary [47, 17] and ternary [3, 13, 16, 19] alphabets, leaving alphabets of size greater than three relatively unexplored.

In terms of reducing the ambiguity of a word, the investigation was based on either gathering more information about the specific word by altering the order of the alphabet, known as the dual order [6, 14], or by considering the reverse image of the word [6]. Hence an under-studied aspect that may reduce the ambiguity of a matrix concerns the information acquired by altering the word itself, or considering other alterations of the alphabet. In this work we present and investigate two different methods that reduce the ambiguity of the original Parikh matrices in the form of Inline graphic-Parikh matrices and Inline graphic-Parikh matrices.

The first of the two transformations, the Inline graphic-Parikh matrix mapping, considers the Parikh matrices associated to a projection morphism of the initial word, where the considered alphabet is reduced to the subset of the alphabet used within the defined transformation. These represent a particular case of the extended mapping presented in [18], where we only consider a subset of the original alphabet. For example, consider the words abcaabaac and abacabcaa. It is easy to see that both share the same number of letters, and subwords ab, bc and abc, respectively, making their Parikh matrices equal and therefore ambiguous. The Inline graphic-Parikh matrices associated to them with respect to Inline graphic consider the number of subwords ac, which is 6 in the former, but only 5 in the latter of the words. Hence, there exist Inline graphic-Parikh matrices not shared by the words.

We show that, using Inline graphic-Parikh matrices, we can reduce the ambiguity of the vast majority of words. We also explore when Inline graphic-Parikh matrices do not reduce ambiguity, as well as provide some insight into the types of words that cannot be uniquely described by a Inline graphic-Parikh matrix.

However, since Inline graphic-Parikh matrices are defined for a subset of the initial alphabet, they prove useless when dealing with binary sequences. We therefore consider an alternative transformation of words: the Lyndon conjugate, first introduced in [20], which is defined as the lexicographically smallest circular rotation of a word. Lyndon conjugates were used previously as a tool for ambiguity reduction. In [17], the authors define the Lyndon image of a Parikh matrix as the lexicographically smallest word describing such a matrix. Hence every Parikh matrix has exactly one distinct Lyndon image, which therefore allows each Parikh matrix to be described uniquely. In the context of this paper, we use the Lyndon conjugate differently, i. e., we consider the Parikh matrix of the Lyndon conjugate of a word, and we call the resulting matrix the Inline graphic-Parikh matrix of the original word.

Consider the Parikh matrix of the Lyndon conjugates of the two previously given words. Observe that aabaacabc has 7 occurrences of ab, whereas aaabacabc has 8, making their Parikh matrices different. Hence, the ambiguity of their Parikh matrix can be reduced using Inline graphic-Parikh matrices.

While Inline graphic-Parikh matrices are a useful concept for any alphabet size, we focus on the cases where they reduce ambiguity in the binary alphabet and show that this happens in most cases. We give specific conditions of when Inline graphic-Parikh matrices do not help reduce the ambiguity of the given word, and investigate the words for which these criteria apply. This leads us to our main result of the paper, a characterisation of words whose ambiguity can be reduced using Inline graphic-Parikh matrices.

We end this section with a brief breakdown of our paper. In Sect. 2 we present some basic definitions and notions. Section 3 examines the first of the two notions we introduce, the Inline graphic-Parikh matrix, establishing conditions for when they can or cannot achieve ambiguity reduction. In Sect. 4, we study equivalent questions for Inline graphic-Parikh matrices, largely focusing on binary alphabets in some cases. We end our paper with conclusions as well as directions for future work.

Preliminaries

It is assumed the reader is familiar with the basics of combinatorics on words. If needed, [12] can be consulted. Throughout this paper, Inline graphic refers to the set of natural numbers starting with 1.

We refer to a string of arbitrary letters as a word which is formed by concatenation of letters. The set of all letters used to create our words is called an alphabet. We represent an ordered alphabet as Inline graphic, where Inline graphic is the size of the alphabet, and by convention Inline graphic is the ith letter in the Latin alphabet. Whenever the alphabet size is irrelevant or understood, we omit this from notation using only Inline graphic. All alphabets referred to in this paper have an order imposed on them.

We define the concatenation of two words u and v as uv. The length of a word is the total number of not necessarily distinct letters it contains and the empty word of length zero is denoted Inline graphic. The Kleene star, denoted Inline graphic, is the operation that, once applied to a given alphabet, generates the set of all finite words that result from concatenating any words in that alphabet. Further, we denote the Inline graphic letter in a word w as w[i].

The reversal of a word, denoted Inline graphic, is defined as Inline graphic, where Inline graphic is a word with Inline graphic. We say that a factor v is in w if and only if w can be written as Inline graphic, where Inline graphic. We say that Inline graphic is a subword of v if we have a factorisation Inline graphic where Inline graphic. We use Inline graphic to denote the number of distinct occurrences of u as a subword in v.

The Parikh vector [15] Inline graphic associated with a word w is obtained through a mapping Inline graphic, defined as Inline graphic. For a matrix M of size Inline graphic, the j-diagonal is defined as all elements of M that are in the position Inline graphic for Inline graphic. A word is associated with a matrix, called its Parikh matrix, if the matrix is obtained from that word following the process detailed in the following explanatory definition. For a technical version of the definition we refer to [14].

Definition 1 (Explanatory)

Let Inline graphic. The Parikh matrix, denoted Inline graphic, that w is associated with has size Inline graphic. The diagonal of the matrix is populated with 1’s and all elements below it are 0. The count of all subwords that consist of consecutive letters in Inline graphic and are of length n in the word are found on the n-diagonal, for Inline graphic.

One notion we introduce in this paper relies on a change in alphabet size. As such, to emphasise the size n of the alphabet used for a Parikh matrix, we write Inline graphic. We say that a Parikh matrix describes a word if the word is associated to the matrix. Notice that due to the associativity of matrix multiplication, the Parikh matrix of a word can be constructed from the Parikh matrices of its factors. For a word Inline graphic, we have Inline graphic.

Example 1

Consider the word Inline graphic defined over the alphabet Inline graphic. Then by definition our Parikh matrix is of size Inline graphic and we have

graphic file with name M53.gif

For the rest of this work we refine our notation for a Parikh matrix where we remove the elements not depending on the associated word. By definition a Parikh matrix is an upper triangular matrix with 1’s on the diagonal regardless of the word described. For aesthetics, removing the redundant part leaves us with a triangular structure that holds the same information as the original matrix,

graphic file with name M54.gif

   Inline graphic

Two words w and Inline graphic are conjugates if we can write Inline graphic and Inline graphic. For a word w, we say that the conjugacy class of w, denoted C(w) is the class of all of its possible conjugates. A conjugacy class is associated to a Parikh matrix if at least one word belonging to that class is associated to the matrix.

Example 2

The matrix Inline graphic has only the words aabbaaabaababaaaab associated to it. The words aabbaa and baaaab are members of the same conjugacy class, while abaaba belongs to a different conjugacy class. Hence this matrix has two conjugacy classes associated to it.   Inline graphic

A Parikh matrix can be associated to multiple words, as seen above, although cases exist where a matrix describes a single word, e. g., aabb is the unique word associated to Inline graphic. We say that two words are amiable if they are associated to the same Parikh matrix. If two or more words are associated to a single Parikh matrix, we say that the matrix is ambiguous. Later in this paper, we reduce the ambiguity of a word using both its Parikh matrix and the Parikh matrix of an altered form of that word to describe it. As such, we introduce a formal definition of the ambiguity that multiple functions may have based on the set of all words that satisfy all functions. We are then able to use this when considering the ambiguity of the notions we introduce later.

Definition 2

For a word w and functions Inline graphic we define Inline graphic Inline graphic for Inline graphic. If Inline graphic, then we call w unambiguous on Inline graphic, and say that Inline graphic uniquely define w. However, if Inline graphic for Inline graphic and functions Inline graphic, then we say that Inline graphic reduce the ambiguity of w on Inline graphic.

Observe that we always have Inline graphic. Furthermore, if Inline graphic, then Inline graphic is unambiguous and it is not possible to further reduce ambiguity.

First we introduce the Inline graphic-Parikh matrix. This matrix is in essence the Parikh matrix of a projection of a word, and represents a particular case of the extension of the Parikh matrix mapping presented in [19]. For Inline graphic, Inline graphic and Inline graphic, the Inline graphic-Parikh matrix of w with respect to S is defined as follows.

Definition 3

For Inline graphic with Inline graphic, let Inline graphic such that Inline graphic, where Inline graphic. We define the Inline graphic-Parikh matrix of the word w with respect to S as Inline graphic, where the morphism Inline graphic is defined as

graphic file with name M90.gif

To gain some intuition about the above definition, consider an example.

Example 3

Let Inline graphic, Inline graphic, and Inline graphic. For the index sequence of S, since a is the lexicographically smallest letter in S, we obtain Inline graphic, Inline graphic and Inline graphic. Hence Inline graphic, Inline graphic and Inline graphic.

With the transformation defined we apply this to the word, and calculate the corresponding Inline graphic-Parikh matrix as the Parikh matrix of the transformed word,

graphic file with name M101.gif

   Inline graphic

The Lyndon conjugate of a word is the conjugate that is lexicographically smallest based on the order on the alphabet. The Lyndon conjugate of a word w is denoted L(w). In an attempt to reduce the ambiguity of Parikh matrices, we modify the original Parikh matrix mapping to gain more information about a given word. Next, we introduce the Inline graphic-Parikh matrix associated to a word.

Definition 4

Given a word w, we define its Inline graphic-Parikh matrix, Inline graphic, as the Parikh matrix associated with its Lyndon conjugate, L(w).

It was shown in [4] that there exist transformations that, when applied to a word, create a new word that is amiable with the original. For non-binary alphabets, a Inline graphic transformation is given.

Lemma 1

([4]). Let Inline graphic with Inline graphic. Then w transforms into Inline graphic using a Inline graphic transformation if Inline graphic and Inline graphic, where Inline graphic, Inline graphic, and Inline graphic.

For binary alphabets, a second type of transformation is described, referred to as a Inline graphic, that allows us to check if certain words are amiable without constructing their matrices.

Lemma 2

([4]). Let Inline graphic. Then w transforms into Inline graphic through a Inline graphic transformation if Inline graphic and Inline graphic, or vice-versa, where Inline graphic and Inline graphic.

Inline graphic-Parikh Matrices

In this section, we examine when and how much Inline graphic-Parikh matrices reduce the ambiguity of a given word. When we refer to a reduction in ambiguity using Inline graphic-Parikh matrices, we mean that the number of words described by the original Parikh matrix and their respective Inline graphic-Parikh matrices is strictly less than the total number of words described by the original Parikh matrix alone, i. e., Inline graphic, for some Inline graphic. First we present an example of Inline graphic-Parikh matrices removing the ambiguity of a Parikh matrix entirely.

Example 4

Consider the word Inline graphic from Example 1, which is amiable with the word Inline graphic and no others. Then we choose our set Inline graphic, and get that: Inline graphic and Inline graphic Thus w and Inline graphic have different Inline graphic-Parikh matrices and we can uniquely describe them.   Inline graphic

We first introduce some terms that are useful when describing how effective a given Inline graphic-Parikh matrix is at reducing ambiguity.

Definition 5

Given a word Inline graphic, we call Inline graphic Inline graphic-distinguishable if either Inline graphic or there exists a word Inline graphic and a set Inline graphic such that Inline graphic and Inline graphic. In the latter case we say that w and u are Inline graphic-distinct. Furthermore, we call w Inline graphic-unique if there exist sets Inline graphic such that Inline graphic.

Now we use these terms to examine words whose ambiguity can be reduced using Inline graphic-Parikh matrices, namely those that contain any length two factor where those two letters are not equal or consecutive in the alphabet.

Proposition 1

For any word Inline graphic with a factor Inline graphic where Inline graphic, we have that Inline graphic is Inline graphic-distinguishable.

Proof

Since Inline graphic, if Inline graphic where Inline graphic, then Inline graphic is also associated to w, following Lemma 1. Without loss of generality, take Inline graphic. Then Inline graphic, since Inline graphic and Inline graphic are elements in Inline graphic and Inline graphic, respectively, and Inline graphic.   Inline graphic

It is simple to identify words that have such factors by comparing adjacent positions in the word. We can use this to find a lower bound for the proportion of words that are uniquely identified for a given alphabet and word length.

Proposition 2

The number of words of length m in Inline graphic that are reduced in ambiguity by Inline graphic-Parikh matrices is bounded below by Inline graphic.

Notice especially that as n and m get larger, the proportion of words which are reduced in ambiguity by Inline graphic-Parikh matrices also gets larger. We therefore conclude that the use of Inline graphic-Parikh matrices reduces ambiguity for a larger ratio of words for bigger alphabets rather than smaller.

There also exist words for which Inline graphic-Parikh matrices do not reduce ambiguity. Our following result says that if our choice of a subset consists of only consecutive letters of the initial alphabet, the Inline graphic-Parikh matrices are not Inline graphic-distinguishable.

Remark 6

If all elements of the set Inline graphic are consecutive in the alphabet Inline graphic, then Inline graphic.

The result of Remark 6 strengthens the one of Proposition 1 by telling us that the ambiguity of words defined over binary alphabets is not reducible by Inline graphic-Parikh matrices.

Corollary 1

There does not exist a Parikh matrix that describes binary words whose ambiguity can be reduced by Inline graphic-Parikh matrices.

Furthermore, there exist non-binary words for which Inline graphic-Parikh matrices do not remove ambiguity, namely those that are not Inline graphic-unique. Finally, we end this section by giving two classes of words which are not uniquely described by Inline graphic-Parikh matrices, no matter how we choose the set S.

Proposition 3

Take two words Inline graphic with the form Inline graphic and Inline graphic, where Inline graphic and Inline graphic. If Inline graphic, then for all Inline graphic, we have Inline graphic.

Proof

Firstly, if Inline graphic, equivalence follows, as Inline graphic. Now, let Inline graphic.

In the case where S contains either Inline graphic or Inline graphic, then Inline graphic since Inline graphic and Inline graphic are the only letters that swap places in Inline graphic compared to w. Since Inline graphic, clearly Inline graphic follows.

If Inline graphic, then, Inline graphic is a binary word and can be transformed via a Inline graphic transformation, from Lemma 2, into Inline graphic, so Inline graphic.

Next consider that Inline graphic, Inline graphic, and S has no elements between Inline graphic and Inline graphic. Then Inline graphic and Inline graphic. Using an extension from [3] of the Inline graphic transformations we can transform Inline graphic into Inline graphic, and get that Inline graphic.

Finally, consider the case where S contains Inline graphic, and at least one letter that comes lexicographically between Inline graphic and Inline graphic. Then, Inline graphic can be transformed into Inline graphic via two Inline graphic transformations on Inline graphic and Inline graphic, since Inline graphic and Inline graphic are not lexicographically adjacent in S (see Lemma 1).   Inline graphic

The ideas from Proposition 3 give rise to another class of words that are not Inline graphic-unique, by loosening the condition on v and extending the length of the word.

Proposition 4

Take two words of the form Inline graphic, and Inline graphic, where Inline graphic and Inline graphic. Let Inline graphic and Inline graphic. Then, w and Inline graphic are not Inline graphic-distinct if and only if Inline graphic for all Inline graphic, and at least one of the following conditions is true:

  1. Inline graphic, and for Inline graphic, if Inline graphic, then Inline graphic, and if Inline graphic, then Inline graphic;

  2. Inline graphic, and for Inline graphic, if Inline graphic, then Inline graphic, and if Inline graphic, then Inline graphic.

In other words, the above statement says that two words are not Inline graphic-distinct if both Inline graphic and Inline graphic are defined on the subset of the alphabet which is either lexicographically bigger than Inline graphic or smaller than Inline graphic, and they share the same Parikh vector for the subset of letters which are not in between Inline graphic and Inline graphic. Furthermore, if Inline graphic, then all the letters which are lexicographically greater than Inline graphic must occur in Inline graphic in decreasing lexicographical order and in Inline graphic in increasing order. On the other hand, if Inline graphic, then all the letters which are lexicographically smaller than Inline graphic must occur in Inline graphic in increasing lexicographically order and in Inline graphic in decreasing lexicographical order.

Inline graphic-Parikh Matrices

Proposition 2 shows that in many cases, the set of words that share both a Parikh matrix and a Inline graphic-Parikh matrix is smaller than the set of those that share only a Parikh matrix. However, following Corollary 1 we also know that this never happens for binary alphabets. Hence we now study Inline graphic-Parikh matrices as an alternative method of ambiguity reduction. While they can be effective for any non-unary alphabet, we focus on binary alphabets specifically. We begin this section by explaining the motivation for choosing the Lyndon conjugate of a word and then build to our main result where we characterise words whose ambiguity is reduced by the use of Inline graphic-Parikh matrices.

As indicated by Definition 4, the concept of Inline graphic-Parikh matrices is based on a modification to a word that results in a change in the order of letters. The following theorem implies that the strategy of altering a word is not always a successful method of ambiguity reduction. Note that Inline graphic refers to the Parikh matrix of the reversal of a word.

Theorem 1

([4]). For a word w, we have that Inline graphic.

Unlike Theorem 1, Inline graphic-Parikh matrices use the conjugate of a word. The next proposition implies that such conjugates need to be chosen wisely.

Proposition 5.

Given words Inline graphic with Inline graphic, for any factorisations Inline graphic and Inline graphic such that Inline graphic, we have that Inline graphic implies Inline graphic. For Inline graphic, the reverse direction also stands, namely Inline graphic implies Inline graphic.

Proof Outline.

We can prove the statement that holds for every size alphabet by contradiction, by assuming that Inline graphic and Inline graphic. We examine the total number of ab subwords in v, w, Inline graphic and Inline graphic to obtain a set of equations. We then consider the total number of b’s in Inline graphic and Inline graphic to find a contradiction within these equations.

For the statement that holds just for the binary alphabet we examine the total number of ab subwords in Inline graphic and Inline graphic and get a contradiction in the equations we obtain by initially assuming that Inline graphic, Inline graphic and Inline graphic.   Inline graphic

Below example shows that Inline graphic is necessary for Proposition 5.

Example 5.

Consider the words Inline graphic with Inline graphic and Inline graphic with Inline graphic. One can easily find that Inline graphic. Furthermore, we have that Inline graphic, Inline graphic and Inline graphic. However Inline graphic, since Inline graphic and Inline graphic, and therefore Inline graphic is a necessary condition in the context of Proposition 5.   Inline graphic

An example for the ternary alphabet where Inline graphic even though we have that Inline graphic and Inline graphic is given below. Note that if Inline graphic, then we must also have Inline graphic. Since any alphabet of size greater than 3 would rely on the result of the ternary alphabet always being true, we can deduce that the backwards direction from Proposition 5 only holds for the binary alphabet.

Example 6.

Let Inline graphic and Inline graphic. We have that Inline graphic. Now let Inline graphic and Inline graphic. Then we have that Inline graphic and Inline graphic. Note that Inline graphic, since Inline graphic and Inline graphic. But this gives us Inline graphic.   Inline graphic

Proposition 5 shows that when looking for a modification that we can apply to a word to find a new and different Parikh matrix, we need to consider conjugates of amiable words where it is less likely that the Parikh vectors of their right factors are the same, i. e., conjugates obtained by shifting the original words a different number of times, respectively.

Let us now consider how using Inline graphic-Parikh matrices reduces ambiguity. The rest of this section ignores any word w where Inline graphic, since there is no ambiguity to be reduced here. For a word w, we calculate Inline graphic and Inline graphic and use both of these matrices to describe the original word. The ambiguity of a word w, with respect to its Parikh and Inline graphic-Parikh matrices, according to Definition 2, is the total number of words that share a Parikh matrix and an Inline graphic-Parikh matrix with w, namely Inline graphic. We use the next definitions and propositions to build to our main result where we characterise binary words whose ambiguity is reduced using Inline graphic-Parikh matrices. In line with Definition 5 we introduce the following definitions.

Definition 7.

Given a word Inline graphic, we call Inline graphic Inline graphic-distinguishable if either Inline graphic or there exists a word Inline graphic with Inline graphic, such that Inline graphic. In the latter case we say that w and u are Inline graphic-distinct. A word w is Inline graphic-unique if Inline graphic.

Note that if w and v are Inline graphic-distinct, then Inline graphic and Inline graphic. The example below demonstrates the effectiveness of Inline graphic-Parikh matrices for ambiguity reduction.

Example 7.

Consider the words Inline graphic, Inline graphic and Inline graphic with Inline graphic. However, for the conjugates Inline graphic, Inline graphic and Inline graphic we have that Inline graphic, Inline graphic, and Inline graphic. Thus their Inline graphic-Parikh matrices are all different and we can uniquely describe each of the words by using Inline graphic-Parikh matrices.   Inline graphic

Inline graphic-distinguishability is necessary for ambiguity reduction in this case.

Corollary 2.

For Inline graphic, Inline graphic if and only if Inline graphic is Inline graphic-distinguishable.

The above characterisation of ambiguity reduction leads us to investigate sufficient conditions for a matrix to be ambiguous, and therefore for any pair of words it describes not to be Inline graphic-distinct. Our next results consider the situations when the Parikh matrix of a word is not Inline graphic-distinguishable. We show that words that meet the criteria outlined in each proposition within the binary alphabet are rare either later in the paper or directly following the next proposition.

Proposition 6.

For a word Inline graphic, if all words in Inline graphic belong to the same conjugacy class, then Inline graphic is not Inline graphic-distinguishable.

Example 8.

Let Inline graphic and Inline graphic. These two words are amiable with each other and nothing else. Furthermore, Inline graphic, and since both words share a Lyndon conjugate, both words also share an Inline graphic-Parikh matrix. Therefore Inline graphic is not Inline graphic-distinguishable.   Inline graphic

Now we move on to explore, for binary alphabets, the case where all words in Inline graphic belong to the same conjugacy class in more detail. Recall that C(w) refers to the conjugacy class of w.

Proposition 7.

Let Inline graphic. Then Inline graphic, for all Inline graphic, if and only if Inline graphic.

Proof Outline.

The forwards direction is proven by examining every element of the conjugacy class of w. We can first prove that if Inline graphic, for all Inline graphic, then words in the conjugacy class of w are only amiable with other conjugates of w. We then show that this is only true when L(w) is in the set Inline graphic. For this we define a block of a letter to be a unary factor of a word which is not extendable to the right or left and argue that applying a Inline graphic transformation to any Lyndon conjugate that is not in the above set either alters the size of the block of a’s at the start of the word, or changes the total number of blocks of a’s in the word altogether. This therefore gives us a word that is amiable to, but not a conjugate of, the original.

The backwards direction is proven by finding the Parikh matrices of all conjugates of words in the set Inline graphic. We then find that the only words described by these matrices are these conjugates.   Inline graphic

We now look at the case where all words associated to a Parikh matrix are the Lyndon representatives of their respective conjugacy classes, which again makes this matrix not Inline graphic-distinguishable.

Proposition 8.

For a word Inline graphic, if Inline graphic and Inline graphic, then Inline graphic is not Inline graphic-distinguishable.

Example 9.

The words Inline graphic and Inline graphic are only amiable with each other, Inline graphic, and both are the Lyndon representatives of their respective conjugacy classes. Therefore, Inline graphic and Inline graphic is not Inline graphic-distinguishable.   Inline graphic

For binary alphabets, we examine in greater detail when all words in Inline graphic are the Lyndon representatives of their conjugacy classes. The next result provides a necessary and sufficient condition, and therefore the complete characterisation, for this case to occur for the binary alphabet.

Proposition 9.

Let Inline graphic. Then the following statements are equivalent.

  • For all Inline graphic, we have that Inline graphic.

  • Inline graphic and for Inline graphic we have that Inline graphic and Inline graphic.

Proof Outline.

To show that these two statements are equivalent, we begin by showing that the second statement implies the former. We do this by first showing that if a word is of the form Inline graphic and, for Inline graphic, we have that Inline graphic and Inline graphic, then Inline graphic, and next move on to prove that only words of this form are described by Inline graphic. We prove that Inline graphic by observing that Inline graphic. Adding more a’s to the start of v and more b’s to the end means that the Lyndon conjugate is still the word itself, and hence obtain Inline graphic. We prove that words of the form described in the second point are only amiable with each other by calculating the total number of ab subwords in v and extrapolating this to w.

To prove that the first statement implies the second, we use the fact that our words share a Parikh matrix and that they must begin with the largest number of consecutive a’s in the word and end with at least one b. We also rewrite Inline graphic where Inline graphic begins with the first occurrence of a b and ends with the last occurrence of an a in w, and examine the form that this must take given the fixed number of ab subwords we must have in w. This gives us the total number of a’s and b’s in a word relative to the total number of ba subwords.   Inline graphic

The next example shows how the above result can be used to identify the form of the words that always share a Parikh matrix with other Lyndon conjugates.

Example 10.

Following Proposition 9, Lyndon representatives of different conjugacy classes share a Parikh matrix only if they are of the form Inline graphic, where for Inline graphic we have that Inline graphic and Inline graphic. Let us find all words of this form where Inline graphic. We begin by finding all binary words that contain 3 subwords ba. These are baaababa and bbba. Next add a’s to the front and b’s to the end of each word, respectively, so that we have a total of 6 a’s and 4 b’s per word: aaabaaabbbaaaabababbaaaaabbbab. Finally, any number of a’s and b’s can be added to the front and end of each word, respectively: Inline graphic. Hence we know that any word of this form is the Lyndon representative of its conjugacy class and shares a Parikh matrix with the two other words stated above. For example, Inline graphic.   Inline graphic

Thus far, we presented sufficient conditions for two amiable words not to be Inline graphic-distinct. Our main result shows that these conditions are in fact also the necessary ones. The following lemmas are used in the proof of the final result, but are included here as they are also interesting results on their own. The first lemma tells us that if the Parikh vectors of the proper right factors of two amiable words are different, then the size of these factors must also be unequal.

Lemma 3.

Consider the words Inline graphic and Inline graphic with Inline graphic, such that Inline graphic and Inline graphic. If Inline graphic, then Inline graphic.

Furthermore, if two amiable binary words are not the Lyndon representatives of their conjugacy classes, then to either of them we can apply a Type 2 transformation to obtain an amiable word whose Lyndon conjugate begins in a different position from the original one.

Lemma 4.

Let Inline graphic with Inline graphic. If Inline graphic, then there exists Inline graphic, where Inline graphic, such that Inline graphic.

Proof Outline.

The statement can be proven by contradiction, by first assuming that the Lyndon conjugate of every word associated to Inline graphic begins in the same position within those words. We then show that for the Lyndon conjugate to begin at any position within a given binary word, it is possible to apply a Inline graphic transformation to obtain a new word whose Lyndon conjugate begins in a different position.   Inline graphic

Next we show that all words that are conjugates of any word w such that Inline graphic are also amiable with a word that is not a conjugate of any of the words in Inline graphic.

Lemma 5.

Let Inline graphic, where Inline graphic. For any Inline graphic there exists Inline graphic such that Inline graphic.

Proof Outline.

This statement can be proven by considering every form that a word w can take, such that Inline graphic, from Proposition 9 and then examining all conjugates of these words. We show that a Inline graphic transformation can be applied to every conjugate to obtain a word that is not a conjugate of any word in our original set Inline graphic.   Inline graphic

We end this section by giving our main result that characterises all binary words whose Parikh matrix is not Inline graphic-distinguishable.

Theorem 2.

For Inline graphic, a Parikh matrix is not Inline graphic-distinguishable if and only if any of the words it describes meet at least one of the following criteria:

  • Inline graphic

  • Inline graphic and for Inline graphic we have that Inline graphic and Inline graphic

Proof Outline.

For the set of words Inline graphic, the forward direction is easily proven by finding these words’ Parikh and Inline graphic-Parikh matrices, respectively. The backward direction is proved using the fact that for words Inline graphic such that w is the reverse of Inline graphic and Inline graphic, then Inline graphic if and only if Inline graphic.

For the rest of the words, the ‘if’ direction was mostly proven earlier when Propositions 678 and 9, describing these situations, were introduced.

The ‘only if’ direction is proven by first examining the consequences of Proposition 5, which tells us that two words are Inline graphic-distinct if their Lyndon conjugates begin in different positions, respectively. We use Lemmas 3 and 4 to conclude that no set of amiable binary words exists where the Lyndon conjugates of all words in the set begin in the same position of each word, respectively. Hence all Parikh matrices would be Inline graphic-distinguishable if it were not for some cases that arise as a result of us using the Lyndon conjugate. These cases are namely the ones where the set of amiable words are all Lyndon conjugates, are all members of the same conjugacy class, or are all conjugates of words whose Lyndon conjugates share a Parikh matrix. We showed in Propositions 7 and 9 that the first two cases are characterised by words of the form Inline graphic where for Inline graphic we have that Inline graphic and Inline graphic, and by words where their Lyndon conjugate is in the set Inline graphic, respectively. We use Lemma 5 to conclude that no words exist such that the third case is true.   Inline graphic

Conclusion and Future Work

In this paper, we have shown that using Inline graphic-Parikh matrices and Inline graphic-Parikh matrices reduces the ambiguity of a word in most cases. From Corollary 1, we learn that Inline graphic-Parikh matrices cannot reduce the ambiguity of a Parikh matrix that describes words in a binary alphabet, but are very powerful when it comes to reducing the ambiguity of words in larger alphabets (Proposition 2). On the other hand, we find that Inline graphic-Parikh matrices reduce the ambiguity of most binary words, with the few exceptions from Theorem 2, which have all been shown to be rare occurrences within the binary alphabet. Thus, using both tools together leads to a reduction in ambiguity in most cases.

Going forward, we wish to characterise words that are described uniquely by both types of matrices, respectively, as well as quantifying the ambiguity reduction permitted by both notions. Theorem 2 tells us that there are very few binary words whose Parikh matrix ambiguity cannot be reduced by Inline graphic-Parikh matrices. Future research on Inline graphic-Parikh matrices could also include an analysis similar to the one done in Proposition 2.

Finally we present a conjecture on the types of words that might be described by a Parikh matrix that is Inline graphic-distinguishable. We know that the presence of a certain type of factor, described in Proposition 1, in a word means that its Parikh matrix is Inline graphic-distinguishable. This conjecture implies that the presence of this factor is the only way that the ambiguity of a word could be reduced by Inline graphic-Parikh matrices.

Conjecture 8.

For any word Inline graphic, if Inline graphic is Inline graphic-distinguishable, then there exists a word amiable with w which contains a factor Inline graphic, where Inline graphic.

Contributor Information

Alberto Leporati, Email: alberto.leporati@unimib.it.

Carlos Martín-Vide, Email: carlos.martin@urv.cat.

Dana Shapira, Email: shapird@g.ariel.ac.il.

Claudio Zandron, Email: zandron@disco.unimib.it.

Jeffery Dick, Email: J.Dick@lboro.ac.uk.

Laura K. Hutchinson, Email: L.Hutchinson@lboro.ac.uk

Robert Mercaş, Email: R.G.Mercas@lboro.ac.uk.

Daniel Reidenbach, Email: D.Reidenbach@lboro.ac.uk.

References

  • 1.Alazemi HMK, Černý A. Counting subwords using a trie automaton. Int. J. Found. Comput. Sci. 2011;22(6):1457–1469. doi: 10.1142/S0129054111008817. [DOI] [Google Scholar]
  • 2.Alazemi HMK, Černý A. Several extensions of the Parikh matrix L-morphism. J. Comput. Syst. Sci. 2013;79(5):658–668. doi: 10.1016/j.jcss.2013.01.018. [DOI] [Google Scholar]
  • 3.Atanasiu, A.: Parikh matrix mapping and amiability over a ternary alphabet. In: Discrete Mathematics and Computer Science, pp. 1–12 (2014)
  • 4.Atanasiu A, Atanasiu R, Petre I. Parikh matrices and amiable words. Theoret. Comput. Sci. 2008;390(1):102–109. doi: 10.1016/j.tcs.2007.10.022. [DOI] [Google Scholar]
  • 5.Atanasiu A, Martín-Vide C, Mateescu A. Codifiable languages and the Parikh matrix mapping. J. UCS. 2001;7:783–793. [Google Scholar]
  • 6.Atanasiu A, Martín-Vide C, Mateescu A. On the injectivity of the Parikh matrix mapping. Fund. Inform. 2002;49(4):289–299. [Google Scholar]
  • 7.Atanasiu A, Teh WC. A new operator over Parikh languages. Int. J. Found. Comput. Sci. 2016;27(06):757–769. doi: 10.1142/S0129054116500271. [DOI] [Google Scholar]
  • 8.Bera S, Mahalingam K. Some algebraic aspects of Parikh q-matrices. Int. J. Found. Comput. Sci. 2016;27(4):479–500. doi: 10.1142/S0129054116500118. [DOI] [Google Scholar]
  • 9.Egecioglu, Ö.: A q-matrix encoding extending the Parikh matrix mapping. Technical report 14, Department of Computer Science at UC Santa Barbara (2004)
  • 10.Egecioglu O, Ibarra OH. A matrix q-analogue of the Parikh map. In: Levy J-J, Mayr EW, Mitchell JC, editors. Exploring New Frontiers of Theoretical Informatics; Boston: Springer; 2004. pp. 125–138. [Google Scholar]
  • 11.Egecioglu, Ö., Ibarra, O.H.: A Inline graphic-analogue of the Parikh matrix mapping. In: Formal Models, Languages and Applications [this volume commemorates the 75th birthday of Prof. Rani Siromoney]. In: Series in Machine Perception and Artificial Intelligence, vol. 66, pp. 97–111 (2007)
  • 12.Lothaire M. Combinatorics on Words. Cambridge: Cambridge University Press; 1997. [Google Scholar]
  • 13.Mahalingam K, Subramanian KG. Product of Parikh matrices and commutativity. Int. J. Found. Comput. Sci. 2012;23(01):207–223. doi: 10.1142/S0129054112500049. [DOI] [Google Scholar]
  • 14.Mateescu, A., Salomaa, A., Salomaa, K., Yu, S.: On an extension of the Parikh mapping. Turku Centre for Computer Science (2000)
  • 15.Parikh RJ. On context-free languages. J. ACM. 1966;13(4):570–581. doi: 10.1145/321356.321364. [DOI] [Google Scholar]
  • 16.Poovanandran G, Teh WC. Strong (2Inline graphict) and strong (3Inline graphict) transformations for strong M-equivalence. Int. J. Found. Comput. Sci. 2019;30(05):719–733. doi: 10.1142/S0129054119500187. [DOI] [Google Scholar]
  • 17.Salomaa A, Yu S. Subword occurrences, Parikh matrices and Lyndon images. Int. J. Found. Comput. Sci. 2010;21:91–111. doi: 10.1142/S0129054110007155. [DOI] [Google Scholar]
  • 18.Şerbănuţă TF. Extending Parikh matrices. Theor. Comput. Sci. 2004;310(1–3):233–246. doi: 10.1016/S0304-3975(03)00396-7. [DOI] [Google Scholar]
  • 19.Şerbănuţă VN. On Parikh matrices, ambiguity, and prints. Int. J. Found. Comput. Sci. 2009;20(01):151–165. doi: 10.1142/S0129054109006498. [DOI] [Google Scholar]
  • 20.Širšov, A.I.: Subalgebras of free Lie algebras. Mat. Sbornik N.S. 33(75), 441–452 (1953)

Articles from Language and Automata Theory and Applications are provided here courtesy of Nature Publishing Group

RESOURCES