Reducing the Ambiguity of Parikh Matrices

Jeffery Dick; Laura K Hutchinson; Robert Mercaş; Daniel Reidenbach

doi:10.1007/978-3-030-40608-0_28

. 2020 Jan 7;12038:397–411. doi: 10.1007/978-3-030-40608-0_28

Reducing the Ambiguity of Parikh Matrices

Jeffery Dick ⁵, Laura K Hutchinson ^5,^✉, Robert Mercaş ⁵, Daniel Reidenbach ⁵

Editors: Alberto Leporati⁸, Carlos Martín-Vide⁹, Dana Shapira¹⁰, Claudio Zandron¹¹

PMCID: PMC7206623

Abstract

The Parikh matrix mapping allows us to describe words using matrices. Although compact, this description comes with a level of ambiguity since a single matrix may describe multiple words. This work looks at how considering the Parikh matrices of various transformations of a given word can decrease that ambiguity. More specifically, for any word, we study the Parikh matrix of its Lyndon conjugate as well as that of its projection to a smaller alphabet. Our results demonstrate that ambiguity can often be reduced using these concepts, and we give conditions on when they succeed.

Keywords: Combinatorics, Parikh matrix, Ambiguity, Lyndon conjugate

Introduction

An approach for a more compact representation of data can be provided by histograms, which are also a well established statistical tool used in a wide range of applications. The concept of a Parikh vector [15] represents a type of such histograms that is specific to the analysis of sequences of symbols (or: words), considering the number of occurrences of each letter that exists in a word.

Parikh vectors can be easily computed and are guaranteed to be logarithmic in the size of the word they represent, but they are ambiguous; that is, multiple words typically share the same Parikh vector. Following this, in [14] the authors look at a refinement of the vector notion which is meant to reduce this ambiguity, and introduce an extension for it in the form of a Parikh matrix. A Parikh matrix not only contains the Parikh vector of the word, but also information regarding some of the word’s (scattered) subwords. Such a matrix has the same asymptotic compactness as a Parikh vector and is associated to a significantly smaller number of words. However, it does not normally remove ambiguity entirely.

The bulk of the work done on the Parikh matrix mapping concerns the ambiguity that Parikh matrices exhibit. A lot of effort is spent on identifying an alternative to the Parikh matrix concept that would make a mapping from a word injective, or less ambiguous in general [1, 2, 8–11, 18]. These include even more refined versions of the matrices by inclusion of polynomials, various extensions on the mappings, or both. For Parikh matrices explicitly, due to the difficulty arising from this ambiguity, the primary focus was on investigating this property on binary [4–7, 17] and ternary [3, 13, 16, 19] alphabets, leaving alphabets of size greater than three relatively unexplored.

In terms of reducing the ambiguity of a word, the investigation was based on either gathering more information about the specific word by altering the order of the alphabet, known as the dual order [6, 14], or by considering the reverse image of the word [6]. Hence an under-studied aspect that may reduce the ambiguity of a matrix concerns the information acquired by altering the word itself, or considering other alterations of the alphabet. In this work we present and investigate two different methods that reduce the ambiguity of the original Parikh matrices in the form of Inline graphic -Parikh matrices and -Parikh matrices.

The first of the two transformations, the Inline graphic -Parikh matrix mapping, considers the Parikh matrices associated to a projection morphism of the initial word, where the considered alphabet is reduced to the subset of the alphabet used within the defined transformation. These represent a particular case of the extended mapping presented in [18], where we only consider a subset of the original alphabet. For example, consider the words abcaabaac and abacabcaa. It is easy to see that both share the same number of letters, and subwords ab, bc and abc, respectively, making their Parikh matrices equal and therefore ambiguous. The Inline graphic -Parikh matrices associated to them with respect to consider the number of subwords ac, which is 6 in the former, but only 5 in the latter of the words. Hence, there exist -Parikh matrices not shared by the words.

We show that, using Inline graphic -Parikh matrices, we can reduce the ambiguity of the vast majority of words. We also explore when -Parikh matrices do not reduce ambiguity, as well as provide some insight into the types of words that cannot be uniquely described by a -Parikh matrix.

However, since Inline graphic -Parikh matrices are defined for a subset of the initial alphabet, they prove useless when dealing with binary sequences. We therefore consider an alternative transformation of words: the Lyndon conjugate, first introduced in [20], which is defined as the lexicographically smallest circular rotation of a word. Lyndon conjugates were used previously as a tool for ambiguity reduction. In [17], the authors define the Lyndon image of a Parikh matrix as the lexicographically smallest word describing such a matrix. Hence every Parikh matrix has exactly one distinct Lyndon image, which therefore allows each Parikh matrix to be described uniquely. In the context of this paper, we use the Lyndon conjugate differently, i. e., we consider the Parikh matrix of the Lyndon conjugate of a word, and we call the resulting matrix the Inline graphic -Parikh matrix of the original word.

Consider the Parikh matrix of the Lyndon conjugates of the two previously given words. Observe that aabaacabc has 7 occurrences of ab, whereas aaabacabc has 8, making their Parikh matrices different. Hence, the ambiguity of their Parikh matrix can be reduced using Inline graphic -Parikh matrices.

While Inline graphic -Parikh matrices are a useful concept for any alphabet size, we focus on the cases where they reduce ambiguity in the binary alphabet and show that this happens in most cases. We give specific conditions of when -Parikh matrices do not help reduce the ambiguity of the given word, and investigate the words for which these criteria apply. This leads us to our main result of the paper, a characterisation of words whose ambiguity can be reduced using Inline graphic -Parikh matrices.

We end this section with a brief breakdown of our paper. In Sect. 2 we present some basic definitions and notions. Section 3 examines the first of the two notions we introduce, the Inline graphic -Parikh matrix, establishing conditions for when they can or cannot achieve ambiguity reduction. In Sect. 4, we study equivalent questions for -Parikh matrices, largely focusing on binary alphabets in some cases. We end our paper with conclusions as well as directions for future work.

Preliminaries

It is assumed the reader is familiar with the basics of combinatorics on words. If needed, [12] can be consulted. Throughout this paper, Inline graphic refers to the set of natural numbers starting with 1.

We refer to a string of arbitrary letters as a word which is formed by concatenation of letters. The set of all letters used to create our words is called an alphabet. We represent an ordered alphabet as Inline graphic , where is the size of the alphabet, and by convention is the ith letter in the Latin alphabet. Whenever the alphabet size is irrelevant or understood, we omit this from notation using only . All alphabets referred to in this paper have an order imposed on them.

We define the concatenation of two words u and v as uv. The length of a word is the total number of not necessarily distinct letters it contains and the empty word of length zero is denoted Inline graphic . The Kleene star, denoted , is the operation that, once applied to a given alphabet, generates the set of all finite words that result from concatenating any words in that alphabet. Further, we denote the letter in a word w as w[i].

The reversal of a word, denoted Inline graphic , is defined as , where is a word with . We say that a factor v is in w if and only if w can be written as , where . We say that is a subword of v if we have a factorisation where . We use to denote the number of distinct occurrences of u as a subword in v.

The Parikh vector [15] Inline graphic associated with a word w is obtained through a mapping , defined as . For a matrix M of size , the j-diagonal is defined as all elements of M that are in the position for . A word is associated with a matrix, called its Parikh matrix, if the matrix is obtained from that word following the process detailed in the following explanatory definition. For a technical version of the definition we refer to [14].

Definition 1 (Explanatory)

Let Inline graphic . The Parikh matrix, denoted , that w is associated with has size . The diagonal of the matrix is populated with 1’s and all elements below it are 0. The count of all subwords that consist of consecutive letters in and are of length n in the word are found on the n-diagonal, for .

One notion we introduce in this paper relies on a change in alphabet size. As such, to emphasise the size n of the alphabet used for a Parikh matrix, we write Inline graphic . We say that a Parikh matrix describes a word if the word is associated to the matrix. Notice that due to the associativity of matrix multiplication, the Parikh matrix of a word can be constructed from the Parikh matrices of its factors. For a word , we have .

Example 1

Consider the word Inline graphic defined over the alphabet . Then by definition our Parikh matrix is of size and we have

For the rest of this work we refine our notation for a Parikh matrix where we remove the elements not depending on the associated word. By definition a Parikh matrix is an upper triangular matrix with 1’s on the diagonal regardless of the word described. For aesthetics, removing the redundant part leaves us with a triangular structure that holds the same information as the original matrix,

Inline graphic

Two words w and Inline graphic are conjugates if we can write and . For a word w, we say that the conjugacy class of w, denoted C(w) is the class of all of its possible conjugates. A conjugacy class is associated to a Parikh matrix if at least one word belonging to that class is associated to the matrix.

Example 2

The matrix Inline graphic has only the words aabbaa, abaaba, baaaab associated to it. The words aabbaa and baaaab are members of the same conjugacy class, while abaaba belongs to a different conjugacy class. Hence this matrix has two conjugacy classes associated to it.

A Parikh matrix can be associated to multiple words, as seen above, although cases exist where a matrix describes a single word, e. g., aabb is the unique word associated to Inline graphic . We say that two words are amiable if they are associated to the same Parikh matrix. If two or more words are associated to a single Parikh matrix, we say that the matrix is ambiguous. Later in this paper, we reduce the ambiguity of a word using both its Parikh matrix and the Parikh matrix of an altered form of that word to describe it. As such, we introduce a formal definition of the ambiguity that multiple functions may have based on the set of all words that satisfy all functions. We are then able to use this when considering the ambiguity of the notions we introduce later.

Definition 2

For a word w and functions Inline graphic we define for . If , then we call w unambiguous on , and say that uniquely define w. However, if for and functions , then we say that reduce the ambiguity of w on .

Observe that we always have Inline graphic . Furthermore, if , then is unambiguous and it is not possible to further reduce ambiguity.

First we introduce the Inline graphic -Parikh matrix. This matrix is in essence the Parikh matrix of a projection of a word, and represents a particular case of the extension of the Parikh matrix mapping presented in [19]. For , and , the -Parikh matrix of w with respect to S is defined as follows.

Definition 3

For Inline graphic with , let such that , where . We define the -Parikh matrix of the word w with respect to S as , where the morphism is defined as

To gain some intuition about the above definition, consider an example.

Example 3

Let Inline graphic , , and . For the index sequence of S, since a is the lexicographically smallest letter in S, we obtain , and . Hence , and .

With the transformation defined we apply this to the word, and calculate the corresponding Inline graphic -Parikh matrix as the Parikh matrix of the transformed word,

Inline graphic

The Lyndon conjugate of a word is the conjugate that is lexicographically smallest based on the order on the alphabet. The Lyndon conjugate of a word w is denoted L(w). In an attempt to reduce the ambiguity of Parikh matrices, we modify the original Parikh matrix mapping to gain more information about a given word. Next, we introduce the Inline graphic -Parikh matrix associated to a word.

Definition 4

Given a word w, we define its Inline graphic -Parikh matrix, , as the Parikh matrix associated with its Lyndon conjugate, L(w).

It was shown in [4] that there exist transformations that, when applied to a word, create a new word that is amiable with the original. For non-binary alphabets, a Inline graphic transformation is given.

Lemma 1

([4]). Let Inline graphic with . Then w transforms into using a transformation if and , where , , and .

For binary alphabets, a second type of transformation is described, referred to as a Inline graphic , that allows us to check if certain words are amiable without constructing their matrices.

Lemma 2

([4]). Let Inline graphic . Then w transforms into through a transformation if and , or vice-versa, where and .

-Parikh Matrices

In this section, we examine when and how much Inline graphic -Parikh matrices reduce the ambiguity of a given word. When we refer to a reduction in ambiguity using -Parikh matrices, we mean that the number of words described by the original Parikh matrix and their respective -Parikh matrices is strictly less than the total number of words described by the original Parikh matrix alone, i. e., Inline graphic , for some . First we present an example of -Parikh matrices removing the ambiguity of a Parikh matrix entirely.

Example 4

Consider the word Inline graphic from Example 1, which is amiable with the word and no others. Then we choose our set , and get that: and Thus w and have different -Parikh matrices and we can uniquely describe them.

We first introduce some terms that are useful when describing how effective a given Inline graphic -Parikh matrix is at reducing ambiguity.

Definition 5

Given a word Inline graphic , we call -distinguishable if either or there exists a word and a set such that and . In the latter case we say that w and u are -distinct. Furthermore, we call w -unique if there exist sets such that .

Now we use these terms to examine words whose ambiguity can be reduced using Inline graphic -Parikh matrices, namely those that contain any length two factor where those two letters are not equal or consecutive in the alphabet.

Proposition 1

For any word Inline graphic with a factor where , we have that is -distinguishable.

Proof

Since Inline graphic , if where , then is also associated to w, following Lemma 1. Without loss of generality, take . Then , since and are elements in and , respectively, and .

It is simple to identify words that have such factors by comparing adjacent positions in the word. We can use this to find a lower bound for the proportion of words that are uniquely identified for a given alphabet and word length.

Proposition 2

The number of words of length m in Inline graphic that are reduced in ambiguity by -Parikh matrices is bounded below by .

Notice especially that as n and m get larger, the proportion of words which are reduced in ambiguity by Inline graphic -Parikh matrices also gets larger. We therefore conclude that the use of -Parikh matrices reduces ambiguity for a larger ratio of words for bigger alphabets rather than smaller.

There also exist words for which Inline graphic -Parikh matrices do not reduce ambiguity. Our following result says that if our choice of a subset consists of only consecutive letters of the initial alphabet, the -Parikh matrices are not -distinguishable.

Remark 6

If all elements of the set Inline graphic are consecutive in the alphabet , then .

The result of Remark 6 strengthens the one of Proposition 1 by telling us that the ambiguity of words defined over binary alphabets is not reducible by Inline graphic -Parikh matrices.

Corollary 1

There does not exist a Parikh matrix that describes binary words whose ambiguity can be reduced by Inline graphic -Parikh matrices.

Furthermore, there exist non-binary words for which Inline graphic -Parikh matrices do not remove ambiguity, namely those that are not -unique. Finally, we end this section by giving two classes of words which are not uniquely described by -Parikh matrices, no matter how we choose the set S.

Proposition 3

Take two words Inline graphic with the form and , where and . If , then for all , we have .

Proof

Firstly, if Inline graphic , equivalence follows, as . Now, let .

In the case where S contains either Inline graphic or , then since and are the only letters that swap places in compared to w. Since , clearly follows.

If Inline graphic , then, is a binary word and can be transformed via a transformation, from Lemma 2, into , so .

Next consider that Inline graphic , , and S has no elements between and . Then and . Using an extension from [3] of the transformations we can transform into , and get that .

Finally, consider the case where S contains Inline graphic , and at least one letter that comes lexicographically between and . Then, can be transformed into via two transformations on and , since and are not lexicographically adjacent in S (see Lemma 1).

The ideas from Proposition 3 give rise to another class of words that are not Inline graphic -unique, by loosening the condition on v and extending the length of the word.

Proposition 4

Take two words of the form Inline graphic , and , where and . Let and . Then, w and are not -distinct if and only if for all , and at least one of the following conditions is true:

, and for , if , then , and if , then ;
, and for , if , then , and if , then .

In other words, the above statement says that two words are not Inline graphic -distinct if both and are defined on the subset of the alphabet which is either lexicographically bigger than or smaller than , and they share the same Parikh vector for the subset of letters which are not in between and . Furthermore, if , then all the letters which are lexicographically greater than Inline graphic must occur in in decreasing lexicographical order and in in increasing order. On the other hand, if , then all the letters which are lexicographically smaller than must occur in in increasing lexicographically order and in in decreasing lexicographical order.

-Parikh Matrices

Proposition 2 shows that in many cases, the set of words that share both a Parikh matrix and a Inline graphic -Parikh matrix is smaller than the set of those that share only a Parikh matrix. However, following Corollary 1 we also know that this never happens for binary alphabets. Hence we now study -Parikh matrices as an alternative method of ambiguity reduction. While they can be effective for any non-unary alphabet, we focus on binary alphabets specifically. We begin this section by explaining the motivation for choosing the Lyndon conjugate of a word and then build to our main result where we characterise words whose ambiguity is reduced by the use of Inline graphic -Parikh matrices.

As indicated by Definition 4, the concept of Inline graphic -Parikh matrices is based on a modification to a word that results in a change in the order of letters. The following theorem implies that the strategy of altering a word is not always a successful method of ambiguity reduction. Note that refers to the Parikh matrix of the reversal of a word.

Theorem 1

([4]). For a word w, we have that Inline graphic .

Unlike Theorem 1, Inline graphic -Parikh matrices use the conjugate of a word. The next proposition implies that such conjugates need to be chosen wisely.

Proposition 5.

Given words Inline graphic with , for any factorisations and such that , we have that implies . For , the reverse direction also stands, namely implies .

Proof Outline.

We can prove the statement that holds for every size alphabet by contradiction, by assuming that Inline graphic and . We examine the total number of ab subwords in v, w, and to obtain a set of equations. We then consider the total number of b’s in and to find a contradiction within these equations.

For the statement that holds just for the binary alphabet we examine the total number of ab subwords in Inline graphic and and get a contradiction in the equations we obtain by initially assuming that , and .

Below example shows that Inline graphic is necessary for Proposition 5.

Example 5.

Consider the words Inline graphic with and with . One can easily find that . Furthermore, we have that , and . However , since and , and therefore is a necessary condition in the context of Proposition 5.

An example for the ternary alphabet where Inline graphic even though we have that and is given below. Note that if , then we must also have . Since any alphabet of size greater than 3 would rely on the result of the ternary alphabet always being true, we can deduce that the backwards direction from Proposition 5 only holds for the binary alphabet.

Example 6.

Let Inline graphic and . We have that . Now let and . Then we have that and . Note that , since and . But this gives us .

Proposition 5 shows that when looking for a modification that we can apply to a word to find a new and different Parikh matrix, we need to consider conjugates of amiable words where it is less likely that the Parikh vectors of their right factors are the same, i. e., conjugates obtained by shifting the original words a different number of times, respectively.

Let us now consider how using Inline graphic -Parikh matrices reduces ambiguity. The rest of this section ignores any word w where , since there is no ambiguity to be reduced here. For a word w, we calculate and and use both of these matrices to describe the original word. The ambiguity of a word w, with respect to its Parikh and Inline graphic -Parikh matrices, according to Definition 2, is the total number of words that share a Parikh matrix and an -Parikh matrix with w, namely . We use the next definitions and propositions to build to our main result where we characterise binary words whose ambiguity is reduced using -Parikh matrices. In line with Definition 5 we introduce the following definitions.

Definition 7.

Given a word Inline graphic , we call -distinguishable if either or there exists a word with , such that . In the latter case we say that w and u are -distinct. A word w is -unique if .

Note that if w and v are Inline graphic -distinct, then and . The example below demonstrates the effectiveness of -Parikh matrices for ambiguity reduction.

Example 7.

Consider the words Inline graphic , and with . However, for the conjugates , and we have that , , and . Thus their -Parikh matrices are all different and we can uniquely describe each of the words by using -Parikh matrices.

Inline graphic -distinguishability is necessary for ambiguity reduction in this case.

Corollary 2.

For Inline graphic , if and only if is -distinguishable.

The above characterisation of ambiguity reduction leads us to investigate sufficient conditions for a matrix to be ambiguous, and therefore for any pair of words it describes not to be Inline graphic -distinct. Our next results consider the situations when the Parikh matrix of a word is not -distinguishable. We show that words that meet the criteria outlined in each proposition within the binary alphabet are rare either later in the paper or directly following the next proposition.

Proposition 6.

For a word Inline graphic , if all words in belong to the same conjugacy class, then is not -distinguishable.

Example 8.

Let Inline graphic and . These two words are amiable with each other and nothing else. Furthermore, , and since both words share a Lyndon conjugate, both words also share an -Parikh matrix. Therefore is not -distinguishable.

Now we move on to explore, for binary alphabets, the case where all words in Inline graphic belong to the same conjugacy class in more detail. Recall that C(w) refers to the conjugacy class of w.

Proposition 7.

Let Inline graphic . Then , for all , if and only if .

Proof Outline.

The forwards direction is proven by examining every element of the conjugacy class of w. We can first prove that if Inline graphic , for all , then words in the conjugacy class of w are only amiable with other conjugates of w. We then show that this is only true when L(w) is in the set . For this we define a block of a letter to be a unary factor of a word which is not extendable to the right or left and argue that applying a Inline graphic transformation to any Lyndon conjugate that is not in the above set either alters the size of the block of a’s at the start of the word, or changes the total number of blocks of a’s in the word altogether. This therefore gives us a word that is amiable to, but not a conjugate of, the original.

The backwards direction is proven by finding the Parikh matrices of all conjugates of words in the set Inline graphic . We then find that the only words described by these matrices are these conjugates.

We now look at the case where all words associated to a Parikh matrix are the Lyndon representatives of their respective conjugacy classes, which again makes this matrix not Inline graphic -distinguishable.

Proposition 8.

For a word Inline graphic , if and , then is not -distinguishable.

Example 9.

The words Inline graphic and are only amiable with each other, , and both are the Lyndon representatives of their respective conjugacy classes. Therefore, and is not -distinguishable.

For binary alphabets, we examine in greater detail when all words in Inline graphic are the Lyndon representatives of their conjugacy classes. The next result provides a necessary and sufficient condition, and therefore the complete characterisation, for this case to occur for the binary alphabet.

Proposition 9.

Let Inline graphic . Then the following statements are equivalent.

For all , we have that .
and for we have that and .

Proof Outline.

To show that these two statements are equivalent, we begin by showing that the second statement implies the former. We do this by first showing that if a word is of the form Inline graphic and, for , we have that and , then , and next move on to prove that only words of this form are described by . We prove that by observing that . Adding more a’s to the start of v and more b’s to the end means that the Lyndon conjugate is still the word itself, and hence obtain . We prove that words of the form described in the second point are only amiable with each other by calculating the total number of ab subwords in v and extrapolating this to w.

To prove that the first statement implies the second, we use the fact that our words share a Parikh matrix and that they must begin with the largest number of consecutive a’s in the word and end with at least one b. We also rewrite Inline graphic where begins with the first occurrence of a b and ends with the last occurrence of an a in w, and examine the form that this must take given the fixed number of ab subwords we must have in w. This gives us the total number of a’s and b’s in a word relative to the total number of ba subwords. Inline graphic

The next example shows how the above result can be used to identify the form of the words that always share a Parikh matrix with other Lyndon conjugates.

Example 10.

Following Proposition 9, Lyndon representatives of different conjugacy classes share a Parikh matrix only if they are of the form Inline graphic , where for we have that and . Let us find all words of this form where . We begin by finding all binary words that contain 3 subwords ba. These are baaa, baba and bbba. Next add a’s to the front and b’s to the end of each word, respectively, so that we have a total of 6 a’s and 4 b’s per word: aaabaaabbb, aaaabababb, aaaaabbbab. Finally, any number of a’s and b’s can be added to the front and end of each word, respectively: Inline graphic . Hence we know that any word of this form is the Lyndon representative of its conjugacy class and shares a Parikh matrix with the two other words stated above. For example, .

Thus far, we presented sufficient conditions for two amiable words not to be Inline graphic -distinct. Our main result shows that these conditions are in fact also the necessary ones. The following lemmas are used in the proof of the final result, but are included here as they are also interesting results on their own. The first lemma tells us that if the Parikh vectors of the proper right factors of two amiable words are different, then the size of these factors must also be unequal.

Lemma 3.

Consider the words Inline graphic and with , such that and . If , then .

Furthermore, if two amiable binary words are not the Lyndon representatives of their conjugacy classes, then to either of them we can apply a Type 2 transformation to obtain an amiable word whose Lyndon conjugate begins in a different position from the original one.

Lemma 4.

Let Inline graphic with . If , then there exists , where , such that .

Proof Outline.

The statement can be proven by contradiction, by first assuming that the Lyndon conjugate of every word associated to Inline graphic begins in the same position within those words. We then show that for the Lyndon conjugate to begin at any position within a given binary word, it is possible to apply a transformation to obtain a new word whose Lyndon conjugate begins in a different position.

Next we show that all words that are conjugates of any word w such that Inline graphic are also amiable with a word that is not a conjugate of any of the words in .

Lemma 5.

Let Inline graphic , where . For any there exists such that .

Proof Outline.

This statement can be proven by considering every form that a word w can take, such that Inline graphic , from Proposition 9 and then examining all conjugates of these words. We show that a transformation can be applied to every conjugate to obtain a word that is not a conjugate of any word in our original set .

We end this section by giving our main result that characterises all binary words whose Parikh matrix is not Inline graphic -distinguishable.

Theorem 2.

For Inline graphic , a Parikh matrix is not -distinguishable if and only if any of the words it describes meet at least one of the following criteria:

and for we have that and

Proof Outline.

For the set of words Inline graphic , the forward direction is easily proven by finding these words’ Parikh and -Parikh matrices, respectively. The backward direction is proved using the fact that for words such that w is the reverse of and , then if and only if .

For the rest of the words, the ‘if’ direction was mostly proven earlier when Propositions 6, 7, 8 and 9, describing these situations, were introduced.

The ‘only if’ direction is proven by first examining the consequences of Proposition 5, which tells us that two words are Inline graphic -distinct if their Lyndon conjugates begin in different positions, respectively. We use Lemmas 3 and 4 to conclude that no set of amiable binary words exists where the Lyndon conjugates of all words in the set begin in the same position of each word, respectively. Hence all Parikh matrices would be Inline graphic -distinguishable if it were not for some cases that arise as a result of us using the Lyndon conjugate. These cases are namely the ones where the set of amiable words are all Lyndon conjugates, are all members of the same conjugacy class, or are all conjugates of words whose Lyndon conjugates share a Parikh matrix. We showed in Propositions 7 and 9 that the first two cases are characterised by words of the form Inline graphic where for we have that and , and by words where their Lyndon conjugate is in the set , respectively. We use Lemma 5 to conclude that no words exist such that the third case is true.

Conclusion and Future Work

In this paper, we have shown that using Inline graphic -Parikh matrices and -Parikh matrices reduces the ambiguity of a word in most cases. From Corollary 1, we learn that -Parikh matrices cannot reduce the ambiguity of a Parikh matrix that describes words in a binary alphabet, but are very powerful when it comes to reducing the ambiguity of words in larger alphabets (Proposition 2). On the other hand, we find that Inline graphic -Parikh matrices reduce the ambiguity of most binary words, with the few exceptions from Theorem 2, which have all been shown to be rare occurrences within the binary alphabet. Thus, using both tools together leads to a reduction in ambiguity in most cases.

Going forward, we wish to characterise words that are described uniquely by both types of matrices, respectively, as well as quantifying the ambiguity reduction permitted by both notions. Theorem 2 tells us that there are very few binary words whose Parikh matrix ambiguity cannot be reduced by Inline graphic -Parikh matrices. Future research on -Parikh matrices could also include an analysis similar to the one done in Proposition 2.

Finally we present a conjecture on the types of words that might be described by a Parikh matrix that is Inline graphic -distinguishable. We know that the presence of a certain type of factor, described in Proposition 1, in a word means that its Parikh matrix is -distinguishable. This conjecture implies that the presence of this factor is the only way that the ambiguity of a word could be reduced by Inline graphic -Parikh matrices.

Conjecture 8.

For any word Inline graphic , if is -distinguishable, then there exists a word amiable with w which contains a factor , where .

Contributor Information

Alberto Leporati, Email: alberto.leporati@unimib.it.

Carlos Martín-Vide, Email: carlos.martin@urv.cat.

Dana Shapira, Email: shapird@g.ariel.ac.il.

Claudio Zandron, Email: zandron@disco.unimib.it.

Jeffery Dick, Email: J.Dick@lboro.ac.uk.

Laura K. Hutchinson, Email: L.Hutchinson@lboro.ac.uk

Robert Mercaş, Email: R.G.Mercas@lboro.ac.uk.

Daniel Reidenbach, Email: D.Reidenbach@lboro.ac.uk.

References

1.Alazemi HMK, Černý A. Counting subwords using a trie automaton. Int. J. Found. Comput. Sci. 2011;22(6):1457–1469. doi: 10.1142/S0129054111008817. [DOI] [Google Scholar]
2.Alazemi HMK, Černý A. Several extensions of the Parikh matrix L-morphism. J. Comput. Syst. Sci. 2013;79(5):658–668. doi: 10.1016/j.jcss.2013.01.018. [DOI] [Google Scholar]
3.Atanasiu, A.: Parikh matrix mapping and amiability over a ternary alphabet. In: Discrete Mathematics and Computer Science, pp. 1–12 (2014)
4.Atanasiu A, Atanasiu R, Petre I. Parikh matrices and amiable words. Theoret. Comput. Sci. 2008;390(1):102–109. doi: 10.1016/j.tcs.2007.10.022. [DOI] [Google Scholar]
5.Atanasiu A, Martín-Vide C, Mateescu A. Codifiable languages and the Parikh matrix mapping. J. UCS. 2001;7:783–793. [Google Scholar]
6.Atanasiu A, Martín-Vide C, Mateescu A. On the injectivity of the Parikh matrix mapping. Fund. Inform. 2002;49(4):289–299. [Google Scholar]
7.Atanasiu A, Teh WC. A new operator over Parikh languages. Int. J. Found. Comput. Sci. 2016;27(06):757–769. doi: 10.1142/S0129054116500271. [DOI] [Google Scholar]
8.Bera S, Mahalingam K. Some algebraic aspects of Parikh q-matrices. Int. J. Found. Comput. Sci. 2016;27(4):479–500. doi: 10.1142/S0129054116500118. [DOI] [Google Scholar]
9.Egecioglu, Ö.: A q-matrix encoding extending the Parikh matrix mapping. Technical report 14, Department of Computer Science at UC Santa Barbara (2004)
10.Egecioglu O, Ibarra OH. A matrix q-analogue of the Parikh map. In: Levy J-J, Mayr EW, Mitchell JC, editors. Exploring New Frontiers of Theoretical Informatics; Boston: Springer; 2004. pp. 125–138. [Google Scholar]
11.Egecioglu, Ö., Ibarra, O.H.: A -analogue of the Parikh matrix mapping. In: Formal Models, Languages and Applications [this volume commemorates the 75th birthday of Prof. Rani Siromoney]. In: Series in Machine Perception and Artificial Intelligence, vol. 66, pp. 97–111 (2007)
12.Lothaire M. Combinatorics on Words. Cambridge: Cambridge University Press; 1997. [Google Scholar]
13.Mahalingam K, Subramanian KG. Product of Parikh matrices and commutativity. Int. J. Found. Comput. Sci. 2012;23(01):207–223. doi: 10.1142/S0129054112500049. [DOI] [Google Scholar]
14.Mateescu, A., Salomaa, A., Salomaa, K., Yu, S.: On an extension of the Parikh mapping. Turku Centre for Computer Science (2000)
15.Parikh RJ. On context-free languages. J. ACM. 1966;13(4):570–581. doi: 10.1145/321356.321364. [DOI] [Google Scholar]
16.Poovanandran G, Teh WC. Strong (2t) and strong (3t) transformations for strong M-equivalence. Int. J. Found. Comput. Sci. 2019;30(05):719–733. doi: 10.1142/S0129054119500187. [DOI] [Google Scholar]
17.Salomaa A, Yu S. Subword occurrences, Parikh matrices and Lyndon images. Int. J. Found. Comput. Sci. 2010;21:91–111. doi: 10.1142/S0129054110007155. [DOI] [Google Scholar]
18.Şerbănuţă TF. Extending Parikh matrices. Theor. Comput. Sci. 2004;310(1–3):233–246. doi: 10.1016/S0304-3975(03)00396-7. [DOI] [Google Scholar]
19.Şerbănuţă VN. On Parikh matrices, ambiguity, and prints. Int. J. Found. Comput. Sci. 2009;20(01):151–165. doi: 10.1142/S0129054109006498. [DOI] [Google Scholar]
20.Širšov, A.I.: Subalgebras of free Lie algebras. Mat. Sbornik N.S. 33(75), 441–452 (1953)

[CR1] 1.Alazemi HMK, Černý A. Counting subwords using a trie automaton. Int. J. Found. Comput. Sci. 2011;22(6):1457–1469. doi: 10.1142/S0129054111008817. [DOI] [Google Scholar]

[CR2] 2.Alazemi HMK, Černý A. Several extensions of the Parikh matrix L-morphism. J. Comput. Syst. Sci. 2013;79(5):658–668. doi: 10.1016/j.jcss.2013.01.018. [DOI] [Google Scholar]

[CR3] 3.Atanasiu, A.: Parikh matrix mapping and amiability over a ternary alphabet. In: Discrete Mathematics and Computer Science, pp. 1–12 (2014)

[CR4] 4.Atanasiu A, Atanasiu R, Petre I. Parikh matrices and amiable words. Theoret. Comput. Sci. 2008;390(1):102–109. doi: 10.1016/j.tcs.2007.10.022. [DOI] [Google Scholar]

[CR5] 5.Atanasiu A, Martín-Vide C, Mateescu A. Codifiable languages and the Parikh matrix mapping. J. UCS. 2001;7:783–793. [Google Scholar]

[CR6] 6.Atanasiu A, Martín-Vide C, Mateescu A. On the injectivity of the Parikh matrix mapping. Fund. Inform. 2002;49(4):289–299. [Google Scholar]

[CR7] 7.Atanasiu A, Teh WC. A new operator over Parikh languages. Int. J. Found. Comput. Sci. 2016;27(06):757–769. doi: 10.1142/S0129054116500271. [DOI] [Google Scholar]

[CR8] 8.Bera S, Mahalingam K. Some algebraic aspects of Parikh q-matrices. Int. J. Found. Comput. Sci. 2016;27(4):479–500. doi: 10.1142/S0129054116500118. [DOI] [Google Scholar]

[CR9] 9.Egecioglu, Ö.: A q-matrix encoding extending the Parikh matrix mapping. Technical report 14, Department of Computer Science at UC Santa Barbara (2004)

[CR10] 10.Egecioglu O, Ibarra OH. A matrix q-analogue of the Parikh map. In: Levy J-J, Mayr EW, Mitchell JC, editors. Exploring New Frontiers of Theoretical Informatics; Boston: Springer; 2004. pp. 125–138. [Google Scholar]

[CR11] 11.Egecioglu, Ö., Ibarra, O.H.: A -analogue of the Parikh matrix mapping. In: Formal Models, Languages and Applications [this volume commemorates the 75th birthday of Prof. Rani Siromoney]. In: Series in Machine Perception and Artificial Intelligence, vol. 66, pp. 97–111 (2007)

[CR12] 12.Lothaire M. Combinatorics on Words. Cambridge: Cambridge University Press; 1997. [Google Scholar]

[CR13] 13.Mahalingam K, Subramanian KG. Product of Parikh matrices and commutativity. Int. J. Found. Comput. Sci. 2012;23(01):207–223. doi: 10.1142/S0129054112500049. [DOI] [Google Scholar]

[CR14] 14.Mateescu, A., Salomaa, A., Salomaa, K., Yu, S.: On an extension of the Parikh mapping. Turku Centre for Computer Science (2000)

[CR15] 15.Parikh RJ. On context-free languages. J. ACM. 1966;13(4):570–581. doi: 10.1145/321356.321364. [DOI] [Google Scholar]

[CR16] 16.Poovanandran G, Teh WC. Strong (2t) and strong (3t) transformations for strong M-equivalence. Int. J. Found. Comput. Sci. 2019;30(05):719–733. doi: 10.1142/S0129054119500187. [DOI] [Google Scholar]

[CR17] 17.Salomaa A, Yu S. Subword occurrences, Parikh matrices and Lyndon images. Int. J. Found. Comput. Sci. 2010;21:91–111. doi: 10.1142/S0129054110007155. [DOI] [Google Scholar]

[CR18] 18.Şerbănuţă TF. Extending Parikh matrices. Theor. Comput. Sci. 2004;310(1–3):233–246. doi: 10.1016/S0304-3975(03)00396-7. [DOI] [Google Scholar]

[CR19] 19.Şerbănuţă VN. On Parikh matrices, ambiguity, and prints. Int. J. Found. Comput. Sci. 2009;20(01):151–165. doi: 10.1142/S0129054109006498. [DOI] [Google Scholar]

[CR20] 20.Širšov, A.I.: Subalgebras of free Lie algebras. Mat. Sbornik N.S. 33(75), 441–452 (1953)

PERMALINK

Reducing the Ambiguity of Parikh Matrices

Jeffery Dick

Laura K Hutchinson

Robert Mercaş

Daniel Reidenbach

Abstract

Introduction

Preliminaries

Definition 1 (Explanatory)

Example 1

Example 2

Definition 2

Definition 3

Example 3

Definition 4

Lemma 1

Lemma 2

-Parikh Matrices

Example 4

Definition 5

Proposition 1

Proof

Proposition 2

Remark 6

Corollary 1

Proposition 3

Proof

Proposition 4

-Parikh Matrices

Theorem 1

Proposition 5.

Proof Outline.

Example 5.

Example 6.

Definition 7.

Example 7.

Corollary 2.

Proposition 6.

Example 8.

Proposition 7.

Proof Outline.

Proposition 8.

Example 9.

Proposition 9.

Proof Outline.

Example 10.

Lemma 3.

Lemma 4.

Proof Outline.

Lemma 5.

Proof Outline.

Theorem 2.

Proof Outline.

Conclusion and Future Work

Conjecture 8.

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases