Abstract
The Parikh matrix mapping allows us to describe words using matrices. Although compact, this description comes with a level of ambiguity since a single matrix may describe multiple words. This work looks at how considering the Parikh matrices of various transformations of a given word can decrease that ambiguity. More specifically, for any word, we study the Parikh matrix of its Lyndon conjugate as well as that of its projection to a smaller alphabet. Our results demonstrate that ambiguity can often be reduced using these concepts, and we give conditions on when they succeed.
Keywords: Combinatorics, Parikh matrix, Ambiguity, Lyndon conjugate
Introduction
An approach for a more compact representation of data can be provided by histograms, which are also a well established statistical tool used in a wide range of applications. The concept of a Parikh vector [15] represents a type of such histograms that is specific to the analysis of sequences of symbols (or: words), considering the number of occurrences of each letter that exists in a word.
Parikh vectors can be easily computed and are guaranteed to be logarithmic in the size of the word they represent, but they are ambiguous; that is, multiple words typically share the same Parikh vector. Following this, in [14] the authors look at a refinement of the vector notion which is meant to reduce this ambiguity, and introduce an extension for it in the form of a Parikh matrix. A Parikh matrix not only contains the Parikh vector of the word, but also information regarding some of the word’s (scattered) subwords. Such a matrix has the same asymptotic compactness as a Parikh vector and is associated to a significantly smaller number of words. However, it does not normally remove ambiguity entirely.
The bulk of the work done on the Parikh matrix mapping concerns the ambiguity that Parikh matrices exhibit. A lot of effort is spent on identifying an alternative to the Parikh matrix concept that would make a mapping from a word injective, or less ambiguous in general [1, 2, 8–11, 18]. These include even more refined versions of the matrices by inclusion of polynomials, various extensions on the mappings, or both. For Parikh matrices explicitly, due to the difficulty arising from this ambiguity, the primary focus was on investigating this property on binary [4–7, 17] and ternary [3, 13, 16, 19] alphabets, leaving alphabets of size greater than three relatively unexplored.
In terms of reducing the ambiguity of a word, the investigation was based on either gathering more information about the specific word by altering the order of the alphabet, known as the dual order [6, 14], or by considering the reverse image of the word [6]. Hence an under-studied aspect that may reduce the ambiguity of a matrix concerns the information acquired by altering the word itself, or considering other alterations of the alphabet. In this work we present and investigate two different methods that reduce the ambiguity of the original Parikh matrices in the form of
-Parikh matrices and
-Parikh matrices.
The first of the two transformations, the
-Parikh matrix mapping, considers the Parikh matrices associated to a projection morphism of the initial word, where the considered alphabet is reduced to the subset of the alphabet used within the defined transformation. These represent a particular case of the extended mapping presented in [18], where we only consider a subset of the original alphabet. For example, consider the words abcaabaac and abacabcaa. It is easy to see that both share the same number of letters, and subwords ab, bc and abc, respectively, making their Parikh matrices equal and therefore ambiguous. The
-Parikh matrices associated to them with respect to
consider the number of subwords ac, which is 6 in the former, but only 5 in the latter of the words. Hence, there exist
-Parikh matrices not shared by the words.
We show that, using
-Parikh matrices, we can reduce the ambiguity of the vast majority of words. We also explore when
-Parikh matrices do not reduce ambiguity, as well as provide some insight into the types of words that cannot be uniquely described by a
-Parikh matrix.
However, since
-Parikh matrices are defined for a subset of the initial alphabet, they prove useless when dealing with binary sequences. We therefore consider an alternative transformation of words: the Lyndon conjugate, first introduced in [20], which is defined as the lexicographically smallest circular rotation of a word. Lyndon conjugates were used previously as a tool for ambiguity reduction. In [17], the authors define the Lyndon image of a Parikh matrix as the lexicographically smallest word describing such a matrix. Hence every Parikh matrix has exactly one distinct Lyndon image, which therefore allows each Parikh matrix to be described uniquely. In the context of this paper, we use the Lyndon conjugate differently, i. e., we consider the Parikh matrix of the Lyndon conjugate of a word, and we call the resulting matrix the
-Parikh matrix of the original word.
Consider the Parikh matrix of the Lyndon conjugates of the two previously given words. Observe that aabaacabc has 7 occurrences of ab, whereas aaabacabc has 8, making their Parikh matrices different. Hence, the ambiguity of their Parikh matrix can be reduced using
-Parikh matrices.
While
-Parikh matrices are a useful concept for any alphabet size, we focus on the cases where they reduce ambiguity in the binary alphabet and show that this happens in most cases. We give specific conditions of when
-Parikh matrices do not help reduce the ambiguity of the given word, and investigate the words for which these criteria apply. This leads us to our main result of the paper, a characterisation of words whose ambiguity can be reduced using
-Parikh matrices.
We end this section with a brief breakdown of our paper. In Sect. 2 we present some basic definitions and notions. Section 3 examines the first of the two notions we introduce, the
-Parikh matrix, establishing conditions for when they can or cannot achieve ambiguity reduction. In Sect. 4, we study equivalent questions for
-Parikh matrices, largely focusing on binary alphabets in some cases. We end our paper with conclusions as well as directions for future work.
Preliminaries
It is assumed the reader is familiar with the basics of combinatorics on words. If needed, [12] can be consulted. Throughout this paper,
refers to the set of natural numbers starting with 1.
We refer to a string of arbitrary letters as a word which is formed by concatenation of letters. The set of all letters used to create our words is called an alphabet. We represent an ordered alphabet as
, where
is the size of the alphabet, and by convention
is the ith letter in the Latin alphabet. Whenever the alphabet size is irrelevant or understood, we omit this from notation using only
. All alphabets referred to in this paper have an order imposed on them.
We define the concatenation of two words u and v as uv. The length of a word is the total number of not necessarily distinct letters it contains and the empty word of length zero is denoted
. The Kleene star, denoted
, is the operation that, once applied to a given alphabet, generates the set of all finite words that result from concatenating any words in that alphabet. Further, we denote the
letter in a word w as w[i].
The reversal of a word, denoted
, is defined as
, where
is a word with
. We say that a factor
v is in w if and only if w can be written as
, where
. We say that
is a subword of v if we have a factorisation
where
. We use
to denote the number of distinct occurrences of u as a subword in v.
The Parikh vector [15]
associated with a word w is obtained through a mapping
, defined as
. For a matrix M of size
, the j-diagonal is defined as all elements of M that are in the position
for
. A word is associated with a matrix, called its Parikh matrix, if the matrix is obtained from that word following the process detailed in the following explanatory definition. For a technical version of the definition we refer to [14].
Definition 1 (Explanatory)
Let
. The Parikh matrix, denoted
, that w is associated with has size
. The diagonal of the matrix is populated with 1’s and all elements below it are 0. The count of all subwords that consist of consecutive letters in
and are of length n in the word are found on the n-diagonal, for
.
One notion we introduce in this paper relies on a change in alphabet size. As such, to emphasise the size n of the alphabet used for a Parikh matrix, we write
. We say that a Parikh matrix describes a word if the word is associated to the matrix. Notice that due to the associativity of matrix multiplication, the Parikh matrix of a word can be constructed from the Parikh matrices of its factors. For a word
, we have
.
Example 1
Consider the word
defined over the alphabet
. Then by definition our Parikh matrix is of size
and we have
![]() |
For the rest of this work we refine our notation for a Parikh matrix where we remove the elements not depending on the associated word. By definition a Parikh matrix is an upper triangular matrix with 1’s on the diagonal regardless of the word described. For aesthetics, removing the redundant part leaves us with a triangular structure that holds the same information as the original matrix,
![]() |

Two words w and
are conjugates if we can write
and
. For a word w, we say that the conjugacy class of
w, denoted C(w) is the class of all of its possible conjugates. A conjugacy class is associated to a Parikh matrix if at least one word belonging to that class is associated to the matrix.
Example 2
The matrix
has only the words aabbaa, abaaba, baaaab associated to it. The words aabbaa and baaaab are members of the same conjugacy class, while abaaba belongs to a different conjugacy class. Hence this matrix has two conjugacy classes associated to it. 
A Parikh matrix can be associated to multiple words, as seen above, although cases exist where a matrix describes a single word, e. g., aabb is the unique word associated to
. We say that two words are amiable if they are associated to the same Parikh matrix. If two or more words are associated to a single Parikh matrix, we say that the matrix is ambiguous. Later in this paper, we reduce the ambiguity of a word using both its Parikh matrix and the Parikh matrix of an altered form of that word to describe it. As such, we introduce a formal definition of the ambiguity that multiple functions may have based on the set of all words that satisfy all functions. We are then able to use this when considering the ambiguity of the notions we introduce later.
Definition 2
For a word w and functions
we define
for
. If
, then we call w
unambiguous on
, and say that
uniquely define
w. However, if
for
and functions
, then we say that
reduce the ambiguity of w on
.
Observe that we always have
. Furthermore, if
, then
is unambiguous and it is not possible to further reduce ambiguity.
First we introduce the
-Parikh matrix. This matrix is in essence the Parikh matrix of a projection of a word, and represents a particular case of the extension of the Parikh matrix mapping presented in [19]. For
,
and
, the
-Parikh matrix of w with respect to S is defined as follows.
Definition 3
For
with
, let
such that
, where
. We define the
-Parikh matrix of the word w with respect to S as
, where the morphism
is defined as
![]() |
To gain some intuition about the above definition, consider an example.
Example 3
Let
,
, and
. For the index sequence of S, since a is the lexicographically smallest letter in S, we obtain
,
and
. Hence
,
and
.
With the transformation defined we apply this to the word, and calculate the corresponding
-Parikh matrix as the Parikh matrix of the transformed word,
![]() |

The Lyndon conjugate of a word is the conjugate that is lexicographically smallest based on the order on the alphabet. The Lyndon conjugate of a word w is denoted L(w). In an attempt to reduce the ambiguity of Parikh matrices, we modify the original Parikh matrix mapping to gain more information about a given word. Next, we introduce the
-Parikh matrix associated to a word.
Definition 4
Given a word w, we define its
-Parikh matrix,
, as the Parikh matrix associated with its Lyndon conjugate, L(w).
It was shown in [4] that there exist transformations that, when applied to a word, create a new word that is amiable with the original. For non-binary alphabets, a
transformation is given.
Lemma 1
([4]). Let
with
. Then w transforms into
using a
transformation if
and
, where
,
, and
.
For binary alphabets, a second type of transformation is described, referred to as a
, that allows us to check if certain words are amiable without constructing their matrices.
Lemma 2
([4]). Let
. Then w transforms into
through a
transformation if
and
, or vice-versa, where
and
.
-Parikh Matrices
In this section, we examine when and how much
-Parikh matrices reduce the ambiguity of a given word. When we refer to a reduction in ambiguity using
-Parikh matrices, we mean that the number of words described by the original Parikh matrix and their respective
-Parikh matrices is strictly less than the total number of words described by the original Parikh matrix alone, i. e.,
, for some
. First we present an example of
-Parikh matrices removing the ambiguity of a Parikh matrix entirely.
Example 4
Consider the word
from Example 1, which is amiable with the word
and no others. Then we choose our set
, and get that:
and
Thus w and
have different
-Parikh matrices and we can uniquely describe them. 
We first introduce some terms that are useful when describing how effective a given
-Parikh matrix is at reducing ambiguity.
Definition 5
Given a word
, we call
-distinguishable if either
or there exists a word
and a set
such that
and
. In the latter case we say that w and u are
-distinct. Furthermore, we call w
-unique if there exist sets
such that
.
Now we use these terms to examine words whose ambiguity can be reduced using
-Parikh matrices, namely those that contain any length two factor where those two letters are not equal or consecutive in the alphabet.
Proposition 1
For any word
with a factor
where
, we have that
is
-distinguishable.
Proof
Since
, if
where
, then
is also associated to w, following Lemma 1. Without loss of generality, take
. Then
, since
and
are elements in
and
, respectively, and
. 
It is simple to identify words that have such factors by comparing adjacent positions in the word. We can use this to find a lower bound for the proportion of words that are uniquely identified for a given alphabet and word length.
Proposition 2
The number of words of length m in
that are reduced in ambiguity by
-Parikh matrices is bounded below by
.
Notice especially that as n and m get larger, the proportion of words which are reduced in ambiguity by
-Parikh matrices also gets larger. We therefore conclude that the use of
-Parikh matrices reduces ambiguity for a larger ratio of words for bigger alphabets rather than smaller.
There also exist words for which
-Parikh matrices do not reduce ambiguity. Our following result says that if our choice of a subset consists of only consecutive letters of the initial alphabet, the
-Parikh matrices are not
-distinguishable.
Remark 6
If all elements of the set
are consecutive in the alphabet
, then
.
The result of Remark 6 strengthens the one of Proposition 1 by telling us that the ambiguity of words defined over binary alphabets is not reducible by
-Parikh matrices.
Corollary 1
There does not exist a Parikh matrix that describes binary words whose ambiguity can be reduced by
-Parikh matrices.
Furthermore, there exist non-binary words for which
-Parikh matrices do not remove ambiguity, namely those that are not
-unique. Finally, we end this section by giving two classes of words which are not uniquely described by
-Parikh matrices, no matter how we choose the set S.
Proposition 3
Take two words
with the form
and
, where
and
. If
, then for all
, we have
.
Proof
Firstly, if
, equivalence follows, as
. Now, let
.
In the case where S contains either
or
, then
since
and
are the only letters that swap places in
compared to w. Since
, clearly
follows.
If
, then,
is a binary word and can be transformed via a
transformation, from Lemma 2, into
, so
.
Next consider that
,
, and S has no elements between
and
. Then
and
. Using an extension from [3] of the
transformations we can transform
into
, and get that
.
Finally, consider the case where S contains
, and at least one letter that comes lexicographically between
and
. Then,
can be transformed into
via two
transformations on
and
, since
and
are not lexicographically adjacent in S (see Lemma 1). 
The ideas from Proposition 3 give rise to another class of words that are not
-unique, by loosening the condition on v and extending the length of the word.
Proposition 4
Take two words of the form
, and
, where
and
. Let
and
. Then, w and
are not
-distinct if and only if
for all
, and at least one of the following conditions is true:
, and for
, if
, then
, and if
, then
;
, and for
, if
, then
, and if
, then
.
In other words, the above statement says that two words are not
-distinct if both
and
are defined on the subset of the alphabet which is either lexicographically bigger than
or smaller than
, and they share the same Parikh vector for the subset of letters which are not in between
and
. Furthermore, if
, then all the letters which are lexicographically greater than
must occur in
in decreasing lexicographical order and in
in increasing order. On the other hand, if
, then all the letters which are lexicographically smaller than
must occur in
in increasing lexicographically order and in
in decreasing lexicographical order.
-Parikh Matrices
Proposition 2 shows that in many cases, the set of words that share both a Parikh matrix and a
-Parikh matrix is smaller than the set of those that share only a Parikh matrix. However, following Corollary 1 we also know that this never happens for binary alphabets. Hence we now study
-Parikh matrices as an alternative method of ambiguity reduction. While they can be effective for any non-unary alphabet, we focus on binary alphabets specifically. We begin this section by explaining the motivation for choosing the Lyndon conjugate of a word and then build to our main result where we characterise words whose ambiguity is reduced by the use of
-Parikh matrices.
As indicated by Definition 4, the concept of
-Parikh matrices is based on a modification to a word that results in a change in the order of letters. The following theorem implies that the strategy of altering a word is not always a successful method of ambiguity reduction. Note that
refers to the Parikh matrix of the reversal of a word.
Theorem 1
([4]). For a word w, we have that
.
Unlike Theorem 1,
-Parikh matrices use the conjugate of a word. The next proposition implies that such conjugates need to be chosen wisely.
Proposition 5.
Given words
with
, for any factorisations
and
such that
, we have that
implies
. For
, the reverse direction also stands, namely
implies
.
Proof Outline.
We can prove the statement that holds for every size alphabet by contradiction, by assuming that
and
. We examine the total number of ab subwords in v, w,
and
to obtain a set of equations. We then consider the total number of b’s in
and
to find a contradiction within these equations.
For the statement that holds just for the binary alphabet we examine the total number of ab subwords in
and
and get a contradiction in the equations we obtain by initially assuming that
,
and
. 
Below example shows that
is necessary for Proposition 5.
Example 5.
Consider the words
with
and
with
. One can easily find that
. Furthermore, we have that
,
and
. However
, since
and
, and therefore
is a necessary condition in the context of Proposition 5. 
An example for the ternary alphabet where
even though we have that
and
is given below. Note that if
, then we must also have
. Since any alphabet of size greater than 3 would rely on the result of the ternary alphabet always being true, we can deduce that the backwards direction from Proposition 5 only holds for the binary alphabet.
Example 6.
Let
and
. We have that
. Now let
and
. Then we have that
and
. Note that
, since
and
. But this gives us
. 
Proposition 5 shows that when looking for a modification that we can apply to a word to find a new and different Parikh matrix, we need to consider conjugates of amiable words where it is less likely that the Parikh vectors of their right factors are the same, i. e., conjugates obtained by shifting the original words a different number of times, respectively.
Let us now consider how using
-Parikh matrices reduces ambiguity. The rest of this section ignores any word w where
, since there is no ambiguity to be reduced here. For a word w, we calculate
and
and use both of these matrices to describe the original word. The ambiguity of a word w, with respect to its Parikh and
-Parikh matrices, according to Definition 2, is the total number of words that share a Parikh matrix and an
-Parikh matrix with w, namely
. We use the next definitions and propositions to build to our main result where we characterise binary words whose ambiguity is reduced using
-Parikh matrices. In line with Definition 5 we introduce the following definitions.
Definition 7.
Given a word
, we call
-distinguishable if either
or there exists a word
with
, such that
. In the latter case we say that w and u are
-distinct. A word w is
-unique if
.
Note that if w and v are
-distinct, then
and
. The example below demonstrates the effectiveness of
-Parikh matrices for ambiguity reduction.
Example 7.
Consider the words
,
and
with
. However, for the conjugates
,
and
we have that
,
, and
. Thus their
-Parikh matrices are all different and we can uniquely describe each of the words by using
-Parikh matrices. 
-distinguishability is necessary for ambiguity reduction in this case.
Corollary 2.
For
,
if and only if
is
-distinguishable.
The above characterisation of ambiguity reduction leads us to investigate sufficient conditions for a matrix to be ambiguous, and therefore for any pair of words it describes not to be
-distinct. Our next results consider the situations when the Parikh matrix of a word is not
-distinguishable. We show that words that meet the criteria outlined in each proposition within the binary alphabet are rare either later in the paper or directly following the next proposition.
Proposition 6.
For a word
, if all words in
belong to the same conjugacy class, then
is not
-distinguishable.
Example 8.
Let
and
. These two words are amiable with each other and nothing else. Furthermore,
, and since both words share a Lyndon conjugate, both words also share an
-Parikh matrix. Therefore
is not
-distinguishable. 
Now we move on to explore, for binary alphabets, the case where all words in
belong to the same conjugacy class in more detail. Recall that C(w) refers to the conjugacy class of w.
Proposition 7.
Let
. Then
, for all
, if and only if
.
Proof Outline.
The forwards direction is proven by examining every element of the conjugacy class of w. We can first prove that if
, for all
, then words in the conjugacy class of w are only amiable with other conjugates of w. We then show that this is only true when L(w) is in the set
. For this we define a block of a letter to be a unary factor of a word which is not extendable to the right or left and argue that applying a
transformation to any Lyndon conjugate that is not in the above set either alters the size of the block of a’s at the start of the word, or changes the total number of blocks of a’s in the word altogether. This therefore gives us a word that is amiable to, but not a conjugate of, the original.
The backwards direction is proven by finding the Parikh matrices of all conjugates of words in the set
. We then find that the only words described by these matrices are these conjugates. 
We now look at the case where all words associated to a Parikh matrix are the Lyndon representatives of their respective conjugacy classes, which again makes this matrix not
-distinguishable.
Proposition 8.
For a word
, if
and
, then
is not
-distinguishable.
Example 9.
The words
and
are only amiable with each other,
, and both are the Lyndon representatives of their respective conjugacy classes. Therefore,
and
is not
-distinguishable. 
For binary alphabets, we examine in greater detail when all words in
are the Lyndon representatives of their conjugacy classes. The next result provides a necessary and sufficient condition, and therefore the complete characterisation, for this case to occur for the binary alphabet.
Proposition 9.
Let
. Then the following statements are equivalent.
For all
, we have that
.
and for
we have that
and
.
Proof Outline.
To show that these two statements are equivalent, we begin by showing that the second statement implies the former. We do this by first showing that if a word is of the form
and, for
, we have that
and
, then
, and next move on to prove that only words of this form are described by
. We prove that
by observing that
. Adding more a’s to the start of v and more b’s to the end means that the Lyndon conjugate is still the word itself, and hence obtain
. We prove that words of the form described in the second point are only amiable with each other by calculating the total number of ab subwords in v and extrapolating this to w.
To prove that the first statement implies the second, we use the fact that our words share a Parikh matrix and that they must begin with the largest number of consecutive a’s in the word and end with at least one b. We also rewrite
where
begins with the first occurrence of a b and ends with the last occurrence of an a in w, and examine the form that this must take given the fixed number of ab subwords we must have in w. This gives us the total number of a’s and b’s in a word relative to the total number of ba subwords. 
The next example shows how the above result can be used to identify the form of the words that always share a Parikh matrix with other Lyndon conjugates.
Example 10.
Following Proposition 9, Lyndon representatives of different conjugacy classes share a Parikh matrix only if they are of the form
, where for
we have that
and
. Let us find all words of this form where
. We begin by finding all binary words that contain 3 subwords ba. These are baaa, baba and bbba. Next add a’s to the front and b’s to the end of each word, respectively, so that we have a total of 6 a’s and 4 b’s per word: aaabaaabbb, aaaabababb, aaaaabbbab. Finally, any number of a’s and b’s can be added to the front and end of each word, respectively:
. Hence we know that any word of this form is the Lyndon representative of its conjugacy class and shares a Parikh matrix with the two other words stated above. For example,
. 
Thus far, we presented sufficient conditions for two amiable words not to be
-distinct. Our main result shows that these conditions are in fact also the necessary ones. The following lemmas are used in the proof of the final result, but are included here as they are also interesting results on their own. The first lemma tells us that if the Parikh vectors of the proper right factors of two amiable words are different, then the size of these factors must also be unequal.
Lemma 3.
Consider the words
and
with
, such that
and
. If
, then
.
Furthermore, if two amiable binary words are not the Lyndon representatives of their conjugacy classes, then to either of them we can apply a Type 2 transformation to obtain an amiable word whose Lyndon conjugate begins in a different position from the original one.
Lemma 4.
Let
with
. If
, then there exists
, where
, such that
.
Proof Outline.
The statement can be proven by contradiction, by first assuming that the Lyndon conjugate of every word associated to
begins in the same position within those words. We then show that for the Lyndon conjugate to begin at any position within a given binary word, it is possible to apply a
transformation to obtain a new word whose Lyndon conjugate begins in a different position. 
Next we show that all words that are conjugates of any word w such that
are also amiable with a word that is not a conjugate of any of the words in
.
Lemma 5.
Let
, where
. For any
there exists
such that
.
Proof Outline.
This statement can be proven by considering every form that a word w can take, such that
, from Proposition 9 and then examining all conjugates of these words. We show that a
transformation can be applied to every conjugate to obtain a word that is not a conjugate of any word in our original set
. 
We end this section by giving our main result that characterises all binary words whose Parikh matrix is not
-distinguishable.
Theorem 2.
For
, a Parikh matrix is not
-distinguishable if and only if any of the words it describes meet at least one of the following criteria:

and for
we have that
and 
Proof Outline.
For the set of words
, the forward direction is easily proven by finding these words’ Parikh and
-Parikh matrices, respectively. The backward direction is proved using the fact that for words
such that w is the reverse of
and
, then
if and only if
.
For the rest of the words, the ‘if’ direction was mostly proven earlier when Propositions 6, 7, 8 and 9, describing these situations, were introduced.
The ‘only if’ direction is proven by first examining the consequences of Proposition 5, which tells us that two words are
-distinct if their Lyndon conjugates begin in different positions, respectively. We use Lemmas 3 and 4 to conclude that no set of amiable binary words exists where the Lyndon conjugates of all words in the set begin in the same position of each word, respectively. Hence all Parikh matrices would be
-distinguishable if it were not for some cases that arise as a result of us using the Lyndon conjugate. These cases are namely the ones where the set of amiable words are all Lyndon conjugates, are all members of the same conjugacy class, or are all conjugates of words whose Lyndon conjugates share a Parikh matrix. We showed in Propositions 7 and 9 that the first two cases are characterised by words of the form
where for
we have that
and
, and by words where their Lyndon conjugate is in the set
, respectively. We use Lemma 5 to conclude that no words exist such that the third case is true. 
Conclusion and Future Work
In this paper, we have shown that using
-Parikh matrices and
-Parikh matrices reduces the ambiguity of a word in most cases. From Corollary 1, we learn that
-Parikh matrices cannot reduce the ambiguity of a Parikh matrix that describes words in a binary alphabet, but are very powerful when it comes to reducing the ambiguity of words in larger alphabets (Proposition 2). On the other hand, we find that
-Parikh matrices reduce the ambiguity of most binary words, with the few exceptions from Theorem 2, which have all been shown to be rare occurrences within the binary alphabet. Thus, using both tools together leads to a reduction in ambiguity in most cases.
Going forward, we wish to characterise words that are described uniquely by both types of matrices, respectively, as well as quantifying the ambiguity reduction permitted by both notions. Theorem 2 tells us that there are very few binary words whose Parikh matrix ambiguity cannot be reduced by
-Parikh matrices. Future research on
-Parikh matrices could also include an analysis similar to the one done in Proposition 2.
Finally we present a conjecture on the types of words that might be described by a Parikh matrix that is
-distinguishable. We know that the presence of a certain type of factor, described in Proposition 1, in a word means that its Parikh matrix is
-distinguishable. This conjecture implies that the presence of this factor is the only way that the ambiguity of a word could be reduced by
-Parikh matrices.
Conjecture 8.
For any word
, if
is
-distinguishable, then there exists a word amiable with w which contains a factor
, where
.
Contributor Information
Alberto Leporati, Email: alberto.leporati@unimib.it.
Carlos Martín-Vide, Email: carlos.martin@urv.cat.
Dana Shapira, Email: shapird@g.ariel.ac.il.
Claudio Zandron, Email: zandron@disco.unimib.it.
Jeffery Dick, Email: J.Dick@lboro.ac.uk.
Laura K. Hutchinson, Email: L.Hutchinson@lboro.ac.uk
Robert Mercaş, Email: R.G.Mercas@lboro.ac.uk.
Daniel Reidenbach, Email: D.Reidenbach@lboro.ac.uk.
References
- 1.Alazemi HMK, Černý A. Counting subwords using a trie automaton. Int. J. Found. Comput. Sci. 2011;22(6):1457–1469. doi: 10.1142/S0129054111008817. [DOI] [Google Scholar]
- 2.Alazemi HMK, Černý A. Several extensions of the Parikh matrix L-morphism. J. Comput. Syst. Sci. 2013;79(5):658–668. doi: 10.1016/j.jcss.2013.01.018. [DOI] [Google Scholar]
- 3.Atanasiu, A.: Parikh matrix mapping and amiability over a ternary alphabet. In: Discrete Mathematics and Computer Science, pp. 1–12 (2014)
- 4.Atanasiu A, Atanasiu R, Petre I. Parikh matrices and amiable words. Theoret. Comput. Sci. 2008;390(1):102–109. doi: 10.1016/j.tcs.2007.10.022. [DOI] [Google Scholar]
- 5.Atanasiu A, Martín-Vide C, Mateescu A. Codifiable languages and the Parikh matrix mapping. J. UCS. 2001;7:783–793. [Google Scholar]
- 6.Atanasiu A, Martín-Vide C, Mateescu A. On the injectivity of the Parikh matrix mapping. Fund. Inform. 2002;49(4):289–299. [Google Scholar]
- 7.Atanasiu A, Teh WC. A new operator over Parikh languages. Int. J. Found. Comput. Sci. 2016;27(06):757–769. doi: 10.1142/S0129054116500271. [DOI] [Google Scholar]
- 8.Bera S, Mahalingam K. Some algebraic aspects of Parikh q-matrices. Int. J. Found. Comput. Sci. 2016;27(4):479–500. doi: 10.1142/S0129054116500118. [DOI] [Google Scholar]
- 9.Egecioglu, Ö.: A q-matrix encoding extending the Parikh matrix mapping. Technical report 14, Department of Computer Science at UC Santa Barbara (2004)
- 10.Egecioglu O, Ibarra OH. A matrix q-analogue of the Parikh map. In: Levy J-J, Mayr EW, Mitchell JC, editors. Exploring New Frontiers of Theoretical Informatics; Boston: Springer; 2004. pp. 125–138. [Google Scholar]
-
11.Egecioglu, Ö., Ibarra, O.H.: A
-analogue of the Parikh matrix mapping. In: Formal Models, Languages and Applications [this volume commemorates the 75th birthday of Prof. Rani Siromoney]. In: Series in Machine Perception and Artificial Intelligence, vol. 66, pp. 97–111 (2007)
- 12.Lothaire M. Combinatorics on Words. Cambridge: Cambridge University Press; 1997. [Google Scholar]
- 13.Mahalingam K, Subramanian KG. Product of Parikh matrices and commutativity. Int. J. Found. Comput. Sci. 2012;23(01):207–223. doi: 10.1142/S0129054112500049. [DOI] [Google Scholar]
- 14.Mateescu, A., Salomaa, A., Salomaa, K., Yu, S.: On an extension of the Parikh mapping. Turku Centre for Computer Science (2000)
- 15.Parikh RJ. On context-free languages. J. ACM. 1966;13(4):570–581. doi: 10.1145/321356.321364. [DOI] [Google Scholar]
-
16.Poovanandran G, Teh WC. Strong (2
t) and strong (3
t) transformations for strong M-equivalence. Int. J. Found. Comput. Sci. 2019;30(05):719–733. doi: 10.1142/S0129054119500187. [DOI] [Google Scholar] - 17.Salomaa A, Yu S. Subword occurrences, Parikh matrices and Lyndon images. Int. J. Found. Comput. Sci. 2010;21:91–111. doi: 10.1142/S0129054110007155. [DOI] [Google Scholar]
- 18.Şerbănuţă TF. Extending Parikh matrices. Theor. Comput. Sci. 2004;310(1–3):233–246. doi: 10.1016/S0304-3975(03)00396-7. [DOI] [Google Scholar]
- 19.Şerbănuţă VN. On Parikh matrices, ambiguity, and prints. Int. J. Found. Comput. Sci. 2009;20(01):151–165. doi: 10.1142/S0129054109006498. [DOI] [Google Scholar]
- 20.Širšov, A.I.: Subalgebras of free Lie algebras. Mat. Sbornik N.S. 33(75), 441–452 (1953)




