On Infinite Prefix Normal Words

Ferdinando Cicalese; Zsuzsanna Lipták; Massimiliano Rossi

doi:10.1016/j.tcs.2021.01.015

. Author manuscript; available in PMC: 2021 Jun 22.

Published in final edited form as: Theor Comput Sci. 2021 Jan 11;859:134–148. doi: 10.1016/j.tcs.2021.01.015

On Infinite Prefix Normal Words

Ferdinando Cicalese ¹, Zsuzsanna Lipták ^1,^*, Massimiliano Rossi ²

PMCID: PMC8219218 NIHMSID: NIHMS1709524 PMID: 34163096

Abstract

Prefix normal words are binary words with the property that no factor has more 1s than the prefix of the same length. Finite prefix normal words were introduced in [Fici and Lipták, DLT 2011]. In this paper, we study infinite prefix normal words and explore their relationship to some known classes of infinite binary words. In particular, we establish a connection between prefix normal words and Sturmian words, between prefix normal words and abelian complexity, and between prefix normality and lexicographic order.¹

Keywords: combinatorics on words, prefix normal words, infinite words, Sturmian words, abelian complexity, paperfolding word, Thue-Morse sequence, lexicographic order

1. Introduction

Prefix normal words are binary words where no factor has more 1s than the prefix of the same length. As an example, the word 11100110101 is prefix normal, while 11100110110 is not, since it has a factor of length 5 with four 1s, while the prefix of length 5 has only three 1s. Finite prefix normal words were introduced in [18] and further studied in [10, 11, 31, 14, 3, 19, 9].

One motivation for studying prefix normal words comes from the problem of Indexed Binary Jumbled Pattern Matching [7, 8, 25, 21, 2, 20, 13, 16, 1]: Given a finite word s of length n, construct an index in such a way that the following type of queries can be answered efficiently: for two integers x, y ≥ 0, does s have a factor with x 1s and y 0s? As shown in [18, 11], prefix normal words can be used for constructing such an index, via so-called prefix normal forms.

Prefix normal words have also been shown to form bubble languages [29, 30, 10], a family of binary languages with efficiently generable combinatorial Gray codes; the language of prefix normal words has connections to the Binary Reflected Gray Code [31]; and, recently, prefix normal words also appeared in a graph theoretic context [6]. Indeed, three sequences related to prefix normal words are present in the On-Line Encyclopedia of Integer Sequences (OEIS [33]): A194850 (the number of prefix normal words of length n), A238109 (a list of prefix normal words over the alphabet {1, 2}), and A238110 (maximal equivalence class sizes of words with the same prefix normal form).

In [14], we introduced infinite prefix normal words and analyzed a particular procedure that, given a finite prefix normal word, extends it while preserving the prefix normality property. We showed that the resulting infinite word is ultimately periodic. In this paper, we present a more comprehensive study of infinite prefix normal words, covering several classes of known and well studied infinite words. We will now give a quick tour of the paper (for precise definitions, see Section 2).

1.1. Our results

One way of obtaining infinite prefix normal words is by extending finite prefix normal words. We specify two such operations which, in the limit, produce prefix normal words that are extremal with respect to density (Theorem 1).

There exist periodic, ultimately periodic, and aperiodic infinite prefix normal words: for example, the periodic words 0^ω, 1^ω, and (10)^ω are prefix normal; the ultimately periodic word 1(10)^ω is prefix normal; and so is the aperiodic word 10100100010000 ⋯ = lim_n→∞ 1010² ⋯ 10ⁿ. The best studied class of aperiodic words are Sturmian words. We show that a Sturmian word w is prefix normal if and only if w = 1c_α for some α, where c_α is the characteristic word of slope α (Theorem 2).

We show further that every Sturmian word w can be turned into a prefix normal word by prepending a fixed number of 1s, which only depends on the slope of w. This follows from a more general result regarding c-balanced words (Lemma 5). For example, the Fibonacci word

f = 0100101001001010010100100101001001 \dots

is not prefix normal, but the word 1f is. Two other well-studied aperiodic words are the Thue-Morse word and the Champernowne word. The Thue-Morse word

t = 01101001100101101001011001101001 \dots

is not prefix normal but it can be turned into a prefix normal word by prepending two 1s: 11t is prefix normal. On the other hand, the binary Champernowne word

c = 0110111001011101111000100110101011 \dots

which is constructed by concatenating the binary expansions of the integers in ascending order, is not prefix normal and cannot be turned into a prefix normal word by prepending a finite number of 1s.

We also show that the notion of prefix normal forms from [18, 11] can be extended to infinite words. These can be used, similarly to the finite case, to encode the abelian complexity of the original word. The study of abelian complexity of infinite words was initiated in [27], and continued e.g. in [24, 4, 34, 12, 22]. We establish a close relationship between the abelian complexity and the prefix normal forms of w (Theorem 3). We demonstrate how this close connection can be used to derive results about the prefix normal forms of a word w. In some cases, such as for Sturmian words and words which are morphic images under the Thue-Morse morphism, we are able to explicitly give the prefix normal forms of the word (Corollary 3 and Theorem 5). Conversely, knowing its prefix normal forms allows us to derive results about the abelian complexity of a word. We also show how to compute the prefix normal forms of words that are binary uniform morphisms, based on an algorithm from [5] for computing their abelian complexity.

Another class of well-known binary words are Lyndon words. Notice that the prefix normal condition is different from the Lyndon condition²: for finite words, there are words which are both Lyndon and prefix normal (e.g. 110010), words which are Lyndon but not prefix normal (11100110110), words which are prefix normal but not Lyndon (110101), and words which are neither (101100). We study infinite prefix normal words and their prefix normal forms in the context of lexicographic orderings, and compare them to infinite Lyndon words [32] and the max- and min-words of [26] (Corollary 5).

Finally, we give conditions for periodicity and ultimate periodicity of prefix normal words in terms of their minimum density, a parameter introduced in [14] (Theorem 8).

1.2. Overview of paper

The paper is organized as follows. In Section 2, we introduce our terminology and give some simple facts about prefix normal words. In Section 3, we compare different operations that generate infinite prefix normal words by extending finite prefix normal words. In Section 4, we study the relationship between Sturmian words and prefix normal words. Section 5 deals with the connection between prefix normality and abelian complexity, and Section 6 focuses on the relationship with lexicographic order. Finally, in Section 7, we analyze the relationship between periodicity and minimum density of prefix normal words.

2. Basics

In our definitions and notations, we follow mostly [23]. A finite (resp. infinite) binary word w is a finite (resp. infinite) sequence of elements from {0, 1}. Thus an infinite word is a mapping $w : N \to {0, 1}$ , where $N$ denotes the set of positive integers. We denote the ith character of w by w_i. Note that we index words starting from 1. If w is finite, then its length is denoted by ∣w∣. The empty word, denoted ε, is the unique word of length 0. The set of binary words of length n is denoted by {0, 1}ⁿ, the set of all finite words by {0, 1}* = ∪_n≥0{0, 1}ⁿ, and the set of infinite binary words by {0, 1}^ω. For a finite word u = u₁ ⋯ u_n, we write u^rev = u_n ⋯ u₁ for the reverse of u, and for a finite or infinite word u, $\bar{u} = {\bar{u}}_{1} {\bar{u}}_{2} \dots$ for the complement of u, where $\bar{a} = 1 - a$ for α ∈ {0, 1}.

For two words u, v, where u is finite and v is finite or infinite, we write uv for their concatenation. If w = uxv, then u is called a prefix, x a factor (or substring), and v a suffix of w. We denote the set of factors of w by Fct(w) and its prefix of length i by pref_w(i), where pref_w(0) = ε. For a finite word u, we write ∣u∣₁ for the number of 1s, and ∣u∣₀ for the number of 0s in u, and refer to ∣u∣₁ as the weight of u. The Parikh vector of u is pv(u) = (∣u∣₀, ∣u∣₁). A word w is called balanced if for all u, v ∈ Fct(w), ∣u∣ = ∣v∣ implies ∣∣u∣₁ − ∣v∣₁∣ ≤ 1, and c-balanced if ∣u∣ = ∣v∣ implies ∣∣u∣₁ − ∣v∣₁∣ ≤ c.

For an integer k ≥ 1 and u ∈ {0, 1}ⁿ, u^k denotes the kn-length word uuu ⋯ u (k-fold concatenation of u) and u^ω the infinite word uuu ⋯. An infinite word w is called periodic if w = u^ω for some non-empty word u, and ultimately periodic if it can be written as w = vu^ω for some v and non-empty u. A word that is neither periodic nor ultimately periodic is called aperiodic. We set 0 < 1 and denote by ≤_lex the lexicographic order between words, i.e. u ≤_lex v if u is a prefix of v or there is an index i ≥ 1 s.t. pref_u(i − 1) = pref_v(i − 1) and u_i < v_i.

For an operation op : {0, 1}* → {0, 1}*, we denote by op⁽ⁱ⁾ the ith iteration of op. Further, let op*(w) = {op⁽ⁱ⁾(w) ∣ i ≥ 1} and op^ω (w) = lim_i→∞ op⁽ⁱ⁾(w), if it exists.

A binary morphism μ is a function μ : {0, 1}* → {0, 1}* such that for all u, v ∈ {0, 1}*, μ(uv) = μ(u)μ(v). A binary morphism μ is called uniform if ∣μ(0)∣ = ∣μ(1)∣. A fix point of a morphism μ is an infinite word v such that v = μ^ω(a) for some a ∈ {0, 1}.

Definition 1 Let w be a (finite or infinite) binary word. We define the following functions:

P_w(i) = ∣pref_w(i)∣₁, the weight of the prefix of length i,
D_w(i) = P_w(i)/i, the density of the prefix of length i,
$F_{w}^{1} (i) = \max {∣ u ∣_{1} : u \in F c t (w), ∣ u ∣ = i}$ the maximum number of 1s in a factor of length i,
$f_{w}^{1} (i) = min {∣ u ∣_{1} : u \in F c t (w), ∣ u ∣ = i}$ , the minimum number of 1s in a factor of length i,
$F_{w}^{0} (i) = \max {∣ u ∣_{0} : u \in F c t (w), ∣ u ∣ = i}$ , the maximum number of 0s in a factor of length i,
$f_{w}^{0} (i) = min {∣ u ∣_{0} : u \in F c t (w), ∣ u ∣ = i}$ , the minimum number of 0s in a factor of length i.

Note that in the context of succinct indexing, the function P_w(i) is often called rank₁(w, i). We are now ready to define prefix normal words.

Definition 2 (Prefix normal words) A (infinite or finite) binary word w is called 1-prefix normal, or simply prefix normal, if $P_{w} (i) = F_{w}^{1} (i)$ for all i ≥ 1 (for all 1 ≤ i ≤ ∣w∣ if w is finite). It is called 0-prefix normal if $i - P_{w} (i) = F_{w}^{0} (i)$ for all i ≥ 1 (for all 1 ≤ i ≤ ∣w∣ if w is finite). We denote the set of all finite 1-prefix normal words by $L_{fin}$ , the set of all infinite 1-prefix normal words by $L_{inf}$ , and $L = L_{fin} \cup L_{inf}$ .

In other words, a word is prefix normal if no factor has more 1s than the prefix of the same length. Given a binary word w, we say that a factor u of w satisfies the prefix normal condition if ∣u∣₁ ≤ P_w(∣u∣).

Example 1 The word 110100110110 is not prefix normal since the factor 11011 has four 1s, which is more than in the prefix 11010 of length 5. The word 110100110010, on the other hand, is prefix normal. The infinite word (11001)^ω is not prefix normal, because it has 111 as a factor, which has more 1s than the prefix of length 3, but the word (11010)^ω is.

The following facts about infinite prefix normal words are immediate.

Lemma 1 1. For all $u \in L_{fin}$ , the word $w = u 0^{ω} \in L_{inf}$ .
2.
Let w ∈ {0, 1}^ω. Then $w \in L$ if and only if for all i ≥ 1, ${pref}_{w} (i) \in L$ .

Definition 3 (Minimum density, minimum-density prefix, slope) Let w ∈ {0, 1}*∪{0, 1}^ω. Define the minimum density of w as δ(w) = inf{D_w(i) ∣ 1 ≤ i}. If this infimum is attained somewhere, then we also define ı(w) = min{j ≥ 1 ∣ ∀i : D_w(j) ≤ D_w(i)} and κ(w) = P_w(ı(w)). We refer to pref_w(ı(w)) as the minimum-density prefix, the shortest prefix with density δ(w). For an infinite word w, we define the slope of w as lim_i→∞ D_w(i), if this limit exists.

Remark 1 Note that ı(w) is always defined for finite words, while for infinite words, a prefix which attains the infimum may or may not exist. We note further that density and slope of infinite binary words do not necessarily coincide. In particular, while δ(w) exists for every w, the limit lim_i→∞ D_w(i) may not exist, i.e., w may or may not have a slope. As an example, consider the word w = v₀v₁v₂ ⋯, where for each i, v_i = 1^2ⁱ 0^2ⁱ. Then, δ(w) = 1/2 and lim_i→∞ D_w(i) does not exist, since D_w(i) has an infinite subsequence which is constant 1/2, and another which tends to 2/3.

Moreover, even for words w for which the slope is defined, this can be different from the minimum density. If w has slope α, then α = δ(w) if and only if for all i, D_w(i) ≥ α. For instance, the infinite word 01^ω has slope 1 but its minimum density is 0. On the other hand, the infinite word 1(10)^ω has both slope and minimum density 1/2.

3. Operations generating infinite prefix normal words

In [14], we introduced an operation which takes a finite prefix normal word w ending in 1 and extends it by a run of 0s followed by a new 1, in such a way that this new 1 is placed in the first possible position without violating prefix normality. This operation, called flipext, leaves the minimum density invariant. Moreover, by repeatedly applying the flipext operation, an infinite prefix normal word is produced which is the densest among all prefix normal words with given prefix w.

Here we extend the definition of flipext to all prefix normal words containing at least one 1 and show that the same properties hold, even if the original word w does not end in 1.

Definition 4 (Operation flipext) Let $w \in L_{fin} ∖ {0}^{*}$ . Define flipext(w) as the finite word w0^k1, where $k = min {j ∣ w 0^{j} 1 \in L}$ . We further define the infinite word v = flipext^ω(w).

The next proposition is a slightly more general form of Lemma 13 from [14]:

Proposition 1 Let $w \in L_{fin} ∖ {0}^{*}$ and v ∈ flipext*(w)∪{flipext^ω(w)}. Then δ(v) = δ(w), and, as a consequence, ı(v) = ı(w) and κ(v) = κ(w). Moreover, D_v(j · ı(w)) = δ(w) for all j ≥ 1.

Proof. Let $w \in L$ . If the last character of w is a 1, then the claim holds by Lemma 13 of [14].

Else w ends in a run of 0s. Let ℓ be the length of this run, and w′ be such that w = w′0^ℓ. Let w″ = flipext(w′) = w′0^k1, i.e. by definition of flipext, k is minimal s.t. $w^{'} 0^{k} 1 \in L$ . If ℓ ≤ k, then flipext(w) = flipext(w′) = w″. Since w′ is a prefix of w, and w is a prefix of w″, we have δ(w′) ≥ δ(w) ≥ δ(w″). Since w′ ends in a 1, δ(w″) = δ(w′), and thus δ(w″) = δ(w).

Otherwise ℓ > k, therefore $flipext (w^{'}) = w^{'} 0^{ℓ^{'}} 1 \in L_{fin}$ for some ℓ′ < ℓ, hence $w^{'} 0^{ℓ} 1 \in L_{fin}$ . The latter implies flipext(w) = w1 and δ(flipext(w)) = δ(w).

Further iterations flipext⁽ⁱ⁾(w) fulfil the claim due to the fact that flipext(w) ends in a 1.

We now show the second statement: D_v(j·ı(w)) = δ(w) for all j ≥ 1. We show it by induction. It is clearly true for j = 1, moreover for each j > 1 assuming D_v((j − 1)·ı(w)) = δ(w) and letting w′ = pref_v((j − 1) · ı(w)) and w″ be the substring of size ı(w) such that w′w″ = pref_v(j · ı(w)), we have

δ (w) = δ (v) \leq D_{v} (j \cdot ı (w)) = \frac{∣ w^{'} ∣_{1} + ∣ w^{″} ∣_{1}}{j \cdot ı (w)} \leq \frac{P_{w} (ı (w)) (j - 1) + P_{w} (ı (w))}{j \cdot ı (w)} = \frac{P_{w} (ı (w))}{ı (w)} = δ (w),

where in the second inequality we are using ∣w′∣₁ = P_w(ı(w))(j − 1)ı(w) (induction hypothesis) and ∣w″∣₁ < P_w(ı(w)) (since v is prefix normal).

The next proposition states that the infinite word which is generated by repeatedly applying the flipext operation is the densest among all prefix normal words with prefix w.

Proposition 2 Let $w \in L_{fin} ∖ 0^{*}$ , v = flipext^ω(w), and let $z \in L_{inf}$ such that pref_z(∣w∣) = w. Then for every i = 1, 2,… we have P_v(i) ≥ P_z(i).

Proof. We argue by contradiction. Let i be the smallest integer such that P_v(i) < P_z(i). Clearly i > ∣w∣ and, by the minimality assumption we must have P_v(i − 1) = P_z(i − 1) and v_i = 0, z_i = 1. By definition of flipext there must exist j < i such that ∣v_j+1 … v_i−11∣₁ > P_v(i − j) ≥ P_z(i − j), for otherwise we would have v_i = 1. Since v is prefix normal, it also follows that we have ∣v_j+1…v_i−1v_i∣₁ = P_v(i − j) ≥ P_z(i − j).

From this, since by the minimality of i it holds that P_z(j) ≤ P_v(j), we have that ∣z_j+1 … z_i−1z_i∣₁ = P_z(i) − P_z(j) > P_v(i) − P_v(j) = P_v(i − j) ≥ P_z(i − j), violating the prefix normality of z. □

We now define a different operation, called lazy-flipext, which, given a prefix normal word w, extends it by adding 0s as long as the minimum density of the resulting word is not smaller than δ(w), and only then adding a 1. We show that this operation preserves the prefix normality of the resulting word.

Definition 5 (Operation lazy-flipext) Let α ∈ (0, 1] and let $w \in L_{fin}$ with δ(w) ≥ α. We define lazy-flipext(w, α) as the finite word w0^k 1 where k = max{j ∣ δ(w0^j) ≥ α}. We further define the infinite word v = lazy-flipext^ω(w, α).

Example 2 Let w = 111 and let $α = \sqrt{2} - 1$ . Then lazy-flipext(w, α) = 11100001, since δ(1110000) = 3/7 ≥ α and δ(11100000) = 3/8 < α. Further, lazy-flipext⁽²⁾(w, α) = 1110000101, since δ(111000010) = 4/9 ≥ α and δ(1110000100) = 2/5 < α.

Lemma 2 Let α ∈ (0, 1]. For every $w \in L_{fin}$ with δ(w) ≥ α, the word v = lazy-flipext(w, α) is also prefix normal, with δ(v) ≥ α.

Proof. First note that δ(v) ≥ α by definition. Now write v = w0^k1, and let u = flipext(w) = w0^ℓ1. Recall that $ℓ = min {j ∣ w 0^{j} 1 \in L}$ . If k < ℓ, this implies δ(u) < α, in contradiction to Proposition 1, since δ(u) = δ(w) ≥ α. Thus k ≥ ℓ, from which follows $v \in L$ . □

Corollary 1 Let α ∈ (0, 1] and $w \in L_{fin}$ with δ(w) ≥ α. Then v = lazy-flipext^ω(w, α) is an infinite prefix normal word and δ(v) = α.

Proof. That v is prefix normal follows from Lemma 1 and from Lemma 2, which also implies that δ(v) ≥ α. However, if δ(v) > α was true, then for a suitably long prefix i, we would get a contradition to the definition of the lazy-flipext operation. □

Fix $w \in L_{fin}$ . The next proposition states that the lazy-flipext operation with α = δ(w), applied to w, generates a prefix normal word that has the minimum number of 1s among all prefix normal words with prefix w and minimum density δ(w).

Proposition 3 Let $w \in L_{fin}$ , α = δ(w), v = lazy-flipext^ω(w, α), and $z \in L_{inf}$ such that pref_z(∣w∣) = w and δ(z) ≥ δ(w). Then for all i = 1, 2, …, we have P_v(i) ≤ P_z(i).

Proof. We argue by contradiction. Let i be the smallest integer such that P_v(i) < P_z(i). Clearly i > ∣w∣ and, by the minimality assumption, we have P_v(i − 1) = P_z(i − 1) and v_i = 0, z_i = 1. Let u = pref_v(i − 1). Since i > ∣w∣ and v_i = 1, therefore u1 = lazy-flipext(u′, α) for some u′, and thus, by definition of lazy-flipext, P_u0(i)/i < α. But u0 = pref_i(z), so we have

δ (z) \leq D_{z} (i) = \frac{P_{z} (i)}{i} = \frac{P_{u 0} (i)}{i} < δ (w),

in contradiction to the density of z. □

Theorem 1 Let $w \in L_{fin}$ with α = δ(w) ∈ (0, 1], and let $z \in L_{inf}$ such that pref_z(∣w∣) = w and δ(z) ≥ α. Let u = flipext^ω(w) and v = lazy-flipext^ω(w, α). Then v ≤_lex z ≤_lex u.

Proof. Follows from Prop. 2 and Prop. 3. □

Note that if pref_z(∣w∣) = w, then δ(z) ≥ δ(w) implies that, in fact, δ(z) = δ(w) holds, since z is an extension of w. Theorem 1 states then that all prefix normal extensions of w with the same minimum density as w lie lexicographically between the lazy-flipext- and the flipext-extensions of w. However, not all extensions of w between these two words are prefix normal, as we can see in the next example.

Example 3 Let w = 1101101100100010000001, with α = δ(w) = 8/21, then

v = {lazy-flipext}^{(8)} (w, α) = w 01001010010010100100, u = {flipext}^{(8)} (w) = w 101101100100010000001 .

Let p = w100111010100000100001 and q = w101101010100001000001, we have that for all 1 ≤ i ≤ 42, P_v(i) ≤ P_p(i), P_q(i) ≤ P_u(i) and v ≤_lex p, q ≤_lex u. Note that p is not prefix normal, while q is prefix normal.

4. Sturmian words and prefix normal words

In the previous section, we presented operations that construct infinite prefix normal words by extending finite prefix normal words. In particular, the lazy-flipext operation extends a finite binary word with as few 1s as possible while preserving its minimum density. This is reminiscent of the characterization of Sturmian words in terms of mechanical words and the slope. Led by this analogy, in this section we provide a complete characterization of Sturmian words which are prefix normal. We refer the interested reader to [23, Chapter 2], for a comprehensive treatment of Sturmian words. Here we briefly recall some facts which we will need later.

Definition 6 (Sturmian words) Let w ∈ {0, 1}^ω. Then w is called Sturmian if it is balanced and aperiodic.

An equivalent definition of Sturmian words is that they are irrational mechanical, a definition we recall next.

Definition 7 (Mechanical words) Given two real numbers 0 ≤ α ≤ 1 and 0 ≤ τ < 1, the lower mechanical word s_α,τ = s_α,τ(1) s_α,τ(2) ⋯ and the upper mechanical word $s_{α, τ}^{'} = s_{α, τ}^{'} (1)$ $s_{α, τ}^{'} (2) \dots$ are given by

\begin{matrix} s_{α, τ} (n) = ⌊ α n + τ ⌋ - ⌊ α (n - 1) + τ ⌋ \\ s_{α, τ}^{'} (n) = ⌈ α n + τ ⌉ - ⌈ α (n - 1) + τ ⌉ \end{matrix} (n \geq 1) .

Then α is called the slope and τ the intercept of s_α,τ, $s_{α, τ}^{'}$ . A word w is called mechanical if w = s_α,τ or $w = s_{α, τ}^{'}$ for some α, τ. It is called rational mechanical (resp. irrational mechanical) if α is rational (resp. irrational).

Fact 1 (Some facts about Sturmian words [23]) 1. An infinite binary word is Sturmian if and only if it is irrational mechanical.
2.
For τ = 0 and irrational α, there exists a word c_α, called the characteristic word with slope α, s.t. s_α,0 = 0c_α and $s_{α, 0}^{'} = 1 c_{α}$ . This word c_α is a Sturmian word itself, with both slope and intercept α.
3.
For two Sturmian words w and v with the same slope, Fct(w) = Fct(v).

We now show that the word lazy-flipext^ω(1, α) coincides with the upper mechanical word $s_{α, 0}^{'}$ . This also implies that $s_{α, 0}^{'}$ is prefix normal, as noted in the subsequent corollary.

Lemma 3 Fix α ∈ (0, 1] and let v = lazy-flipext^ω(1, α). Let $s = s_{α, 0}^{'}$ be the upper mechanical word of slope α and intercept 0. Then v = s.

Proof. Let s_i and v_i denote the ith character of s and v respectively. We argue by induction on i that v_i = s_i. The claim is true for i = 1 since, directly from the definitions we have v₁ = 1 = s₁. Let n > 1 and assume that for each i < n we have v_i = s_i. For the induction step we argue according to the character s_n.

(i) If s_n = 1, by definition ⌈nα⌉−⌈(n − 1)α⌉ = 1. Thus, ⌈(n − 1)α⌉ < nα. Using this inequality and the induction hypothesis together with the definition of $s_{α, 0}^{'}$ we have that ∣v₁ ⋯ v_n−1∣₁ = ∣s₁ ⋯ s_n−1∣₁ = ⌈(n − 1)α⌉ < αn. Therefore ∣v₁ ⋯ v_n−10∣₁ = ∣v₁ ⋯ v_n−1∣₁ < αn, which means that δ(v₁ ⋯ v_n−10) < α, hence by definition lazy-flipext(v₁ ⋯ v_n−1, α) = v₁ ⋯ v_n−11, i.e., v_n = s_n.

(ii) If s_n = 0, by definition ⌈nα⌉ − ⌈(n− 1)α⌉ = 0. Thus, ⌈(n − 1)α⌉ ≥ nα. Using this inequality and the induction hypothesis together with the definition of $s_{α, 0}^{'}$ we have that ∣v₁ ⋯ v_n−1∣₁ = ∣s₁ ⋯ s_n−1∣₁ = ⌈(n − 1)α⌉ ≥ αn. Therefore ∣v₁ ⋯ v_n−10∣₁ = ∣v₁ ⋯ v_n−1∣₁ ≥ αn which means that δ(v₁ ⋯ v_n−10) ≥ α, hence by definition lazy-flipext(v₁ ⋯ v_n−1, α) = v₁ ⋯ v_n−10 ⋯ 01, i.e., v_n = 0 = s_n. □

Corollary 2 Let α ∈ (0, 1]. Then $s_{α, 0}^{'}$ is an infinite prefix normal word and $δ (s_{α, 0}^{'}) = α$ .

The following theorem fully characterizes those Sturmian words which are prefix normal.

Theorem 2 A Sturmian word s of slope α is prefix normal if and only if s = 1c_α, where c_α is the characteristic Sturmian word with slope α.

Proof. By definition, α is irrational. Let $s = s_{α, 0}^{'}$ . Then s is Sturmian and prefix normal by Corollary 2. Let t be a Sturmian word with the same slope α which is also prefix normal. By Fact 1, s and t have the same factors.

Assume, by contradiction, that s ≠ t, hence there exists i ≥ 1 such that ∣s₁ ⋯ s_i∣₁ ≠ ∣t₁ ⋯ t_i∣₁. Assume, without loss of generality (since we can, if necessary, swap s and t in the following argument), that ∣s₁ ⋯ s_i∣₁ > ∣t₁ ⋯ t_i∣₁. Then, since s₁ ⋯ s_i is also a factor of t, there is a j ≥ 1 such that t_j+1 ⋯ t_j+i = s₁ ⋯ s_i, hence ∣t_j+1 ⋯ t_j+1∣₁ > ∣t₁ ⋯ t_i∣₁ contradicting the assumption that t is prefix normal. □

5. Prefix normal words, prefix normal forms, and abelian complexity

Given an infinite word w, the abelian complexity function of w, denoted ψ_w, is given by ψ_w (n) = ∣{pv(u) ∣ u ∈ Fct(w), ∣u∣ = n}∣, the number of Parikh vectors of n-length factors of w. A word w is said to have bounded abelian complexity if there exists a c s.t. for all n, ψ_w(n) ≤ c. Note that a binary word is c-balanced if and only if its abelian complexity is bounded by c + 1. We denote the set of Parikh vectors of factors of a word w by Π(w) = {pv(u) ∣ u ∈ Fct(w)}. Thus, ψ_w(n) = ∣Π(w) ∩ {(x, y) ∣ x + y = n}∣. In this section, we study the connection between prefix normal words and abelian complexity.

5.1. Balanced and c-balanced words.

Based on the examples in the introduction, one could conclude that any word with bounded abelian complexity can be turned into a prefix normal word by prepending a fixed number of 1s. However, consider the word w = 01^ω, which is balanced, i.e. its abelian complexity function is bounded by 2. It is easy to see that $1^{k} w \notin L$ for every $k \in N$ .

Sturmian words are precisely the words which are aperiodic and whose abelian complexity is constant 2 [27]. For Sturmian words, it is always possible to prepend a finite number of 1s to get a prefix normal word, as we will see next. Recall that for a Sturmian word w, at least one of 0w and 1w is Sturmian, with both being Sturmian if and only if w is characteristic [23].

Lemma 4 Let w be a Sturmian word with slope α. Then

$1 w \in L$ if and only if 0w is Sturmian,
if 0w is not Sturmian, then $1^{n} w \in L$ for n = ⌈1/(1 − α)⌉.
Proof. 1. Let 0w be Sturmian and let u be some factor of 1w. If u is a prefix of 1w, there is nothing to show, therefore let u ∈ Fct(w), with ∣u∣ = n and ∣u∣₁ = k. Since 0w is Sturmian, we have that the prefix of 0w of length n has at least k − 1 1s, thus P_1w(n) ≥ k = ∣u∣₁, as desired. Conversely, if 0w is not Sturmian, this means that it is not balanced, therefore there exists a factor u of w s.t. ∣∣u∣₁ − ∣0w₁ ⋯ w_n−1∣₁∣ ≥ 2, where ∣u∣ = n. Since w is Sturmian, we have that ∣∣w₁ ⋯ w_n−1∣₁ − ∣u₁ ⋯ u_n−1∣₁ ≤ 1 and ∣∣w₁ ⋯ w_n−1∣₁ − ∣u₂ ⋯ u_n∣₁∣ ≤ 1. Let ∣w₁ ⋯ w_n−1∣₁ = k, then this implies, by a case-by-case consideration, that ∣u₁ ⋯ u_n−1∣₁ = ∣u₂ ⋯ u_n∣₁ = k + 1, and thus ∣1w₁ ⋯ w_n−1∣₁ = k + 1 < k + 2 = ∣u∣₁, showing that 1w is not prefix normal.
First note that a Sturmian word of slope α cannot have a run of 1s of length ⌈1/(1 − α)⌉. To see this, it is enough to consider the upper mechanical word of slope α and intercept 0 (since all the other words with the same slope have the same set of factors). Let us write s = s_α,0 = s₁s₂ ⋯

Now s has a run of n 1s if and only if there exists an i ≥ 0 such that s_i+1 = s_i+2 = ⋯ = s_i+n = 1. By the definition of mechanical words, we have that the last condition is equivalent to

⌈ α (i + n) ⌉ - ⌈ α i ⌉ = n .

On the other hand, if $n \geq \frac{1}{1 - α}$ , i.e., $α \leq \frac{n - 1}{n}$ we have that the sum of the character $\sum_{j = 1}^{n} s_{i + j}$ satisfies

\sum_{j = 1}^{n} s_{i + j} = ⌈ α (i + n) ⌉ - ⌈ α i ⌉ \leq ⌈ α i ⌉ + ⌈ α n ⌉ - ⌈ α i ⌉ = ⌈ α n ⌉ < α n + 1 \leq \frac{n - 1}{n} \times n + 1 = n .

i.e., strictly smaller than n, i.e., we have a contradiction s_i+1 ⋯ s_i+n ≠ 1ⁿ.

Now fix n = ⌈1/(1 − α)⌉ and let w′ = 1ⁿw. Let u ∈ Fct(w). Since, as shown above, 1ⁿ is not a factor, if ∣u∣ ≤ n, there is nothing to show. So let ∣u∣ = n + m. Then ∣u₁ ⋯ u_n∣₁ ≤ n−1, and since w is balanced, we have that ∣w₁ ⋯ w_m∣₁ ≥ ∣u_n+1 ⋯ u_n+m∣₁ − 1, yielding that P_w′(n + m) ≥ n + ∣u_n+1 ⋯ u_n+m∣₁ − 1 ≥ ∣u∣₁. □

Lemma 5 Let w be a c-balanced word. If there exists a positive integer n s.t. 1ⁿ ∉ Fct(w), then the word z = 1^ncw is prefix normal.

Proof. We are going to show that every factor u of z satisfies the prefix normal condition ∣u∣₁ ≤ P_z(∣u∣). It is not hard to see that we can limit ourselves to only considering factors u such that u does not overlap with the prefix of z of the same length.

If ∣u∣ ≤ nc then ∣u∣₁ ≤ ∣u∣ = P_z(∣u∣). Assume now that u = u′u″ with ∣u′∣ = nc and ∣u″∣ > 0. Since u′ is a factor of w of size nc the condition that w does not contain a factor 1ⁿ implies that u′ contains at least c 0s, i.e., ∣u′∣₁ ≤ ∣u′∣ − c. Moreover, since w is c-balanced, we have that ∣u″∣₁ ≤ P_w(∣u″∣) + c. Therefore, observing that pref_z(∣u∣) = pref_z(∣u′∣ + ∣u″∣) = 1^ncpref_w(∣u″∣) we have that P_z(∣u∣) = nc + P_w(∣u″∣) ≥ ∣u′∣₁ + ∣u″∣₁ = ∣u∣₁. □

In particular, Lemma 5 implies that any c-balanced word with infinitely many 0s can be turned into a prefix normal word by prepending a finite number of 1s, since such a word cannot have arbitrarily long runs of 1s. Note, however, that the number of 1s to prepend from Lemma 5 is not tight, as can be seen e.g. from the Thue-Morse word t: the longest run of 1s in t is 2 and t is 2-balanced, but 11t is prefix normal, as will be shown in the next section (Lemma 8).

5.2. Prefix normal forms and abelian complexity.

Recall that for a word w, $F_{w}^{a} (i)$ is the maximum number of a’s in a factor of w of length i, for a ∈ {0, 1}.

Definition 8 (Prefix normal forms) Let w ∈ {0, 1}^ω. Define the words w′ and w″ by setting, for n ≥ 1, $w_{n}^{'} = F_{w}^{1} (n) - F_{w}^{1} (n - 1)$ and $w_{n}^{″} = \bar{F_{w}^{0} (n) - F_{w}^{0} (n - 1)}$ . We refer to w′ as the prefix normal form of w w.r.t. 1 and to w″ as the prefix normal form of w w.r.t. 0, denoted PNF₁(w) resp. PNF₀(w).

In other words, PNF₁(w) is the sequence of first differences of the maximum-1s function $F_{w}^{1}$ of w. Similarly, PNF₀(w) can be obtained by complementing the sequence of first differences of the maximum-0s function $F_{w}^{0}$ of w. Note that for all n and a ∈ {0, 1}, either $F_{w}^{a} (n + 1) = F_{w}^{a} (n)$ or $F_{w}^{a} (n + 1) = F_{w}^{a} (n) + 1$ , and therefore w′ and w″ are words over the alphabet {0, 1}. In particular, by construction, the two prefix normal words allow us to recover the maximum-1s and minimum-1s functions of w:

Observation 1 Let w be an infinite binary word and w′ = PNF₁(w), w″ = PNF₀(w). Then $P_{w^{'}} (n) = F_{w}^{1} (n)$ and $P_{w^{″}} (n) = n - F_{w}^{0} (n) = f_{w}^{1} (n)$ .

Lemma 6 Let w ∈ {0, 1}^ω. Then PNF₁(w) is the unique 1-prefix normal word w′ s.t. for all $i \in N, F_{w^{'}}^{1} (i) = F_{w}^{1} (i)$ . Similarly, PNF₀(w) is the unique 0-prefix normal word w″ s.t. for all $i \in N, F_{w^{″}}^{0} (i) = F_{w}^{0} (i)$ .

Proof. Let w′ = PNF₁(w) and w″ = PNF₀(w). First note that, by construction, for all $i \in N$ , $F_{w^{'}}^{1} (i) = F_{w}^{1} (i)$ and $F_{w^{″}}^{0} (i) = F_{w}^{0} (i)$ . It is easy to see that w′ is 1-prefix normal and w″ is 0-prefix normal. For uniqueness, note that for a ∈ {0, 1} and an a-prefix normal word v, we have PNF_a(v) = v. □

Example 4 The two prefix normal forms and the maximum-1s and maximum-0s functions of the Fibonacci word f = 01001010010010100101 ⋯ are given in Table 1.

Table 1:

The maximum number of 0s and 1s ( $F_{f}^{0} (n)$ and $F_{f}^{1} (n)$ resp.) for all n = 1,…, 20 of the Fibonacci word f, and the prefix normal forms of f.

n	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
$F_{f}^{0} (n)$	1	2	2	3	4	4	5	5	6	7	7	8	9	9	10	10	11	12	12	13
$F_{f}^{1} (n)$	1	1	2	2	2	3	3	4	4	4	5	5	5	6	6	7	7	7	8	8
PNF₀(f)	0	0	1	0	0	1	0	1	0	0	1	0	0	1	0	1	0	0	1	0
PNF₁(f)	1	0	1	0	0	1	0	1	0	0	1	0	0	1	0	1	0	0	1	0

Open in a new tab

Now we can connect the prefix normal forms of w to the abelian complexity of w in the following way. Given w′ = PNF₁(w) and w″ = PNF₀(w), the number of Parikh vectors of k-length factors is precisely 1 more than the difference in 1s in the prefix of length k of w′ and of w″. For example, Fig. 2 shows the prefix normal forms of the Fibonacci word. The vertical line at 5 cuts through points (5, −1) and (5, −3): the first component stands for the length of the string, the second for the difference between the number of 0s and the number of 1s, therefore indicating Parikh vectors (2, 3) and (1, 4).

Figure 2: — The Fibonacci word (dashed) and its prefix normal forms (solid).

The Fibonacci word, being a Sturmian word, has constant abelian complexity 2. An example of a word with unbounded abelian complexity is the Champernowne word, whose prefix normal forms are 1^ω resp. 0^ω. (Fig. 3).

Figure 3: — The Champernowne word (dashed) and its prefix normal forms (solid).

Theorem 3 Let w, v ∈ {0, 1}^ω.

ψ_w(n) = P_w′(n) − P_w″(n) + 1, where w′ = PNF₁(w) and w″ = PNF₀(w).
Π(w) = Π(v) if and only if PNF₀(w) = PNF₀(v) and PNF₁(w) = PNF₁(v).

Proof. 1. Fix an integer n ≥ 1. By definition, we have that for every factor u of w of length n we have $n - F_{w}^{0} (n) \leq ∣ u ∣_{1} \leq F_{w}^{1} (n)$ . Therefore $ψ_{w} (n) \leq F_{w}^{1} (n) - (n - F_{w}^{0} (n)) + 1$ .

Conversely, since w contains a factor u′ of length n with $F_{w}^{1} (n)$ many 1s and a factor u″ of length n with $n - F_{w}^{0} (n)$ many 1s, if we scan w between an occurrence of u′ and an occurrence of u″, for each x ∈ {∣u″∣₁,…, ∣u′∣₁} there must be a factor u‴ of size n such that ∣u‴∣₁ = x. Therefore $ψ_{w} (n) \geq F_{w}^{1} (n) - (n - F_{w}^{0} (n)) + 1$ . We can conclude that $ψ_{w} (n) = F_{w}^{1} (n) - (n - F_{w}^{0} (n)) + 1$ . The desired result then follows by observing that $n - F_{w}^{0} (n) = n - ∣ {pref}_{{PNF}_{0} (w)} (n) ∣_{0} = P_{{PNF}_{0} (w)} (n)$ and $F_{w}^{1} (n) = P_{{PNF}_{1} (w)} (n)$ .

2. Follows directly from Observation 1. □

Theorem 3 implies that if we know the prefix normal forms of a word, then we can compute its abelian complexity. Conversely, the abelian complexity is the width of the area enclosed by the two words PNF₁(w) and PNF₀(w). In general, this fact alone does not give us the PNFs; but if we know more about the word itself, then we may be able to compute the prefix normal forms, as we will see in the case of the paperfolding word.

We will now give two examples of the close connection between abelian complexity and prefix normal forms, using some recent results about the abelian complexity of infinite words.

5.2.1. The paperfolding word

The first few characters of the ordinary paperfolding word are given by

p = 0010011000110110001001110011011 \dots

The paperfolding word was originally introduced in [17]. One definition is given by: $p_{n} = 0$ if n′ ≡ 1 mod 4 and $p_{n} = 1$ if n′ ≡ 3 mod 4, where n′ is the unique odd integer such that n = n′2^k for some k [24]. The abelian complexity function of the paperfolding word was fully determined in [24], giving the following initial values for $ψ_{p} (n)$ , for n ≥ 1: 2, 3, 4, 3, 4, 5, 4, 3, 4, 5, 6, 5, 4, 5, 4, 3, 4, 5, 6, 5, and a recursive formula for the computation of all values. The authors note that for the paperfolding word, it holds that if $u \in F c t (p)$ , then also $\bar{u^{rev}} \in F c t (p)$ . This implies

F_{p}^{1} (n) = F_{p}^{0} (n) for all n, {and thus PNF}_{0} (p) = \bar{{PNF}_{1} (p)} .

Moreover, from Thm. 3 we get that $F_{p}^{1} (n) = P_{{PNF}_{1} (p)} (n) = (ψ_{p} (n) + n - 1) ∕ 2$ , and thus we can determine the prefix normal forms of $p$ , see Fig. 4.

Figure 4: — The paperfolding word (dashed) and its prefix normal forms (solid).

This same argument holds in general as long as the word has the symmetric property similar to the paperfolding word. Therefore, we have proved the following lemma.

Lemma 7 Let w ∈ {0, 1}^ω. If for all $u \in F c t (w)$ , it holds that $\bar{u} \in F c t (w)$ or $\bar{u^{rev}} \in F c t (w)$ , then $F_{w}^{1} (n) = F_{w}^{0} (n)$ for all n, ${PNF}_{0} (w) = \bar{{PNF}_{1} (w)}$ , and $F_{w}^{1} (n) = (ψ_{w} (n) + n - 1) ∕ 2$ .

5.2.2. Morphic images under the Thue-Morse morphism

The Thue-Morse word beginning with 0, which we denote by t, is one of the two fix points of the Thue-Morse morphism μ_TM, where μ_TM(0) = 01 and μ_TM(1) = 10:

t = μ_{TM}^{ω} (0) = 01101001100101101001011001101001 \dots

The word t has abelian complexity function ψ_t(n) = 2 for n odd and ψ_t(n) = 3 for n > 1 even [27]. Since t fulfils the condition that u ∈ Fct(t) implies $\bar{u} \in F c t (t)$ , we can apply Lemma 7, and compute the prefix normal forms of t as PNF₁(t) = 1(10)^ω and PNF₀(t) = 0(01)^ω, see Fig. 5.

Figure 5: — The Thue-Morse word (dashed) and its prefix normal forms (solid).

For the proof of the abelian complexity of t in [27], the Parikh vectors were computed for each length, so we do not really need Lemma 7 but could have obtained the prefix normal forms directly. Moreover, a much more general result was given in [27]:

Theorem 4 ([27]) Let w be an aperiodic infinite binary word. Then ψ_w = ψ_t if and only if w = μ_TM(w′), w = 0μ_TM(w′), or w = 1μ_TM(w′), for some word w′.

The abelian complexity function does not in general determine the prefix normal forms, as can be seen on the example of Sturmian words, which all have the same abelian complexity function but different prefix normal forms. However, ψ_t does, due to its values ψ_t(n) = 2 for n odd and ψ_t(n) = 3 for n even, and to the fact that both $F_{t}^{1}$ and $F_{t}^{0}$ have difference function with values from {0, 1}: notice that the only pair of such functions with width 2 resp. 3 are the PNFs of t. Therefore, we can deduce the following from Theorem 4:

Corollary 3 For an aperiodic infinite binary word w, PNF₁(w) = 1(10)^ω and PNF₀ = 0(01)^ω if and only if w = μ_TM(w′), w = 0μ_TM(w′), or w = 1μ_TM(w′), for some word w′.

To conclude this section, we return to the question of how many 1s need to be prepended to make the Thue-Morse word prefix normal.

Lemma 8 We have $11 t \in L$ . Moreover, this is minimal since 1t is not prefix normal.

Proof. We will show that for every prefix, the number of 1s in the prefix of 11t is greater than or equal to the the number of 1s in the prefix of PNF₁(t) of the same length. Let v = PNF₁(t) and u = 11t. It is easy to see that $P_{v} (n) = ⌊ \frac{n}{2} ⌋ + 1$ and

P_{u} (n) = {\begin{matrix} \frac{n}{2} + 1 & if n is even \\ ⌊ \frac{n}{2} ⌋ + 2 & if n is odd and u_{n} = 1 \\ ⌊ \frac{n}{2} ⌋ + 1 & if n is odd and u_{n} = 0 \end{matrix}

Thus for all n ≥ 1 it holds that P_u(n) ≥ P_v(n), implying that $11 t \in L$ .

For minimality, note that 1t is not prefix normal, since 11 is a factor of t.

5.3. Prefix normal forms of Sturmian words.

Let w be a Sturmian word. As we saw in Sec. 4, the only 1-prefix normal word in the class of Sturmian words with the same slope α is the upper mechanical word $s_{α, 0}^{'} = 1 c_{α}$ .

Theorem 5 Let w be an irrational mechanical word with slope α, i.e. a Sturmian word. Then PNF₁(w) = 1c_α and PNF₀(w) = 0c_α, where c_α is the characteristic word of slope α.

Proof. Since the characteristic word c_α has the same slope as w, we have Fct(w) = Fct(c_α) by Fact 1. The abelian complexity of w is constant 2 [27], thus a factor of length k can have either $F_{w}^{1} (k)$ or $F_{w}^{1} (k) - 1$ 1s. Let us call a factor u of w heavy if $∣ u ∣_{1} = F_{w}^{1} (k)$ , and light otherwise. We have to show that every prefix of 1c_α is heavy; this will imply that 1c_α is the prefix normal form of w. It is known [23] that the prefixes of the characteristic word are precisely the reverses of its right special factors, where a factor u is called right special if both u0 and u1 are factors. Thus, every prefix v of 1c_α has the form v = 1u^rev, where both u1 and u0 are factors of w, implying that $∣ v ∣_{1} = ∣ 1 u^{rev} ∣_{1} = ∣ u 1 ∣_{1} = F_{w}^{1} (∣ u ∣ + 1)$ , therefore v = 1u^rev is heavy. The fact that PNF₀(w) = 0c_α follows analogously. □

5.4. Prefix normal forms of binary uniform morphisms

In [5] the authors provide an algorithm which computes the abelian complexity of a morphic word that is the fix point of a binary uniform morphism, i.e., a morphism μ satisfying ∣μ(0)∣ = ∣μ(1)∣. We refer the reader to [5] for the details on this algorithm. In particular, the following theorem is proved in [5]:

Theorem 6 ([5]) Let w be the fix point of a binary uniform morphism μ. Then, for each n the values ψ_w(1), ψ_w(2), … , ψ_w(n), can be computed in O(n) time.

As an intermediate step in the computation of each ψ_w(i), the algorithm in [5] provides the minimum number of 0s (equivalently, the maximum number of 1s) in every i-length factor of w. Obviously the same procedure can be used to obtain the minimum number of 1s (equivalently, the maximum number of 0s) in every i-length factor of w. Therefore, we have the following corollary to the result of [5]:

Corollary 4 Let w be the fix point of a binary uniform morphism μ. For each n, the prefix of length n of PNF₁(w) and of PNF₀(w) can be computed in O(n) time.

6. Prefix normal words and lexicographic order

In this section, we study the relationship between lexicographic order and prefix normality. Note that for coherence with the rest of the paper, in the definition of Lyndon words, necklaces, and prenecklaces, we use lexicographically greater rather than smaller. Clearly, this is equivalent to the usual definitions up to renaming of characters.

Thus a finite Lyndon word is one which is lexicographically strictly greater than all of its conjugates: w is Lyndon if and only if for all non-empty u, v s.t. w = uv, we have w >_lex vu. A necklace is a word which is greater than or equal to all its conjugates, and a prenecklace is one which can be extended to become a necklace, i.e. which is the prefix of some necklace [23, 28]. As we saw in the introduction, in the finite case, prefix normality and Lyndon property are orthogonal concepts. However, the set of finite prefix normal words is included in the set of prenecklaces [11].

An infinite word is Lyndon if an infinite number of its prefixes is Lyndon [32]. In the infinite case, we have a similar situation as in the finite case. There are words which are both Lyndon and prefix normal: 10^ω, 110(10)^ω; Lyndon but not prefix normal: 11100(110)^ω; prefix normal but not Lyndon: (10)^ω; and neither of the two: (01)^ω.

Next we show that a prefix normal word cannot be lexicographically smaller than any of its suffixes. Let shift_i(w) = w_iw_i+1w_i+2 ⋯ denote the infinite word v s.t. w = w₁ ⋯ w_i−1v, i.e. v is the suffix of w starting at position i.

Lemma 9 Let $w \in L_{inf}$ . Then w ≥_lex shift_i(w) for all i ≥ 1.

Proof. Assume that there exists a suffix v = shift_i(w) of w s.t. v >_lex w. Then there is an index j with v₁ ⋯ v_j−1 = w₁ ⋯ w_j−1 and v_j > w_j, implying v_j = 1 and w_j = 0. But then ∣w_i ⋯ w_i+j−1∣₁ = ∣v₁ ⋯ v_j∣₁ > ∣w₁ ⋯ w_j∣₁, in contradiction to $w \in L_{inf}$ . □

In the finite case, it is easy to see that a word w is a prenecklace if and only if w ≥_lex v for every suffix v of w. This motivates our definition of infinite prenecklaces. The situation is the same as in the finite case: prefix normal words form a proper subset of prenecklaces.

Definition 9 Let w ∈ {0, 1}^ω. Then w is an infinite prenecklace if for all i ≥ 1, w ≥_lex shift_i(w). We denote by $P_{inf}$ the set of infinite prenecklaces.

Proposition 4 $L_{inf} ⊊ P_{inf}$ .

Proof. The inclusion follows from Lemma 9. An example of a word which is an infinite prenecklace but not prefix normal is 11100(110)^ω. □

There is another interesting relationship between lexicographic order and the prefix normal forms of an infinite word. In [26], two words were associated to an infinite binary word w, called max(w) (resp. min(w)), defined as the word whose prefix of length n is the lexicographically greatest (resp. smallest) n-length factor of w. It is easy to see that these words always exist. The following was shown in [26]:³

Theorem 7 ([26]) Let w be an infinite binary word. Then

w is (rational or irrational) mechanical with its intercept equal to its slope if and only if 0w ≤_lex min(w) ≤_lex max(w) ≤_lex 1w, and
w is characteristic Sturmian if and only if min(w) = 0w and max(w) = 1w.

Lemma 10 Let w ∈ {0, 1}^ω. Then PNF₁(w) ≥_lex max(w) and PNF₀(w) ≤_lex min(w).

Proof. Assume otherwise, and let w′ = PNF₁(w), v = max(w). If w′ < v, then there is an index j s.t. $w_{1}^{'} \dots w_{j - 1}^{'} = v_{1} \dots v_{j - 1}$ and $w_{j}^{'} = 0$ and v_j = 1. This implies that v₁ ⋯ v_j has one more 1s than $w_{1}^{'} \dots w_{j}^{'}$ . But $∣ w_{1}^{'} \dots w_{j}^{'} ∣_{1} = F_{w}^{1} (j)$ , a contradiction, since v₁ ⋯ v_j is a factor of w. The second claim follows analogously. □

Finally, from Theorems 5 and 7, we get the following corollary:

Corollary 5 Let w be an infinite binary word. Then w is characteristic Sturmian if and only if 0w = PNF₀(w) = min(w) and 1w = PNF₁(w) = max(w).

7. On the periodicity and aperiodicity of prefix normal words with respect to minimum density

In this section, we derive conditions for the periodicity and aperiodicity of prefix normal words with respect to their minimum density. The following result shows that every ultimately periodic infinite prefix normal word has rational minimum density.

Lemma 11 Let v be an infinite ultimately periodic binary word with minimum density δ(v) = α. Then $α \in Q$ .

Proof. Let us write v = ux^ω with x not a suffix of u.

For i = 0, 1,…, ∣x∣ − 1, let y_i be the prefix of length ∣u∣ + i of v, i.e., y_i = ux₁x₂ ⋯ x_i. Trivially, if for some i we have that δ(y_i) ≤ δ(v) the claim directly follows from y_i being a finite prefix of v.

Let us now assume that for each i = 0, 1,… ∣x∣ − 1 it holds that δ(v) < δ(y_i) and let i* = min{i ∣ δ(y_i) ≤ δ(y_i) for each j ≠ i}, hence δ(v) < δ(y_i*).

For every n ≥ ∣u∣ + ∣x∣ let i_n = ∣u∣ + ((n − ∣u∣) mod ∣x∣) and k_n = ⌊(n − ∣u∣)/∣x∣⌋, i.e., ∣u∣ ≤ i_n ≤ ∣u∣ + ∣x∣ − 1 and n = i_n + k_n∣x∣.

Then, we have that

D_{v} (n) = \frac{∣ y_{i_{n}} ∣_{1} + k_{n} ∣ x ∣_{1}}{∣ y_{i_{n}} ∣ + k_{n} ∣ x ∣} \geq min {δ (y_{i_{n}}), δ (x)} \geq min {δ (y_{i^{*}}), δ (x)} .

(1)

Moreover, we also have that

lim_{k \to \infty} D_{v} (∣ u ∣ + i^{*} + k ∣ x ∣) = lim_{k \to \infty} \frac{∣ y_{i^{*}} ∣_{1} + k ∣ x ∣_{1}}{∣ y_{i^{*}} ∣ + k ∣ x ∣} = δ (x) .

(2)

We cannot have δ(x) ≥ δ(y_i*), since by (1) δ(y_i*) is a rational lower bound on D_v(n) (for each n ≥ 1) which is achieved by D_v(∣u∣ + i*), contradicting the standing hypothesis δ(v) < δ(y_i*).

Therefore, we must have δ(x) < δ(y_i*), and from (1) we have D_v(n) ≥ δ(x) and from (2) we also have that for each ε > 0 there exists k > 0 such that D_v(∣u∣ + i* + k∣x∣) < δ(x) + ε. Therefore, δ(v) = inf{D_v(n) ∣ n ≥ 1} = δ(x), which is a rational number, since x is a finite string. □

We now show that, while periodicity is characterized by rational density, the converse is not true. It turns out that for every α ∈ (0, 1), both rational and irrational, there exists an aperiodic prefix normal word with minimum density α. For irrational α, this is an easy corollary from Theorem 2: since the Sturmian word 1c_α is prefix normal, and D(i) ≥ α for each i, therefore, δ(1c_α) = α. The next lemma shows how to construct an aperiodic prefix normal word with minimum density α for both rational and irrational α.

Lemma 12 Fix α ∈ (0, 1), and let $(a_{n})_{n \in N}$ be a strictly decreasing infinite sequence of rational numbers from (0, 1) converging to α. For each i = 1, 2, …, let the binary word v⁽ⁱ⁾ be defined by

v^{(i)} = {\begin{matrix} 1^{⌈ 10 a_{1} ⌉} 0^{10 - ⌈ 10 a_{1} ⌉} & i = 1 \\ {pref}_{{flipext}^{ω} (v^{(i - 1)})} (k_{i} ∣ v^{(i - 1)} ∣) 0^{ℓ_{i}} & i > 1 \end{matrix}

where ℓ_i defined by

ℓ_{i} = {\begin{matrix} 10 - ⌈ 10 a_{1} ⌉ & i = 1 \\ ⌊ k_{i} (\frac{∣ v^{(i - 1)} ∣_{1} - a_{i} ∣ v^{(i - 1)} ∣}{a_{i}}) ⌋ & i > 1, \end{matrix}

and k_i is the smallest integer greater than one such that ℓ_i > ℓ_i−1.

Then v = lim_i→∞v⁽ⁱ⁾ is an aperiodic infinite prefix normal word such that δ(v) = α.

Before proving Lemma 12, in give an example of the words v⁽ⁱ⁾.

Example 5 We show the first three steps for the construction of an infinite aperiodic word with minimum density α = 1/3 (Lemma 12), using the infinite sequence of rational numbers a_i = i/(3i − 1), which tends to 1/3 for i → ∞. Hence, for i = 1, we have a₁ = 1/2, ℓ₁ = 5, and v_i = 1⁵0⁵ with minimum density δ(v₁) = 1/2. At the next step, a₂ = 2/5, and with the values from the previous iteration we can compute k₂ = 3 and ℓ₂ = 7, hence v₂ = 1⁵0⁵1⁵0⁵1⁵0⁵0⁷, with δ(v₂) = 15/37. At the third iteration, a₃ = 3/8, k₃ = 3, and ℓ₃ = 9, therefore v₃ = 1⁵0⁵1⁵0⁵1⁵0¹²1⁵0⁵1⁵0⁵1⁵0¹²1⁵0⁵1⁵0⁵1⁵0¹²0⁹, and the minimum density is δ(v₃) = 45/120.

Proof. (of Lemma 12)

We will first prove the following claim, giving a number of properties of the sequence of words v⁽ⁱ⁾, and then use these to prove that v is aperiodic and δ(v) = α.

Claim. The following properties hold:

δ(v⁽ⁱ⁾) ≥ a_i for each i ≥ 1;
ı(v⁽ⁱ⁾) = ∣v⁽ⁱ⁾∣ for each i ≥ 1;
δ(v⁽ⁱ⁾) < δ(v⁽ⁱ⁻¹⁾) for each i ≥ 2;
∣v⁽ⁱ⁾∣₁ > ∣v⁽ⁱ⁻¹⁾∣₁ for each i > 2;
$δ (v^{(i)}) \leq a_{i} (\frac{k_{i} ∣ v^{(i - 1)} ∣_{1}}{k_{i} ∣ v^{(i - 1)} ∣_{1} - a_{i}})$ for each i ≥ 2.

Proof of the Claim. By direct inspection we have that properties 1 and 2 hold for v⁽¹⁾. We now argue by induction. Fix i > 1 and let us assume that properties 1 and 2 hold for v⁽ⁱ⁻¹⁾. Then, since a_i < a_i−1 we have

\frac{∣ v^{(i - 1)} ∣_{1}}{a_{i}} > \frac{∣ v^{(i - 1)} ∣_{1}}{a_{i - 1}} \geq ∣ v^{(i - 1)} ∣,

where the last inequality follows from property 1 and 2. Therefore, $(\frac{∣ v^{(i - 1)} ∣_{1} - a_{i} ∣ v^{(i - 1)} ∣}{a_{i}}) > 0$ , hence there exists k_i > 1 such that $⌊ k_{i} (\frac{∣ v^{(i - 1)} ∣_{1} - a_{i} ∣ v^{(i - 1)} ∣}{a_{i}}) ⌋ > ℓ_{i - 1}$ . In particular, ℓ_i is well defined.

By property 2, we have ı(v⁽ⁱ⁻¹⁾) = ∣v⁽ⁱ⁻¹⁾∣ hence by Proposition 1, we have D_{flipext^ω(v⁽ⁱ⁻¹⁾)}(k∣v⁽ⁱ⁻¹⁾∣) = δ(v⁽ⁱ⁻¹⁾) and also δ(pref_{flipext^ω(v⁽ⁱ⁻¹⁾)}(k_i∣v⁽ⁱ⁻¹⁾∣)) = δ(v⁽ⁱ⁻¹⁾).

Moreover, since ℓ_i > 0 it is not hard to see from the definition of v⁽ⁱ⁾ that

δ (v^{(i)}) = D_{v^{(i)}} (∣ v^{(i)} ∣) = \frac{k_{i} ∣ v^{(i - 1)} ∣_{1}}{k_{i} ∣ v^{(i - 1)} ∣ + ℓ_{i}} < δ (v^{(i - 1)}),

(3)

which shows that property 3 and property 2 hold for v⁽ⁱ⁾. In addition, because of k_i > 1 and (by Proposition 1), ∣v⁽ⁱ⁾∣₁ = ∣pref_{flipext^ω(v⁽ⁱ⁻¹⁾)}(k_i∣v⁽ⁱ⁻¹⁾∣)∣₁ = k₁∣v⁽ⁱ⁻¹⁾∣)∣₁, it follows that property 4 also holds for v⁽ⁱ⁾.

The definition of ℓ_i, together with the well known property x − 1 < ⌊x⌋ ≤ x, imply that

\frac{k_{i}}{a_{i}} (∣ v^{(i - 1)} ∣_{1} - a_{i} ∣ v^{(i - 1)} ∣) - 1 < ℓ_{i} \leq k_{i} (\frac{∣ v^{(i - 1)} ∣_{1}}{a_{i}} - ∣ v^{(i - 1)} ∣) .

(4)

Using the right inequality of (4) in (3), we have δ(v⁽ⁱ⁾) ≥ a_i, showing that property 1 holds for v⁽ⁱ⁾.

In addition, using the left inequality of (4) in (3), we have

δ (v^{(i)}) \leq a_{i} (\frac{k_{i} ∣ v^{(i - 1)} ∣_{1}}{k_{i} ∣ v^{(i - 1)} ∣_{1} - a_{i}})

showing that property 5 holds for v⁽ⁱ⁾. The proof of the claim is complete.

In order to see that v is aperiodic, it is enough to observe that v ≠ 0^ω and for each i ≥ 1 it contains a distinct run of ℓ_i 0s, with ℓ_i being a strictly increasing sequence.

To show that δ(v) = α, we will prove that lim_i→∞ δ(v⁽ⁱ⁾) = α. Since lim_i→∞ a_i = α and for each i ≥ 1, k_i > 1 and ∣v⁽ⁱ⁾∣₁ > ∣v⁽ⁱ⁻¹⁾∣₁, we have

lim_{i \to \infty} a_{i} \frac{k_{i} ∣ v^{(i - 1)} ∣_{1}}{k_{i} ∣ v^{(i - 1)} ∣_{1} - a_{i}} = lim_{i \to \infty} a_{i} = α .

Hence, from properties 4 and 5 of the Claim above, we have the desired result, lim_i→∞ δ(v⁽ⁱ⁾) = lim_i→∞ a_i = α.

This completes the proof of the lemma. □

Summarizing, we have shown the following result.

Theorem 8 For every α ∈ (0, 1) (rational or irrational) there is an infinite aperiodic prefix normal word of minimum density α. On the other hand, for every ultimately periodic infinite prefix normal word w, the minimum density δ(w) is a rational number.

8. Conclusion

In this paper, we studied infinite prefix normal words. We gave several results of infinite extensions of finite prefix normal words, and we established connections between infinite prefix normal words and other classes of infinite binary words, namely Sturmian words, Lyndon words and max and min words. We provided a complete characterization of prefix normal Sturmian words. Furthermore, we showed that, similar to the finite case, the classes of infinite prefix normal words and Lyndon words are distinct, and that infinite prefix normal words are infinite prenecklaces.

We explored some connections between prefix normal words, prefix normal forms, and abelian complexity. In particular, we showed how to turn balanced and c-balanced words without arbitrarily long runs of 1s into prefix normal words, by prepending a finite number of 1s. We provided a method to compute the abelian complexity from the prefix normal form of a word, and, for specific cases, we showed how to compute the prefix normal form of a word, given its abelian complexity function. We further applied an existing algorithm to compute the prefix normal form of binary uniform morphisms.

Finally, we gave conditions for the periodicity and the aperiodicity of infinite prefix normal words, according to their minimum density.

Figure 1: — Given w = 1101101100100010000001 the plot represents the last characters of flipext⁽⁸⁾(w) (solid) and the lazy-flipext⁽⁸⁾(*w, α*) (dashed). See Example 3. A 1 corresponds to a diagonal segment in direction NE, while a 0 to one in direction SE. On the x-axis we have the length of the prefix, and on the y-axis, the number of 1s minus the number of 0s in the prefix. The shaded area contains all prefix normal words with w as prefix and minimum density equal to δ(w). Note, however, that not all words in that area are prefix normal.

Acknowledgements

We wish to extend our thanks to the participants of the Workshop on Words and Complexity, which took place in Lyon in February 2018, for exciting discussions and helpful pointers, and to Péter Burcsi, who first got us interested in Sturmian words. We also thank the two anonymous reviewers, whose suggestions helped improve the presentation of our results. MR is funded by the National Science Foundation (NSF) IIS (Grant No. 1618814), IIBR (Grant No. 2029552) and National Institutes of Health (NIH) R01 (Grant No. HG011392).

Footnotes

This is an extended version of our paper presented at SOFSEM 2019 [15].

For ease of presentation, we are using Lyndon to mean lexicographically greatest among its conjugates; this is equivalent to the usual definition up to renaming characters.

The terminology in [26] differs from ours (we are following [23]). In order to help the reader, here we highlight the differences: (i) a periodic Sturmian in [26] is a rational mechanical word, (ii) a proper Sturmian word in [26] is an irrational mechanical word (i.e., a Sturmian word), and (iii) a standard Sturmian word in [26] is a mechanical word with intercept τ = α (the slope), thus a proper standard Sturmian word is a characteristic Sturmian word c_α. Note that all mechanical words in [26] are defined for n ≥ 1, since the definition of mechanical word is: the lower mechanical word is defined as s_α,τ(n) = ⌊α(n + 1) + τ⌋ − ⌊αn + τ⌋ for n ≥ 1, and analogously for the upper mechanical word. Therefore, an intercept τ = 0 in [26] is equivalent to an intercept of τ = α (the slope) in [23].

References

[1].Afshani Peyman, van Duijn Ingo, Killmann Rasmus, and Nielsen Jesper Sindahl. A lower bound for jumbled indexing. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, (SODA 2020), pages 592–606, 2020. [Google Scholar]
[2].Amir Amihood, Chan Timothy M., Lewenstein Moshe, and Lewenstein Noa. On hardness of jumbled indexing. In 41st International Colloquium on Automata, Languages, and Programming (ICALP 2014), volume 8572 of LNCS, pages 114–125, 2014. [Google Scholar]
[3].Balister Paul and Gerke Stefanie. The asymptotic number of prefix normal words. Theoret. Comput. Sci, 784:75–80, 2019. [Google Scholar]
[4].Blanchet-Sadri Francine, Fox Nathan, and Rampersad Narad. On the asymptotic abelian complexity of morphic words. Advances in Applied Mathematics, 61:46–84, 2014. [Google Scholar]
[5].Blanchet-Sadri Francine, Seita Daniel, and Wise David. Computing abelian complexity of binary uniform morphic words. Theor. Comput. Sci, 640:41–51, 2016. doi: 10.1016/j.tcs.2016.05.046. [DOI] [Google Scholar]
[6].Massé Alexandre Blondin, de Carufel Julien, Goupil Alain, Lapointe Mélodie, Nadeau Émile, and Vandomme Élise. Leaf realization problem, caterpillar graphs and prefix normal words. Theoret. Comput. Sci, 732:1–13, 2018. [Google Scholar]
[7].Burcsi Péter, Cicalese Ferdinando, Fici Gabriele, and Lipták Zsuzsanna. Algorithms for Jumbled Pattern Matching in Strings. International Journal of Foundations of Computer Science, 23:357–374, 2012. [Google Scholar]
[8].Burcsi Peter, Cicalese Ferdinando, Fici Gabriele, and Lipták Zsuzsanna. On approximate jumbled pattern matching in strings. Theory Comput. Syst, 50(1):35–51, 2012. [Google Scholar]
[9].Burcsi Péter, Fici Gabriele, Lipták Zsuzsanna, Raman Rajeev, and Sawada Joe. Generating a Gray code for prefix normal words in amortized polylogarithmic time per word. Theor. Comput. Sci, 842:86–99, 2020. [Google Scholar]
[10].Burcsi Péter, Fici Gabriele, Lipták Zsuzsanna, Ruskey Frank, and Sawada Joe. On combinatorial generation of prefix normal words. In Proc. of the 25th Ann. Symp. on Comb. Pattern Matching (CPM 2014), volume 8486 of LNCS, pages 60–69, 2014. [Google Scholar]
[11].Burcsi Péter, Fici Gabriele, Lipták Zsuzsanna, Ruskey Frank, and Sawada Joe. On prefix normal words and prefix normal forms. Theoret. Comput. Sci, 659:1–13, 2017. [Google Scholar]
[12].Cassaigne Julien and Kaboré Idrissa. Abelian complexity and frequencies of letters in infinite words. Int. Journal of Foundations of Computer Science, 27(05):631–649, 2016. [Google Scholar]
[13].Chan Timothy M. and Lewenstein Moshe. Clustered integer 3SUM via additive combinatorics. In Proc. of the 47th Ann. ACM on Symp. on Theory of Computing (STOC 2015), pages 31–40, 2015. [Google Scholar]
[14].Cicalese Ferdinando, Lipták Zsuzsanna, and Rossi Massimiliano. Bubble-flip - A new generation algorithm for prefix normal words. Theoret. Comput. Sci, 743:38–52, 2018. [Google Scholar]
[15].Cicalese Ferdinando, Lipták Zsuzsanna, and Rossi Massimiliano. On infinite prefix normal words. In Proc. of the 45th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2019), pages 122–135, 2019. [Google Scholar]
[16].Cunha Luís Felipe I., Dantas Simone, Gagie Travis, Wittler Roland, Antonio Luis Kowada Brasil, and Stoye Jens. Faster jumbled indexing for binary RLE strings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017), pages 19:1–19:9, 2017. [Google Scholar]
[17].Davis C and Knuth DE. Number representations and dragon curves, I, II. J. Recr. Math, 3:133–149 and 161–181, 1970. [Google Scholar]
[18].Fici Gabriele and Lipták Zsuzsanna. On prefix normal words. In Proc. of the 15th Intern. Conf. on Developments in Language Theory (DLT 2011), volume 6795 of LNCS, pages 228–238. Springer, 2011. [Google Scholar]
[19].Fleischmann Pamela, Nowotka Dirk, Kulczynski Mitja, and Poulsen Danny Bøgsted. On collapsing prefix normal words. In Proc. of the 14th International Conference Language and Automata Theory and Applications (LATA 2020), volume 12038 of LNCS, pages 412–424. Springer, 2020. [Google Scholar]
[20].Gagie Travis, Hermelin Danny, Landau Gad M., and Weimann Oren. Binary jumbled pattern matching on trees and tree-like structures. Algorithmica, 73(3):571–588, 2015. [Google Scholar]
[21].Giaquinta Emanuele and Grabowski Szymon. New algorithms for binary jumbled pattern matching. Inf. Process. Lett, 113(14–16):538–542, 2013. [Google Scholar]
[22].Kaboré Idrissa and Kientéga Boucaré. Abelian complexity of Thue-Morse word over a ternary alphabet. In Proc. of the 11th Int. Conf. on Combinatorics on Words WORDS 2017, volume 10432 of LNCS, pages 132–143. Springer, 2017. [Google Scholar]
[23].Lothaire M. Algebraic Combinatorics on Words. Cambridge Univ. Press, 2002. [Google Scholar]
[24].Madill Blake and Rampersad Narad. The abelian complexity of the paperfolding word. Discrete Mathematics, 313(7):831–838, 2013. doi: 10.1016/j.disc.2013.01.005. [DOI] [Google Scholar]
[25].Moosa Tanaeem M. and Sohel Rahman M. Sub-quadratic time and linear space data structures for permutation matching in binary strings. J. Discr. Alg, 10:5–9, 2012. [Google Scholar]
[26].Pirillo Giuseppe. Inequalities characterizing standard sturmian and episturmian words. Theor. Comput. Sci, 341(1-3):276–292, 2005. doi: 10.1016/j.tcs.2005.04.008. [DOI] [Google Scholar]
[27].Richomme Gwénaël, Saari Kalle, and Zamboni Luca Q.. Abelian complexity of minimal subshifts. J. London Math. Society, 83(1):79–95, 2011. doi: 10.1112/jlms/jdq063. [DOI] [Google Scholar]
[28].Ruskey Frank, Savage Carla, and Wang TMY. Generating necklaces. J. Algorithms, 13(3):414–430, 1992. [Google Scholar]
[29].Ruskey Frank, Sawada Joe, and Williams Aaron. Binary bubble languages and cool-lex order. J. Comb. Theory, Ser. A, 119(1):155–169, 2012. [Google Scholar]
[30].Sawada Joe and Williams Aaron. Efficient oracles for generating binary bubble languages. Electr. J. Comb, 19(1):P42, 2012. [Google Scholar]
[31].Sawada Joe, Williams Aaron, and Wong Dennis. Inside the Binary Reflected Gray Code: Flip-Swap languages in 2-Gray code order. Unpublished manuscript, 2017. [Google Scholar]
[32].Siromoney Rani, Mathew Lisa, Dare VR, and Subramanian KG. Infinite Lyndon words. Inf. Proc. Letters, 50:101–104, 1994. [Google Scholar]
[33].Sloane NJA. The On-Line Encyclopedia of Integer Sequences. Available electronically at http://oeis.org.
[34].Turek Ondrej. Abelian complexity of the Tribonacci word. J. of Integer Sequences, 18, 2015. [Google Scholar]

[R1] [1].Afshani Peyman, van Duijn Ingo, Killmann Rasmus, and Nielsen Jesper Sindahl. A lower bound for jumbled indexing. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, (SODA 2020), pages 592–606, 2020. [Google Scholar]

[R2] [2].Amir Amihood, Chan Timothy M., Lewenstein Moshe, and Lewenstein Noa. On hardness of jumbled indexing. In 41st International Colloquium on Automata, Languages, and Programming (ICALP 2014), volume 8572 of LNCS, pages 114–125, 2014. [Google Scholar]

[R3] [3].Balister Paul and Gerke Stefanie. The asymptotic number of prefix normal words. Theoret. Comput. Sci, 784:75–80, 2019. [Google Scholar]

[R4] [4].Blanchet-Sadri Francine, Fox Nathan, and Rampersad Narad. On the asymptotic abelian complexity of morphic words. Advances in Applied Mathematics, 61:46–84, 2014. [Google Scholar]

[R5] [5].Blanchet-Sadri Francine, Seita Daniel, and Wise David. Computing abelian complexity of binary uniform morphic words. Theor. Comput. Sci, 640:41–51, 2016. doi: 10.1016/j.tcs.2016.05.046. [DOI] [Google Scholar]

[R6] [6].Massé Alexandre Blondin, de Carufel Julien, Goupil Alain, Lapointe Mélodie, Nadeau Émile, and Vandomme Élise. Leaf realization problem, caterpillar graphs and prefix normal words. Theoret. Comput. Sci, 732:1–13, 2018. [Google Scholar]

[R7] [7].Burcsi Péter, Cicalese Ferdinando, Fici Gabriele, and Lipták Zsuzsanna. Algorithms for Jumbled Pattern Matching in Strings. International Journal of Foundations of Computer Science, 23:357–374, 2012. [Google Scholar]

[R8] [8].Burcsi Peter, Cicalese Ferdinando, Fici Gabriele, and Lipták Zsuzsanna. On approximate jumbled pattern matching in strings. Theory Comput. Syst, 50(1):35–51, 2012. [Google Scholar]

[R9] [9].Burcsi Péter, Fici Gabriele, Lipták Zsuzsanna, Raman Rajeev, and Sawada Joe. Generating a Gray code for prefix normal words in amortized polylogarithmic time per word. Theor. Comput. Sci, 842:86–99, 2020. [Google Scholar]

[R10] [10].Burcsi Péter, Fici Gabriele, Lipták Zsuzsanna, Ruskey Frank, and Sawada Joe. On combinatorial generation of prefix normal words. In Proc. of the 25th Ann. Symp. on Comb. Pattern Matching (CPM 2014), volume 8486 of LNCS, pages 60–69, 2014. [Google Scholar]

[R11] [11].Burcsi Péter, Fici Gabriele, Lipták Zsuzsanna, Ruskey Frank, and Sawada Joe. On prefix normal words and prefix normal forms. Theoret. Comput. Sci, 659:1–13, 2017. [Google Scholar]

[R12] [12].Cassaigne Julien and Kaboré Idrissa. Abelian complexity and frequencies of letters in infinite words. Int. Journal of Foundations of Computer Science, 27(05):631–649, 2016. [Google Scholar]

[R13] [13].Chan Timothy M. and Lewenstein Moshe. Clustered integer 3SUM via additive combinatorics. In Proc. of the 47th Ann. ACM on Symp. on Theory of Computing (STOC 2015), pages 31–40, 2015. [Google Scholar]

[R14] [14].Cicalese Ferdinando, Lipták Zsuzsanna, and Rossi Massimiliano. Bubble-flip - A new generation algorithm for prefix normal words. Theoret. Comput. Sci, 743:38–52, 2018. [Google Scholar]

[R15] [15].Cicalese Ferdinando, Lipták Zsuzsanna, and Rossi Massimiliano. On infinite prefix normal words. In Proc. of the 45th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2019), pages 122–135, 2019. [Google Scholar]

[R16] [16].Cunha Luís Felipe I., Dantas Simone, Gagie Travis, Wittler Roland, Antonio Luis Kowada Brasil, and Stoye Jens. Faster jumbled indexing for binary RLE strings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017), pages 19:1–19:9, 2017. [Google Scholar]

[R17] [17].Davis C and Knuth DE. Number representations and dragon curves, I, II. J. Recr. Math, 3:133–149 and 161–181, 1970. [Google Scholar]

[R18] [18].Fici Gabriele and Lipták Zsuzsanna. On prefix normal words. In Proc. of the 15th Intern. Conf. on Developments in Language Theory (DLT 2011), volume 6795 of LNCS, pages 228–238. Springer, 2011. [Google Scholar]

[R19] [19].Fleischmann Pamela, Nowotka Dirk, Kulczynski Mitja, and Poulsen Danny Bøgsted. On collapsing prefix normal words. In Proc. of the 14th International Conference Language and Automata Theory and Applications (LATA 2020), volume 12038 of LNCS, pages 412–424. Springer, 2020. [Google Scholar]

[R20] [20].Gagie Travis, Hermelin Danny, Landau Gad M., and Weimann Oren. Binary jumbled pattern matching on trees and tree-like structures. Algorithmica, 73(3):571–588, 2015. [Google Scholar]

[R21] [21].Giaquinta Emanuele and Grabowski Szymon. New algorithms for binary jumbled pattern matching. Inf. Process. Lett, 113(14–16):538–542, 2013. [Google Scholar]

[R22] [22].Kaboré Idrissa and Kientéga Boucaré. Abelian complexity of Thue-Morse word over a ternary alphabet. In Proc. of the 11th Int. Conf. on Combinatorics on Words WORDS 2017, volume 10432 of LNCS, pages 132–143. Springer, 2017. [Google Scholar]

[R23] [23].Lothaire M. Algebraic Combinatorics on Words. Cambridge Univ. Press, 2002. [Google Scholar]

[R24] [24].Madill Blake and Rampersad Narad. The abelian complexity of the paperfolding word. Discrete Mathematics, 313(7):831–838, 2013. doi: 10.1016/j.disc.2013.01.005. [DOI] [Google Scholar]

[R25] [25].Moosa Tanaeem M. and Sohel Rahman M. Sub-quadratic time and linear space data structures for permutation matching in binary strings. J. Discr. Alg, 10:5–9, 2012. [Google Scholar]

[R26] [26].Pirillo Giuseppe. Inequalities characterizing standard sturmian and episturmian words. Theor. Comput. Sci, 341(1-3):276–292, 2005. doi: 10.1016/j.tcs.2005.04.008. [DOI] [Google Scholar]

[R27] [27].Richomme Gwénaël, Saari Kalle, and Zamboni Luca Q.. Abelian complexity of minimal subshifts. J. London Math. Society, 83(1):79–95, 2011. doi: 10.1112/jlms/jdq063. [DOI] [Google Scholar]

[R28] [28].Ruskey Frank, Savage Carla, and Wang TMY. Generating necklaces. J. Algorithms, 13(3):414–430, 1992. [Google Scholar]

[R29] [29].Ruskey Frank, Sawada Joe, and Williams Aaron. Binary bubble languages and cool-lex order. J. Comb. Theory, Ser. A, 119(1):155–169, 2012. [Google Scholar]

[R30] [30].Sawada Joe and Williams Aaron. Efficient oracles for generating binary bubble languages. Electr. J. Comb, 19(1):P42, 2012. [Google Scholar]

[R31] [31].Sawada Joe, Williams Aaron, and Wong Dennis. Inside the Binary Reflected Gray Code: Flip-Swap languages in 2-Gray code order. Unpublished manuscript, 2017. [Google Scholar]

[R32] [32].Siromoney Rani, Mathew Lisa, Dare VR, and Subramanian KG. Infinite Lyndon words. Inf. Proc. Letters, 50:101–104, 1994. [Google Scholar]

[R33] [33].Sloane NJA. The On-Line Encyclopedia of Integer Sequences. Available electronically at http://oeis.org.

[R34] [34].Turek Ondrej. Abelian complexity of the Tribonacci word. J. of Integer Sequences, 18, 2015. [Google Scholar]

PERMALINK

On Infinite Prefix Normal Words

Ferdinando Cicalese

Zsuzsanna Lipták

Massimiliano Rossi

Abstract

1. Introduction

1.1. Our results

1.2. Overview of paper

2. Basics

3. Operations generating infinite prefix normal words

4. Sturmian words and prefix normal words

5. Prefix normal words, prefix normal forms, and abelian complexity

5.1. Balanced and c-balanced words.

5.2. Prefix normal forms and abelian complexity.

Table 1:

Figure 2:

Figure 3:

5.2.1. The paperfolding word

Figure 4:

5.2.2. Morphic images under the Thue-Morse morphism

Figure 5:

5.3. Prefix normal forms of Sturmian words.

5.4. Prefix normal forms of binary uniform morphisms

6. Prefix normal words and lexicographic order

7. On the periodicity and aperiodicity of prefix normal words with respect to minimum density

8. Conclusion

Figure 1:

Acknowledgements

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

On Infinite Prefix Normal Words

Ferdinando Cicalese

Zsuzsanna Lipták

Massimiliano Rossi

Abstract

1. Introduction

1.1. Our results

1.2. Overview of paper

2. Basics

3. Operations generating infinite prefix normal words

4. Sturmian words and prefix normal words

5. Prefix normal words, prefix normal forms, and abelian complexity

5.1. Balanced and c-balanced words.

5.2. Prefix normal forms and abelian complexity.

Table 1:

Figure 2:

Figure 3:

5.2.1. The paperfolding word

Figure 4:

5.2.2. Morphic images under the Thue-Morse morphism

Figure 5:

5.3. Prefix normal forms of Sturmian words.

5.4. Prefix normal forms of binary uniform morphisms

6. Prefix normal words and lexicographic order

7. On the periodicity and aperiodicity of prefix normal words with respect to minimum density

8. Conclusion

Figure 1:

Acknowledgements

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases