Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jun 22.
Published in final edited form as: Theor Comput Sci. 2021 Jan 11;859:134–148. doi: 10.1016/j.tcs.2021.01.015

On Infinite Prefix Normal Words

Ferdinando Cicalese 1, Zsuzsanna Lipták 1,*, Massimiliano Rossi 2
PMCID: PMC8219218  NIHMSID: NIHMS1709524  PMID: 34163096

Abstract

Prefix normal words are binary words with the property that no factor has more 1s than the prefix of the same length. Finite prefix normal words were introduced in [Fici and Lipták, DLT 2011]. In this paper, we study infinite prefix normal words and explore their relationship to some known classes of infinite binary words. In particular, we establish a connection between prefix normal words and Sturmian words, between prefix normal words and abelian complexity, and between prefix normality and lexicographic order.1

Keywords: combinatorics on words, prefix normal words, infinite words, Sturmian words, abelian complexity, paperfolding word, Thue-Morse sequence, lexicographic order

1. Introduction

Prefix normal words are binary words where no factor has more 1s than the prefix of the same length. As an example, the word 11100110101 is prefix normal, while 11100110110 is not, since it has a factor of length 5 with four 1s, while the prefix of length 5 has only three 1s. Finite prefix normal words were introduced in [18] and further studied in [10, 11, 31, 14, 3, 19, 9].

One motivation for studying prefix normal words comes from the problem of Indexed Binary Jumbled Pattern Matching [7, 8, 25, 21, 2, 20, 13, 16, 1]: Given a finite word s of length n, construct an index in such a way that the following type of queries can be answered efficiently: for two integers x, y ≥ 0, does s have a factor with x 1s and y 0s? As shown in [18, 11], prefix normal words can be used for constructing such an index, via so-called prefix normal forms.

Prefix normal words have also been shown to form bubble languages [29, 30, 10], a family of binary languages with efficiently generable combinatorial Gray codes; the language of prefix normal words has connections to the Binary Reflected Gray Code [31]; and, recently, prefix normal words also appeared in a graph theoretic context [6]. Indeed, three sequences related to prefix normal words are present in the On-Line Encyclopedia of Integer Sequences (OEIS [33]): A194850 (the number of prefix normal words of length n), A238109 (a list of prefix normal words over the alphabet {1, 2}), and A238110 (maximal equivalence class sizes of words with the same prefix normal form).

In [14], we introduced infinite prefix normal words and analyzed a particular procedure that, given a finite prefix normal word, extends it while preserving the prefix normality property. We showed that the resulting infinite word is ultimately periodic. In this paper, we present a more comprehensive study of infinite prefix normal words, covering several classes of known and well studied infinite words. We will now give a quick tour of the paper (for precise definitions, see Section 2).

1.1. Our results

One way of obtaining infinite prefix normal words is by extending finite prefix normal words. We specify two such operations which, in the limit, produce prefix normal words that are extremal with respect to density (Theorem 1).

There exist periodic, ultimately periodic, and aperiodic infinite prefix normal words: for example, the periodic words 0ω, 1ω, and (10)ω are prefix normal; the ultimately periodic word 1(10)ω is prefix normal; and so is the aperiodic word 10100100010000 ⋯ = limn→∞ 10102 ⋯ 10n. The best studied class of aperiodic words are Sturmian words. We show that a Sturmian word w is prefix normal if and only if w = 1cα for some α, where cα is the characteristic word of slope α (Theorem 2).

We show further that every Sturmian word w can be turned into a prefix normal word by prepending a fixed number of 1s, which only depends on the slope of w. This follows from a more general result regarding c-balanced words (Lemma 5). For example, the Fibonacci word

f=0100101001001010010100100101001001

is not prefix normal, but the word 1f is. Two other well-studied aperiodic words are the Thue-Morse word and the Champernowne word. The Thue-Morse word

t=01101001100101101001011001101001

is not prefix normal but it can be turned into a prefix normal word by prepending two 1s: 11t is prefix normal. On the other hand, the binary Champernowne word

c=0110111001011101111000100110101011

which is constructed by concatenating the binary expansions of the integers in ascending order, is not prefix normal and cannot be turned into a prefix normal word by prepending a finite number of 1s.

We also show that the notion of prefix normal forms from [18, 11] can be extended to infinite words. These can be used, similarly to the finite case, to encode the abelian complexity of the original word. The study of abelian complexity of infinite words was initiated in [27], and continued e.g. in [24, 4, 34, 12, 22]. We establish a close relationship between the abelian complexity and the prefix normal forms of w (Theorem 3). We demonstrate how this close connection can be used to derive results about the prefix normal forms of a word w. In some cases, such as for Sturmian words and words which are morphic images under the Thue-Morse morphism, we are able to explicitly give the prefix normal forms of the word (Corollary 3 and Theorem 5). Conversely, knowing its prefix normal forms allows us to derive results about the abelian complexity of a word. We also show how to compute the prefix normal forms of words that are binary uniform morphisms, based on an algorithm from [5] for computing their abelian complexity.

Another class of well-known binary words are Lyndon words. Notice that the prefix normal condition is different from the Lyndon condition2: for finite words, there are words which are both Lyndon and prefix normal (e.g. 110010), words which are Lyndon but not prefix normal (11100110110), words which are prefix normal but not Lyndon (110101), and words which are neither (101100). We study infinite prefix normal words and their prefix normal forms in the context of lexicographic orderings, and compare them to infinite Lyndon words [32] and the max- and min-words of [26] (Corollary 5).

Finally, we give conditions for periodicity and ultimate periodicity of prefix normal words in terms of their minimum density, a parameter introduced in [14] (Theorem 8).

1.2. Overview of paper

The paper is organized as follows. In Section 2, we introduce our terminology and give some simple facts about prefix normal words. In Section 3, we compare different operations that generate infinite prefix normal words by extending finite prefix normal words. In Section 4, we study the relationship between Sturmian words and prefix normal words. Section 5 deals with the connection between prefix normality and abelian complexity, and Section 6 focuses on the relationship with lexicographic order. Finally, in Section 7, we analyze the relationship between periodicity and minimum density of prefix normal words.

2. Basics

In our definitions and notations, we follow mostly [23]. A finite (resp. infinite) binary word w is a finite (resp. infinite) sequence of elements from {0, 1}. Thus an infinite word is a mapping w:N{0,1}, where N denotes the set of positive integers. We denote the ith character of w by wi. Note that we index words starting from 1. If w is finite, then its length is denoted by ∣w∣. The empty word, denoted ε, is the unique word of length 0. The set of binary words of length n is denoted by {0, 1}n, the set of all finite words by {0, 1}* = ∪n≥0{0, 1}n, and the set of infinite binary words by {0, 1}ω. For a finite word u = u1un, we write urev = unu1 for the reverse of u, and for a finite or infinite word u, u¯=u¯1u¯2 for the complement of u, where a¯=1a for α ∈ {0, 1}.

For two words u, v, where u is finite and v is finite or infinite, we write uv for their concatenation. If w = uxv, then u is called a prefix, x a factor (or substring), and v a suffix of w. We denote the set of factors of w by Fct(w) and its prefix of length i by prefw(i), where prefw(0) = ε. For a finite word u, we write ∣u1 for the number of 1s, and ∣u0 for the number of 0s in u, and refer to ∣u1 as the weight of u. The Parikh vector of u is pv(u) = (∣u0, ∣u1). A word w is called balanced if for all u, vFct(w), ∣u∣ = ∣v∣ implies ∣∣u1 − ∣v1∣ ≤ 1, and c-balanced if ∣u∣ = ∣v∣ implies ∣∣u1 − ∣v1∣ ≤ c.

For an integer k ≥ 1 and u ∈ {0, 1}n, uk denotes the kn-length word uuuu (k-fold concatenation of u) and uω the infinite word uuu ⋯. An infinite word w is called periodic if w = uω for some non-empty word u, and ultimately periodic if it can be written as w = vuω for some v and non-empty u. A word that is neither periodic nor ultimately periodic is called aperiodic. We set 0 < 1 and denote by ≤lex the lexicographic order between words, i.e. ulex v if u is a prefix of v or there is an index i ≥ 1 s.t. prefu(i − 1) = prefv(i − 1) and ui < vi.

For an operation op : {0, 1}* → {0, 1}*, we denote by op(i) the ith iteration of op. Further, let op*(w) = {op(i)(w) ∣ i ≥ 1} and opω (w) = limi→∞ op(i)(w), if it exists.

A binary morphism μ is a function μ : {0, 1}* → {0, 1}* such that for all u, v ∈ {0, 1}*, μ(uv) = μ(u)μ(v). A binary morphism μ is called uniform if ∣μ(0)∣ = ∣μ(1)∣. A fix point of a morphism μ is an infinite word v such that v = μω(a) for some a ∈ {0, 1}.

Definition 1 Let w be a (finite or infinite) binary word. We define the following functions:

  • Pw(i) = ∣prefw(i)∣1, the weight of the prefix of length i,

  • Dw(i) = Pw(i)/i, the density of the prefix of length i,

  • Fw1(i)=max{u1:uFct(w),u=i} the maximum number of 1s in a factor of length i,

  • fw1(i)=min{u1:uFct(w),u=i}, the minimum number of 1s in a factor of length i,

  • Fw0(i)=max{u0:uFct(w),u=i}, the maximum number of 0s in a factor of length i,

  • fw0(i)=min{u0:uFct(w),u=i}, the minimum number of 0s in a factor of length i.

Note that in the context of succinct indexing, the function Pw(i) is often called rank1(w, i). We are now ready to define prefix normal words.

Definition 2 (Prefix normal words) A (infinite or finite) binary word w is called 1-prefix normal, or simply prefix normal, if Pw(i)=Fw1(i) for all i ≥ 1 (for all 1 ≤ i ≤ ∣w∣ if w is finite). It is called 0-prefix normal if iPw(i)=Fw0(i) for all i ≥ 1 (for all 1 ≤ i ≤ ∣wif w is finite). We denote the set of all finite 1-prefix normal words by Lfin, the set of all infinite 1-prefix normal words by Linf, and L=LfinLinf.

In other words, a word is prefix normal if no factor has more 1s than the prefix of the same length. Given a binary word w, we say that a factor u of w satisfies the prefix normal condition if ∣u1Pw(∣u∣).

Example 1 The word 110100110110 is not prefix normal since the factor 11011 has four 1s, which is more than in the prefix 11010 of length 5. The word 110100110010, on the other hand, is prefix normal. The infinite word (11001)ω is not prefix normal, because it has 111 as a factor, which has more 1s than the prefix of length 3, but the word (11010)ω is.

The following facts about infinite prefix normal words are immediate.

  • Lemma 1 1. For all uLfin, the word w=u0ωLinf.

  • 2.

    Let w ∈ {0, 1}ω. Then wL if and only if for all i ≥ 1, prefw(i)L.

Definition 3 (Minimum density, minimum-density prefix, slope) Let w ∈ {0, 1}*∪{0, 1}ω. Define the minimum density of w as δ(w) = inf{Dw(i) ∣ 1 ≤ i}. If this infimum is attained somewhere, then we also define ı(w) = min{j ≥ 1 ∣ ∀i : Dw(j) ≤ Dw(i)} and κ(w) = Pw(ı(w)). We refer to prefw(ı(w)) as the minimum-density prefix, the shortest prefix with density δ(w). For an infinite word w, we define the slope of w as limi→∞ Dw(i), if this limit exists.

Remark 1 Note that ı(w) is always defined for finite words, while for infinite words, a prefix which attains the infimum may or may not exist. We note further that density and slope of infinite binary words do not necessarily coincide. In particular, while δ(w) exists for every w, the limit limi→∞ Dw(i) may not exist, i.e., w may or may not have a slope. As an example, consider the word w = v0v1v2 ⋯, where for each i, vi = 12i 02i. Then, δ(w) = 1/2 and limi→∞ Dw(i) does not exist, since Dw(i) has an infinite subsequence which is constant 1/2, and another which tends to 2/3.

Moreover, even for words w for which the slope is defined, this can be different from the minimum density. If w has slope α, then α = δ(w) if and only if for all i, Dw(i) ≥ α. For instance, the infinite word 01ω has slope 1 but its minimum density is 0. On the other hand, the infinite word 1(10)ω has both slope and minimum density 1/2.

3. Operations generating infinite prefix normal words

In [14], we introduced an operation which takes a finite prefix normal word w ending in 1 and extends it by a run of 0s followed by a new 1, in such a way that this new 1 is placed in the first possible position without violating prefix normality. This operation, called flipext, leaves the minimum density invariant. Moreover, by repeatedly applying the flipext operation, an infinite prefix normal word is produced which is the densest among all prefix normal words with given prefix w.

Here we extend the definition of flipext to all prefix normal words containing at least one 1 and show that the same properties hold, even if the original word w does not end in 1.

Definition 4 (Operation flipext) Let wLfin{0}. Define flipext(w) as the finite word w0k1, where k=min{jw0j1L}. We further define the infinite word v = flipextω(w).

The next proposition is a slightly more general form of Lemma 13 from [14]:

Proposition 1 Let wLfin{0} and v ∈ flipext*(w)∪{flipextω(w)}. Then δ(v) = δ(w), and, as a consequence, ı(v) = ı(w) and κ(v) = κ(w). Moreover, Dv(j · ı(w)) = δ(w) for all j ≥ 1.

Proof. Let wL. If the last character of w is a 1, then the claim holds by Lemma 13 of [14].

Else w ends in a run of 0s. Let be the length of this run, and w′ be such that w = w′0. Let w″ = flipext(w′) = w′0k1, i.e. by definition of flipext, k is minimal s.t. w0k1L. If k, then flipext(w) = flipext(w′) = w″. Since w′ is a prefix of w, and w is a prefix of w″, we have δ(w′) ≥ δ(w) ≥ δ(w″). Since w′ ends in a 1, δ(w″) = δ(w′), and thus δ(w″) = δ(w).

Otherwise > k, therefore flipext(w)=w01Lfin for some ℓ′ < , hence w01Lfin. The latter implies flipext(w) = w1 and δ(flipext(w)) = δ(w).

Further iterations flipext(i)(w) fulfil the claim due to the fact that flipext(w) ends in a 1.

We now show the second statement: Dv(j·ı(w)) = δ(w) for all j ≥ 1. We show it by induction. It is clearly true for j = 1, moreover for each j > 1 assuming Dv((j − 1)·ı(w)) = δ(w) and letting w′ = prefv((j − 1) · ı(w)) and w″ be the substring of size ı(w) such that w′w″ = prefv(j · ı(w)), we have

δ(w)=δ(v)Dv(jı(w))=w1+w1jı(w)Pw(ı(w))(j1)+Pw(ı(w))jı(w)=Pw(ı(w))ı(w)=δ(w),

where in the second inequality we are using ∣w′1 = Pw(ı(w))(j − 1)ı(w) (induction hypothesis) and ∣w″1 < Pw(ı(w)) (since v is prefix normal).

The next proposition states that the infinite word which is generated by repeatedly applying the flipext operation is the densest among all prefix normal words with prefix w.

Proposition 2 Let wLfin0, v = flipextω(w), and let zLinf such that prefz(∣w∣) = w. Then for every i = 1, 2,… we have Pv(i) ≥ Pz(i).

Proof. We argue by contradiction. Let i be the smallest integer such that Pv(i) < Pz(i). Clearly i > ∣w∣ and, by the minimality assumption we must have Pv(i − 1) = Pz(i − 1) and vi = 0, zi = 1. By definition of flipext there must exist j < i such that ∣vj+1vi−11∣1 > Pv(ij) ≥ Pz(ij), for otherwise we would have vi = 1. Since v is prefix normal, it also follows that we have ∣vj+1vi−1vi1 = Pv(ij) ≥ Pz(ij).

From this, since by the minimality of i it holds that Pz(j) ≤ Pv(j), we have that ∣zj+1zi−1zi1 = Pz(i) − Pz(j) > Pv(i) − Pv(j) = Pv(ij) ≥ Pz(ij), violating the prefix normality of z. □

We now define a different operation, called lazy-flipext, which, given a prefix normal word w, extends it by adding 0s as long as the minimum density of the resulting word is not smaller than δ(w), and only then adding a 1. We show that this operation preserves the prefix normality of the resulting word.

Definition 5 (Operation lazy-flipext) Let α ∈ (0, 1] and let wLfin with δ(w) ≥ α. We define lazy-flipext(w, α) as the finite word w0k 1 where k = max{jδ(w0j) ≥ α}. We further define the infinite word v = lazy-flipextω(w, α).

Example 2 Let w = 111 and let α=21. Then lazy-flipext(w, α) = 11100001, since δ(1110000) = 3/7 ≥ α and δ(11100000) = 3/8 < α. Further, lazy-flipext(2)(w, α) = 1110000101, since δ(111000010) = 4/9 ≥ α and δ(1110000100) = 2/5 < α.

Lemma 2 Let α ∈ (0, 1]. For every wLfin with δ(w) ≥ α, the word v = lazy-flipext(w, α) is also prefix normal, with δ(v) ≥ α.

Proof. First note that δ(v) ≥ α by definition. Now write v = w0k1, and let u = flipext(w) = w01. Recall that =min{jw0j1L}. If k < , this implies δ(u) < α, in contradiction to Proposition 1, since δ(u) = δ(w) ≥ α. Thus k, from which follows vL. □

Corollary 1 Let α ∈ (0, 1] and wLfin with δ(w) ≥ α. Then v = lazy-flipextω(w, α) is an infinite prefix normal word and δ(v) = α.

Proof. That v is prefix normal follows from Lemma 1 and from Lemma 2, which also implies that δ(v) ≥ α. However, if δ(v) > α was true, then for a suitably long prefix i, we would get a contradition to the definition of the lazy-flipext operation. □

Fix wLfin. The next proposition states that the lazy-flipext operation with α = δ(w), applied to w, generates a prefix normal word that has the minimum number of 1s among all prefix normal words with prefix w and minimum density δ(w).

Proposition 3 Let wLfin, α = δ(w), v = lazy-flipextω(w, α), and zLinf such that prefz(∣w∣) = w and δ(z) ≥ δ(w). Then for all i = 1, 2, …, we have Pv(i) ≤ Pz(i).

Proof. We argue by contradiction. Let i be the smallest integer such that Pv(i) < Pz(i). Clearly i > ∣w∣ and, by the minimality assumption, we have Pv(i − 1) = Pz(i − 1) and vi = 0, zi = 1. Let u = prefv(i − 1). Since i > ∣w∣ and vi = 1, therefore u1 = lazy-flipext(u′, α) for some u′, and thus, by definition of lazy-flipext, Pu0(i)/i < α. But u0 = prefi(z), so we have

δ(z)Dz(i)=Pz(i)i=Pu0(i)i<δ(w),

in contradiction to the density of z. □

Theorem 1 Let wLfin with α = δ(w) ∈ (0, 1], and let zLinf such that prefz(∣w∣) = w and δ(z) ≥ α. Let u = flipextω(w) and v = lazy-flipextω(w, α). Then vlex zlex u.

Proof. Follows from Prop. 2 and Prop. 3. □

Note that if prefz(∣w∣) = w, then δ(z) ≥ δ(w) implies that, in fact, δ(z) = δ(w) holds, since z is an extension of w. Theorem 1 states then that all prefix normal extensions of w with the same minimum density as w lie lexicographically between the lazy-flipext- and the flipext-extensions of w. However, not all extensions of w between these two words are prefix normal, as we can see in the next example.

Example 3 Let w = 1101101100100010000001, with α = δ(w) = 8/21, then

v=lazy-flipext(8)(w,α)=w01001010010010100100,u=flipext(8)(w)=w101101100100010000001.

Let p = w100111010100000100001 and q = w101101010100001000001, we have that for all 1 ≤ i ≤ 42, Pv(i) ≤ Pp(i), Pq(i) ≤ Pu(i) and vlex p, qlex u. Note that p is not prefix normal, while q is prefix normal.

4. Sturmian words and prefix normal words

In the previous section, we presented operations that construct infinite prefix normal words by extending finite prefix normal words. In particular, the lazy-flipext operation extends a finite binary word with as few 1s as possible while preserving its minimum density. This is reminiscent of the characterization of Sturmian words in terms of mechanical words and the slope. Led by this analogy, in this section we provide a complete characterization of Sturmian words which are prefix normal. We refer the interested reader to [23, Chapter 2], for a comprehensive treatment of Sturmian words. Here we briefly recall some facts which we will need later.

Definition 6 (Sturmian words) Let w ∈ {0, 1}ω. Then w is called Sturmian if it is balanced and aperiodic.

An equivalent definition of Sturmian words is that they are irrational mechanical, a definition we recall next.

Definition 7 (Mechanical words) Given two real numbers 0 ≤ α ≤ 1 and 0 ≤ τ < 1, the lower mechanical word sα,τ = sα,τ(1) sα,τ(2) ⋯ and the upper mechanical word sα,τ=sα,τ(1) sα,τ(2) are given by

sα,τ(n)=αn+τα(n1)+τsα,τ(n)=αn+τα(n1)+τ(n1).

Then α is called the slope and τ the intercept of sα,τ, sα,τ. A word w is called mechanical if w = sα,τ or w=sα,τ for some α, τ. It is called rational mechanical (resp. irrational mechanical) if α is rational (resp. irrational).

  • Fact 1 (Some facts about Sturmian words [23]) 1. An infinite binary word is Sturmian if and only if it is irrational mechanical.

  • 2.

    For τ = 0 and irrational α, there exists a word cα, called the characteristic word with slope α, s.t. sα,0 = 0cα and sα,0=1cα. This word cα is a Sturmian word itself, with both slope and intercept α.

  • 3.

    For two Sturmian words w and v with the same slope, Fct(w) = Fct(v).

We now show that the word lazy-flipextω(1, α) coincides with the upper mechanical word sα,0. This also implies that sα,0 is prefix normal, as noted in the subsequent corollary.

Lemma 3 Fix α ∈ (0, 1] and let v = lazy-flipextω(1, α). Let s=sα,0 be the upper mechanical word of slope α and intercept 0. Then v = s.

Proof. Let si and vi denote the ith character of s and v respectively. We argue by induction on i that vi = si. The claim is true for i = 1 since, directly from the definitions we have v1 = 1 = s1. Let n > 1 and assume that for each i < n we have vi = si. For the induction step we argue according to the character sn.

(i) If sn = 1, by definition ⌈⌉−⌈(n − 1)α⌉ = 1. Thus, ⌈(n − 1)α⌉ < . Using this inequality and the induction hypothesis together with the definition of sα,0 we have that ∣v1vn−11 = ∣s1sn−11 = ⌈(n − 1)α⌉ < αn. Therefore ∣v1vn−10∣1 = ∣v1vn−11 < αn, which means that δ(v1vn−10) < α, hence by definition lazy-flipext(v1vn−1, α) = v1vn−11, i.e., vn = sn.

(ii) If sn = 0, by definition ⌈⌉ − ⌈(n− 1)α⌉ = 0. Thus, ⌈(n − 1)α⌉ ≥ . Using this inequality and the induction hypothesis together with the definition of sα,0 we have that ∣v1vn−11 = ∣s1sn−11 = ⌈(n − 1)α⌉ ≥ αn. Therefore ∣v1vn−10∣1 = ∣v1vn−11αn which means that δ(v1vn−10) ≥ α, hence by definition lazy-flipext(v1vn−1, α) = v1vn−10 ⋯ 01, i.e., vn = 0 = sn. □

Corollary 2 Let α ∈ (0, 1]. Then sα,0 is an infinite prefix normal word and δ(sα,0)=α.

The following theorem fully characterizes those Sturmian words which are prefix normal.

Theorem 2 A Sturmian word s of slope α is prefix normal if and only if s = 1cα, where cα is the characteristic Sturmian word with slope α.

Proof. By definition, α is irrational. Let s=sα,0. Then s is Sturmian and prefix normal by Corollary 2. Let t be a Sturmian word with the same slope α which is also prefix normal. By Fact 1, s and t have the same factors.

Assume, by contradiction, that st, hence there exists i ≥ 1 such that ∣s1si1 ≠ ∣t1ti1. Assume, without loss of generality (since we can, if necessary, swap s and t in the following argument), that ∣s1si1 > ∣t1ti1. Then, since s1si is also a factor of t, there is a j ≥ 1 such that tj+1tj+i = s1si, hence ∣tj+1tj+11 > ∣t1ti1 contradicting the assumption that t is prefix normal. □

5. Prefix normal words, prefix normal forms, and abelian complexity

Given an infinite word w, the abelian complexity function of w, denoted ψw, is given by ψw (n) = ∣{pv(u) ∣ uFct(w), ∣u∣ = n}∣, the number of Parikh vectors of n-length factors of w. A word w is said to have bounded abelian complexity if there exists a c s.t. for all n, ψw(n) ≤ c. Note that a binary word is c-balanced if and only if its abelian complexity is bounded by c + 1. We denote the set of Parikh vectors of factors of a word w by Π(w) = {pv(u) ∣ uFct(w)}. Thus, ψw(n) = ∣Π(w) ∩ {(x, y) ∣ x + y = n}∣. In this section, we study the connection between prefix normal words and abelian complexity.

5.1. Balanced and c-balanced words.

Based on the examples in the introduction, one could conclude that any word with bounded abelian complexity can be turned into a prefix normal word by prepending a fixed number of 1s. However, consider the word w = 01ω, which is balanced, i.e. its abelian complexity function is bounded by 2. It is easy to see that 1kwL for every kN.

Sturmian words are precisely the words which are aperiodic and whose abelian complexity is constant 2 [27]. For Sturmian words, it is always possible to prepend a finite number of 1s to get a prefix normal word, as we will see next. Recall that for a Sturmian word w, at least one of 0w and 1w is Sturmian, with both being Sturmian if and only if w is characteristic [23].

Lemma 4 Let w be a Sturmian word with slope α. Then

  1. 1wL if and only if 0w is Sturmian,

  2. if 0w is not Sturmian, then 1nwL for n = ⌈1/(1 − α)⌉.

  3. Proof. 1. Let 0w be Sturmian and let u be some factor of 1w. If u is a prefix of 1w, there is nothing to show, therefore let uFct(w), with ∣u∣ = n and ∣u1 = k. Since 0w is Sturmian, we have that the prefix of 0w of length n has at least k − 1 1s, thus P1w(n) ≥ k = ∣u1, as desired. Conversely, if 0w is not Sturmian, this means that it is not balanced, therefore there exists a factor u of w s.t. ∣∣u1 − ∣0w1wn−11∣ ≥ 2, where ∣u∣ = n. Since w is Sturmian, we have that ∣∣w1wn−11 − ∣u1un−11 ≤ 1 and ∣∣w1wn−11 − ∣u2un1∣ ≤ 1. Let ∣w1wn−11 = k, then this implies, by a case-by-case consideration, that ∣u1un−11 = ∣u2un1 = k + 1, and thus ∣1w1wn−11 = k + 1 < k + 2 = ∣u1, showing that 1w is not prefix normal.

  4. First note that a Sturmian word of slope α cannot have a run of 1s of length ⌈1/(1 − α)⌉. To see this, it is enough to consider the upper mechanical word of slope α and intercept 0 (since all the other words with the same slope have the same set of factors). Let us write s = sα,0 = s1s2

Now s has a run of n 1s if and only if there exists an i ≥ 0 such that si+1 = si+2 = ⋯ = si+n = 1. By the definition of mechanical words, we have that the last condition is equivalent to

α(i+n)αi=n.

On the other hand, if n11α, i.e., αn1n we have that the sum of the character j=1nsi+j satisfies

j=1nsi+j=α(i+n)αiαi+αnαi=αn<αn+1n1n×n+1=n.

i.e., strictly smaller than n, i.e., we have a contradiction si+1si+n ≠ 1n.

Now fix n = ⌈1/(1 − α)⌉ and let w′ = 1nw. Let uFct(w). Since, as shown above, 1n is not a factor, if ∣u∣ ≤ n, there is nothing to show. So let ∣u∣ = n + m. Then ∣u1un1n−1, and since w is balanced, we have that ∣w1wm1 ≥ ∣un+1un+m1 − 1, yielding that Pw′(n + m) ≥ n + ∣un+1un+m1 − 1 ≥ ∣u1. □

Lemma 5 Let w be a c-balanced word. If there exists a positive integer n s.t. 1nFct(w), then the word z = 1ncw is prefix normal.

Proof. We are going to show that every factor u of z satisfies the prefix normal condition ∣u1Pz(∣u∣). It is not hard to see that we can limit ourselves to only considering factors u such that u does not overlap with the prefix of z of the same length.

If ∣u∣ ≤ nc then ∣u1 ≤ ∣u∣ = Pz(∣u∣). Assume now that u = u′u″ with ∣u′∣ = nc and ∣u″∣ > 0. Since u′ is a factor of w of size nc the condition that w does not contain a factor 1n implies that u′ contains at least c 0s, i.e., ∣u′1 ≤ ∣u′∣ − c. Moreover, since w is c-balanced, we have that ∣u″1Pw(∣u″∣) + c. Therefore, observing that prefz(∣u∣) = prefz(∣u′∣ + ∣u″∣) = 1ncprefw(∣u″∣) we have that Pz(∣u∣) = nc + Pw(∣u″∣) ≥ ∣u′1 + ∣u″1 = ∣u1. □

In particular, Lemma 5 implies that any c-balanced word with infinitely many 0s can be turned into a prefix normal word by prepending a finite number of 1s, since such a word cannot have arbitrarily long runs of 1s. Note, however, that the number of 1s to prepend from Lemma 5 is not tight, as can be seen e.g. from the Thue-Morse word t: the longest run of 1s in t is 2 and t is 2-balanced, but 11t is prefix normal, as will be shown in the next section (Lemma 8).

5.2. Prefix normal forms and abelian complexity.

Recall that for a word w, Fwa(i) is the maximum number of a’s in a factor of w of length i, for a ∈ {0, 1}.

Definition 8 (Prefix normal forms) Let w ∈ {0, 1}ω. Define the words w′ and w″ by setting, for n ≥ 1, wn=Fw1(n)Fw1(n1) and wn=Fw0(n)Fw0(n1)¯. We refer to w′ as the prefix normal form of w w.r.t. 1 and to w″ as the prefix normal form of w w.r.t. 0, denoted PNF1(w) resp. PNF0(w).

In other words, PNF1(w) is the sequence of first differences of the maximum-1s function Fw1 of w. Similarly, PNF0(w) can be obtained by complementing the sequence of first differences of the maximum-0s function Fw0 of w. Note that for all n and a ∈ {0, 1}, either Fwa(n+1)=Fwa(n) or Fwa(n+1)=Fwa(n)+1, and therefore w′ and w″ are words over the alphabet {0, 1}. In particular, by construction, the two prefix normal words allow us to recover the maximum-1s and minimum-1s functions of w:

Observation 1 Let w be an infinite binary word and w′ = PNF1(w), w″ = PNF0(w). Then Pw(n)=Fw1(n) and Pw(n)=nFw0(n)=fw1(n).

Lemma 6 Let w ∈ {0, 1}ω. Then PNF1(w) is the unique 1-prefix normal word w′ s.t. for all iN,Fw1(i)=Fw1(i). Similarly, PNF0(w) is the unique 0-prefix normal word w″ s.t. for all iN,Fw0(i)=Fw0(i).

Proof. Let w′ = PNF1(w) and w″ = PNF0(w). First note that, by construction, for all iN, Fw1(i)=Fw1(i) and Fw0(i)=Fw0(i). It is easy to see that w′ is 1-prefix normal and w″ is 0-prefix normal. For uniqueness, note that for a ∈ {0, 1} and an a-prefix normal word v, we have PNFa(v) = v. □

Example 4 The two prefix normal forms and the maximum-1s and maximum-0s functions of the Fibonacci word f = 01001010010010100101 ⋯ are given in Table 1.

Table 1:

The maximum number of 0s and 1s (Ff0(n) and Ff1(n) resp.) for all n = 1,…, 20 of the Fibonacci word f, and the prefix normal forms of f.

n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Ff0(n) 1 2 2 3 4 4 5 5 6 7 7 8 9 9 10 10 11 12 12 13
Ff1(n) 1 1 2 2 2 3 3 4 4 4 5 5 5 6 6 7 7 7 8 8
PNF0(f) 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0
PNF1(f) 1 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0

Now we can connect the prefix normal forms of w to the abelian complexity of w in the following way. Given w′ = PNF1(w) and w″ = PNF0(w), the number of Parikh vectors of k-length factors is precisely 1 more than the difference in 1s in the prefix of length k of w′ and of w″. For example, Fig. 2 shows the prefix normal forms of the Fibonacci word. The vertical line at 5 cuts through points (5, −1) and (5, −3): the first component stands for the length of the string, the second for the difference between the number of 0s and the number of 1s, therefore indicating Parikh vectors (2, 3) and (1, 4).

Figure 2:

Figure 2:

The Fibonacci word (dashed) and its prefix normal forms (solid).

The Fibonacci word, being a Sturmian word, has constant abelian complexity 2. An example of a word with unbounded abelian complexity is the Champernowne word, whose prefix normal forms are 1ω resp. 0ω. (Fig. 3).

Figure 3:

Figure 3:

The Champernowne word (dashed) and its prefix normal forms (solid).

Theorem 3 Let w, v ∈ {0, 1}ω.

  1. ψw(n) = Pw′(n) − Pw″(n) + 1, where w′ = PNF1(w) and w″ = PNF0(w).

  2. Π(w) = Π(v) if and only if PNF0(w) = PNF0(v) and PNF1(w) = PNF1(v).

Proof. 1. Fix an integer n ≥ 1. By definition, we have that for every factor u of w of length n we have nFw0(n)u1Fw1(n). Therefore ψw(n)Fw1(n)(nFw0(n))+1.

Conversely, since w contains a factor u′ of length n with Fw1(n) many 1s and a factor u″ of length n with nFw0(n) many 1s, if we scan w between an occurrence of u′ and an occurrence of u″, for each x ∈ {∣u″1,…, ∣u′1} there must be a factor u‴ of size n such that ∣u‴1 = x. Therefore ψw(n)Fw1(n)(nFw0(n))+1. We can conclude that ψw(n)=Fw1(n)(nFw0(n))+1. The desired result then follows by observing that nFw0(n)=nprefPNF0(w)(n)0=PPNF0(w)(n) and Fw1(n)=PPNF1(w)(n).

2. Follows directly from Observation 1. □

Theorem 3 implies that if we know the prefix normal forms of a word, then we can compute its abelian complexity. Conversely, the abelian complexity is the width of the area enclosed by the two words PNF1(w) and PNF0(w). In general, this fact alone does not give us the PNFs; but if we know more about the word itself, then we may be able to compute the prefix normal forms, as we will see in the case of the paperfolding word.

We will now give two examples of the close connection between abelian complexity and prefix normal forms, using some recent results about the abelian complexity of infinite words.

5.2.1. The paperfolding word

The first few characters of the ordinary paperfolding word are given by

p=0010011000110110001001110011011

The paperfolding word was originally introduced in [17]. One definition is given by: pn=0 if n′ ≡ 1 mod 4 and pn=1 if n′ ≡ 3 mod 4, where n′ is the unique odd integer such that n = n′2k for some k [24]. The abelian complexity function of the paperfolding word was fully determined in [24], giving the following initial values for ψp(n), for n ≥ 1: 2, 3, 4, 3, 4, 5, 4, 3, 4, 5, 6, 5, 4, 5, 4, 3, 4, 5, 6, 5, and a recursive formula for the computation of all values. The authors note that for the paperfolding word, it holds that if uFct(p), then also urev¯Fct(p). This implies

Fp1(n)=Fp0(n)for alln,and thus PNF0(p)=PNF1(p)¯.

Moreover, from Thm. 3 we get that Fp1(n)=PPNF1(p)(n)=(ψp(n)+n1)2, and thus we can determine the prefix normal forms of p, see Fig. 4.

Figure 4:

Figure 4:

The paperfolding word (dashed) and its prefix normal forms (solid).

This same argument holds in general as long as the word has the symmetric property similar to the paperfolding word. Therefore, we have proved the following lemma.

Lemma 7 Let w ∈ {0, 1}ω. If for all uFct(w), it holds that u¯Fct(w) or urev¯Fct(w), then Fw1(n)=Fw0(n) for all n, PNF0(w)=PNF1(w)¯, and Fw1(n)=(ψw(n)+n1)2.

5.2.2. Morphic images under the Thue-Morse morphism

The Thue-Morse word beginning with 0, which we denote by t, is one of the two fix points of the Thue-Morse morphism μTM, where μTM(0) = 01 and μTM(1) = 10:

t=μTMω(0)=01101001100101101001011001101001

The word t has abelian complexity function ψt(n) = 2 for n odd and ψt(n) = 3 for n > 1 even [27]. Since t fulfils the condition that uFct(t) implies u¯Fct(t), we can apply Lemma 7, and compute the prefix normal forms of t as PNF1(t) = 1(10)ω and PNF0(t) = 0(01)ω, see Fig. 5.

Figure 5:

Figure 5:

The Thue-Morse word (dashed) and its prefix normal forms (solid).

For the proof of the abelian complexity of t in [27], the Parikh vectors were computed for each length, so we do not really need Lemma 7 but could have obtained the prefix normal forms directly. Moreover, a much more general result was given in [27]:

Theorem 4 ([27]) Let w be an aperiodic infinite binary word. Then ψw = ψt if and only if w = μTM(w′), w = 0μTM(w′), or w = 1μTM(w′), for some word w′.

The abelian complexity function does not in general determine the prefix normal forms, as can be seen on the example of Sturmian words, which all have the same abelian complexity function but different prefix normal forms. However, ψt does, due to its values ψt(n) = 2 for n odd and ψt(n) = 3 for n even, and to the fact that both Ft1 and Ft0 have difference function with values from {0, 1}: notice that the only pair of such functions with width 2 resp. 3 are the PNFs of t. Therefore, we can deduce the following from Theorem 4:

Corollary 3 For an aperiodic infinite binary word w, PNF1(w) = 1(10)ω and PNF0 = 0(01)ω if and only if w = μTM(w′), w = 0μTM(w′), or w = 1μTM(w′), for some word w′.

To conclude this section, we return to the question of how many 1s need to be prepended to make the Thue-Morse word prefix normal.

Lemma 8 We have 11tL. Moreover, this is minimal since 1t is not prefix normal.

Proof. We will show that for every prefix, the number of 1s in the prefix of 11t is greater than or equal to the the number of 1s in the prefix of PNF1(t) of the same length. Let v = PNF1(t) and u = 11t. It is easy to see that Pv(n)=n2+1 and

Pu(n)={n2+1ifnis evenn2+2ifnis odd andun=1n2+1ifnis odd andun=0}

Thus for all n ≥ 1 it holds that Pu(n) ≥ Pv(n), implying that 11tL.

For minimality, note that 1t is not prefix normal, since 11 is a factor of t.

5.3. Prefix normal forms of Sturmian words.

Let w be a Sturmian word. As we saw in Sec. 4, the only 1-prefix normal word in the class of Sturmian words with the same slope α is the upper mechanical word sα,0=1cα.

Theorem 5 Let w be an irrational mechanical word with slope α, i.e. a Sturmian word. Then PNF1(w) = 1cα and PNF0(w) = 0cα, where cα is the characteristic word of slope α.

Proof. Since the characteristic word cα has the same slope as w, we have Fct(w) = Fct(cα) by Fact 1. The abelian complexity of w is constant 2 [27], thus a factor of length k can have either Fw1(k) or Fw1(k)1 1s. Let us call a factor u of w heavy if u1=Fw1(k), and light otherwise. We have to show that every prefix of 1cα is heavy; this will imply that 1cα is the prefix normal form of w. It is known [23] that the prefixes of the characteristic word are precisely the reverses of its right special factors, where a factor u is called right special if both u0 and u1 are factors. Thus, every prefix v of 1cα has the form v = 1urev, where both u1 and u0 are factors of w, implying that v1=1urev1=u11=Fw1(u+1), therefore v = 1urev is heavy. The fact that PNF0(w) = 0cα follows analogously. □

5.4. Prefix normal forms of binary uniform morphisms

In [5] the authors provide an algorithm which computes the abelian complexity of a morphic word that is the fix point of a binary uniform morphism, i.e., a morphism μ satisfying ∣μ(0)∣ = ∣μ(1)∣. We refer the reader to [5] for the details on this algorithm. In particular, the following theorem is proved in [5]:

Theorem 6 ([5]) Let w be the fix point of a binary uniform morphism μ. Then, for each n the values ψw(1), ψw(2), … , ψw(n), can be computed in O(n) time.

As an intermediate step in the computation of each ψw(i), the algorithm in [5] provides the minimum number of 0s (equivalently, the maximum number of 1s) in every i-length factor of w. Obviously the same procedure can be used to obtain the minimum number of 1s (equivalently, the maximum number of 0s) in every i-length factor of w. Therefore, we have the following corollary to the result of [5]:

Corollary 4 Let w be the fix point of a binary uniform morphism μ. For each n, the prefix of length n of PNF1(w) and of PNF0(w) can be computed in O(n) time.

6. Prefix normal words and lexicographic order

In this section, we study the relationship between lexicographic order and prefix normality. Note that for coherence with the rest of the paper, in the definition of Lyndon words, necklaces, and prenecklaces, we use lexicographically greater rather than smaller. Clearly, this is equivalent to the usual definitions up to renaming of characters.

Thus a finite Lyndon word is one which is lexicographically strictly greater than all of its conjugates: w is Lyndon if and only if for all non-empty u, v s.t. w = uv, we have w >lex vu. A necklace is a word which is greater than or equal to all its conjugates, and a prenecklace is one which can be extended to become a necklace, i.e. which is the prefix of some necklace [23, 28]. As we saw in the introduction, in the finite case, prefix normality and Lyndon property are orthogonal concepts. However, the set of finite prefix normal words is included in the set of prenecklaces [11].

An infinite word is Lyndon if an infinite number of its prefixes is Lyndon [32]. In the infinite case, we have a similar situation as in the finite case. There are words which are both Lyndon and prefix normal: 10ω, 110(10)ω; Lyndon but not prefix normal: 11100(110)ω; prefix normal but not Lyndon: (10)ω; and neither of the two: (01)ω.

Next we show that a prefix normal word cannot be lexicographically smaller than any of its suffixes. Let shifti(w) = wiwi+1wi+2 ⋯ denote the infinite word v s.t. w = w1wi−1v, i.e. v is the suffix of w starting at position i.

Lemma 9 Let wLinf. Then wlex shifti(w) for all i ≥ 1.

Proof. Assume that there exists a suffix v = shifti(w) of w s.t. v >lex w. Then there is an index j with v1vj−1 = w1wj−1 and vj > wj, implying vj = 1 and wj = 0. But then ∣wiwi+j−11 = ∣v1vj1 > ∣w1wj1, in contradiction to wLinf. □

In the finite case, it is easy to see that a word w is a prenecklace if and only if wlex v for every suffix v of w. This motivates our definition of infinite prenecklaces. The situation is the same as in the finite case: prefix normal words form a proper subset of prenecklaces.

Definition 9 Let w ∈ {0, 1}ω. Then w is an infinite prenecklace if for all i ≥ 1, wlex shifti(w). We denote by Pinf the set of infinite prenecklaces.

Proposition 4 LinfPinf.

Proof. The inclusion follows from Lemma 9. An example of a word which is an infinite prenecklace but not prefix normal is 11100(110)ω. □

There is another interesting relationship between lexicographic order and the prefix normal forms of an infinite word. In [26], two words were associated to an infinite binary word w, called max(w) (resp. min(w)), defined as the word whose prefix of length n is the lexicographically greatest (resp. smallest) n-length factor of w. It is easy to see that these words always exist. The following was shown in [26]:3

Theorem 7 ([26]) Let w be an infinite binary word. Then

  1. w is (rational or irrational) mechanical with its intercept equal to its slope if and only if 0wlex min(w) ≤lex max(w) ≤lex 1w, and

  2. w is characteristic Sturmian if and only if min(w) = 0w and max(w) = 1w.

Lemma 10 Let w ∈ {0, 1}ω. Then PNF1(w) ≥lex max(w) and PNF0(w) ≤lex min(w).

Proof. Assume otherwise, and let w′ = PNF1(w), v = max(w). If w′ < v, then there is an index j s.t. w1wj1=v1vj1 and wj=0 and vj = 1. This implies that v1vj has one more 1s than w1wj. But w1wj1=Fw1(j), a contradiction, since v1vj is a factor of w. The second claim follows analogously. □

Finally, from Theorems 5 and 7, we get the following corollary:

Corollary 5 Let w be an infinite binary word. Then w is characteristic Sturmian if and only if 0w = PNF0(w) = min(w) and 1w = PNF1(w) = max(w).

7. On the periodicity and aperiodicity of prefix normal words with respect to minimum density

In this section, we derive conditions for the periodicity and aperiodicity of prefix normal words with respect to their minimum density. The following result shows that every ultimately periodic infinite prefix normal word has rational minimum density.

Lemma 11 Let v be an infinite ultimately periodic binary word with minimum density δ(v) = α. Then αQ.

Proof. Let us write v = uxω with x not a suffix of u.

For i = 0, 1,…, ∣x∣ − 1, let yi be the prefix of length ∣u∣ + i of v, i.e., yi = ux1x2xi. Trivially, if for some i we have that δ(yi) ≤ δ(v) the claim directly follows from yi being a finite prefix of v.

Let us now assume that for each i = 0, 1,… ∣x∣ − 1 it holds that δ(v) < δ(yi) and let i* = min{iδ(yi) ≤ δ(yi) for each ji}, hence δ(v) < δ(yi*).

For every n ≥ ∣u∣ + ∣x∣ let in = ∣u∣ + ((n − ∣u∣) mod ∣x∣) and kn = ⌊(n − ∣u∣)/∣x∣⌋, i.e., ∣u∣ ≤ in ≤ ∣u∣ + ∣x∣ − 1 and n = in + knx∣.

Then, we have that

Dv(n)=yin1+knx1yin+knxmin{δ(yin),δ(x)}min{δ(yi),δ(x)}. (1)

Moreover, we also have that

limkDv(u+i+kx)=limkyi1+kx1yi+kx=δ(x). (2)

We cannot have δ(x) ≥ δ(yi*), since by (1) δ(yi*) is a rational lower bound on Dv(n) (for each n ≥ 1) which is achieved by Dv(∣u∣ + i*), contradicting the standing hypothesis δ(v) < δ(yi*).

Therefore, we must have δ(x) < δ(yi*), and from (1) we have Dv(n) ≥ δ(x) and from (2) we also have that for each ε > 0 there exists k > 0 such that Dv(∣u∣ + i* + kx∣) < δ(x) + ε. Therefore, δ(v) = inf{Dv(n) ∣ n ≥ 1} = δ(x), which is a rational number, since x is a finite string. □

We now show that, while periodicity is characterized by rational density, the converse is not true. It turns out that for every α ∈ (0, 1), both rational and irrational, there exists an aperiodic prefix normal word with minimum density α. For irrational α, this is an easy corollary from Theorem 2: since the Sturmian word 1cα is prefix normal, and D(i) ≥ α for each i, therefore, δ(1cα) = α. The next lemma shows how to construct an aperiodic prefix normal word with minimum density α for both rational and irrational α.

Lemma 12 Fix α ∈ (0, 1), and let (an)nN be a strictly decreasing infinite sequence of rational numbers from (0, 1) converging to α. For each i = 1, 2, …, let the binary word v(i) be defined by

v(i)={110a101010a1i=1prefflipextω(v(i1))(kiv(i1))0ii>1}

where ℓi defined by

i={1010a1i=1ki(v(i1)1aiv(i1)ai)i>1,}

and ki is the smallest integer greater than one such that ℓi > i−1.

Then v = limi→∞v(i) is an aperiodic infinite prefix normal word such that δ(v) = α.

Before proving Lemma 12, in give an example of the words v(i).

Example 5 We show the first three steps for the construction of an infinite aperiodic word with minimum density α = 1/3 (Lemma 12), using the infinite sequence of rational numbers ai = i/(3i − 1), which tends to 1/3 for i → ∞. Hence, for i = 1, we have a1 = 1/2, 1 = 5, and vi = 1505 with minimum density δ(v1) = 1/2. At the next step, a2 = 2/5, and with the values from the previous iteration we can compute k2 = 3 and ℓ2 = 7, hence v2 = 15051505150507, with δ(v2) = 15/37. At the third iteration, a3 = 3/8, k3 = 3, and ℓ3 = 9, therefore v3 = 15051505150121505150515012150515051501209, and the minimum density is δ(v3) = 45/120.

Proof. (of Lemma 12)

We will first prove the following claim, giving a number of properties of the sequence of words v(i), and then use these to prove that v is aperiodic and δ(v) = α.

Claim. The following properties hold:

  1. δ(v(i)) ≥ ai for each i ≥ 1;

  2. ı(v(i)) = ∣v(i)∣ for each i ≥ 1;

  3. δ(v(i)) < δ(v(i−1)) for each i ≥ 2;

  4. v(i)1 > ∣v(i−1)1 for each i > 2;

  5. δ(v(i))ai(kiv(i1)1kiv(i1)1ai) for each i ≥ 2.

Proof of the Claim. By direct inspection we have that properties 1 and 2 hold for v(1). We now argue by induction. Fix i > 1 and let us assume that properties 1 and 2 hold for v(i−1). Then, since ai < ai−1 we have

v(i1)1ai>v(i1)1ai1v(i1),

where the last inequality follows from property 1 and 2. Therefore, (v(i1)1aiv(i1)ai)>0, hence there exists ki > 1 such that ki(v(i1)1aiv(i1)ai)>i1. In particular, i is well defined.

By property 2, we have ı(v(i−1)) = ∣v(i−1)∣ hence by Proposition 1, we have Dflipextω(v(i−1))(kv(i−1)∣) = δ(v(i−1)) and also δ(prefflipextω(v(i−1))(kiv(i−1)∣)) = δ(v(i−1)).

Moreover, since i > 0 it is not hard to see from the definition of v(i) that

δ(v(i))=Dv(i)(v(i))=kiv(i1)1kiv(i1)+i<δ(v(i1)), (3)

which shows that property 3 and property 2 hold for v(i). In addition, because of ki > 1 and (by Proposition 1), ∣v(i)1 = ∣prefflipextω(v(i−1))(kiv(i−1)∣)∣1 = k1v(i−1)∣)∣1, it follows that property 4 also holds for v(i).

The definition of i, together with the well known property x − 1 < ⌊x⌋ ≤ x, imply that

kiai(v(i1)1aiv(i1))1<iki(v(i1)1aiv(i1)). (4)

Using the right inequality of (4) in (3), we have δ(v(i)) ≥ ai, showing that property 1 holds for v(i).

In addition, using the left inequality of (4) in (3), we have

δ(v(i))ai(kiv(i1)1kiv(i1)1ai)

showing that property 5 holds for v(i). The proof of the claim is complete.

In order to see that v is aperiodic, it is enough to observe that v ≠ 0ω and for each i ≥ 1 it contains a distinct run of i 0s, with i being a strictly increasing sequence.

To show that δ(v) = α, we will prove that limi→∞ δ(v(i)) = α. Since limi→∞ ai = α and for each i ≥ 1, ki > 1 and ∣v(i)1 > ∣v(i−1)1, we have

limiaikiv(i1)1kiv(i1)1ai=limiai=α.

Hence, from properties 4 and 5 of the Claim above, we have the desired result, limi→∞ δ(v(i)) = limi→∞ ai = α.

This completes the proof of the lemma. □

Summarizing, we have shown the following result.

Theorem 8 For every α ∈ (0, 1) (rational or irrational) there is an infinite aperiodic prefix normal word of minimum density α. On the other hand, for every ultimately periodic infinite prefix normal word w, the minimum density δ(w) is a rational number.

8. Conclusion

In this paper, we studied infinite prefix normal words. We gave several results of infinite extensions of finite prefix normal words, and we established connections between infinite prefix normal words and other classes of infinite binary words, namely Sturmian words, Lyndon words and max and min words. We provided a complete characterization of prefix normal Sturmian words. Furthermore, we showed that, similar to the finite case, the classes of infinite prefix normal words and Lyndon words are distinct, and that infinite prefix normal words are infinite prenecklaces.

We explored some connections between prefix normal words, prefix normal forms, and abelian complexity. In particular, we showed how to turn balanced and c-balanced words without arbitrarily long runs of 1s into prefix normal words, by prepending a finite number of 1s. We provided a method to compute the abelian complexity from the prefix normal form of a word, and, for specific cases, we showed how to compute the prefix normal form of a word, given its abelian complexity function. We further applied an existing algorithm to compute the prefix normal form of binary uniform morphisms.

Finally, we gave conditions for the periodicity and the aperiodicity of infinite prefix normal words, according to their minimum density.

Figure 1:

Figure 1:

Given w = 1101101100100010000001 the plot represents the last characters of flipext(8)(w) (solid) and the lazy-flipext(8)(w, α) (dashed). See Example 3. A 1 corresponds to a diagonal segment in direction NE, while a 0 to one in direction SE. On the x-axis we have the length of the prefix, and on the y-axis, the number of 1s minus the number of 0s in the prefix. The shaded area contains all prefix normal words with w as prefix and minimum density equal to δ(w). Note, however, that not all words in that area are prefix normal.

Acknowledgements

We wish to extend our thanks to the participants of the Workshop on Words and Complexity, which took place in Lyon in February 2018, for exciting discussions and helpful pointers, and to Péter Burcsi, who first got us interested in Sturmian words. We also thank the two anonymous reviewers, whose suggestions helped improve the presentation of our results. MR is funded by the National Science Foundation (NSF) IIS (Grant No. 1618814), IIBR (Grant No. 2029552) and National Institutes of Health (NIH) R01 (Grant No. HG011392).

Footnotes

1

This is an extended version of our paper presented at SOFSEM 2019 [15].

2

For ease of presentation, we are using Lyndon to mean lexicographically greatest among its conjugates; this is equivalent to the usual definition up to renaming characters.

3

The terminology in [26] differs from ours (we are following [23]). In order to help the reader, here we highlight the differences: (i) a periodic Sturmian in [26] is a rational mechanical word, (ii) a proper Sturmian word in [26] is an irrational mechanical word (i.e., a Sturmian word), and (iii) a standard Sturmian word in [26] is a mechanical word with intercept τ = α (the slope), thus a proper standard Sturmian word is a characteristic Sturmian word cα. Note that all mechanical words in [26] are defined for n ≥ 1, since the definition of mechanical word is: the lower mechanical word is defined as sα,τ(n) = ⌊α(n + 1) + τ⌋ − ⌊αn + τ⌋ for n ≥ 1, and analogously for the upper mechanical word. Therefore, an intercept τ = 0 in [26] is equivalent to an intercept of τ = α (the slope) in [23].

References

  • [1].Afshani Peyman, van Duijn Ingo, Killmann Rasmus, and Nielsen Jesper Sindahl. A lower bound for jumbled indexing. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, (SODA 2020), pages 592–606, 2020. [Google Scholar]
  • [2].Amir Amihood, Chan Timothy M., Lewenstein Moshe, and Lewenstein Noa. On hardness of jumbled indexing. In 41st International Colloquium on Automata, Languages, and Programming (ICALP 2014), volume 8572 of LNCS, pages 114–125, 2014. [Google Scholar]
  • [3].Balister Paul and Gerke Stefanie. The asymptotic number of prefix normal words. Theoret. Comput. Sci, 784:75–80, 2019. [Google Scholar]
  • [4].Blanchet-Sadri Francine, Fox Nathan, and Rampersad Narad. On the asymptotic abelian complexity of morphic words. Advances in Applied Mathematics, 61:46–84, 2014. [Google Scholar]
  • [5].Blanchet-Sadri Francine, Seita Daniel, and Wise David. Computing abelian complexity of binary uniform morphic words. Theor. Comput. Sci, 640:41–51, 2016. doi: 10.1016/j.tcs.2016.05.046. [DOI] [Google Scholar]
  • [6].Massé Alexandre Blondin, de Carufel Julien, Goupil Alain, Lapointe Mélodie, Nadeau Émile, and Vandomme Élise. Leaf realization problem, caterpillar graphs and prefix normal words. Theoret. Comput. Sci, 732:1–13, 2018. [Google Scholar]
  • [7].Burcsi Péter, Cicalese Ferdinando, Fici Gabriele, and Lipták Zsuzsanna. Algorithms for Jumbled Pattern Matching in Strings. International Journal of Foundations of Computer Science, 23:357–374, 2012. [Google Scholar]
  • [8].Burcsi Peter, Cicalese Ferdinando, Fici Gabriele, and Lipták Zsuzsanna. On approximate jumbled pattern matching in strings. Theory Comput. Syst, 50(1):35–51, 2012. [Google Scholar]
  • [9].Burcsi Péter, Fici Gabriele, Lipták Zsuzsanna, Raman Rajeev, and Sawada Joe. Generating a Gray code for prefix normal words in amortized polylogarithmic time per word. Theor. Comput. Sci, 842:86–99, 2020. [Google Scholar]
  • [10].Burcsi Péter, Fici Gabriele, Lipták Zsuzsanna, Ruskey Frank, and Sawada Joe. On combinatorial generation of prefix normal words. In Proc. of the 25th Ann. Symp. on Comb. Pattern Matching (CPM 2014), volume 8486 of LNCS, pages 60–69, 2014. [Google Scholar]
  • [11].Burcsi Péter, Fici Gabriele, Lipták Zsuzsanna, Ruskey Frank, and Sawada Joe. On prefix normal words and prefix normal forms. Theoret. Comput. Sci, 659:1–13, 2017. [Google Scholar]
  • [12].Cassaigne Julien and Kaboré Idrissa. Abelian complexity and frequencies of letters in infinite words. Int. Journal of Foundations of Computer Science, 27(05):631–649, 2016. [Google Scholar]
  • [13].Chan Timothy M. and Lewenstein Moshe. Clustered integer 3SUM via additive combinatorics. In Proc. of the 47th Ann. ACM on Symp. on Theory of Computing (STOC 2015), pages 31–40, 2015. [Google Scholar]
  • [14].Cicalese Ferdinando, Lipták Zsuzsanna, and Rossi Massimiliano. Bubble-flip - A new generation algorithm for prefix normal words. Theoret. Comput. Sci, 743:38–52, 2018. [Google Scholar]
  • [15].Cicalese Ferdinando, Lipták Zsuzsanna, and Rossi Massimiliano. On infinite prefix normal words. In Proc. of the 45th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2019), pages 122–135, 2019. [Google Scholar]
  • [16].Cunha Luís Felipe I., Dantas Simone, Gagie Travis, Wittler Roland, Antonio Luis Kowada Brasil, and Stoye Jens. Faster jumbled indexing for binary RLE strings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017), pages 19:1–19:9, 2017. [Google Scholar]
  • [17].Davis C and Knuth DE. Number representations and dragon curves, I, II. J. Recr. Math, 3:133–149 and 161–181, 1970. [Google Scholar]
  • [18].Fici Gabriele and Lipták Zsuzsanna. On prefix normal words. In Proc. of the 15th Intern. Conf. on Developments in Language Theory (DLT 2011), volume 6795 of LNCS, pages 228–238. Springer, 2011. [Google Scholar]
  • [19].Fleischmann Pamela, Nowotka Dirk, Kulczynski Mitja, and Poulsen Danny Bøgsted. On collapsing prefix normal words. In Proc. of the 14th International Conference Language and Automata Theory and Applications (LATA 2020), volume 12038 of LNCS, pages 412–424. Springer, 2020. [Google Scholar]
  • [20].Gagie Travis, Hermelin Danny, Landau Gad M., and Weimann Oren. Binary jumbled pattern matching on trees and tree-like structures. Algorithmica, 73(3):571–588, 2015. [Google Scholar]
  • [21].Giaquinta Emanuele and Grabowski Szymon. New algorithms for binary jumbled pattern matching. Inf. Process. Lett, 113(14–16):538–542, 2013. [Google Scholar]
  • [22].Kaboré Idrissa and Kientéga Boucaré. Abelian complexity of Thue-Morse word over a ternary alphabet. In Proc. of the 11th Int. Conf. on Combinatorics on Words WORDS 2017, volume 10432 of LNCS, pages 132–143. Springer, 2017. [Google Scholar]
  • [23].Lothaire M. Algebraic Combinatorics on Words. Cambridge Univ. Press, 2002. [Google Scholar]
  • [24].Madill Blake and Rampersad Narad. The abelian complexity of the paperfolding word. Discrete Mathematics, 313(7):831–838, 2013. doi: 10.1016/j.disc.2013.01.005. [DOI] [Google Scholar]
  • [25].Moosa Tanaeem M. and Sohel Rahman M. Sub-quadratic time and linear space data structures for permutation matching in binary strings. J. Discr. Alg, 10:5–9, 2012. [Google Scholar]
  • [26].Pirillo Giuseppe. Inequalities characterizing standard sturmian and episturmian words. Theor. Comput. Sci, 341(1-3):276–292, 2005. doi: 10.1016/j.tcs.2005.04.008. [DOI] [Google Scholar]
  • [27].Richomme Gwénaël, Saari Kalle, and Zamboni Luca Q.. Abelian complexity of minimal subshifts. J. London Math. Society, 83(1):79–95, 2011. doi: 10.1112/jlms/jdq063. [DOI] [Google Scholar]
  • [28].Ruskey Frank, Savage Carla, and Wang TMY. Generating necklaces. J. Algorithms, 13(3):414–430, 1992. [Google Scholar]
  • [29].Ruskey Frank, Sawada Joe, and Williams Aaron. Binary bubble languages and cool-lex order. J. Comb. Theory, Ser. A, 119(1):155–169, 2012. [Google Scholar]
  • [30].Sawada Joe and Williams Aaron. Efficient oracles for generating binary bubble languages. Electr. J. Comb, 19(1):P42, 2012. [Google Scholar]
  • [31].Sawada Joe, Williams Aaron, and Wong Dennis. Inside the Binary Reflected Gray Code: Flip-Swap languages in 2-Gray code order. Unpublished manuscript, 2017. [Google Scholar]
  • [32].Siromoney Rani, Mathew Lisa, Dare VR, and Subramanian KG. Infinite Lyndon words. Inf. Proc. Letters, 50:101–104, 1994. [Google Scholar]
  • [33].Sloane NJA. The On-Line Encyclopedia of Integer Sequences. Available electronically at http://oeis.org.
  • [34].Turek Ondrej. Abelian complexity of the Tribonacci word. J. of Integer Sequences, 18, 2015. [Google Scholar]

RESOURCES