Fig. 4.
Empirical rank distribution of word frequencies in The Origin of Species (black), showing two power-law regimes. For the most frequent words, the distribution is approximately power-law with an exponent . The corresponding distribution for the process with (red), suggests a slight deviation from perfect nesting. This means that in sentence formation, about 90% of consecutive word pairs, sample space is strictly reducing. Simulation: (words), and restarts (sentences).