Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2015 Aug 11;5:12209. doi: 10.1038/srep12209

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

Copyright © 2015, Macmillan Publishers Limited

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

PMC Copyright notice

A.Density plot showing the Zipf exponent θ for ‘one-off’ randomly partitioned phrase and word Zipf distributions (q = 1 and q = ) for around 4000 works of literature. We indicate “Tale of Two Cities” by the red circle, and with black circles, we represent measurements for 14 other works of literature analyzed further in the supplementary material. Marginal distributions are plotted as histograms along the edges of panel A and highlight how phrases typically exhibit θ ≤ 1 whereas words produce unphysical θ > 1, according to Simons model. B. Test of the Simon model’s analytical connection θ = 1 − α, where θ is the Zipf exponent and α is the rate at which new terms (e.g., graphemes, words, phrases) are introduced throughout a text. We estimate α as the number of different words normalized by the total word volume. For both words and phrases, we compute linear fits using Reduced Major Axis (RMA) regression²⁴ to obtain slope m along with the Pearson correlation coefficient r_p. Words (green) do not exhibit a simple linear relationship whereas phrases do (blue), albeit clearly below the α = 1 − θ line in black.

Inline graphic — A.Density plot showing the Zipf exponent θ for ‘one-off’ randomly partitioned phrase and word Zipf distributions (q = 1 and q = ) for around 4000 works of literature. We indicate “Tale of Two Cities” by the red circle, and with black circles, we represent measurements for 14 other works of literature analyzed further in the supplementary material. Marginal distributions are plotted as histograms along the edges of panel A and highlight how phrases typically exhibit θ ≤ 1 whereas words produce unphysical θ > 1, according to Simons model. B. Test of the Simon model’s analytical connection θ = 1 − α, where θ is the Zipf exponent and α is the rate at which new terms (e.g., graphemes, words, phrases) are introduced throughout a text. We estimate α as the number of different words normalized by the total word volume. For both words and phrases, we compute linear fits using Reduced Major Axis (RMA) regression²⁴ to obtain slope m along with the Pearson correlation coefficient r_p. Words (green) do not exhibit a simple linear relationship whereas phrases do (blue), albeit clearly below the α = 1 − θ line in black.