Skip to main content
. 2012 Dec 10;2:943. doi: 10.1038/srep00943

Figure 4. Pruning reveals the variable marginal return of words.

Figure 4

The Heaps scaling exponent b depends on the extent of the inclusion of the rarest words. For a given corpora and Uc value we make a scatter plot between Nw(t|Uc) and Nu(t|Uc) using words with ui (t) ≥ Uc, using the same data color-Uc correspondence as in Fig. 3. (Panel Inset) We use OLS estimation to estimate the scaling exponent b(Uc) for the model Nw (t|Uc) ~ [Nu(t|Uc)]b to show that b(Uc) increases from approximately 0.5 towards unity as we prune the corpora of extremely rare words. Our longitudinal language analysis provides insight into the structural importance of the most frequent words which are used more times per appearance and which play a crucial role in the usage of new and rare words.