Skip to main content
. 2020 Jan 20;22(1):126. doi: 10.3390/e22010126

Figure 1.

Figure 1

Sketch of the pre-processing pipeline of the Project Gutenberg (PG) data. The folder structure (left) organizes each PG book on four different levels of granularity, see example books (middle): raw, text, tokens, and counts. On the right we show the basic python commands used in the pre-processing.