Skip to main content
. 2014 Oct 6;111(43):15322–15327. doi: 10.1073/pnas.1309389111

Fig. 1.

Fig. 1.

File sizes define distinct content types. (A) For each user, we collect the size of the files they downloaded. We plot the distribution of all those file sizes, with sizes binned logarithmically. This distribution has pronounced peaks at 14 MB, 195 MB, 400 MB, 830 MB, 1.65 GB, and 5.6 GB. Based on these peaks, we define seven file size ranges (alternating white and gray bands). (B) File size ranges can be associated to distinct content types. We randomly sample half a million torrents at “The Pirate Bay” and analyze their content categories as a function of their sizes. For each file size range, 1–3 categories account for most of the observed files. For example, for file sizes in the range 196–400 MB, which we denote as Videos of TV Shows, accounts for 40% of all files. For each size range, we color and name all categories that account for more than 10% of the files in the range and that are significantly overrepresented, P<0.05, with respect to a null model in which categories are uniformly distributed among file size ranges (SI Appendix).