Table 2.
Type | Characteristics | Mean (±SD) | Quantile |
---|---|---|---|
Data file | no. of rows (csv) | 4,115 | [39.0, 92.0, 108.0] |
no. of columns (csv) | 20.5 | [3.0, 5.0, 12.0] | |
no. of rows (xls(x)) | 607 | [28.0, 65.0, 108.0] | |
no. of columns (xls(x)) | 30.5 | [8.0, 15.0, 19.0] | |
no. of missing values csv (ratio) | 8.9 | [0.0, 0.0, 11.5] | |
avgerage size of data files (csv) | 331,343 | [1,625.0, 8,375.0, 47,752.5] | |
average size of data files (xlsx) | 428,586 | [18,804.0, 34,723.0, 121,633.0] | |
Repository | size of repository | 51,372 kilobytes | [983.0, 7,740.0, 32,715.0] |
no. of open issues | 5.2 | [0.0, 0.0, 0.0] | |
no. of closed issues | 40.6 | [0.0, 0.0, 2.0] | |
description length | 7.2 | [1.0, 5.0, 10.0] | |
ratio of data files per repo | 7.2% | [0.3, 1.9, 8.0] | |
age of repository (days) | 1,521.9 | [1,108.0, 1,478.0, 1,844.0] | |
ratio of problematic files with respect to a standard config (Pandas) | 0.3% | [0.0, 0.0, 0.0] | |
README | no. of words in README (non-code related) | 378.2% | [10.0, 112.0, 431.0] |
no. of tables | 0.1 | [0.0, 0.0, 0.0] | |
no. of code blocks | 1.4 | [0.0, 0.0, 1.0] | |
no. of headers | 3.6 | [1.0, 1.0, 5.0] | |
no. of urls | 9.1 | [1.0, 3.0, 12.0] | |
no. of images | 0.7 | [0.0, 0.0, 0.0] |
Average values are reported in the “mean (±SD)” format. Quantiles values are reported in the [, , ] format, where , and represent the 25th, 50th, and 75th quantile of a particular group's characteristic.