Table 6.
Type | Characteristics | Mean G1 | Mean G2 | Mean G3 | Mean G4 | Quantile G1 | Quantile G2 | Quantile G3 | Quantile G4 |
---|---|---|---|---|---|---|---|---|---|
README | no. of words in README (non-code related)a | 286.2 ( 963.8) | 345.1 (835.6) | 541.9 (1,509.7) | 801.9 (1,808.7) | [6.0, 48.0, 287.0] | [15.0, 125.0, 389.8] | [63.0, 250.0, 626.0] | [151.5, 416.0, 869.0] |
no. of tablesa | 0.0 (0.5) | 0.1 ( 0.6) | 0.1 (1.6) | 0.3 (2.2) | [0.0, 0.0, 0.0] | [0.0, 0.0, 0.0] | [0.0, 0.0, 0.0] | [0.0, 0.0, 0.0] | |
no. of code blocksa | 0.9 (3.5) | 1.3 (4.2) | 2.3 (6.1) | 3.5 (8.1) | [0.0, 0.0, 1.0] | [0.0, 0.0, 1.0] | [0.0, 0.0, 2.0] | [0.0, 1.0, 4.0] | |
no. of headersa | 2.3 ( 4.1) | 3.6 (5.6) | 5.3(7.9) | 8.8 (54.6) | [0.0, 1.0, 3.0] | [1.0, 1.0, 5.0] | [1.0, 3.0, 7.0] | [2.0, 6.0, 10.0] | |
no. of URLSa | 6.0 (10.4) | 8.1 (18.4) | 12.8 (21.1) | 25.2 (113.7) | [1.0, 2.0, 8.0] | [1.0, 4.0, 11.0] | [2.0, 8.0, 17.0] | [6.0, 15.0, 28.0] | |
no. of imagesa | 0.3 (1.7) | 0.7 (5.5) | 1.1 (4.8) | 2.5 (6.1) | [0.0, 0.0, 0.0] | [0.0, 0.0, 0.0] | [0.0, 0.0, 1.0] | [0.0, 1.0, 3.0] | |
Repository | repository sizea | 33,689.8 ( 152,529) | 50,916.3 ( 194,154) | 70,511.1 (225,835) | 133,307.1 (423,076) | [580.0, 5,386.5, 22,780.2] | [1,230.0, 7,667.0, 33,723.8] | [2,174.5, 14,557.0, 52,912.2] | [4,896.5, 27,393.0, 113,130.0] |
no. of open issuesa | 1.1 (10.8) | 2.0 (13.2) | 6.4 (21.8) | 38.1 (163.7) | [0.0, 0.0, 0.0] | [0.0, 0.0, 1.0] | [0.0, 1.0, 4.0] | [0.0, 5.0, 25.0] | |
no. of closed issuesa | 1.9 (13.5) | 7.6 (31.7) | 38.4 (130.8) | 3,74.7 (1,823.4) | [0.0, 0.0, 0.0] | [0.0, 0.0, 3.0] | [0.0, 2.0, 19.0] | [2.0, 25.0, 175.5] | |
description lengtha | 6.2 ( 8.3) | 7.7 (9.2) | 8.9 (11.2) | 9.6 (10.2) | [0.0, 4.0, 9.0] | [2.0, 6.0, 11.0] | [4.0, 7.0, 11.0] | [4.0, 7.0, 12.0] | |
ratio of data files per repositorya | 8.2 (14.0) | 7.1 (12.7) | 5.4 (10.9) | 3.6 (8.7) | [0.2, 2.3, 10.0] | [0.4, 2.2, 7.7] | [0.3, 1.4, 5.3] | [0.1, 0.7, 2.8] | |
age of repository (days)a | 1,467.9 (490.0) | 1,513.4 (545.2) | 1,627.7 (592.3) | 1,725.3 (653.0) | [1,067.0, 1,448.0, 1,791.0] | [1,093.2, 1,453.0, 1,816.0] | [1,214.0, 1,562.0, 1,964.0] | [1,256.5, 1,628.0, 2,082.5] | |
ratio of problematic files for a standard config (Pandas)b | 0.3 (2.7) | 0.4 (2.8) | 0.3 (2.6) | 0.2 (1.5) | [0.0, 0.0, 0.0] | [0.0, 0.0, 0.0] | [0.0, 0.0, 0.0] | [0.0, 0.0, 0.0] | |
Data File | average size of data files (csv)b | 309,999.4 (4,314,537) | 337,453.3 (2,901,912) | 532,226.8 (3,595,252) | 248,120.4 (2,268,705) | [1,732.0, 7,017.0, 33,942.0] | [1,419.0, 6,046.5, 53,402.0] | [1,692.0, 10,398.0, 79,279.0] | [4,763.8, 28,315.0, 73,671.0] |
average size of data files(xls(x))b | 426,555.6 (2,755,034.2) | 528,439.2 (2,953,938) | 360,737.8 (2,050,485.3) | 330,846.9 (1,518,167.8) | [20,430.2, 30,511.0, 83,968.0] | [20,287.0, 45,568.0, 147,138.5] | [16,856.8, 45,056.0, 203,837.5] | [16,896.0, 34,462.0, 95,356.0] | |
no. of rows (csv)a | 3,845.2 (50,528) | 4,324.6 (52,089) | 6,221.6 (55,637) | 3,087.6 (35,192.0) | [41.0, 85.0, 569.0] | [33.0, 79.0, 719.0] | [42.0, 147.0, 930.0] | [41.0, 118.0, 293.0] | |
no. of columns (csv)b | 23.3 (340.0) | 16.3 (376.5) | 23.7 (524.6) | 14.7 (363.2) | [3.0, 7.0, 18.0] | [2.0, 4.0, 7.0] | [3.0, 6.0, 13.0] | [4.0, 11.0, 11.0] | |
no. of rows (xls(x)) | 1,337.2 (22,013.9) | 409.4 (10,184.4) | 324.2 (8,992.9) | 1,105.0 (16,615.8) | [26.0, 64.0, 141.0] | [64.0, 86.0, 122.0] | [19.0, 31.0, 52.0] | [20.0, 46.0, 176.0] | |
no. of columns (xls(x)) | 29.8 (397.2) | 36.2 (531.0) | 23.8 (155.0) | 25.6 (423.3) | [5.0, 9.0, 16.0] | [19.0, 19.0, 19.0] | [9.0, 12.0, 16.0] | [6.0, 10.0, 15.0] | |
missing values ratio (csv)a | 8.7 (16.6) | 7.2 (19.1) | 10.5 (20.5) | 13.0 (13.6) | [0.0, 0.0, 11.3] | [0.0, 0.0, 0.0] | [0.0, 0.0, 11.7] | [0.0, 19.0, 19.8] |
Quantiles values are reported in the [, , ] format, where , and represent the 25th, 50th, and 75th quantile of a particular group's characteristic.
Indicates statistically significant differences () of pairwise comparisons across all four groups.
Denotes cases for which statistical significant differences are observed between the values of groups 1 and 4 but not necessarily between the rest of pairwise comparisons.