Table 1.
snpnet-2.0 | snpnet | bigstatsr | BOLT-LMM | |
---|---|---|---|---|
High cholesterol (B) | 21.9 | 109.8 | 44.98 + 26.81 | 334.27 |
Asthma (B) | 21.7 | 130.0 | 40.71 + 29.18 | 278.22 |
Standing height (Q) | 99.9 | a | 41.04 + 217.91 | 1148.32 |
BMI (Q) | 51.5 | a | 40.38 + 78.84 | 517.12 |
Other hypothyroidism (S) | 13.5 | 71.5 | 44.23 + 25.80 | 265.61 |
Thyrotoxicosis (S) | 3.6 | 10.0 | 41.33 + 24.54 | 243.52 |
Note: Time is measured in min. (B) indicates the response is binary, (Q) indicates the response is quantitative and (S) indicates that the response is a survival time. For bigstatsr, the first number we report is the total duration of the time spent on attaching the genetic matrix to a file and mean imputation, which can be shared among multiple responses if they use the same training set split. The second number is the duration of the model fitting function (big_spLinReg and big_spLogReg). For snpnet-2.0 and snpnet, data loading and mean imputation are always done on the fly and are taken into account for this benchmark. For BOLT-LMM, the total runtime also includes time spent on running single-variate regression and association testing on the variants.
The machine we used for most of the applications here has a dual-socket architecture, each having around 400 GB of local memory. The memory requirements by the old version snpnet for both standing height and BMI exceeds the capacity of the local memory of a single socket in this machine. As a result, we ran these two experiments on an Intel Xeon Gold 6130 (also 16 cores) machine with more memory.