Table 2:
Runtime metrics of the QC module of BIGwas compared to the QC module of the H3Agwas Pipeline Version 3 [12] for several different-sized GWAS data sets
| Sample No. | Variant No. | BIGwas Runtime | H3A Runtime |
|---|---|---|---|
| 200 | 500 | 6 min | 4 min |
| 500 | 1,000 | 6 min | 7 min |
| 1,000 | 5,000 | 6 min | 9 min |
| 5,000 | 50,000 | 8 min | 18 min |
| 5,000 | 250,000 | 15 min | 35 min |
| 10,000 | 250,000 | 15 min | 57 min |
| 20,554 | 700,078 | 2 h 15 min | 8 h 57 min |
| 488,292 | 231,151 | 3 d 1 h | * |
| 488,292 | 803,113 | 4 d 22 h | * |
| 976,584 | 803,113 | 6 d 12 h | * |
A GWAS input data set with 976,584 samples and 803,113 genetic variants (i.e., twice the size of the current UK Biobank GWAS data set) can be quality-controlled in <7 days with 1 command of the BIGwas software, whereby only 150 jobs (configurable; equivalent to ∼7 compute nodes on our HPC cluster system) are used in parallel. *Runtime of H3A has exceeded the maximum time allocation of 10 days for the HPC system despite the possibility of using a maximum of 1,336 jobs in parallel.