Table 1. Comparison of methods on the UK Biobank dataset.
Sample size | Method | Clustering | New MCMC | Switch Error (%) | Run time (hrs) | Run time scaling | Sample size scaling |
---|---|---|---|---|---|---|---|
1,072 | SHAPEIT3 | No | Yes | 2.6 | 0.25 | 1 | 1 |
10,072 | SHAPEIT2 | No | No | 1.1 | 4.2 | 16.8 | 9.4 |
10,072 | SHAPEIT3 | No | Yes | 1.1 | 3.3 | 13.2 | 9.4 |
10,072 | SHAPEIT3 | Yes | Yes | 1.3 | 2.5 | 10.0 | 9.4 |
152,112 | SHAPEIT3 | Yes | Yes | 0.4 | 38.5 | 154 | 142 |
Each row shows the performance on a subset of the full dataset. The clustering column indicates whether the new method for choosing copying states was used or not. The new MCMC column indicates whether the new MCMC routine, which uses completely parallel updates and local algorithm termination, was used or not. Performance is measured as median switch error on the trio children. Run time is given in hours. The Scaling column shows the relative run time compared to the SHAPEIT3 run on a sample size of 1,072. 10 threads were used for all runs.