Skip to main content
. Author manuscript; available in PMC: 2020 Mar 2.
Published in final edited form as: Nat Genet. 2019 Sep 2;51(9):1330–1338. doi: 10.1038/s41588-019-0483-y

Table 1.

Summary of input data, output tree sequences and computing resources required for TGP, SGDP and UKB chromosome 20. Input sizes reported are of tsinfer’s input .samples files, which uses the Zarr library (https://zarr.readthedocs.io/) to achieve similar compression levels to BCF. File sizes are reported using binary multipliers (i.e., 1M = 220 bytes); all other values use decimal multipliers (i.e., 1M = 106). The times reported are the total wall clock time required to produce the output tree sequence from the .samples file on a server with two Xeon Gold 6148 CPUs (40 cores in total; no hyperthreading) and 187GiB of RAM. For SGDP, TGP and UKB we used the standard tsinfer inference pipeline. In UKB+TGP, we matched the UKB samples to the inferred TGP tree sequence (time reported is just for sample matching phase). In UKB+UKB we incrementally added samples from UKB to the ancestors inferred from UKB (see text).

Input Output Resources



n sites   size nodes edges trees   size time RAM
SGDP 277 348K 15M 236K 1.7M 196K 83M  5m 3.6G
TGP 2504 860K 135M 735K 7.3M 550K 296M  2h  11G
UKB 487K 15.8K 1.6G 1.9M 484M 15.8K 14.5G  3h 160G
UKB+TGP 487K 15.6K 1.6G 5.5M 185M 15.6K 5.8G 15h  66G
UKB+UKB 487K 15.8K  1.6G 2.0M 62M 15.8K  2.1G 50h  40G