. Author manuscript; available in PMC: 2020 Mar 2.

Published in final edited form as: Nat Genet. 2019 Sep 2;51(9):1330–1338. doi: 10.1038/s41588-019-0483-y

Table 1.

Summary of input data, output tree sequences and computing resources required for TGP, SGDP and UKB chromosome 20. Input sizes reported are of tsinfer’s input .samples files, which uses the Zarr library (https://zarr.readthedocs.io/) to achieve similar compression levels to BCF. File sizes are reported using binary multipliers (i.e., 1M = 2²⁰ bytes); all other values use decimal multipliers (i.e., 1M = 10⁶). The times reported are the total wall clock time required to produce the output tree sequence from the .samples file on a server with two Xeon Gold 6148 CPUs (40 cores in total; no hyperthreading) and 187GiB of RAM. For SGDP, TGP and UKB we used the standard tsinfer inference pipeline. In UKB+TGP, we matched the UKB samples to the inferred TGP tree sequence (time reported is just for sample matching phase). In UKB+UKB we incrementally added samples from UKB to the ancestors inferred from UKB (see text).

	Input			Output				Resources

	n	sites	size	nodes	edges	trees	size	time	RAM
SGDP	277	348K	15M	236K	1.7M	196K	83M	5m	3.6G
TGP	2504	860K	135M	735K	7.3M	550K	296M	2h	11G
UKB	487K	15.8K	1.6G	1.9M	484M	15.8K	14.5G	3h	160G
UKB+TGP	487K	15.6K	1.6G	5.5M	185M	15.6K	5.8G	15h	66G
UKB+UKB	487K	15.8K	1.6G	2.0M	62M	15.8K	2.1G	50h	40G