Summary of input data, output tree sequences and computing resources required for TGP, SGDP and UKB chromosome 20. Input sizes reported are of tsinfer’s input .samples files, which uses the Zarr library (https://zarr.readthedocs.io/) to achieve similar compression levels to BCF. File sizes are reported using binary multipliers (i.e., 1M = 220 bytes); all other values use decimal multipliers (i.e., 1M = 106). The times reported are the total wall clock time required to produce the output tree sequence from the .samples file on a server with two Xeon Gold 6148 CPUs (40 cores in total; no hyperthreading) and 187GiB of RAM. For SGDP, TGP and UKB we used the standard tsinfer inference pipeline. In UKB+TGP, we matched the UKB samples to the inferred TGP tree sequence (time reported is just for sample matching phase). In UKB+UKB we incrementally added samples from UKB to the ancestors inferred from UKB (see text).