Skip to main content
. 2023 May 1;55(5):768–776. doi: 10.1038/s41588-023-01379-x

Extended Data Fig. 1. Additional comparison of ARG inference methods with array data and topology-only metrics.

Extended Data Fig. 1

We compare methods on runtime and topology-only metrics, as in Fig. 2 but with additional simulation conditions. All columns are for 5 Mb of CEU demography array data, and individual columns represent standard parameters (see Methods), a factor of 2 smaller recombination rate (ρ = 6 × 10−9), a factor of 2 larger recombination rate (ρ = 2.4 × 10−8), and a constant population size demography of 15,000 individuals. a. Robinson-Foulds distance as a function of the number of samples N, where values are scaled to lie between 0 and 1 (polytomies are randomly resolved). b. KC topology-only distance for N = 4,000 samples, showing performance as branches in marginal inferred trees are collapsed to form polytomies, using a heuristic to preferentially collapse branches that are least certain (see Methods). For tsinfer and tsinfer-sparse, we instead rely on the default amount of polytomies in the output, additionally showcasing when polytomies are randomly resolved (dashed lines indicate a linear trend may not hold). c. The same as b, except branches are randomly collapsed to form polytomies. d. KC topology-only distance as a function of N, with polytomies randomly resolved. e. Inference time as a function of N. All panels use 5 random seeds. Data are presented as means ± 2 s.e.m.