The
transcluster algorithm computes the likelihood of two samples being linked within a given number of intermediate hosts, based on the date and genetic differences between samples (assuming a given serial interval and mutation rate, further details in Materials and methods). Changing the probability threshold used to define clusters changes the number of clusters defined, with a higher threshold yielding more clusters (and higher likelihood of transmission within each cluster). The dataset analysed contained 700 genomes from residents in 292 care homes, and we treated each care home separately as microcosms of potential infection networks. Therefore, the highest theoretical number of clusters is 700, if every genome were its own cluster; and the lowest possible number of clusters is 292, if every person within each care home was part of the same cluster. The cut-off used (
>15% probability of transmission with
<2 intermediate hosts) is indicated by the red vertical line. This is arbitrary, and was selected (1) because the distribution of pairwise SNP and date differences within resulting clusters appeared reasonable (
Figure 7—figure supplements 4 and
5) and because of a ‘jump’ in the number of clusters occurring at that point.