Skip to main content
. 2019 Oct 9;5(2):vez039. doi: 10.1093/ve/vez039

Table 2.

Clustering performance of Phydelity on seasonal A/H3N2 influenza viruses collected by McCrone et al. (2018).

Basis nsample %trans Purity IG ARI NMI
High-quality transmission clusters All 0.98 0.02 0.96 0.99
52 25% 0.87 0.06 0.72 0.93
45% 0.87 0.04 0.74 0.95
70% 0.85 0.07 0.76 0.94
93 25% 0.94 0.03 0.88 0.98
45% 0.94 0.03 0.90 0.98
Household All 0.89 0.08 0.79 0.96
52 25% 0.56 0.29 0.35 0.82
45% 0.73 0.16 0.56 0.90
70% 0.82 0.11 0.74 0.93
93 25% 0.75 0.16 0.64 0.92
45% 0.87 0.11 0.80 0.95

Ground truth used for clustering assessment was either based on the identities of genetically validated, high-quality transmission clusters as defined by McCrone et al. or by the patients’ households. Besides analysing all of the viruses collected (bolded results), Phydelity was also applied to downsampled datasets consisting of different sample size (nsample) and proportion of sequences derived from the aforementioned high-quality transmission pairs (%trans). Adjusted rand index (ARI) measures how accurate the output clusters corresponded with the ground truth labels. Purity gives the average extent clusters contain only a single class. Modified Gini index (IG) is the probability that a randomly selected sequence would be incorrectly clustered. Normalised mutual information (NMI) accounts for the trade-off between clustering quality and number of clusters (see Section 2).