We constructed t-SNE plots from a pairwise SNP distance matrix for our global sample of 20,352 clinical isolates and 128,898 SNP sites (Materials and methods). Isolates are colored by global lineage in the first plot, the rest are colored by whether isolates had specific mutations in that were detected in-host (
Supplementary file 8). These mutations occur in a global collection of isolates (
Supplementary file 18) and are scattered across the tSNE plots, indicating that they belong to genetically different clusters of isolates (
Supplementary file 19) and have arisen independently in different genetic backgrounds. Each plot is labeled with the gene name each mutation occurs within, amino acid encoded by the reference allele, H37Rv codon position, and amino acid encoded by the mutant allele (for intergenic mutations - reference allele, H37Rv genome coordinate, and mutant allele). N = number of isolates with mutant allele.