Skip to main content
. 2020 Oct 21;9:e54895. doi: 10.7554/eLife.54895

Figure 4. Unsupervised learning segregates residues into clusters with distinct responses to mutation.

(A) Amino acids were segregated into classes based on their physicochemical properties and mean activity scores were reported by class for each residue. With Uniform Manifold Approximation and Projection (UMAP) a 2D representation of every residue’s response to each mutation class across agonist conditions was learned. Each residue is assigned into one of six clusters using HDBSCAN (see Figure 4—figure supplement 1). (B) Class averages for each of these cluster reveal distinct responses to mutation. The upper dashed line represents the mean activity of Cluster 6 and the lower solid line represents the mean activity of frameshift mutations. (C) A 2D snake plot representation of β2AR secondary structure with each residue colored by cluster identity.

Figure 4.

Figure 4—figure supplement 1. Cluster assignment is robust across different UMAP embeddings.

Figure 4—figure supplement 1.

Given the high dimensionality of the mutational responses, Uniform Manifold Approximation and Projection (UMAP) (McInnes and Healy, 2018) was used to learn lower dimension representations of the all the mutational data across agonist conditions summarized by amino acid class before clustering the output with HDBSCAN (minimum cluster size = 10) (Campello et al., 2013). To ensure that the clustering results are not biased by a particular UMAP embedding, a hyperparameter search was run over the dimension and nearest neighbor parameters of UMAP. The HDBSCAN cluster assignments were plotted on a 2D UMAP embedding to ease visualization. Points that HDBSCAN does not assign to a cluster are colored powder blue. Groups of residues reliably cluster together regardless of the UMAP embedding, and residues were assigned to one of six distinct clusters.