Abstract
Establishing the subcellular distribution of all proteins encoded by the human genome remains a key objective of life science research. This is particularly important in the context of proteins that, through genetic sequencing of patients, have been identified as containing missense mutations. A recent publication in Cell1 highlights the prominence of protein mislocalization as a hallmark of dysfunctional proteins. The use of high-content subcellular phenotypic screens and allied technology by Lacoste and colleagues has enormous potential to change the landscape of how we approach both diagnostic and therapeutic decisions.
Establishing the subcellular distribution of all proteins encoded by the human genome remains a key objective of life science research. This is particularly important in the context of proteins that, through genetic sequencing of patients, have been identified as containing missense mutations. A recent publication in Cell highlights the prominence of protein mislocalization as a hallmark of dysfunctional proteins. The use of high-content subcellular phenotypic screens and allied technology by Lacoste and colleagues has enormous potential to change the landscape of how we approach both diagnostic and therapeutic decisions.
Main text
The genome serves as the primary repository of information in a cell, ultimately dictating its functionality and fate. The sequencing of various genomes at the start of this century was without doubt a landmark achievement in the life sciences; however, arguably the greater challenge, still ongoing, is to assign subcellular localization and ultimately function to those encoded proteins. Understanding the physical and temporal distribution of the proteome is not only important from the perspective of the function of any one protein in isolation, but it also unlocks knowledge pertaining to protein-protein interaction (PPI) networks and how the cell functions as a system. Over the years, high-throughput approaches have employed either the overexpression of open reading frames (ORFs) fused to fluorescent proteins or large-scale production of antibodies to systematically ascertain the subcellular localization of proteins in the cell.2 For the most part, such studies have concentrated on doing this for the wild-type protein encoded by every gene, effectively creating a reference subcellular distribution map of the human proteome.3
Continued sequencing efforts are now increasingly providing us with information relating to gene variants of clinical relevance. The most noteworthy repository for this is the ClinVar database,4 which at the time of writing this preview contains in excess of four million records and lists over 17,000 genes containing specific variants. This raises the tantalizing question of the subcellular status of their encoded proteins and more specifically whether aberrant localization contributes to human disease. In a recent article published in Cell, Lacoste and colleagues have made a significant advancement toward addressing this key question by carrying out a high-throughput assessment of the subcellular localization of large numbers of these variants.1 They gathered a collection of ORFs, predominantly from the human mutation ORFeome v1.1,5 consisting of 3,448 variants representing 1,269 unique genes. These ORFs were epitope tagged, transfected into HeLa Kyoto cells, and studied using confocal high-content screening (HCS) microscopy (Figure 1).
Figure 1.
A high-content microscopy screen reveals features and characteristics of mislocalized missense protein variants
Lacoste and colleagues assemble a collection of 3,448 missense variants of 1,269 genes from various ORFeome resources. Subcellular phenotypic characterization reveals that within the mislocalized variants, there is evidence of strong links between mislocalization, pleiotropy, and disease severity. Mislocalized proteins, comprising 16% of all pathogenic variants represented in the study, are primarily the result of protein instability and misfolding. Figure created in BioRender.com.
One important challenge to overcome with HCS microscopy is accurate annotation of the localizations observed. This is particularly important when considering the natural variations in organelle morphology between one cell and the next—how do we define the specific structure and architecture of subcellular compartments? In this regard, localizations were manually assessed by two independent researchers as well as automated annotation using a CellProfiler analysis pipeline. Interestingly, the automated system was able to identify several proteins that showed subtle changes in localization which were not detected by manual observation—an important demonstration of how software automation can complement the role of the scientist in the assessment of microscopy images. Analysis revealed that over half of the reference proteins were found to localize to multiple locations, and this was particularly prominent in proteins found that are associated with membrane compartments. This same cohort of proteins showed a higher frequency of mislocalization, with 59% of them occurring in compartments of the secretory pathway. Overall, the study identified 250 aberrantly localized variants from 152 distinct genes, providing valuable information with respect to the extent to which these gene mutations can impact events at the cellular scale.
The nature and cause of the mislocalizations are also explored in this work. Perhaps unsurprisingly, a number of the variants were found in intracellular aggregates and foci. As well as demonstrating self-aggregation, many of these variants were found to contain mutations contributing to the dysregulation of biomolecular condensate formation. In addition to this, it was found that mislocalized variants were substantially and significantly enriched in mutations that perturb protein folding. To support this observation, Lacoste and colleagues show that mislocalized proteins had elevated levels of interaction with the Hsp70 family of chaperones and co-chaperones, known to act in the initial stages of a protein’s life cycle.
The observation of mislocalized proteins having disturbed protein folding makes an important case for the benefit and usage of pharmacological chaperones. Pharmacological chaperones, also known as pharmacoperones, are small molecules which bind to misfolded proteins and improve their stability.6 Perhaps the best known example of pharmacoperones as an effective treatment strategy is that of the FDA-approved drug VX-809 (lumacaftor) in the treatment of cystic fibrosis patients presenting with the ΔF508 CFTR mutation. The ΔF508 mutation results in the retention of cystic fibrosis transmembrane conductance regulator (CFTR) in the endoplasmic reticulum (ER) as opposed to its successful trafficking and function in the plasma membrane. Lumacaftor binds to a region within the first transmembrane domain (TMD) of mutant CFTR, thus improving its stability and allowing it to traffic to the plasma membrane.7 Compounds such as lumacaftor used in the treatment of cystic fibrosis have been identified predominantly by means of high-throughput screens assessing CFTR activity, but the methods presented by Lacoste and colleagues highlight the arguably under-explored potential of HCS microscopy to similarly identify compounds which may correct the mislocalization of clinically relevant protein variants.
This work explored more widely the molecular basis for the mislocalizations observed. Aberrations in post-translational modifications were found not to be a significant driver; similarly, disruption to PPIs was also not a major contributing factor—this latter point was further supported by comparisons with publicly available yeast two-hybrid datasets. However, a strong link was found between mislocalization and mutations that interfered with the insertion of TMDs into membranes. Indeed, 20% of the mislocalized variants contained mutations in their TMDs. This links well with the observation of secretory-pathway organelles being the most common with respect to the occurrence of mislocalization events. Mutations in TMDs not only affect initial insertion into the membrane but can also have a significant impact on how well that protein is retained in its preferred compartment. For proteins containing TMDs, the various membranes across the secretory pathway are known to have different lipid composition, and in turn the specific amino acid sequence is crucial for ensuring their ultimate correct localization.8 The secretory pathway feeds into a wide range of downstream organelles, with cargo trafficking through this pathway to reach destinations such as endosomes, lysosomes, the plasma membrane, and the extracellular matrix. Dysfunction in this pathway is the basis of a large range of developmental diseases—the insights provided by this work highlight the sensitivity of this system as an axis on which so many downstream processes depend.
The power of the approach described by Lacoste and colleagues is that understanding the cellular context of mislocalization and its linkage to disease can pave the way for the design of appropriate therapeutic interventions. In order to do this, it is also necessary to determine the specific impact of a protein being in the incorrect location. Of the examples presented in this work, arguably the most striking was that of the cytoskeleton protein beta-actin. Two specific variants, R183W and E364K, have both been recorded in patients and give rise to different pathogenic effects in affected individuals. Previous in vitro analysis of the mutant proteins was unable to shed light on any biochemical differences between them. However, high-content imaging of these variants revealed remarkable differences in their subcellular distribution. The wild-type protein localized as expected to short actin fibers throughout the cytoplasm and the E364K variant was only poorly expressed and assumed a highly diffuse pattern, whereas the R183W variant was found in short filaments crossing the nucleus as well as in large cytoplasmic punctae. The authors then used the BioID proximity biotinylation labeling approach9 to identify the respective interactors of the two variants and compare these to the wild-type protein. The E364K variant was found to show increased interaction with chaperones, suggesting it to be defective in folding, whereas the interaction network of the R183W variant favored binding to actin bundling and cross-linking proteins, suggesting that this variant was disrupting normal actin-filament turnover activity. This example highlights the importance of ascertaining the subcellular distribution of all recorded variants and that clearly distinct therapeutic strategies will be required to address the different pathogenic phenotypes seen.
In summary, the work presented by Lacoste and colleagues represents a tour-de-force of the power of automated HCS microscopy in the context of understanding the molecular basis of human disease and meaningfully exploits the immense efforts of twenty years ago in generating the first human ORFeome.5 Not only does the scale of the work—in terms of the number of variants characterized—provide an excellent resource for others to build on, it also illuminates a wider strategy for the systematic understanding of protein characterization in the cellular context. It is exciting to consider how this strategy could be expanded, for example, in the detection of mislocalized proteins in primary tissue biopsies from patients. This will be dependent on the development of sensitive antibodies capable of recognizing clinically relevant variant proteins. Fundamental studies such as this one underpin our understanding of clinical pathologies at their most basic molecular level and act as important and irreplaceable scaffolds on which we can design effective and targeted treatment strategies with minimal off-target side effects.
Declaration of interests
The authors declare no competing interests.
References
- 1.Lacoste J., Haghighi M., Haider S., Reno C., Lin Z.Y., Segal D., Qian W.W., Xiong X., Teelucksingh T., Miglietta E., et al. Pervasive mislocalization of pathogenic coding variants underlying human disorders. Cell. 2024 doi: 10.1016/j.cell.2024.09.003. S0092-8674(24)01021-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Stadler C., Rexhepaj E., Singan V.R., Murphy R.F., Pepperkok R., Uhlén M., Simpson J.C., Lundberg E. Immunofluorescence and fluorescent-protein tagging show high correlation for protein localization in mammalian cells. Nat. Methods. 2013;10:315–323. doi: 10.1038/nmeth.2377. [DOI] [PubMed] [Google Scholar]
- 3.Thul P.J., Åkesson L., Wiking M., Mahdessian D., Geladaki A., Ait Blal H., Alm T., Asplund A., Björk L., Breckels L.M., et al. A subcellular map of the human proteome. Science (New York, N.Y.) 2017;356 doi: 10.1126/science.aal3321. [DOI] [PubMed] [Google Scholar]
- 4.Landrum M.J., Lee J.M., Benson M., Brown G., Chao C., Chitipiralla S., Gu B., Hart J., Hoffman D., Hoover J., et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 2016;44:D862–D868. doi: 10.1093/nar/gkv1222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Rual J.F., Hirozane-Kishikawa T., Hao T., Bertin N., Li S., Dricot A., Li N., Rosenberg J., Lamesch P., Vidalain P.O., et al. Human ORFeome version 1.1: a platform for reverse proteomics. Genome Res. 2004;14:2128–2135. doi: 10.1101/gr.2973604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Tao Y.X., Conn P.M. Pharmacoperones as Novel Therapeutics for Diverse Protein Conformational Diseases. Physiol. Rev. 2018;98:697–725. doi: 10.1152/physrev.00029.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Fiedorczuk K., Chen J. Mechanism of CFTR correction by type I folding correctors. Cell. 2022;185:158–168.e11. doi: 10.1016/j.cell.2021.12.009. [DOI] [PubMed] [Google Scholar]
- 8.Sharpe H.J., Stevens T.J., Munro S. A comprehensive comparison of transmembrane domains reveals organelle-specific properties. Cell. 2010;142:158–169. doi: 10.1016/j.cell.2010.05.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Roux K.J., Kim D.I., Raida M., Burke B. A promiscuous biotin ligase fusion protein identifies proximal and interacting proteins in mammalian cells. J. Cell Biol. 2012;196:801–810. doi: 10.1083/jcb.201112098. [DOI] [PMC free article] [PubMed] [Google Scholar]