An ambitious initiative is proposed to generate a definitive reference set of human proteoforms.
Abstract
Proteins are the primary effectors of function in biology, and thus, complete knowledge of their structure and properties is fundamental to deciphering function in basic and translational research. The chemical diversity of proteins is expressed in their many proteoforms, which result from combinations of genetic polymorphisms, RNA splice variants, and posttranslational modifications. This knowledge is foundational for the biological complexes and networks that control biology yet remains largely unknown. We propose here an ambitious initiative to define the human proteome, that is, to generate a definitive reference set of the proteoforms produced from the genome. Several examples of the power and importance of proteoform-level knowledge in disease-based research are presented along with a call for improved technologies in a two-pronged strategy to the Human Proteoform Project.
The Human Genome Project (HGP) was a remarkable and unqualified success profoundly transforming and accelerating biological and medical research while converting a ~ $4B public investment into over $700B of economic activity and new industries (1). The challenge of revealing the “Blueprints of Life,” however, is surpassed by the challenge we face today: deriving from these blueprints an understanding of the structures they dictate and how these function within biological systems.
Proteins are primary effectors of function in biology, and thus, complete knowledge of their structure and behavior is fundamental to deciphering function in basic and translational research (2). The richness of protein structure and function goes far beyond the linear amino acid sequence dictated by the genetic code. Genetic variation, alternative splicing, and posttranslational modification (PTM) work together to create a rich variety of different proteoforms arising from our genes (Fig. 1) (3). The chemical diversity of proteins is foundational for the biological complexes and networks that control biology yet remains largely unknown. Genome sequence alone does not provide the needed information—only direct analysis of the proteoforms themselves can reveal their composition, enabling studies of their spatial distributions and temporal dynamics in biological systems. We propose here an ambitious initiative to define the human proteome, that is, to generate a definitive set of reference proteoforms produced from the genome (see Box 1).
Box 1.
A standard answer to this question is that a proteome is the set of proteins expressed by an organism. This idea clearly depends on what is meant by a “protein.” Proteins from even a single gene can vary widely in their amino acid sequence and PTMs giving rise to a variety of proteoforms. Thus, the proteome is necessarily the set of all proteoforms expressed by an organism. The initiative proposed here is founded upon this simple idea.
PROTEOFORM-LEVEL KNOWLEDGE IS ESSENTIAL TO UNDERSTAND BIOLOGICAL FUNCTION
Proteins are the central intermediaries between genotype and phenotype (2–4). It is not possible to understand the functioning of a biological system if one does not know what protein molecules are present, as well as the nature and abundances of their proteoforms. Knowledge of where the proteoforms are located within cells or tissues, what other proteoforms they interact with to form the multifunctional complexes that carry out critical functions in cell biology, and how they change in response to stimuli is essential. Innovative new tools are needed to comprehensively define the proteome, allowing proteoform abundances, interactors, and locations to be assessed with far greater depth at lower cost. The foundational premise of the HGP, which knowledge of the genome sequence will provide a fundamental understanding of biological systems, will not be realized in the absence of detailed proteoform-level information. This was clearly articulated by Collins et al. (2), “A critical step toward gaining a complete understanding... will be to take an accurate census of the proteins present in particular cell types. It will be a major challenge to catalog proteins present in low abundance or in membranes. Determining the absolute abundance of each protein, including all modified forms, will be an important next step.”
The Human Proteoform Project we present here is the critical next step in the quest to understand human health and disease. Several examples from five important disease areas illustrate the critical role of proteoforms in disease and health (Fig. 2). These examples show how disease-driven research has been advanced by discovery of proteoforms and their PTMs.
CENTRAL GOALS AND STRATEGY OF THE PROJECT
The primary objective of this project is to elucidate a complete set of expressed proteoforms derived from the ~20,000 genes encoded in the human genome. We forward a two-pronged strategy: On the one hand, we pursue deep proteoform-level analysis in medically relevant systems (Fig. 2); this will continue to open up fundamental insights into targets and use cases of high biomedical importance. In parallel, we invest heavily in the accelerated development of proteoform discovery and characterization technologies and deploy them for large-scale proteoform analysis to specimens from nominally healthy donors.
The project is modeled roughly after the successful roadmap provided by the HGP, which generated the human reference genome sequence while advancing technology in the process (2, 3, 5). An international effort on the scale of the HGP in both funding and time will reveal the full chemical complexity of our proteins, drive the frontiers of research and medicine well beyond what is currently possible, and be critical in the assignment of function to proteins and their PTMs in the decades ahead.
THE HUMAN PROTEOFORM PROJECT
We propose the Human Proteoform Project, a program to aggressively develop new technologies for comprehensive proteoform analysis and to assemble an extensive, high-quality atlas of human proteoforms. We envision next-generation proteomics in humans to be based on ~20,000 proteoform families (6), one for each gene in the genome. Deep catalogs of proteoforms compiled for widely characterized mammalian cell lines and primarily human samples will markedly accelerate our understanding and exploitation of proteins. This more profound knowledge of the central molecules of biology will provide an essential cornerstone for 21st century biology. New technologies will be central to this effort, as today’s ability to comprehensively identify proteoforms in complex systems is limited.
ASSEMBLING THE HUMAN PROTEOFORM ATLAS
Proteoform expression varies across cells and tissues, and studies of proteoform expression can be either global or targeted. The expression of rare proteoforms is stochastic in nature. The Human Proteoform Project will thus necessarily focus on capturing the identities of the dominant functional proteoform population rather than rare occurrences. We propose the bifurcated approach shown in Fig. 3. In global studies, all proteoforms present at detectable levels are characterized; in targeted studies, specific proteoform families arising from each human gene will be enriched and subjected to systematic proteoform discovery to reveal the molecular diversity present. The two paths are described below.
Cell-based approach to proteoform discovery
An important thrust of the project is the delineation of proteoform expression patterns in human cell types (Fig. 3, bottom) (7). Defining the number and nature of human cell types is an ambitious undertaking in its own right and is currently being pursued by several consortia (see below). Anchoring proteoform analysis with cell types provides a generalized strategy to access human biology across the natural context present within our tissues. The depth of proteoform analysis obtained depends on the detection sensitivity of the technology used: While today’s mass spectrometric platforms are pushing toward detection limits of ~25 copies per cell (7), aggressive technology investment is needed to further develop these platforms and to develop new approaches and paradigms (see the section below). A cell-based approach can begin using many thousands of cells of a given type and adopt single-cell proteoform technologies as they become available.
Gene-based approach for targeted proteoform discovery
The development of affinity reagents to capture the proteins encoded by each human gene will be invaluable to enrich and then characterize their proteoform families in a selection of human specimens. The fundamental role of proteoform-level knowledge in understanding human disease and health (Fig. 2) is evident from consideration of the most highly cited human genes in the biomedical literature (Fig. 4). Tumor necrosis factor, at the top of the list, has >200,000 citations; this high-citation number can be considered a reasonable proxy for the research funding that has gone into its study over decades. Notably, even the most-studied genes have unknown proteoforms essential to understand of their biological and disease-related functions. The economies of scale afforded by a concerted project to obtain comprehensive proteoform-level knowledge will make possible the acquisition of such information for the 20,000 proteoform families derived from the human genome.
NEW TECHNOLOGIES
At present, the dominant “bottom-up” paradigm of mass spectrometry (MS)–based proteomics sacrifices information about proteoforms by cleaving proteins into peptides; this is done for a pragmatic reason—it works, as the resultant peptides are generally much easier to identify than their parent intact proteoforms (8, 9). Top-down proteomics, in contrast, analyzes the entire intact proteoform and is the most powerful proteoform-level analysis technology in existence, providing knowledge regarding RNA isoform translation and combinatorial PTMs, but is limited in depth and throughput (4, 6). The flagship efforts of the Cancer Proteomics Consortium, CPTAC, have brought targeted proteomics and proteogenomics into regular use and produced major studies on ovarian (10), breast (11), and colorectal cancer (12). Using the bottom-up approach to proteomics, CPTAC noted recently that “the aggregated NCI-60 proteomics dataset covers only 12% of the whole encoded proteome, and only ~5% of the genes had sequence coverage of >50% of their protein coding regions.” (13). Regarding alternative splicing, “there is yet a major gap between the number of alternative transcripts asserted by RNA sequencing and that detectable by proteomics (e.g., <0.1% of putative novel splice junctions in cancer xenografts)” (13). This state of affairs underscores the critical need to advance the state of the art in proteomic analysis (14–17) via new technologies and extensive proteoform–level characterization of biological systems.
To achieve the objectives outlined above, it is critical to expand our technological abilities through a concerted long-term and multifaceted research and development effort. This effort should pursue both the continued development of MS-based technologies for proteoform analysis, as well as the exploration of potential paradigm-shifting new ideas and approaches that offer the possibility of transformative change. The development of increasingly powerful and effective nucleic acid sequencing has demonstrated the importance of investing heavily in ambitious new efforts to drive technology development. Similarly, single-molecule MS (18–20), nanopore sequencing (21, 22), cryoelectron microscopy and visual proteomics (23, 24), single-cell proteomics (25–29), single-molecule protein arrays (30, 31), and other ideas yet to be conceived need to be encouraged, supported, and developed to advance proteoform biology.
The outstanding success of the technology development program in the HGP and the associated private sector engagement provide an inspiring model for how this can be done well. Just as the $1 per base estimate for the HGP provided an important target to spur technology competition and development, so will a $1 per proteoform goal for the Human Proteoform Project as proposed previously (7). Although the details of its implementation plan will be developed with key stakeholders, at this time, the main parameters and their estimates help frame the project. For the cell-based prong (Fig. 3), we can anticipate that the output of the Human Cell Atlas, Human Biomolecular Atlas Program (HuBMAP), and other consortia will be a defined ontology and number of human cell types, allowing the proteome of each to be targeted. Assuming 5000 cell types and prescribing a depth of 1 million proteoforms in each, constructing the Human Proteoform Atlas would involve ~5 billion measurements of redundant proteoforms (32). Combined with the gene-based approach, perhaps ~50 million unique (nonredundant) proteoforms will be asserted with defined quality metrics over the course of the project.
THE PIVOT FROM PROTEOFORM DISCOVERY TO PROTEOFORM SCORING
A central principle in comprehensive proteoform analysis concerns the distinction between discovery and scoring. Comprehensive analysis of protein primary structure requires the generation of highly complex data necessary in the discovery phase of proteoform analysis. However, once we have in hand a comprehensive index of these proteoforms for the system under study, efforts can shift to a scoring mode informed by the previous knowledge. This transition from discovery to scoring is central to many fields: in genomics, for example, the initial discovery of single-nucleotide polymorphisms (SNPs) led to the generation of SNP databases and technologies for their scoring at scale. The scoring technology enabled cost-effective functional studies and disease-based research across human populations. Similarly, in MS, initial work to develop small molecule identification from gas-phase fragmentation patterns led to the establishment of rich databases of molecular fragmentation spectra allowing the rapid identification of already known compounds. This venerable principle will be invaluable to driving increased throughput and decreased cost. This anticipates that disruptive technologies such as single-molecule proteoform sequencing and analysis would benefit by providing a reference set of the human proteoforms actually present.
ENABLING NEW LEVELS OF BIOMEDICAL RESEARCH
With a new generation of precision measurement tools, studies of mutations, disease, infection, and drug treatment will all operate with more detailed knowledge afforded by creation of a comprehensive proteoform index. This will further accelerate the goal of 21st century biomedicine such as regenerative biology, enhanced drug development, and better detection of human disease—all of which involve proteins. Beyond improving the use of proteins as biomarkers, the reference atlas of proteoforms will enable the study of their spatial and temporal distributions within cells and tissues, information presently impossible to obtain. This will often involve protein affinity capture reagents enabling readouts using a wide array of technologies (Fig. 5). Scoring technologies for single-molecule and single-cell biology will be propelled by having proteoform answers in the “back of the book” as we develop and optimize them in the decade ahead.
SYNERGY OF THE HUMAN PROTEOFORM PROJECT WITH OTHER INITIATIVES
The Human Proteoform Project, by capturing all sources of protein variation for creation of a reference atlas of whole proteoforms, is fundamentally different from other proteomics initiatives. Prior initiatives such as those describing first drafts of the human proteome in 2014 (33, 34) and ongoing work under the aegis of the Human Protein Atlas and the Human Proteome Project (35) have accomplished a great deal over the past several years, and the Human Proteome Organization has called for the community to “systematically map all human proteoforms” (36). There has also been an industry-led call from several pharmaceutical companies underscoring the need for major improvements in proteoform measurement (37).
Clear synergies with initiatives focused on human cell typing and protein capture reagents are visible. The Human Protein Atlas with its existing set of >15,000 antibodies provides an valuable resource for targeted studies while also driving efforts to develop “open source” renewable affinity reagents of known sequence (38). These affinity reagents enable targeted enrichment of proteoform families deriving from each human gene (Fig. 3, top). Once the members of proteoform families are known, creation of a next generation of proteoform-directed affinity reagents will be possible (Fig. 5) (39). An important thrust of the Human Proteoform Project is the delineation of proteoform expression patterns across human tissues and cell types to be archived in the Human Proteoform Atlas. This effort will benefit greatly from the output of the now accelerating efforts in the HuBMAP (40), the Human Cell Atlas (41), and several affiliated consortia. These groups are actively in the process of defining all human cell types in an organized and interoperable ontology. This includes generating markers of cell types that will facilitate their sampling for cell-based proteomics to determine the proteoforms present.
ROLES OF GOVERNMENT, FOUNDATIONS, AND THE PRIVATE SECTOR
For the necessary transformation of technology and knowledge to take place over the coming decade, numerous stakeholders will be needed to engage and align with the project to bring it to fruition (42). Within the emergent proteomics ecosystem that we envisage, three categories of organizations can be identified—those focused on creating new knowledge (universities and research institutes), those creating new value for customers (instrument, biopharma, and diagnostics companies), and those providing financial and other resource support for the creation of knowledge or customer value (government agencies, philanthropies, nonprofit foundations, and well-established companies) (43). The role of the knowledge creators is paramount for a research-intensive area similar to this, and the major universities and research institutes will generate the structural, large-scale data to drive this effort. This will require substantial funding; for comparison, genomics research worldwide was publicly funded at about $3B per year from 2003 to 2006, with the United States contributing about 35% of this (44).
The companies and institutions that commercialize the tools, technologies, and services to advance the field also play a pivotal role in this endeavor often collaborating with academic researchers to bring new technologies to the marketplace. This cycle of innovation and commercialization was a fundamental enabler of the HGP. The biopharmaceutical and diagnostic companies invest heavily in research and development [for example, having spent $97 billion in R&D in the United States in 2017 (45)] and so are well poised to participate in these efforts. As noted above, generating the definitive proteoform set for the expressed human proteome presents a major economic opportunity for the private sector.
Bringing alignment and finding common goals for the various members of the emerging “proteoform ecosystem” is already underway with organizations starting to forge bridges across the boundaries. Increasing cooperation between public agencies, organizations, and international institutions will hasten the discovery and understanding of human proteoforms and provide marked growth in therapeutics, diagnostics, and the life sciences.
CONCLUSION AND OUTLOOK
The Human Proteoform Project will revolutionize our understanding of human health and disease. This ambitious project to develop and apply powerful new technologies to reveal the molecular complexity that underlies human biology will be transformative. While a full exploration into the nature of its many impacts is beyond the scope of this article, we provide in Fig. 6 an overview of some of the many areas in which it will open new vistas and enable revolutionary new technologies. We offer the roadmap outlined here to inspire its realization.
Acknowledgments
We acknowledge M. W. Mullowney for assistance in generating figures.
Funding: J.N.A. was supported by the ALS Association under award 508452. Y.G. was supported by the NIH under awards R01 HL096971, GM117058, and GM125085. N.L.K. was supported by the National Institute of General Medical Sciences (NIGMS) at NIH under award P41GM1085698. J.A.L. was supported by the NIH under award R01GM103479 and the Department of Energy under award DE-FC02-02ER63421. L.M.S. was supported by the NIGMS under award R35GM126914. J.C.-R. and Y.O.T. were supported by the European Horizon 2020 program under award 829157, and J.C.-R. was also supported by the Institut Pasteur, the CNRS, and EPIC-XS under award 823839.
Competing interest: The authors declare the following competing financial interest(s): Y.O.T. is an employee of Spectroswiss, which develops and commercializes FTMS data processing and data analysis software. N.L.K is involved with commercialization of hardware and software for proteoform analysis. P.O.D. is the Founder of Eastwoods Consulting, providing business advisory services to life science companies.
Background on the Consortium for Top Down Proteomics (CTDP): The consortium was formed in 2012 and is a nonprofit 501(c)3 organization. More information about the organization can be found at www.topdownproteomics.org/. The CTDP mission is to promote innovative research, collaboration, and education to accelerate the comprehensive analysis of intact proteoforms and their complexes in diverse biological systems.
Supplementary Materials
This PDF file includes:
REFERENCES AND NOTES
- 1.Dranke N., What is the human genome worth? Nature , (2011). [Google Scholar]
- 2.Collins F. S., Green E. D., Guttmacher A. E., Guyer M. S.; US National Human Genome Research Institute , A vision for the future of genomics research. Nature 422, 835–847 (2003). [DOI] [PubMed] [Google Scholar]
- 3.Smith L. M., Kelleher N. L.; Consortium for Top Down Proteomics , Proteoform: A single term describing protein complexity. Nat. Methods 10, 186–187 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Aebersold R., Agar J. N., Amster I. J., Baker M. S., Bertozzi C. R., Boja E. S., Costello C. E., Cravatt B. F., Fenselau C., Garcia B. A., Ge Y., Gunawardena J., Hendrickson R. C., Hergenrother P. J., Huber C. G., Ivanov A. R., Jensen O. N., Jewett M. C., Kelleher N. L., Kiessling L. L., Krogan N. J., Larsen M. R., Loo J. A., Ogorzalek Loo R. R., Lundberg E., MacCoss M. J., Mallick P., Mootha V. K., Mrksich M., Muir T. W., Patrie S. M., Pesavento J. J., Pitteri S. J., Rodriguez H., Saghatelian A., Sandoval W., Schlüter H., Sechi S., Slavoff S. A., Smith L. M., Snyder M. P., Thomas P. M., Uhlén M., van Eyk J. E., Vidal M., Walt D. R., White F. M., Williams E. R., Wohlschlager T., Wysocki V. H., Yates N. A., Young N. L., Zhang B., How many human proteoforms are there? Nat. Chem. Biol. 14, 206–214 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Venter J. C., Adams M. D., Myers E. W., Li P. W., Mural R. J., Sutton G. G., Smith H. O., Yandell M., Evans C. A., Holt R. A., Gocayne J. D., Amanatides P., Ballew R. M., Huson D. H., Wortman J. R., Zhang Q., Kodira C. D., Zheng X. H., Chen L., Skupski M., Subramanian G., Thomas P. D., Zhang J., Gabor Miklos G. L., Nelson C., Broder S., Clark A. G., Nadeau J., McKusick V. A., Zinder N., Levine A. J., Roberts R. J., Simon M., Slayman C., Hunkapiller M., Bolanos R., Delcher A., Dew I., Fasulo D., Flanigan M., Florea L., Halpern A., Hannenhalli S., Kravitz S., Levy S., Mobarry C., Reinert K., Remington K., Abu-Threideh J., Beasley E., Biddick K., Bonazzi V., Brandon R., Cargill M., Chandramouliswaran I., Charlab R., Chaturvedi K., Deng Z., Francesco V. D., Dunn P., Eilbeck K., Evangelista C., Gabrielian A. E., Gan W., Ge W., Gong F., Gu Z., Guan P., Heiman T. J., Higgins M. E., Ji R. R., Ke Z., Ketchum K. A., Lai Z., Lei Y., Li Z., Li J., Liang Y., Lin X., Lu F., Merkulov G. V., Milshina N., Moore H. M., Naik A. K., Narayan V. A., Neelam B., Nusskern D., Rusch D. B., Salzberg S., Shao W., Shue B., Sun J., Wang Z. Y., Wang A., Wang X., Wang J., Wei M. H., Wides R., Xiao C., Yan C., Yao A., Ye J., Zhan M., Zhang W., Zhang H., Zhao Q., Zheng L., Zhong F., Zhong W., Zhu S. C., Zhao S., Gilbert D., Baumhueter S., Spier G., Carter C., Cravchik A., Woodage T., Ali F., An H., Awe A., Baldwin D., Baden H., Barnstead M., Barrow I., Beeson K., Busam D., Carver A., Center A., Cheng M. L., Curry L., Danaher S., Davenport L., Desilets R., Dietz S., Dodson K., Doup L., Ferriera S., Garg N., Gluecksmann A., Hart B., Haynes J., Haynes C., Heiner C., Hladun S., Hostin D., Houck J., Howland T., Ibegwam C., Johnson J., Kalush F., Kline L., Koduru S., Love A., Mann F., May D., McCawley S., McIntosh T., McMullen I., Moy M., Moy L., Murphy B., Nelson K., Pfannkoch C., Pratts E., Puri V., Qureshi H., Reardon M., Rodriguez R., Rogers Y. H., Romblad D., Ruhfel B., Scott R., Sitter C., Smallwood M., Stewart E., Strong R., Suh E., Thomas R., Tint N. N., Tse S., Vech C., Wang G., Wetter J., Williams S., Williams M., Windsor S., Winn-Deen E., Wolfe K., Zaveri J., Zaveri K., Abril J. F., Guigó R., Campbell M. J., Sjolander K. V., Karlak B., Kejariwal A., Mi H., Lazareva B., Hatton T., Narechania A., Diemer K., Muruganujan A., Guo N., Sato S., Bafna V., Istrail S., Lippert R., Schwartz R., Walenz B., Yooseph S., Allen D., Basu A., Baxendale J., Blick L., Caminha M., Carnes-Stine J., Caulk P., Chiang Y. H., Coyne M., Dahlke C., Mays A. D., Dombroski M., Donnelly M., Ely D., Esparham S., Fosler C., Gire H., Glanowski S., Glasser K., Glodek A., Gorokhov M., Graham K., Gropman B., Harris M., Heil J., Henderson S., Hoover J., Jennings D., Jordan C., Jordan J., Kasha J., Kagan L., Kraft C., Levitsky A., Lewis M., Liu X., Lopez J., Ma D., Majoros W., McDaniel J., Murphy S., Newman M., Nguyen T., Nguyen N., Nodell M., Pan S., Peck J., Peterson M., Rowe W., Sanders R., Scott J., Simpson M., Smith T., Sprague A., Stockwell T., Turner R., Venter E., Wang M., Wen M., Wu D., Wu M., Xia A., Zandieh A., Zhu X., The sequence of the human genome. Science 291, 1304–1351 (2001). [DOI] [PubMed] [Google Scholar]
- 6.Smith L. M., Kelleher N. L., Proteoforms as the next proteomics currency. Science 359, 1106–1107 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kelleher N. L., A cell-based approach to the human proteome project. J. Am. Soc. Mass Spectrom. 23, 1617–1624 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.R. Aebersold, M. Mann, Mass-spectrometric exploration of proteome structure and function. 537, 347–355 (2016). [DOI] [PubMed]
- 9.Zhang Y., Fonslow B. R., Shan B., Baek M. C., Yates J. R. III, Protein analysis by shotgun/bottom-up proteomics. Chem. Rev. 113, 2343–2394 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zhang H., Liu T., Zhang Z., Payne S. H., Zhang B., McDermott J. E., Zhou J. Y., Petyuk V. A., Chen L., Ray D., Sun S., Yang F., Chen L., Wang J., Shah P., Cha S. W., Aiyetan P., Woo S., Tian Y., Gritsenko M. A., Clauss T. R., Choi C., Monroe M. E., Thomas S., Nie S., Wu C., Moore R. J., Yu K. H., Tabb D. L., Fenyö D., Bafna V., Wang Y., Rodriguez H., Boja E. S., Hiltke T., Rivers R. C., Sokoll L., Zhu H., Shih I. M., Cope L., Pandey A., Zhang B., Snyder M. P., Levine D. A., Smith R. D., Chan D. W., Rodland K. D., Carr S. A., Gillette M. A., Klauser K. R., Kuhn E., Mani D. R., Mertins P., Ketchum K. A., Thangudu R., Cai S., Oberti M., Paulovich A. G., Whiteaker J. R., Edwards N. J., McGarvey P. B., Madhavan S., Wang P., Chan D. W., Pandey A., Shih I. M., Zhang H., Zhang Z., Zhu H., Cope L., Whiteley G. A., Skates S. J., White F. M., Levine D. A., Boja E. S., Kinsinger C. R., Hiltke T., Mesri M., Rivers R. C., Rodriguez H., Shaw K. M., Stein S. E., Fenyo D., Liu T., McDermott J. E., Payne S. H., Rodland K. D., Smith R. D., Rudnick P., Snyder M., Zhao Y., Chen X., Ransohoff D. F., Hoofnagle A. N., Liebler D. C., Sanders M. E., Shi Z., Slebos R. J. C., Tabb D. L., Zhang B., Zimmerman L. J., Wang Y., Davies S. R., Ding L., Ellis M. J. C., Townsend R. R., Integrated proteogenomic characterization of human high-grade serous ovarian cancer. Cell 166, 755–765 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Mertins P., Mani D. R., Ruggles K. V., Gillette M. A., Clauser K. R., Wang P., Wang X., Qiao J. W., Cao S., Petralia F., Kawaler E., Mundt F., Krug K., Tu Z., Lei J. T., Gatza M. L., Wilkerson M., Perou C. M., Yellapantula V., Huang K.-l., Lin C., Mc Lellan M. D., Yan P., Davies S. R., Townsend R. R., Skates S. J., Wang J., Zhang B., Kinsinger C. R., Mesri M., Rodriguez H., Ding L., Paulovich A. G., Fenyö D., Ellis M. J., Carr S. A.; NCI CPTAC , Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534, 55–62 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Vasaikar S., Huang C., Wang X., Petyuk V. A., Savage S. R., Wen B., Dou Y., Zhang Y., Shi Z., Arshad O. A., Gritsenko M. A., Zimmerman L. J., Dermott J. E. M., Clauss T. R., Moore R. J., Zhao R., Monroe M. E., Wang Y.-T., Chambers M. C., Slebos R. J. C., Lau K. S., Mo Q., Ding L., Ellis M., Thiagarajan M., Kinsinger C. R., Rodriguez H., Smith R. D., Rodland K. D., Liebler D. C., Liu T., Zhang B.; Clinical Proteomic Tumor Analysis Consortium , Proteogenomic analysis of human colon cancer reveals new therapeutic opportunities. Cell 177, 1035–1049.e19 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Ruggles K. V., Tang Z., Wang X., Grover H., Askenazi M., Teubl J., Cao S., McLellan M. D., Clauser K. R., Tabb D. L., Mertins P., Slebos R., Erdmann-Gilmore P., Li S., Gunawardena H. P., Xie L., Liu T., Zhou J. Y., Sun S., Hoadley K. A., Perou C. M., Chen X., Davies S. R., Maher C. A., Kinsinger C. R., Rodland K. D., Zhang H., Zhang Z., Ding L., Townsend R. R., Rodriguez H., Chan D., Smith R. D., Liebler D. C., Carr S. A., Payne S., Ellis M. J., Fenyő D., An analysis of the sensitivity of proteogenomic mapping of somatic mutations and novel splicing events in cancer. Mol. Cell. Proteomics 15, 1060–1071 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Tran J. C., Zamdborg L., Ahlf D. R., Lee J. E., Catherman A. D., Durbin K. R., Tipton J. D., Vellaichamy A., Kellie J. F., Li M., Wu C., Sweet S. M. M., Early B. P., Siuti N., LeDuc R. D., Compton P. D., Thomas P. M., Kelleher N. L., Mapping intact protein isoforms in discovery mode using top-down proteomics. Nature 480, 254–258 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Chen B., Brown K. A., Lin Z., Ge Y., Top-down proteomics: Ready for prime time? Anal. Chem. 90, 110–127 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.LeDuc R. D., Schwämmle V., Shortreed M. R., Cesnik A. J., Solntsev S. K., Shaw J. B., Martin M. J., Vizcaino J. A., Alpi E., Danis P., Kelleher N. L., Smith L. M., Ge Y., Agar J. N., Chamot-Rooke J., Loo J. A., Pasa-Tolic L., Tsybin Y. O., ProForma: A standard proteoform notation. J. Proteome Res. 17, 1321–1325 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Smith L. M., Thomas P. M., Shortreed M. R., Schaffer L. V., Fellers R. T., LeDuc R. D., Tucholski T., Ge Y., Agar J. N., Anderson L. C., Chamot-Rooke J., Gault J., Loo J. A., Paša-Tolić L., Robinson C. V., Schlüter H., Tsybin Y. O., Vilaseca M., Vizcaíno J. A., Danis P. O., Kelleher N. L., A five-level classification system for proteoform identifications. Nat. Methods 16, 939–940 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Dominguez-Medina S., Fostner S., Defoort M., Sansa M., Stark A. K., Halim M. A., Vernhes E., Gely M., Jourdan G., Alava T., Boulanger P., Masselon C., Hentz S., Neutral mass spectrometry of virus capsids above 100 megadaltons with nanomechanical resonators. Science 362, 918–922 (2018). [DOI] [PubMed] [Google Scholar]
- 19.Kafader J. O., Melani R. D., Durbin K. R., Ikwuagwu B., Early B. P., Fellers R. T., Beu S. C., Zabrouskov V., Makarov A. A., Maze J. T., Shinholt D. L., Yip P. F., Tullman-Ercek D., Senko M. W., Compton P. D., Kelleher N. L., Multiplexed mass spectrometry of individual ions improves measurement of proteoforms and their complexes. Nat. Methods 17, 391–394 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kafader J. O., Melani R. D., Senko M. W., Makarov A. A., Kelleher N. L., Compton P. D., Measurement of individual ions sharply increases the resolution of orbitrap mass spectra of proteins. Anal. Chem. 91, 2776–2783 (2019). [DOI] [PubMed] [Google Scholar]
- 21.Ouldali H., Sarthak K., Ensslen T., Piguet F., Manivet P., Pelta J., Behrends J. C., Aksimentiev A., Oukhaled A., Electrical recognition of the twenty proteinogenic amino acids using an aerolysin nanopore. Nat. Biotechnol. 38, 176–181 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Restrepo-Perez L., Joo C., Dekker C., Paving the way to single-molecule protein sequencing. Nat. Nanotechnol. 13, 786–796 (2018). [DOI] [PubMed] [Google Scholar]
- 23.Beck M., Malmström J. A., Lange V., Schmidt A., Deutsch E. W., Aebersold R., Visual proteomics of the human pathogen Leptospira interrogans. Nat. Methods 6, 817–823 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Xu M., Singla J., Tocheva E. I., Chang Y.-W., Stevens R. C., Jensen G. J., Alber F., De novo structural pattern mining in cellular electron cryotomograms. Structure 27, 679–691.e14 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Specht H., Slavov N., Transformative opportunities for single-cell proteomics. J. Proteome Res. 17, 2565–2571 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Budnik B., Levy E., Harmange G., Slavov N., SCoPE-MS: Mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation. Genome Biol. 19, 161 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Slavov N., Single-cell protein analysis by mass spectrometry. Curr. Opin. Chem. Biol. 60, 1–9 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zhou M., Uwugiaren N., Williams S. M., Moore R. J., Zhao R., Goodlett D., Dapic I., Paša-Tolić L., Zhu Y., Sensitive top-down proteomics analysis of a low number of mammalian cells using a nanodroplet sample processing platform. Anal. Chem. 92, 7087–7095 (2020). [DOI] [PubMed] [Google Scholar]
- 29.Zhu Y., Clair G., Chrisler W. B., Shen Y., Zhao R., Shukla A. K., Moore R. J., Misra R. S., Pryhuber G. S., Smith R. D., Ansong C., Kelly R. T., Proteomic analysis of single mammalian cells enabled by microfluidic nanodroplet sample preparation and ultrasensitive NanoLC-MS. Angew. Chem. Int. Ed. Eng. 57, 12370–12374 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Swaminathan J., Boulgakov A. A., Hernandez E. T., Bardo A. M., Bachman J. L., Marotta J., Johnson A. M., Anslyn E. V., Marcotte E. M., Highly parallel single-molecule identification of proteins in zeptomole-scale mixtures. Nat. Biotechnol. 36, 1076–1082 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Wu C., Garden P. M., Walt D. R., Ultrasensitive detection of attomolar protein concentrations by dropcast single molecule assays. J. Am. Chem. Soc. 142, 12314–12323 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.L. Smith, J. Chamot-Rooke, P. Danis, Y. Ge, J. Loo, L. Pasa-Tolic, Y. Tsybin, N. Kelleher, The Human Proteoform Project: A plan to define the human proteome; www.preprints.org/manuscript/202010.0368/v1 (2020). [DOI] [PMC free article] [PubMed]
- 33.Kim M. S., Pinto S. M., Getnet D., Nirujogi R. S., Manda S. S., Chaerkady R., Madugundu A. K., Kelkar D. S., Isserlin R., Jain S., Thomas J. K., Muthusamy B., Leal-Rojas P., Kumar P., Sahasrabuddhe N. A., Balakrishnan L., Advani J., George B., Renuse S., Selvan L. D. N., Patil A. H., Nanjappa V., Radhakrishnan A., Prasad S., Subbannayya T., Raju R., Kumar M., Sreenivasamurthy S. K., Marimuthu A., Sathe G. J., Chavan S., Datta K. K., Subbannayya Y., Sahu A., Yelamanchi S. D., Jayaram S., Rajagopalan P., Sharma J., Murthy K. R., Syed N., Goel R., Khan A. A., Ahmad S., Dey G., Mudgal K., Chatterjee A., Huang T. C., Zhong J., Wu X., Shaw P. G., Freed D., Zahari M. S., Mukherjee K. K., Shankar S., Mahadevan A., Lam H., Mitchell C. J., Shankar S. K., Satishchandra P., Schroeder J. T., Sirdeshmukh R., Maitra A., Leach S. D., Drake C. G., Halushka M. K., Prasad T. S. K., Hruban R. H., Kerr C. L., Bader G. D., Iacobuzio-Donahue C. A., Gowda H., Pandey A., A draft map of the human proteome. Nature 509, 575–581 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wilhelm M., Schlegl J., Hahne H., Gholami A. M., Lieberenz M., Savitski M. M., Ziegler E., Butzmann L., Gessulat S., Marx H., Mathieson T., Lemeer S., Schnatbaum K., Reimer U., Wenschuh H., Mollenhauer M., Slotta-Huspenina J., Boese J. H., Bantscheff M., Gerstmair A., Faerber F., Kuster B., Mass-spectrometry-based draft of the human proteome. Nature 509, 582–587 (2014). [DOI] [PubMed] [Google Scholar]
- 35.Thul P. J., Åkesson L., Wiking M., Mahdessian D., Geladaki A., Blal H. A., Alm T., Asplund A., Björk L., Breckels L. M., Bäckström A., Danielsson F., Fagerberg L., Fall J., Gatto L., Gnann C., Hober S., Hjelmare M., Johansson F., Lee S., Lindskog C., Mulder J., Mulvey C. M., Nilsson P., Oksvold P., Rockberg J., Schutten R., Schwenk J. M., Sivertsson Å., Sjöstedt E., Skogs M., Stadler C., Sullivan D. P., Tegel H., Winsnes C., Zhang C., Zwahlen M., Mardinoglu A., Pontén F., von Feilitzen K., Lilley K. S., Uhlén M., Lundberg E., A subcellular map of the human proteome. Science 356, eaal3321 (2017). [DOI] [PubMed] [Google Scholar]
- 36.Adhikari S., Nice E. C., Deutsch E. W., Lane L., Omenn G. S., Pennington S. R., Paik Y. K., Overall C. M., Corrales F. J., Cristea I. M., van Eyk J. E., Uhlén M., Lindskog C., Chan D. W., Bairoch A., Waddington J. C., Justice J. L., LaBaer J., Rodriguez H., He F., Kostrzewa M., Ping P., Gundry R. L., Stewart P., Srivastava S., Srivastava S., Nogueira F. C. S., Domont G. B., Vandenbrouck Y., Lam M. P. Y., Wennersten S., Vizcaino J. A., Wilkins M., Schwenk J. M., Lundberg E., Bandeira N., Marko-Varga G., Weintraub S. T., Pineau C., Kusebauch U., Moritz R. L., Ahn S. B., Palmblad M., Snyder M. P., Aebersold R., Baker M. S., A high-stringency blueprint of the human proteome. Nat. Commun. 11, 5301 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Kellie J. F., Tran J. C., Jian W., Jones B., Mehl J. T., Ge Y., Henion J., Bateman K. P., Intact protein mass spectrometry for therapeutic protein quantitation, pharmacokinetics, and biotransformation in preclinical and clinical studies: An industry perspective. J. Am. Soc. Mass Spectrom. 32, 1886–1900 (2021). [DOI] [PubMed] [Google Scholar]
- 38.Baker M., Reproducibility crisis: Blame it on the antibodies. Nature 521, 274–276 (2015). [DOI] [PubMed] [Google Scholar]
- 39.Zhou X. X., Bracken C. J., Zhang K., Zhou J., Mou Y., Wang L., Cheng Y., Leung K. K., Wells J. A., Targeting phosphotyrosine in native proteins with conditional, bispecific antibody traps. J. Am. Chem. Soc. 142, 17703–17713 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.HuBMAP Consortium , The human body at cellular resolution: The NIH Human Biomolecular Atlas Program. Nature 574, 187–192 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Regev A., Teichmann S. A., Lander E. S., Amit I., Benoist C., Birney E., Bodenmiller B., Campbell P., Carninci P., Clatworthy M., Clevers H., Deplancke B., Dunham I., Eberwine J., Eils R., Enard W., Farmer A., Fugger L., Göttgens B., Hacohen N., Haniffa M., Hemberg M., Kim S., Klenerman P., Kriegstein A., Lein E., Linnarsson S., Lundberg E., Lundeberg J., Majumder P., Marioni J. C., Merad M., Mhlanga M., Nawijn M., Netea M., Nolan G., Pe'er D., Phillipakis A., Ponting C. P., Quake S., Reik W., Rozenblatt-Rosen O., Sanes J., Satija R., Schumacher T. N., Shalek A., Shapiro E., Sharma P., Shin J. W., Stegle O., Stratton M., Stubbington M. J. T., Theis F. J., Uhlen M., van Oudenaarden A., Wagner A., Watt F., Weissman J., Wold B., Xavier R., Yosef N.; Human Cell Atlas Meeting Participants , The human cell atlas. eLife 6, e27041 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Adner R., Ecosystem as structure: An actionable construct for strategy. J. Manag. 43, 39–58 (2017). [Google Scholar]
- 43.Valkokari K., Business, innovation, and knowledge ecosystems: How they differ and how to survive and thrive within them. Technol. Innov. Manag. Rev. 5, 17–24 (2015). [Google Scholar]
- 44.Pohlhaus J. R., Cook-Deegan R. M., Genomics research: World survey of public funding. BMC Genomics 9, 472 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Biopharmaceutical Industry Profile (PhRMA - Pharmaceutical Research and Manufacturers of America, 2019).
- 46.Dolgin E., The most popular genes in the human genome. Nature 551, 427–431 (2017). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.