Abstract
Alternative splicing is prevalent in the heart and implicated in many cardiovascular diseases, but not every alternative transcript is translated and detecting non-canonical isoforms at the protein level remains challenging. Here we show the use of a computation-assisted targeted proteomics workflow to detect protein alternative isoforms in the human heart. We build on a recent strategy to integrate deep RNA-seq and large-scale mass spectrometry data to identify candidate translated isoform peptides. A machine learning approach is then applied to predict their fragmentation patterns and design protein isoform-specific parallel reaction monitoring detection (PRM) assays. As proof-of-principle, we built PRM assays for 29 non-canonical isoform peptides and detected 22 peptides in a human heart lysate. The predictions-aided PRM assays closely mirrored synthetic peptide standards for non-canonical sequences. This approach may be useful for validating non-canonical protein identification and discovering functionally relevant isoforms in the heart.
Keywords: Targeted proteomics, mass spectrometry, parallel reaction monitoring, alternative splicing, protein isoforms, proteoforms, machine learning, heart
1. Introduction
Many cardiac proteins have alternatively spliced variants, e.g., the tropomyosin 1 smooth vs. striated muscle and titin fetal N2BA vs. adult N2B isoforms, and several have been implicated in disease development [1]. Although some isoforms can be distinguished by their electrophoretic migration patterns, gel bands do not directly inform on sequence identity and isoforms frequently share similar molecular weights. Relatively few antibody probes currently exist that bind to specific isoforms without cross-reactivity with their canonical proteins, and their availability is hampered by the lengthy vetting and validation steps needed to establish probe specificity and reproducibility [2]. There is therefore a need for additional strategies that can detect and verify the existence of alternative protein isoforms in the heart.
Targeted mass spectrometry methods such as selected reaction monitoring (SRM) and parallel reaction monitoring (PRM) [3] offer high specificity and reproducibility for distinguishing isoform sequences, even if they differ from the canonical protein by one or few amino acid residues. These assays can also be easily disseminated across laboratories without proprietary reagents [4,5]. A challenge however is that building and verifying SRM/PRM methods for specific proteins can be laborious, as it requires knowledge of the MS2 fragmentation pattern and chromatographic retention time of the target peptide under a particular instrument setup. Target specificity in PRM is commonly achieved by co-injecting samples with isotope-labeled synthetic peptide standards. The synthetic peptides are identical in sequence to their endogenous counterparts but are tagged with 13C or 15N, hence they can be differentiated by their mass differences but contain identical MS2 fragmentation patterns and retention times to endogenous peptides of interest. The fragmentation patterns and retention time information can then be compared between the standards and endogenous peptides to verify the identity of non-canonical isoforms. However, the synthesis of labeled peptides can be expensive and time-consuming, especially if a large number of non-canonical peptides need to be detected.
Advances in machine learning have recently enabled more accurate computational prediction of peptide chromatographic retention time and fragmentation profiles [6–8], thus creating opportunities to build virtual spectral libraries from predicted spectra to be used in targeted proteomics assays [9,10]. Here we describe an application of MS2 spectrum prediction to assist the generation of PRM assays for detecting alternative protein isoforms in the heart. We first nominated a list of likely-translated non-canonical protein isoforms using a proteogenomics approach we recently described [11,12]. For each alternative peptide, we then generated virtual spectral libraries using an available deep learning tool to predict peptide fragmentation patterns [6] to build detection assays for their detection in the endogenous human heart proteome. Our results suggest that predictions-aided PRM mass spectrometry can present a viable strategy to detect and verify alternative isoform proteins in the heart.
2. Method Summary
We constructed computation-assisted protein assays using computationally predicted peptide fragmentation spectra. The spectra were predicted using the neural network implemented in Prosit [6]. Collision energy and retention time predictions were calibrated with HeLa cell digest experiment data using data-dependent acquisition (DDA) mass spectrometry. Human adult heart lysate (adult female donor, aged 76 years, no known heart diseases) was purchased from Novus Biologicals. Three technical replicates of shotgun DDA and PRM data were acquired on a Q-Exactive HF mass spectrometer (Thermo) and analyzed using Skyline [13]. Similarly, three biological replicates of AC16 cells data were acquired using PRM. Protein structures 4A7L and 4G1N were retrieved from Protein Data Bank. Details are in Supplementary Methods.
3. Results and Discussions
3.1. Selecting candidate cardiac alternative isoform peptides.
We recently described a proteogenomics strategy that combines RNA-seq and mass spectrometry to identify translated alternative splicing isoforms [11,12]. The protein isoforms are identified via non-canonical peptides that correspond to isoform-specific exons or exon junctions, such as sequences that span two exons that are non-consecutive in the canonical protein. In a hypothetical protein with 3 exons where the canonical exon 2 is skipped over in a shorter alternative isoform, a peptide segment spanning exon 1 and exon 3 would distinguish the alternative isoform. Using RNA-seq transcript splice junctions to construct a human heart-specific protein database, we reanalyzed three large public human heart shotgun proteomics datasets (PXD006675, PXD000561, PXD010154) to acquire the list of non-canonical peptide candidates. We then filtered this list to include only non-canonical isoform peptides with a 1% peptide false discovery rate (Percolator q-value ≤ 0.01), were ≥9 amino acid long, and had no chemical or enzymatic modifications besides cysteine carbamidomethylation (Supplementary Figure S1). The shortlisted 954 non-canonical peptides belonged to 339 genes that represent likely translated protein isoforms in human hearts (Supplementary Table S1).
3.2. Prediction of MS2 spectra using a neural network.
Despite stringent filtering, identifying non-canonical peptides with shotgun proteomics can still be prone to pitfalls including false positive identifications and inconsistent detection across samples. PRM provides a means to verify the peptide sequences that have been discovered from shotgun proteomics and enable their routine detection across multiple samples.; however, as mentioned above, the building of PRM assays commonly relies on chemically synthesized peptide standards, which can be financially prohibitive to procure at scale. We therefore explored an alternative approach to synthetic standards for PRM (Figure 1A). To predict the fragmentation pattern and retention time for each peptide, we utilized a neural network implemented in Prosit to predict MS2 fragmentation patterns and chromatographic retention times of the isoform peptides, after calibrating peptide fragmentation energy and retention time using a complex protein mixture on a local instrument (see Supplementary Methods). The predicted fragmentation spectra of all 954 candidate isoform peptides are shown in Supplementary Table S1 and Supplementary Data S1. On average the predicted spectrum of each sequence has 29 fragments at 1% or more relative intensity (Interquartile range: [21–39] fragments).
Figure 1 |. Computation-Assisted PRM Detection of Protein Alternative Isoforms.
A. Schematic of workflow to build targeted proteomics assays using predicted MS2 spectra. Step 1: candidate translated protein isoform sequences are identified by co-analyzing RNA-seq and shotgun proteomics data. Step 2: a neural network is used to predict fragmentation patterns and retention time of the candidate peptides. Step 3: The predictions are used to build targeted mass spectrometry methods to detect cardiac protein isoforms. B. List of 22 alternative isoform peptides detected in the human heart using parallel reaction monitoring (n=3; technical replicates). Rankings of protein popularity are calculated from citation-weighted publication counts among PubMed papers in each topic.
Because many isoform peptides are not commonly reported in proteomics experiments and may contain different sequence features (e.g., non-tryptic ends or missed cleavages due to the enrichment of lysine at exon junctions) than those used to train the neural network, we assessed whether the predicted spectra corresponded well to experimental ground truths. To do so, we compared in silico spectra to experimental MS2 spectra of 12 synthetic peptide standards of non-canonical sequences we previously acquired [11] (Figure 2A; Supplementary Data S2). The results show that predicted and synthetic spectra showed a high degree of concordance as evidenced by visual comparisons of fragment ion intensity patterns, as well as through calculated spectral dot product scores in Skyline (Figure 2A). We further examined 8 additional synthetic peptides corresponding to isoform sequences, and again observed a high degree of correspondence between experimental and predicted spectra. Taken together, the estimated fragmentation patterns of the peptides closely matched empirical measurements from synthetic standards and endogenous peptides, supporting that neural network predictions could provide a reasonable surrogate to synthetic standards for building and verifying targeted protein detection assays for the examined non-canonical peptides (Supplementary Data S2 and S3).
Figure 2 |. Identified Protein Alternative Isoforms in the Human Heart.
A. Comparison of predicted (top) and experimental (bottom) MS2 spectra of two selected sequences, showing concordance between predicted and experimental spectra. B. Example splice graphs of targeted alternative peptides (red) and the canonical exon sequences they replace (blue) in three cardiac genes (see Supplementary Data S4 for additional splice graphs). UniProt-documented phosphorylation sites are highlighted on the canonical sequence. C. Protein structures showing the location of the canonical sequences replaced in the alternative isoforms in (top) PKM (4G1N) (blue: alternative region) and (bottom) TPM1 (4A7L) (red: alternative region; pink: myosins; beige: actins; white and orange: tropomyosin chains).
3.3. Targeted proteomics detection of human heart protein isoforms.
We applied the targeted assays towards detecting endogenous protein isoforms using retention time scheduled PRM. Because the scan speed of the mass spectrometer limits the number of peptides that can be simultaneously targeted in one experiment, we performed a proof-of-principle experiment where we selected a small subset of non-canonical peptides to monitor in a scheduled PRM analysis. To select these qualifying peptides, we first performed a DDA shotgun proteomics experiment on commercially available human heart lysates. These results were then matched against predicted spectral and retention time values using Skyline. In total, we shortlisted 27 non-canonical peptides based on putative identification from the DDA data in a conventional database search (Percolator q-value ≤ 0.01) or the quality of their match to computed spectra (Skyline spectral library match dotp ≥ 0.7), and additionally included 2 known PKM isoform peptides. Among the 29 targeted peptides, we successfully detected 22 endogenous splice isoform peptides in the human heart lysate (Figure 1B; Supplementary Table S2; Supplementary Data S3). These peptides were identified based on spectral matching with Skyline, conventional database search, and manual data inspection by mass spectrometrists. The non-canonical sequences also showed good differential explanatory power in peptide-spectrum matches over second-best hit sequences or canonical-only searches (Supplementary Figure S2; Supplementary Tables S3–S4). To further assess the reproducibility of the workflow and applicability to other sample types, we also acquired PRM data on 3 biological replicates of human AC16 cardiomyocyte cell lines, in which 14 isoform peptides were consistently identified from all 3 biological replicates with dotp scores ranging from 0.79 to 0.95 (Supplementary Figure S3).
The 22 identified isoform peptides from human heart tissues include those from three different splice types (mutually exclusive exons, skipped exons, alternative 3′ splice sites) (Figure 2; Supplementary Data S4), correspond to both popular and less studied proteins in the heart (Figure 1B), and belong to proteins with diverse functions including sarcomere stability, epigenetic regulations, and oxidative stress responses. In some instances, we observed an intersection of alternative sequences with known post-translational modification sites and protein-protein interaction surfaces that provide clues to potential functional significance (Figure 2B–C). In the relatively well-characterized KPYM splice isoforms PKM1/PKM2, the alternative exon excluded in PKM1 overlaps with a cluster of post-translational modification sites in the designated canonical form (PKM2) including two 4-hydroxyproline sites at PKM2 P403 and P408 that are known to participate in HIF1ɑ binding and transactivation [14] (Figure 2C). In tropomyosin-1 (TPM1), one of four tropomyosin ortholog genes that function to stabilize actin filaments and which itself has over 10 potential splice isoforms, we detected a peptide arising from a mutually exclusive exon splice event which corresponds to the uncharacterized Uniprot TPM1 isoform 4 (P09493–4) (Figure 2B). This sequence replaces residues 189–212 in the canonical TPM1 sequence, which sits at a groove in the actin-binding interface (PDB 4A7L) and which spans likely pathogenic variants for hypertrophic cardiomyopathy (e.g., ClinVar p.E192K and p.N203K) (Figure 2C). To our knowledge, antibodies that distinguish between TPM1 splice isoforms are not commercially available, hence the targeted mass spectrometry method presented here offers a promising tool to characterize the expression and functions of these isoforms across sample types.
4. Conclusion
Reliable detection methods are essential for understanding protein biological function [15]. There is currently a dearth of targeted quantification assays for protein isoforms arising from alternative splicing. Here, we used machine learning predicted mass spectra as an alternative to labeled synthetic peptides to support the detection of non-canonical protein isoform sequences in targeted mass spectrometry. We found that the predicted peptide fragmentation patterns are compared to endogenous peptides to target peptides from non-canonical isoforms in the human heart. This strategy may avail routine targeted detection of less studied proteins isoforms and advance understanding of alternative splicing in the heart.
Supplementary Material
Supplementary Figure S2 | Confidence of Peptide-Spectrum matches in Non-canonical vs. Canonical Searches. Scatterplot showing peptide confidence after all cardiac tissue mass spectrometry data from PXD006675 were searched in bulk using either a custom isoform database or SwissProt Homo sapiens canonical sequences (v.2020_05, 20370 entries). The x axis shows the −log10 posterior error probability (higher = more confident identification) at the spectrum level for 723 re-identified candidate isoform peptides. Approx. 83% of the re-identified non-canonical peptides were re-identified at PEP ≤ 0.01. In the parallel SwissProt search (y-axis), few of these spectra could be confidently assigned to canonical sequences.
Supplementary Figure S1 | Sequence Features of Candidate Protein Isoform Peptides. A. Density showing distribution of the 954 candidate non-isoform peptide lengths (red dashed line: median). B. Histogram showing −log10 Percolator peptide posterior error probability at the spectrum level (red dashed line: median). C. Histogram showing −log10 Percolator peptide B-H FDR (q-value) at the spectrum level (red dashed line: median). D. Sequence logos showing the frequency of preceding and succeeding amino acids for the candidate non-canonical peptides. E. Distribution of number of internal tryptic cleavage sites. F. Number of candidate peptides that are not found in one of three databases (SwissProt v.2020_05 canonical sequences, SwissProt v.2020_05 canonical + isoform sequences, Trembl v.2020_05 canonical + isoform sequences) while allowing 0–3 substitutions (fill color of bars). Sizes of the compared databases are shown on the bottom.
Supplementary Figure S3 | Identified non-canonical peptides in AC16 human cardiomyocyte cell line. 14 non-canonical peptides were detected in AC16 cells (n = 3; biological replicates) by PRM LC-MS analysis.
Supplementary Data S1 | Prosit Predicted Spectra of Non-Canonical Peptides. Predicted MS2 spectra for 954 non-canonical peptide candidates. Color: b vs. y ions.
Supplementary Data S2 Spectral Prediction Quality. Each panel shows a comparison of the predicted spectrum (top) from Prosit for a non-canonical sequence vs. experimental spectra (bottom) acquired from selected heavy synthetic peptides. Dotp: Skyline dot product score.
Supplementary Data S4 | Splice Graphs for Additional PRM Detected Peptides.
Supplementary Table S2: Peptide sequences selected from shotgun DDA MS experiments but not detected in PRM MS runs in the human heart.
Supplementary Table S3: Score differential in best vs. second best hits of identified non-canonical peptides.
Supplementary Table S4: Best SwissProt canonical peptide matches for detected non-canonical peptides.
Supplementary Data S3 | Predicted and Experimental Spectra of Peptides Used in Scheduled PRM. In each of 22 detected peptides, the panels show the Skyline-generated view of the predicted spectrum (top) vs. the experimentally detected spectrum (bottom) of an alternative isoform peptide in the human heart.
Supplementary Table S1: Predicted MS2 fragment ions, intensity values, and peptide iRT values from 954 alternative isoform candidates in the heart using Prosit.
Highlights.
PRM-MS identification of translated protein isoforms from alternative splicing
Machine learning prediction aids the development of specific assays for isoform peptides
The workflow is applied to human heart lysate and cardiomyocyte cell line
Acknowledgments
This study was supported in part by the NIH NHLBI awards R00-HL127302, R01-HL141278, R21-HL150456 to M.L. and R00-HL144829 to E.L.; the NIH NRSA Postdoctoral Fellowship F32-HL149191 to Y.H.; the University of Colorado Postdoctoral Fellowship in Cardiovascular Research T32-HL007822 to V.D. and Y.H.; the University of Colorado Consortium for Fibrosis Research and Translation Pilot Grant to M.L.; and the Colorado Undergraduate Research Opportunity Program award to J.M.W.
Footnotes
Disclosures
None.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Data Availability
Raw mass spectrometry data files are available on ProteomeXchange (human heart tissue: PXD020074, AC16: PXD023563)
References
- [1].Weeland CJ, van den Hoogenhof MM, Beqqali A, Creemers EE. Insights into alternative splicing of sarcomeric genes in the heart. J Mol Cell Cardiol. 2015;81:107–113. [DOI] [PubMed] [Google Scholar]
- [2].Bradbury A, Plückthun A. Reproducibility: Standardize antibodies used in research. Nature. 2015;518(7537):27–29. [DOI] [PubMed] [Google Scholar]
- [3].Peterson AC, Russell JD, Bailey DJ, Westphall MS, Coon JJ. Parallel reaction monitoring for high resolution and high mass accuracy quantitative, targeted proteomics. Mol Cell Proteomics. 2012;11(11):1475–1488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Kennedy JJ, Abbatiello SE, Kim K, Yan P, Whiteaker JR, Lin C, et al. Demonstrating the feasibility of large-scale development of standardized assays to quantify human proteins. Nat Methods. 2014;11(2):149–155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].van den Broek I, Mastali M, Mouapi K, Bystrom C, Bairey Merz CN, Van Eyk JE. Quality Control and Outlier Detection of Targeted Mass Spectrometry Data from Multiplex Protein Panels. J Proteome Res. 2020;19(6):2278–2293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Gessulat S, Schmidt T, Zolg DP, Samaras P, Schnatbaum K, Zerweck J, et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat Methods. 2019;16(6):509–518. [DOI] [PubMed] [Google Scholar]
- [7].Tiwary S, Levy R, Gutenbrunner P, Salinas Soto F, Palaniappan KK, Deming L, et al. High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis. Nat Methods. 2019;16(6):519–525. [DOI] [PubMed] [Google Scholar]
- [8].Xu R, Sheng J, Bai M, Shu K, Zhu Y, Chang C. A Comprehensive Evaluation of MS/MS Spectrum Prediction Tools for Shotgun Proteomics [published online ahead of print, 2020 Jun 23]. Proteomics. 2020;e1900345. [DOI] [PubMed] [Google Scholar]
- [9].Yang Y, Liu X, Shen C, Lin Y, Yang P, Qiao L. In silico spectral libraries by deep learning facilitate data-independent acquisition proteomics. Nat Commun. 2020;11(1):146. Published 2020 Jan 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Lou R, Tang P, Ding K, Li S, Tian C, Li Y, et al. Hybrid Spectral Library Combining DIA-MS Data and a Targeted Virtual Library Substantially Deepens the Proteome Coverage. iScience. 2020;23(3):100903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Lau E, Han Y, Williams DR, Thomas CT, Shrestha R, Wu JC, et al. Splice-Junction-Based Mapping of Alternative Isoforms in the Human Proteome. Cell Rep. 2019;29(11):3751–3765.e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Han Y, Wright JM, Lau E, Lam MPY. Determining Alternative Protein Isoform Expression using RNA Sequencing and Mass Spectrometry. STAR Protocols. 2020;1, 100138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].MacLean B, Tomazela DM, Shulman N, Chambers M, Finney GL, Frewen B, et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics. 2010;26(7):966–968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Luo W, Hu H, Chang R, Zhong J, Knabel M, O’Meally R, et al. Pyruvate kinase M2 is a PHD3-stimulated coactivator for hypoxia-inducible factor 1. Cell. 2011;145(5):732–744. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Edwards AM, Isserlin R, Bader GD, Frye SV, Willson TM, Yu FH. Too many roads not taken. Nature. 2011;470(7333):163–165. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary Figure S2 | Confidence of Peptide-Spectrum matches in Non-canonical vs. Canonical Searches. Scatterplot showing peptide confidence after all cardiac tissue mass spectrometry data from PXD006675 were searched in bulk using either a custom isoform database or SwissProt Homo sapiens canonical sequences (v.2020_05, 20370 entries). The x axis shows the −log10 posterior error probability (higher = more confident identification) at the spectrum level for 723 re-identified candidate isoform peptides. Approx. 83% of the re-identified non-canonical peptides were re-identified at PEP ≤ 0.01. In the parallel SwissProt search (y-axis), few of these spectra could be confidently assigned to canonical sequences.
Supplementary Figure S1 | Sequence Features of Candidate Protein Isoform Peptides. A. Density showing distribution of the 954 candidate non-isoform peptide lengths (red dashed line: median). B. Histogram showing −log10 Percolator peptide posterior error probability at the spectrum level (red dashed line: median). C. Histogram showing −log10 Percolator peptide B-H FDR (q-value) at the spectrum level (red dashed line: median). D. Sequence logos showing the frequency of preceding and succeeding amino acids for the candidate non-canonical peptides. E. Distribution of number of internal tryptic cleavage sites. F. Number of candidate peptides that are not found in one of three databases (SwissProt v.2020_05 canonical sequences, SwissProt v.2020_05 canonical + isoform sequences, Trembl v.2020_05 canonical + isoform sequences) while allowing 0–3 substitutions (fill color of bars). Sizes of the compared databases are shown on the bottom.
Supplementary Figure S3 | Identified non-canonical peptides in AC16 human cardiomyocyte cell line. 14 non-canonical peptides were detected in AC16 cells (n = 3; biological replicates) by PRM LC-MS analysis.
Supplementary Data S1 | Prosit Predicted Spectra of Non-Canonical Peptides. Predicted MS2 spectra for 954 non-canonical peptide candidates. Color: b vs. y ions.
Supplementary Data S2 Spectral Prediction Quality. Each panel shows a comparison of the predicted spectrum (top) from Prosit for a non-canonical sequence vs. experimental spectra (bottom) acquired from selected heavy synthetic peptides. Dotp: Skyline dot product score.
Supplementary Data S4 | Splice Graphs for Additional PRM Detected Peptides.
Supplementary Table S2: Peptide sequences selected from shotgun DDA MS experiments but not detected in PRM MS runs in the human heart.
Supplementary Table S3: Score differential in best vs. second best hits of identified non-canonical peptides.
Supplementary Table S4: Best SwissProt canonical peptide matches for detected non-canonical peptides.
Supplementary Data S3 | Predicted and Experimental Spectra of Peptides Used in Scheduled PRM. In each of 22 detected peptides, the panels show the Skyline-generated view of the predicted spectrum (top) vs. the experimentally detected spectrum (bottom) of an alternative isoform peptide in the human heart.
Supplementary Table S1: Predicted MS2 fragment ions, intensity values, and peptide iRT values from 954 alternative isoform candidates in the heart using Prosit.
Data Availability Statement
Raw mass spectrometry data files are available on ProteomeXchange (human heart tissue: PXD020074, AC16: PXD023563)