Significance
An important goal in precision oncology is the identification of biomarkers and therapeutic targets. We identified and annotated a compendium of N-glycoproteins from diverse human lymphoid neoplasia, an attractive class of proteins with potential to serve as cancer biomarkers and therapeutic targets. In anaplastic lymphoma kinase-positive (ALK+) anaplastic large cell lymphoma (ALCL), integration of N-glycoproteomics and transcriptome sequencing revealed an underappreciated and targetable ALK-regulated cytokine/receptor signaling network highlighting the utility of functional proteogenomics for discovery of cancer biomarkers and therapeutic targets.
Keywords: proteomics, RNA-seq, CRISPR screen, biomarkers, lymphoma
Abstract
Identification of biomarkers and therapeutic targets is a critical goal of precision medicine. N-glycoproteins are a particularly attractive class of proteins that constitute potential cancer biomarkers and therapeutic targets for small molecules, antibodies, and cellular therapies. Using mass spectrometry (MS), we generated a compendium of 1,091 N-glycoproteins (from 40 human primary lymphomas and cell lines). Hierarchical clustering revealed distinct subtype signatures that included several subtype-specific biomarkers. Orthogonal immunological studies in 671 primary lymphoma tissue biopsies and 32 lymphoma-derived cell lines corroborated MS data. In anaplastic lymphoma kinase-positive (ALK+) anaplastic large cell lymphoma (ALCL), integration of N-glycoproteomics and transcriptome sequencing revealed an ALK-regulated cytokine/receptor signaling network, including vulnerabilities corroborated by a genome-wide clustered regularly interspaced short palindromic screen. Functional targeting of IL-31 receptor β, an ALCL-enriched and ALK-regulated N-glycoprotein in this network, abrogated ALK+ALCL growth in vitro and in vivo. Our results highlight the utility of functional proteogenomic approaches for discovery of cancer biomarkers and therapeutic targets.
The discovery of biologically relevant biomarkers and therapeutic targets is a critical goal of precision medicine. Integrative strategies combining multiple large-scale analyses, such as genomics, transcriptomics, and proteomics, offer complementary opportunities for the elucidation of novel pathogenic insights and discovery of functional modules and networks that are dysregulated in disease. Although the genomic aberrations associated with major types of primary cancer are being extensively characterized, the proteomic signatures of many cancers, including lymphomas, are unknown.
Protein glycosylation is one of the most common posttranslational modifications (PTMs) and plays important roles in many biological processes. N-glycosylation occurs on the amine group of asparagine residues (1, 2) and facilitates protein trafficking to membranes and secretion into the extracellular environment (3). N-glycoproteins are already used as diagnostic/prognostic biomarkers in clinical practice. The N-glycoproteins include many cluster of differentiation (CD) proteins that define hematopoietic subpopulations, as well as hematopoietic neoplasms (4). Many of these proteins have been found to represent useful targets for therapeutic antibodies, immunotoxins, and chimeric antigen receptor T cells (5–8). Thus, large-scale characterization of N-glycoproteins is appealing for discovering biomarkers and therapeutic targets.
To identify novel biomarkers and therapeutic targets in lymphomas, we performed agnostic mass spectrometry (MS)-based profiling of N-glycoproteins of 13 distinct subtypes of lymphomas. Orthogonal immunophenotypic studies in primary lymphoma samples corroborated MS-based results and revealed several biomarker candidates. We focused on anaplastic lymphoma kinase-positive (ALK+) anaplastic large cell lymphoma (ALCL) to obtain a more complete view of the “regulome” controlled by oncogenically activated ALK. To this end, we integrated RNA sequencing (RNA-seq) with N-glycoproteomic data to reveal ALK-regulated functional signaling networks. We functionally validated the expression and oncogenic role of selected components as being biologically relevant therapeutic targets in ALK+ALCL. Recently, large-scale analysis of cancer susceptibility genes has been greatly facilitated by using the bacterial clustered regularly interspaced short palindromic (CRISPR) system in combination with genome-wide, single-guide RNA (sgRNA) libraries to identify genes essential for cancer cell survival (9). Accordingly, we integrated the results of a genome-wide, CRISPR-Cas9–mediated vulnerability screen, which confirmed several susceptible therapeutic targets, including proteins in the signaling axes identified by proteogenomic integration. Our findings demonstrate the advantages of integration of multiomics data for the discovery of novel cancer biomarkers.
Results
Unbiased Analysis of N-Glycoproteomes of Human Lymphomas.
We performed solid-phase extraction of glycoproteins (SPEG) (10), followed by identification of PNGase-F deglycosylated peptides using liquid chromatography (LC)-tandem MS (MS/MS) (Fig. S1A). Label-free spectral counting (11) was used to quantify the relative abundance of N-glycoproteins in 32 human cell lines representing 13 subtypes of lymphoma (Figs. S1B and S2A). The data have been deposited to the ProteomeXchange Consortium (proteomecentral.proteomexchange.org/cgi/GetDataset) via the PRIDE (PRoteomics IDEntifications database) partner repository (12) with the dataset identifier PXD003469. In total, 1,091 unique N-glycoproteins were identified. The list of N-glycoproteins and their observed spectral counts is provided in Dataset S1. Overall, 69.5% of the identified N-glycoproteins were assigned to membrane compartments, of which 52.3% could be localized to the plasma/integral to membrane fraction. The endoplasmic reticulum, Golgi, and lysosome membranes were also highly represented (20.3%), whereas 10.1% were annotated as secreted proteins (Fig. S2B). Using the SOSUI transmembrane domain (TMD) prediction algorithm (13), we noted that 64.6% of the membrane proteins contained one to two TMDs. Using the MotifX algorithm (14), we confirmed that the most significantly enriched motif in all N-glycoproteins was the classical N-glycosylation sequence: N-!P-[S/T]. Another significantly enriched motif was a subset of the N-!P-S motif, where Y at the −1 position was overrepresented. Finally, gene set enrichment analysis identified proteins involved in “cell adhesion,” “signal transduction,” “immune response,” and “receptor activity” as significantly represented in our dataset (Fig. S2 C and D).
N-Glycoproteomic Profiles Classify Lymphomas According to Lineage, Cell of Origin, and Subtype.
Unsupervised clustering using 751 unique N-glycoproteins that met stipulated criteria (Methods) discriminated T/NK (natural killer)-cell neoplasia and B-cell lymphomas (Fig. 1A). Within the T/NK-cell group, all seven ALCL cell lines, all three NK-cell lymphoma cell lines and the two cutaneous T-cell lymphoma cells lines formed distinct clades. Similarly, in the B-cell lymphoma group, classical Hodgkin lymphoma (cHL) and non-Hodgkin lymphoma (NHL) cell lines formed two distinct clades. Among the B-cell NHLs, cell lines representative of pregerminal (pre-GC) center lymphoid proliferation [mantle-cell lymphoma (MCL)] grouped together, whereas GC-derived cell lines [Burkitt lymphoma (BL), transformed follicular lymphoma (t-FL), and diffuse large B-cell lymphoma (DLBCL)] formed a distinctive subgroup. The data show that pre-GC and GC-derived B-lymphomas have distinct N-glycoproteomic profiles.
To assess the diagnostic validity of this approach, we performed N-glycoproteomic profiling on eight primary lymphoma samples in a blinded fashion and evaluated the correlation of their N-glycoprotein profile with the cell lines. The glycoproteomic profiles of clinical samples of patients (P)1–P4 highly correlated with MCL cell lines, whereas the profiles of clinical samples P5–P8 matched with GC-derived NHL cell lines (Fig. 1B). These observations were in complete agreement with the clinical diagnosis rendered for these samples; clinical samples P1–P4 represented well-characterized cases of MCL, and clinical samples P5–P8 represented well-characterized cases of FL (Table S1). These data indicate that N-glycoproteomic profiles can accurately cluster primary lymphoma specimens into appropriate diagnostic categories.
Table S1.
Identifier code | Diagnosis | Age, y (sex) | Hb, g/dL | PCV, % | Fe, mg/dL | LDH, U/l | Ig gene rearrangement | Cytogenetics (classical karyotype or FISH) | Immunophenotype (immunohistochemistry and/or flow cytometry) | Sites of involvement | Stage at diagnosis |
P1 | MCL | 67 (M) | 13.9 | 39.5 | ND | ND | Monoclonal IgH | CCND1-IgH+ (dual-fusion FISH) | CD20+, CD79a+, CD3−, CD5+, CD23−, Cyclin D1+, BCL2+ | Peripheral blood | IV |
Bone marrow | |||||||||||
Left cervical lymph node | |||||||||||
P2 | MCL | 68 (M) | 12 | 37 | ND | 198 | Monoclonal IgH | Complex translocation involving chr. 11 and chr. 14 CCND1-IgH+ (dual-fusion FISH) | CD20+, CD79a+, CD3−, CD5+, CD23−, Cyclin D1+, CD43+, CD10− | Peripheral blood | IV |
Bone marrow | |||||||||||
Right supraclavicular lymph node | |||||||||||
P3 | MCL | 75 (F) | 11.9 | 37 | ND | 301 | Monoclonal IgH | ND | CD20+, CD19+, CD3−, CD5+, CD23−, Cyclin D1+, CD10−, FMC7+, Igκ-restricted | Peripheral blood | IV |
Bone marrow | |||||||||||
Right cervical lymph node | |||||||||||
P4 | MCL | 55 (F) | 15.8 | 46.9 | ND | 301 | Monoclonal IgH | CCND1-IgH+ (dual-fusion FISH) | CD20+, CD3−, CD2−, CD5+, CD23−, Cyclin D1+, CD10−, CD43+, CD45RO− | Peripheral blood | IV |
Bone marrow | |||||||||||
Left axillary lymph node | |||||||||||
Abdomen | |||||||||||
P5 | FL | 62 (F) | 14.9 | 44.5 | NA | NA | Monoclonal IgH | BCL2-IgH+ (dual-fusion FISH) | CD20+, CD3−, CD10+, BCL2+ | Peripheral blood | III |
Left cervical lymph node | |||||||||||
Abdomen | |||||||||||
P6 | FL | 79 (M) | 10.3 | 30.7 | 30 | 186 | Monoclonal IgH | ND | CD20+, CD19+, CD3−, CD4−, CD5−, CD7−, CD10+, BCL2++, Igκ-restricted | Peripheral blood | IV |
Bone marrow | |||||||||||
Cervical lymph node | |||||||||||
Axillary lymph node | |||||||||||
P7 | FL | 63 (F) | 13.1 | 40.3 | 83 | ND | ND | BCL2-IgH+ (dual-fusion FISH) | CD20+, CD79a+, CD3−, CD5−, CD45+, CD10+, BCL2+, BCL6+ | Right cervical lymph node | IV |
P8 | FL | 51 (M) | NA | NA | NA | NA | NA | NA | CD20+, CD3−, CD5−, CD10+, BCL2+ | Left cervical lymph node | NA |
chr., chromosome; F, female; Hb, hemoglobin; LDH, lactate dehydrogenase; M, male; NA, not available; ND, not determined; PCV, packed cell volume.
N-Glycoproteomic Analysis Accurately Identifies All Clinically Relevant CD Proteins.
Of the 417 CD proteins listed by the Human Cell Differentiation Molecules consortium (www.hcdm.org/), we identified 194 (46.5%) CD proteins in our N-glycoproteomic dataset (Dataset S1). We identified between 52 and 102 CD proteins per WHO subtype (average of 79) (Fig. S3A). Importantly, we detected virtually all CD proteins currently used for diagnostic evaluation of lymphomas. Pan–B-cell markers (CD19, CD22, CD79a, and CD79b) were appropriately identified in cell lines representative of MCL, t-FL, DLBCL, primary mediastinal B-cell lymphoma (PMBL) and nodular lymphocyte predominant Hodgkin lymphoma (NLPHL), but not in cell lines with T/NK-cell origin or cell lines representative of cHL, which typically lack expression of these markers. In addition, CD10 was identified in BL-, DLBCL-, PMBL-, and t-FL–derived cell lines but not in any other B-cell neoplasia or in any cell lines from T/NK-cell origin (Dataset S1). Furthermore, CD30 was also detected as expected in cell lines representative of cHL, PMBL, ALCL (ALK+ and ALK−), Sézary syndrome, mycosis fungoides, and aggressive NK-cell leukemia (15) (Fig. S3B). Moreover, the spectral counts for CD30 were consistent with the relative intensities observed by Western blot analysis of cHL, PMBL, and ALK+ALCL cell lines (15). As frequently observed in T/NK-cell neoplasia, the pan–T-cell markers (CD2, CD3, CD4, CD5, and CD7) were frequently singly or combinatorially undetected in T/NK-cell–derived lymphoma cell lines (16). CD56 was detected in two of the three aggressive NK-cell lymphoma cell lines (Dataset S1). Correspondingly, the primary lymphoma samples uniformly showed expression of the pan–B-cell markers CD79a and CD79b. Notably, CD10 was detected exclusively in the FL samples and not in the MCL samples, in keeping with the expected immunophenotypic profiles of the two B-cell lymphoma subcategories. Taken together, our analysis of N-glycoproteins represents a compendium of CD proteins expressed in lymphomas. The expression patterns of the vast majority of these proteins have not previously been described in the diverse lymphoma subtypes. Their identification in this study offers opportunities for their exploration as diagnostic biomarkers and therapeutic targets.
Immunophenotypic Validation of Selected N-Glycoproteins in Primary Clinical Samples.
For further orthogonal validation, we selected three proteins whose expression is underexplored in mature B-cell lymphomas and were observed by MS to be differentially expressed between t-FL–, BL-, and MCL-derived cell lines: CD44, Paraoxonase 2 (PON2), and CD276 (Fig. S4A). CD44 was identified in all MCL cell lines but absent or largely negative in BL and t-FL cell lines. These observations were confirmed by Western blotting and flow cytometry analyses (combined Fisher’s P value < 1e-10; Fig. S4 A and B). PON2 was also identified consistently in all MCL cell lines, although its expression was more limited in other lymphomas. Western blot analysis and spectral counts of PON2 demonstrated expression in MCL among the B-cell lymphomas and in ALCL cell lines (Fig. S4A). PON2 protein expression in lymphomas has not received notable attention in the literature, although differential expression of PON2 mRNA has been suggested in a single report (17). We further analyzed the expression of CD44 and PON2 by immunohistochemistry on lymphoma tissue microarrays containing 34 clinical biopsy specimens and revealed that 91.2% of MCL expressed CD44 and 58.8% of MCL expressed PON2 (Fig. S4C). As shown in Fig. S4A, CD276 was expressed in a subset of MCL and ALK+ALCL cell lines but not in GC-derived lymphomas, such as t-FL and BL. We observed significant concordance (combined Fisher’s P value < 1e-9) between spectral counts, Western blot analysis, and flow cytometric analysis for CD276 (Fig. S4D). This observation was corroborated by the immunohistochemistry of primary lymphoma tissue biopsies (n = 644), which showed that 35.3% of MCL and 18.8% of ALK+ALCL expressed CD276, whereas only rare cases of other lymphomas demonstrated CD276 expression (P < 1e-4; Fig. S4E).
Taken together, our MS-based profiling of N-glycoproteins identified several potential biomarkers of which a subset was further orthogonally validated in a large cohort of primary lymphoma tissue biopsies. Some of these potential biomarkers demonstrated highly selective expression in some subtypes of lymphoma. Using ALK+ALCL as a model, we investigated whether restricted expression may be linked to the pathogenesis of these subtypes, and performed functional studies to assess the candidacy of a selected protein as a vulnerability target.
Integrative Transcriptomics and Proteomics Reveal ALK-Activated Cytokine Receptor Network in ALCL.
To identify novel biologically relevant N-glycoproteins that play critical roles in the pathogenesis of ALK+ALCL, we used an integrative N-glycoproteomics and transcriptomics RNA-seq strategy. N-glycoproteomic analysis of ALK+ALCL identified several proteins with “cytokine receptor” activity, including inflammatory cytokine receptors [IL-1 receptor 1 (IL-1R1), IL-1R2, and IL-1RAP), Th1 cytokine receptors (IL-2Rα and IL-18R), Th2 cytokine receptor (IL-4Rα), and Th17 cytokine/cytokine receptors (IL-17, IL-17Rα, IL-22, IL-6Rα, IL-6Rβ, IL-31Rα, and IL-31Rβ) (Dataset S1]).
To determine whether the expression of the cytokine receptor signaling network is regulated by ALK activity, we performed N-glycoproteomic profiling of two ALK+ALCL cell lines (Karpas 299 and SU-DHL-1) with or without treatment for 9 h with CEP-26939 (300 nM), an ALK inhibitor (ALKi). We observed that 65 and 90 N-glycoproteins exhibited decreased expression in Karpas 299 and SU-DHL-1, respectively, upon treatment, whereas 11 N-glycoproteins decreased in both cell lines, including interleukin receptors, such as IL-31Rβ (Fig. S5). These results indicate that expression of these glycoproteins is regulated by ALK activity.
We also performed RNA-seq analysis of these two ALK+ALCL cell lines treated with crizotinib (100 nM), a US Food and Drug Administration-approved ALKi (18). The RNA-seq data were deposited in the Gene Expression Omnibus (19) with the accession number GSE81301. We observed that 1,244 transcripts were altered by ALK inhibition in both cell lines, with 400 transcripts being induced and 844 being repressed (Fig. S6A). In either cell line, the ALK-induced transcripts were significantly enriched for members of cytokine/receptor-STAT signaling pathways (Fig. S6 B and C). Furthermore, gene set enrichment analysis (GSEA) (20) revealed that transcripts involved in IL-2–STAT5, IL-6–STAT3 and TNF-α signaling pathways were significantly down-regulated after ALK inhibition, suggesting up-regulation of these pathways in ALK+ALCL (Fig. S6 D and E).
Integration of N-glycoproteomic and transcriptomic datasets revealed both corroborative and complementary information about the pathobiology of ALK+ALCL. The Spearman’s correlations between the steady-state mRNA and N-glycoprotein abundance were 0.50 and 0.49 in Karpas 299 and SU-DHL-1, respectively, which is comparable to previous reports (21) and suggests that RNA-seq and N-glycoproteomic dataset analyses may yield complementary results. Approximately 13% of the gene products were only measured at the N-glycoprotein level (Dataset S2), whereas more than 34% of the CD genes measured by either method were not detected by N-glycoproteomics, suggesting the importance of multiomic approaches for comprehensive analyses of cellular processes (Dataset S2). Despite these differences, in both Karpas 299 and SU-DHL-1, the N-glycoproteins induced by ALK were enriched in the ALK-regulated transcripts, indicating concordance between transcriptional and N-glycoproteomic levels [false discovery rate (FDR) < 1e-2; Fig. 2 A and B].
Integrative unsupervised clustering of N-glycoproteomic and transcriptomic datasets of both cell lines revealed 36 gene products that were concordantly decreased as a result of ALK inhibition in transcriptomic and glycoproteomic data of both cell lines (Fig. 2C). Several of these gene products are members of cytokine/receptor families, such as IL-1R1, IL-1R2, IL-2Rα, IL-4R, IL-18R1, and IL-31Rβ and STAT signaling networks (IL-6–STAT3 and IL-2–STAT5) (Fig. 2D). Reconstruction of the protein interaction network of these cytokine receptors revealed that several key effectors, such as STAT3, STAT1, JAK1/2, and STAT5B, are also induced by ALK in ALCL (Fig. 2E).
Identification of Selective Expression of IL-31Rβ in ALK+ALCL, Regulation by ALK Activity, and Contribution to ALK–Mediated Oncogenesis.
To investigate the validity of novel biomarkers identified by N-glycoproteomic and transcriptomic studies further, we investigated IL-31Rβ, a cytokine receptor we noted to be exclusively expressed in ALK+ALCL cell lines by N-glycoproteomics (Dataset S1) and regulated by ALK through transcription. Western blotting and flow cytometry results confirmed the selective expression of IL-31Rβ in all ALK+ALCL cell lines (Fig. 3A and Fig. S7A). Immunohistochemistry on 56 ALCL biopsies (ALK+ and ALK−) revealed that 100% of ALK+ALCL expressed IL-31Rβ, whereas only 40% of ALK−ALCL was positive (χ2 = 20.6, P < 1e-3; Fig. 3B). Given the selectivity of its expression in ALK+ALCL, we sought to assess its regulation by ALK. Pharmacological inhibition of ALK by two inhibitors (crizotinib and CEP-26939) decreased IL-31Rβ protein expression in a dose- and time-dependent manner (Fig. 3C and Fig. S7 B and C). To substantiate the effect of ALK activity on IL-31Rβ expression further, we ectopically expressed nucleophosmin–anaplastic lymphoma kinase (NPM-ALK) and its kinase-dead mutant NPM-ALK K210R into HeLa cells. As shown in Fig. S7D, NPM-ALK, but not the kinase-dead mutant K210R, induced IL-31Rβ expression, confirming the regulation of IL-31Rβ expression by ALK. Using quantitative real-time PCR, we confirmed that ALK regulates IL-31Rβ expression at the transcriptional level (Fig. S7 E and F). We also demonstrated that the transcriptional expression of oncostatin M (OSM), the ligand of IL-31Rβ, is dependent on ALK activity (Fig. 2E and Fig. S6A) and corroborated at the protein level by Western blotting (Fig. 3C). Interestingly, OSM partially rescued the phosphorylation of STAT3, indicative of its activation when ALK was inhibited (Fig. S7G). This observation suggests that the OSM/IL-31Rβ/STAT3 signaling axis (22) might function as a positive regulatory loop to maintain consistent STAT3 activation in ALK+ALCL (23) (Fig. S7H).
To determine if IL-31Rβ is a functionally relevant target in ALK+ALCL, we silenced IL-31Rβ using lentivirus-mediated RNAi in a DEL cell line. IL-31Rβ silencing (Fig. 3D) resulted in decreased clonogenicity of ALK+ALCL cells (Fig. 3E). Additionally, tumor growth in a xenograft model was abrogated in ALK+ALCL depleted of IL-31Rβ by RNAi, suggesting that this protein is a functionally relevant therapeutic target in ALK+ALCL (Fig. 3F).
CRISPR-Cas9 Screen Reveals Vulnerabilities in Cytokine Receptor Signaling Pathways in ALK+ALCL.
Although IL-6–STAT3 and IL-2–STAT5 signaling pathways are identified to be significantly down-regulated by ALK inhibition (Fig. 2D), this finding does not establish that they represent therapeutic vulnerabilities in ALK+ALCL. To address this question, we performed a genome-wide functional screen using a lentiviral sgRNA library (9) that targets 14,250 genes in the human genome with an average coverage of three to four sgRNAs per gene in two ALK+ALCL cell lines (SUP-M2 and Karpas 299). Monte Carlo analysis assessing major ALK-dependent signaling pathways in aggregate revealed that the IL-6–STAT3 and IL-2–STAT5 pathways are significant sensitivity targets in ALK+ALCL (Fig. S8A). The sgRNA screen showed that members of these pathways, such as OSM (the ligand for IL-31Rβ) and STAT3, are susceptible targets (Fig. S8B).
Discussion
The advent of large-scale technologies has dramatically advanced the understanding of cancer pathogenesis. However, most studies involve single-platform profiling approaches that provide a unidimensional view of the pathobiological mechanisms intrinsic to disease states. In contrast, integrative approaches, such as proteogenomics, offer the opportunity to understand better the complex biological networks that underlie the pathogenesis of higher order biological processes, such as cancer. Beyond large-scale annotation of genomic, transcriptomic, and proteomic profiles, the ability to undertake large-scale genetic screens of lethal phenotypes using the CRISPR system greatly facilitates the identification of critical functional genes that could be exploited for precision therapeutics.
We used an MS-based approach to generate a compendium of N-glycoproteins expressed in human lymphomas. Our study revealed a large number of proteins (>1,000) involved in various cellular functions. Importantly, numerous candidate biomarkers and therapeutic targets were identified, and selected candidates were orthogonally and functionally validated.
Unsupervised clustering of the N-glycoproteomic data readily segregated the 32 cell lines based on lineage of origin and respective subtype of lymphoma. Importantly, primary clinical samples were appropriately classified with cell lines representative of their respective cell of origin and lymphoma subtypes based on N-glycoprotein profiles. In all, the N-glycoproteomic signatures were robust and discriminated a wide range of closely related tumor subtypes and yielded several candidate biomarkers for distinct subtypes of lymphomas.
To gain insights into the relationship between RNA expression and N-glycoproteomic profiles and to identify candidate diagnostic biomarkers, we focused on ALCL harboring chimeric fusions involving the ALK. We integrated RNA-seq and N-glycoproteomic data to investigate changes mediated by oncogenic ALK activity. This approach highlighted a number of signaling modules regulated by ALK activity, including the inflammatory and IFN-γ and IFN-α responses, as well as pathways regulating hypoxia adaptation. In particular, the top-ranking pathways were the cytokine/receptor regulatory network involving several interleukins, their receptors, and the JAK-STAT signaling axis. Members of this network, including STAT3, were identified by CRISPR susceptibility screens as vulnerability targets in ALK+ALCL.
To validate the cytokine/receptor-STAT pathway functionally in ALK+ALCL, we followed up on our MS observation of restricted expression of IL-31Rβ in ALK+ALCL N-glycoproteomes. The expression of IL-31Rβ, a cytokine 6 receptor family member (24), was confirmed by immunohistochemistry of primary ALK+ALCL tissues. Further, we showed that IL-31Rβ is regulated by ALK activity and that IL-31Rβ plays an important role in its pathogenesis and may be used as biomarker for ALK+ALCL.
In conclusion, we demonstrated that integration of proteomics and functional genomic analyses yields complementary information that informs our understanding of complex biological processes. Additionally, the N-glycoproteomic data presented herein represent a compendium of candidate biomarkers and a valuable resource for the investigation of the biological roles and functional significance of specific glycoproteins in lymphomas. Taken together, these results suggest a model wherein NPM-ALK–driven signaling promotes an autocrine–paracrine network through induction of cytokines and their receptors via activation of STAT family transcription factors. Our data suggest that several glycoproteins demonstrate characteristic and unique expression signatures in clinicopathologically distinct forms of human lymphomas. Overall, our results indicate that integration of multidimensional data of cancer cells offers opportunities for novel biomarker discovery and identification of therapeutic targets.
Methods
Cell Lines.
All cell lines were grown in RPMI 1644 media supplemented with 10% FBS except for Hut-78, which was grown in Iscove’s modified Dulbecco’s medium (IMDM) supplemented with 10% FBS.
Clinical Samples.
Tumor cells were enriched from peripheral blood of eight patients using an immunomagnetic bead negative selection mixture (EasySep; StellCell Technologies). A total of 671 primary tumor biopsies were selected, and representative areas of different lymphomas were included in tissue microarrays, followed by immunohistochemistry studies. This study was approved by the Institutional Review Board of the University of Michigan (HUM00023256). No informed consent was required for this retrospective study.
N-Linked Glycopeptide Enrichments and LC-MS/MS Analysis.
The SPEG protocol previously described was used to isolate N-glycosylated peptides from whole-cell lysates (10). An LTQ OrbitrapXL (ThermoFisher) in-line with Paradigm MS2 HPLC (Michrom BioResources, Inc.) was used for acquiring high-resolution MS and MS/MS data. Each cell line was analyzed in biological replicate format until at least three biological replicates with a glycocapture efficiency of ≥90%, defined as the percentage of identified peptides having at least one deamidated Asn residue, were obtained.
Proteomic Database Searches.
RAW files were converted to mzXML using the ReAdW conversion tool from the Trans-Proteomic Pipeline (25) and then searched against the human UniProt database (release 15.15). Searches were carried out using X!Tandem with a k-score plug-in and the following parameters: (i) precursor parent and daughter ion mass tolerance windows were set to 50 ppm and 0.8 Da, respectively; (ii) maximum of two missed cleavages; and (iii) variable modifications: oxidized methionine, carbamidomethyl cysteine, and +0.9840 Da on Asn (reflecting conversion of glycosylated Asn to Asp upon PNGaseF-mediated deglycosylation). X!Tandem results were postprocessed using PeptideProphet and ProteinProphet (26, 27).
Extraction of Spectral Counts.
Adjusted spectral counts were extracted using Abacus with the following parameters: (i) PeptideProphet probability ≥ 0.8, (ii) combined file probability ≥ 0.7, and (iii) only consider peptides containing modified Asn residue (11). Normalization was performed using a modified version of the normalized spectral abundance factor (NSAF) (28). Our version of NSAF (which we term gNSAF) normalized protein spectral counts to a protein’s glycopeptide length. A protein’s glyco-length is defined as the length of all its tryptic peptides that contain the glycosylation motif Nx[S/T/C].
Hierarchical Clustering.
Throughout the analysis, the R (v3.2.3) (29) package hclust was used for hierarchical clustering and pheatmap was used for generating heat maps. Hierarchical clustering of cell lines was conducted on the log2-transformed gNSAF values using Euclidean distance and ward.D2 linkage. Given the ordered cell lines, protein expressions were clustered using the Euclidean distance and average linkage. To minimize the influence of low-level signals in the nomination of biomarker candidates, hierarchical clustering was performed with N-glycoproteins with spectral count sums across all cell lines that were greater than the average spectral counts among all the cell lines. A total of 751 N-glycoproteins met this criterion. Pearson correlation between the log2-transformed gNSAF values of each patient sample and 32 cell lines was calculated and used for clustering based on average linkage of Euclidean distance.
RNA-Seq.
Pair-ended RNA-seq reads were aligned to a human reference sequence (February 2009, GRCh37/hg19) using STAR (v2.4.2a) (30). Uniquely mapped reads were quantified by Subread (v1.5.0) and normalized to calculate fragment(s) per kilobase per million reads (FPKM). The log2-fold change was calculated as the log2 ratio of CEP-26939 and DMSO-treated samples with a pseudocount of 0.01 after removing the genes with less than 1 FPKM expression. The enrichment of the Hallmark gene sets from MSigDB (v5.1) (20) in the genes with a log2-fold change lower than −0.8 after ALK inhibition was computed and reported with a FDR q-value cutoff of 0.05. A GSEA preranked tool (v2.2.1) with 10,000 permutations was used with a ranked gene metric of log2-fold change and the Hallmark gene sets.
Normalization of N-Glycoproteomics Data for ALK-Inhibitor Treatment Experiments.
For each cell line, the normalized spectral count of N-glycoproteomics was calculated based on the total spectral count after addition of a pseudocount 1. The normalized spectral count of CEP-26939 versus DMSO treatment was used to calculate the log2-fold change.
Integration of RNA-Seq and N-Glycoproteomics.
A GSEA preranked tool (v2.2.1) with 10,000 permutations was used with a ranked gene metric of log2-fold change from RNA-seq and the set of genes with log2-fold change lower than −0.8 in N-glycoproteomics. The log2-fold changes from RNA-seq and N-glycoproteomics for Karpas 299 and SU-DHL-1 were standardized by the column-wise mean and SD. Hierarchical clustering of genes with Euclidean distance and ward.D2 linkage identified the subgroup of genes simultaneously down-regulated by ALK inhibition in RNA-seq and N-glycoproteomics. The enrichment of the Hallmark gene sets from MSigDB and this subgroup were computed and reported with an FDR cutoff of 0.05. To reconstruct the protein interaction network of cytokine receptors in this set (IL-4R, IL-18R1, IL-1R1, IL-1R2, IL-2RA, and IL-31RB), the STRING database (v10) (31) was used in a Cytoscape (v3.3.0) (32) environment with an interaction confidence cutoff of 0.95.
sgRNA.
The result of sgRNA screening was processed with ATARiS (33) to calculate the gene-level vulnerability score in each sample. The minimum of the sgRNA scores in SUP-M2 and Karpas 299 cell lines was assigned to the 14,250 genes included in the screen. For the gene members of the IL-2–STAT5 and the IL-6–STAT3 signaling pathways as defined by MSigDB Hallmark gene sets (34), the sgRNA scores were ranked and plotted alongside the corresponding log2 FPKM expression level in the DMSO-treated sample. The pathway cumulative vulnerability score of a pathway was defined as the sum of the sgRNA scores of its members. To assess the negative effect of a pathway on cell survivability, Monte Carlo simulation on the population of the sgRNA scores of human genes as defined by the HUGO Gene Nomenclature Committee (35) was performed. In each permutation, a random pathway with the same cardinality of the pathway of interest was constructed and a cumulative pathway vulnerability score was calculated. The distribution of the pathway cumulative vulnerability score of 1E6 permutations was presented and compared with the observed value for the pathway of interest.
Western Blot Analysis.
The following primary antibodies were obtained from Cell Signaling Technology: CD44 (clone 156-3C11), anti-ALK (clone C26G7), p-ALK (Y1604; polyclonal), STAT3 (clone 79D7), and p-STAT3 (Y705; clone M9C6). Primary antibodies against CD276 (clone 6A1) and PON2 (clone AF3E6) were purchased from Abcam. Primary antibodies against gp130 (clone M-20) were obtained from Santa Cruz Biotechnology. The antibody against IL-31Rβ was obtained from R&D Systems. The loading quality was assessed using antibodies against GAPDH (clone 6C5; Millipore) or β-actin (clone AC-74; Sigma–Aldrich).
Flow Cytometry Analysis.
Flow cytometry analyses were performed using a FACSCanto II (BD Biosciences) flow cytometer, and results were analyzed with FlowJo software. Labeled antibodies were FITC–anti-CD19 (clone SJ25C1; BD Pharmingen), phycoerythrin (PE)–Cy7–anti-CD19 (clone HIB19; BD Pharmingen), PE–Cy7–anti-CD44 (clone G44-26; BD Pharmingen), FITC–anti-CD276 (clone FM276; MiltenyiBiotec GmbH), and APC–anti-IL-31Rβ (clone AN-V2; eBioscience).
IL-31Rβ Knockdown.
IL-31Rβ was knocked down in the DEL cell line using the pLKO.1 lentiviral shRNA (TRCN0000289933) from Sigma–Aldrich. A scramble pLKO.1 lentiviral shRNA was used as a control. IL-31Rβ knockdown was assessed by Western blotting.
Colony Formation Assay.
DEL cells transformed with scramble shRNA or IL-31Rβ shRNA were incubated for 14 d in methylcellulose-based media (MethoCult; Stem Cell Technology). Colonies were stained with iodonitrotetrazolium chloride overnight and then counted under a microscope. Each assay was performed in triplicate.
Mouse Xenograft.
SCID-Beige mice (Charles River Laboratories) were injected s.c. with 10 × 106 cells in the flank (100-μL injection volume containing 50% Matrigel; Becton Dickinson). All animal studies were performed under the compliance of the University of Michigan Committee on the Use and Care of Animals (protocol PRO00003289).
Supplementary Material
Acknowledgments
We thank Farah Keyoumarsi for providing technical assistance with retrieval of clinical lymphoma tissues and Hyungwon Choi for discussions.
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: The mass spectrometry data have been deposited into the ProteomeXchange Consortium (proteomecentral.proteomexchange.org/cgi/GetDataset) via the PRIDE (Proteomics Identifications Database) partner repository with the dataset identifier PXD003469, and the RNA-sequencing data have been deposited in Gene Expression Omnibus (GEO), https://www.ncbi.nlm.nih.gov/geo (accession no. GSE81301).
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1701263114/-/DCSupplemental.
References
- 1.Pan S, Chen R, Aebersold R, Brentnall TA. Mass spectrometry based glycoproteomics–from a proteomics perspective. Mol Cell Proteomics. 2011;10:R110. 003251. doi: 10.1074/mcp.R110.003251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Zhang H, Li XJ, Martin DB, Aebersold R. Identification and quantification of N-linked glycoproteins using hydrazide chemistry, stable isotope labeling and mass spectrometry. Nat Biotechnol. 2003;21:660–666. doi: 10.1038/nbt827. [DOI] [PubMed] [Google Scholar]
- 3.Roth J. Protein N-glycosylation along the secretory pathway: Relationship to organelle topography and function, protein quality control, and cell interactions. Chem Rev. 2002;102:285–303. doi: 10.1021/cr000423j. [DOI] [PubMed] [Google Scholar]
- 4.Ramos-Medina R, et al. Immunohistochemical analysis of HLDA9 Workshop antibodies against cell-surface molecules in reactive and neoplastic lymphoid tissues. Immunol Lett. 2011;134:150–156. doi: 10.1016/j.imlet.2010.10.007. [DOI] [PubMed] [Google Scholar]
- 5.Oflazoglu E, Kissler KM, Sievers EL, Grewal IS, Gerber HP. Combination of the anti-CD30-auristatin-E antibody-drug conjugate (SGN-35) with chemotherapy improves antitumour activity in Hodgkin lymphoma. Br J Haematol. 2008;142:69–73. doi: 10.1111/j.1365-2141.2008.07146.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Reff ME, et al. Depletion of B cells in vivo by a chimeric mouse human monoclonal antibody to CD20. Blood. 1994;83:435–445. [PubMed] [Google Scholar]
- 7.Smith MR. Rituximab (monoclonal anti-CD20 antibody): Mechanisms of action and resistance. Oncogene. 2003;22:7359–7368. doi: 10.1038/sj.onc.1206939. [DOI] [PubMed] [Google Scholar]
- 8.Porter DL, Kalos M, Zheng Z, Levine B, June C. Chimeric antigen receptor therapy for B-cell malignancies. J Cancer. 2011;2:331–332. doi: 10.7150/jca.2.331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wang T, et al. Identification and characterization of essential genes in the human genome. Science. 2015;350:1096–1101. doi: 10.1126/science.aac7041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zhou Y, Aebersold R, Zhang H. Isolation of N-linked glycopeptides from plasma. Anal Chem. 2007;79:5826–5837. doi: 10.1021/ac0623181. [DOI] [PubMed] [Google Scholar]
- 11.Fermin D, Basrur V, Yocum AK, Nesvizhskii AI. Abacus: A computational tool for extracting and pre-processing spectral count data for label-free quantitative proteomic analysis. Proteomics. 2011;11:1340–1345. doi: 10.1002/pmic.201000650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Vizcaíno JA, et al. The PRoteomics IDEntifications (PRIDE) database and associated tools: Status in 2013. Nucleic Acids Res. 2013;41:D1063–D1069. doi: 10.1093/nar/gks1262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hirokawa T, Boon-Chieng S, Mitaku S. SOSUI: Classification and secondary structure prediction system for membrane proteins. Bioinformatics. 1998;14:378–379. doi: 10.1093/bioinformatics/14.4.378. [DOI] [PubMed] [Google Scholar]
- 14.Schwartz D, Gygi SP. An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets. Nat Biotechnol. 2005;23:1391–1398. doi: 10.1038/nbt1146. [DOI] [PubMed] [Google Scholar]
- 15.Swerdlow SHCE, et al. In: WHO Classification of Tumours of Haematopoietic and Lymphoid Tissues. 4th Ed. Vardiman JW, editor. Vol 2 IARC Press; Lyon, France: 2008. [Google Scholar]
- 16.Picker LJ, Weiss LM, Medeiros LJ, Wood GS, Warnke RA. Immunophenotypic criteria for the diagnosis of non-Hodgkin’s lymphoma. Am J Pathol. 1987;128:181–201. [PMC free article] [PubMed] [Google Scholar]
- 17.Fernàndez V, et al. Genomic and gene expression profiling defines indolent forms of mantle cell lymphoma. Cancer Res. 2010;70:1408–1418. doi: 10.1158/0008-5472.CAN-09-3419. [DOI] [PubMed] [Google Scholar]
- 18.Shaw AT, et al. Crizotinib versus chemotherapy in advanced ALK-positive lung cancer. N Engl J Med. 2013;368:2385–2394. doi: 10.1056/NEJMoa1214886. [DOI] [PubMed] [Google Scholar]
- 19.Barrett T, et al. NCBI GEO: Archive for functional genomics data sets–update. Nucleic Acids Res. 2013;41:D991–D995. doi: 10.1093/nar/gks1193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Subramanian A, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Zhang B, et al. NCI CPTAC Proteogenomic characterization of human colon and rectal cancer. Nature. 2014;513:382–387. doi: 10.1038/nature13438. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Auguste P, et al. Signaling of type II oncostatin M receptor. J Biol Chem. 1997;272:15760–15764. doi: 10.1074/jbc.272.25.15760. [DOI] [PubMed] [Google Scholar]
- 23.Chiarle R, et al. Stat3 is required for ALK-mediated lymphomagenesis and provides a possible therapeutic target. Nat Med. 2005;11:623–629. doi: 10.1038/nm1249. [DOI] [PubMed] [Google Scholar]
- 24.Tanaka M, Miyajima A. Oncostatin M, a multifunctional cytokine. Rev Physiol Biochem Pharmacol. 2003;149:39–52. doi: 10.1007/s10254-003-0013-1. [DOI] [PubMed] [Google Scholar]
- 25.Pedrioli PG. Trans-proteomic pipeline: A pipeline for proteomic analysis. Methods Mol Biol. 2010;604:213–238. doi: 10.1007/978-1-60761-444-9_15. [DOI] [PubMed] [Google Scholar]
- 26.Keller A, Nesvizhskii AI, Kolker E, Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem. 2002;74:5383–5392. doi: 10.1021/ac025747h. [DOI] [PubMed] [Google Scholar]
- 27.Nesvizhskii AI, Keller A, Kolker E, Aebersold R. A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem. 2003;75:4646–4658. doi: 10.1021/ac0341261. [DOI] [PubMed] [Google Scholar]
- 28.Paoletti AC, et al. Quantitative proteomic analysis of distinct mammalian Mediator complexes using normalized spectral abundance factors. Proc Natl Acad Sci USA. 2006;103:18928–18933. doi: 10.1073/pnas.0606379103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.R Core Team 2014 R: A Language and Environment for Statistical Computing (F Foundation for Statistical Reporting, Vienna). Available at www.R-project.org/. Accessed February 8, 2016.
- 30.Dobin A, et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Szklarczyk D, et al. STRING v10: Protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015;43:D447–D452. doi: 10.1093/nar/gku1003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Shannon P, et al. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Shao DD, et al. ATARiS: Computational quantification of gene suppression phenotypes from multisample RNAi screens. Genome Res. 2013;23:665–678. doi: 10.1101/gr.143586.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Liberzon A, et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1:417–425. doi: 10.1016/j.cels.2015.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Eyre TA, et al. The HUGO Gene Nomenclature Database, 2006 updates. Nucleic Acids Res. 2006;34:D319–D321. doi: 10.1093/nar/gkj147. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.