Abstract
A vast number of human cell lines are available for cell culture model-based studies, and as such the potential exists for discrepancies in findings due to cell line selection. To investigate this concept, we determined the relative protein abundance profiles of a panel of eight diverse, but commonly studied, human cell lines. This panel includes: HAP1, HEK293T, HeLa, HepG2, Jurkat, Panc1, SH-SY5Y, and SVGp12. We use a mass spectrometry-based proteomics workflow designed to enhance quantitative accuracy while maintaining analytical depth. To this end, our strategy leverages TMTpro16-based sample multiplexing, high-Field Asymmetric Ion Mobility Spectrometry (FAIMS), and real-time database searching (RTS). The data show that differences in the relative protein abundance profiles reflect cell line diversity. We also determine several hundred proteins to be highly enriched for a given cell line and perform gene ontology and pathway analysis on these cell line-enriched proteins. We provide an R Shiny application to query protein abundance profiles and retrieve proteins with similar patterns. The workflows used herein can be applied to additional cell lines to aid cell line selection for addressing a given scientific inquiry or for improving an experimental design.
Keywords: SPS-MS3, Multi-Notch, TMTpro, Eclipse, RTS, FAIMS
1. INTRODUCTION
Selection of an appropriate cell line for a given experiment is a fundamental, yet often overlooked facet of experimental design. Often discrepancies in studies focusing on protein interactions or signaling mechanisms can be traced back to the initial cell line selection [1]. We showcase this concept by exploring the relative protein abundance profiles of a panel of eight cell lines. This panel consisted of a diverse set of regularly used human cell lines, namely: HAP1, HEK293T, HeLa, HepG2, Jurkat, Panc1, SH-SY5Y, and SVGp12. Differences and similarities among these cell lines can provide guidance for cell line selection to address a given scientific question or enhance an experimental design.
The eight cell lines selected are diverse in origin and genotypic profile. The HAP1 cell line is near-haploid, having a single copy of nearly each chromosome [2]. This immortalized human cell line was derived from the KBM-7 cell line which originated from a patient with chronic myeloid leukemia [3] and are advantageous for genetic manipulation [4]. The HEK293T cell line was originally derived from human embryonic kidney cells grown in tissue culture and transfected with the SV40 large T antigen [5]. This cell line has been used in cell biology research for decades because of reliable growth and its propensity to be easily transfected, particularly for protein expression and production of recombinant retroviruses [6]. HeLa is the oldest and most commonly used human cell line, having been established in early 1960s [7]. Genetically, this immortal cervical cancer cell line has a hypertriploid chromosome number (3n+), that is, 76 to 80 total chromosomes rather than the typical diploid number of 46 [8]. HepG2 is an immortal human liver cancer cell line, which is often used as an in vitro model system for the study of polarized human hepatocytes [9]. This cell line is important in the study of human hepatic diseases and is a model system for studies of liver metabolism and toxicity of xenobiotics [10]. Jurkat cells are an immortalized suspension cell line of human T lymphocyte cells that was established in the late 1970s from the peripheral blood [11]. These cells are frequently used to study acute T cell leukemia, T cell signaling, and the expression of various chemokine receptors [12]. Panc1 is a cell line established from the pancreatic duct of a patient who suffered from epithelioid carcinoma [13]. This cell line is commonly used as an in vitro model of non-endocrine pancreatic cancer for tumorigenicity studies [14]. SH-SY5Y is a subclone of SK-N-SH, which was isolated from bone marrow [15]. SH-SY5Y has been used for decades as an in vitro model of neuronal function, differentiation, and dysregulation, with studies focusing on neurogenesis, as well as neurodegeneration [16]. Finally, SVGp12 was established by transfecting cultured human astroglia cells of brain origin with SV40-devived DNA [17]. SVGp12 has been used in studies dealing with tumor formation, proliferation, and apoptosis-related to glioma [18]. As noted, the diversity of these cell lines reflected their very different cellular origins and their use in a wide-range of research applications.
We used an isobaric tagging strategy to profile the basal protein abundance from these eight cell lines. Sample multiplexing strategies in mass spectrometry-based quantitative analyses, such as tandem mass tags (TMT) and isobaric tags for relative and absolute quantitation (iTRAQ) have many advantages for whole proteome profiling [19]. Such strategies allow for samples to be analyzed simultaneously, thereby reducing instrument time and costs, while producing fewer missing protein abundance values between samples and permitting multiple comparisons in a single experiment. Here we use TMTpro16 that enables 16 samples (i.e., duplicates of each of the eight cell lines) to be multiplexed. In addition, we ensure the quantitative integrity of our relative abundance measurements using: 1) synchronous precursor selection (SPS)-MS3 [20], 2) High Field Asymmetric Ion Mobility Spectrometry (FAIMS)-based gas phase fractionation [21], and 3) Real-Time database Searching (RTS) [22, 23]. In total, we profile the relative abundance of over 8,800 proteins across all eight cell lines. We have also designed an R Shiny application to allow the querying of protein abundance profiles in our dataset. The concepts and workflows used herein can be applied to additional cell lines in efforts to aid in selecting the appropriate cell line for a given experiment.
2. MATERIALS AND METHODS
Materials.
Tandem mass tag (TMTpro) isobaric reagents were from Thermo Fisher Scientific (Waltham, MA). Dulbecco's modified Eagle's medium (DMEM) and RPMI media, both supplemented with 10% fetal bovine serum (FBS) were from LifeTechnologies (Waltham, MA). Trypsin was purchased from Pierce Biotechnology (Rockford, IL) and LysC from Wako Chemicals (Richmond, VA). Unless otherwise noted, all other chemicals were from Pierce Biotechnology (Rockford, IL).
Cell growth and harvesting.
Methods of cell growth and propagation followed techniques utilized previously [24]. In brief, adherent cells were propagated in DMEM supplemented with 10% FBS. For adherent cells, the growth media was aspirated upon achieving ~90% confluency, and the cells were washed thrice with ice-cold phosphate-buffered saline (PBS). Adherent cells were dislodged with a non-enzymatic reagent and harvested by trituration. Jurkat cells were propagated in RPMI 1640 with 2mM L-glutamine and 10% FBS in suspension, with cell density maintained between 1x105 and 3x106 cells/ml. Cells were harvested at 3x106 cells/ml and washed thrice with 10 mL PBS. After washing, all cells were pelleted by centrifugation at 3,000 x g for 5 min at 4°C, and the supernatant was removed. Five hundred microliters of 200 mM EPPS, 8M urea, pH 8.5 supplemented with 1X Pierce Protease Inhibitors, Mini was added directly to each 15 cm cell culture dish for harvest and lysis.
Cell lysis and protein digestion.
Cells were homogenized by 12 passes through a 21-gauge (1.25 inches long) needle and incubated at 4°C with gentle agitation for 30 min. The homogenate was sedimented by centrifugation at 21,000 x g for 5 min and the supernatant was transferred to a new tube. Protein concentrations were determined using the bicinchoninic acid (BCA) assay (ThermoFisher Scientific). Proteins were subjected to disulfide bond reduction with 5 mM tris (2-carboxyethyl) phosphine (room temperature, 15 min) and alkylation with 10 mM iodoacetamide (room temperature, 20 min in the dark). Excess iodoacetamide was quenched with 10 mM dithiotreitol (room temperature, 15 min in the dark). Methanol-chloroform precipitation was performed prior to protease digestion. In brief, 4 parts of neat methanol were added to each sample and vortexed, 1-part chloroform was added to the sample and vortexed, and 3 parts water was added to the sample and vortexed. The sample was centrifuged at 14,000 RPM for 2 min at room temperature and subsequently washed twice with 100% methanol. Samples were resuspended in 200 mM EPPS, pH 8.5 and digested at room temperature for 14 h with LysC protease at a 100:1 protein-to-protease ratio. Trypsin was then added at a 100:1 protein-to-protease ratio and the reaction was incubated for 6 h at 37°C.
Tandem mass tag labeling.
TMTpro reagents (0.8 mg) were dissolved in anhydrous acetonitrile (40 μL) of which 7 μL was added to the peptides (50 μg) with 13 μL of acetonitrile to achieve a final concentration of approximately 30% (v/v). Following incubation at room temperature for 1 h, the reaction was quenched with hydroxylamine to a final concentration of 0.3% (v/v). TMTpro-labeled samples were pooled at a 1:1 ratio across all 16 samples. For each experiment, ~800 μg of the pooled sample was vacuum centrifuged to near dryness and subjected to a C18 solid-phase extraction (SPE) column with a capacity of 100 mg (Sep-Pak, Waters).
Off-line basic pH reversed-phase (BPRP) fractionation.
We fractionated the pooled, labeled peptide sample using BPRP HPLC [25] and an Agilent 1200 pump equipped with a degasser and a UV detector (set at 220 and 280 nm wavelength). Peptides were subjected to a 50-min linear gradient from 5% to 35% acetonitrile in 10 mM ammonium bicarbonate pH 8 at a flow rate of 0.6 mL/min over an Agilent ZORBAX 300Extend C18 column (3.5 μm particles, 4.6 mm ID and 250 mm in length). The peptide mixture was fractionated into a total of 96 fractions, which were consolidated into 24 “super-fractions” [26]. Each “super-fraction” consisted of 4 fractions from the 96 well plate, corresponding to every 24th fraction. Samples were subsequently acidified with 1% formic acid and vacuum centrifuged to near dryness. Each super-fraction was desalted via StageTip, dried again via vacuum centrifugation, and reconstituted in 5% acetonitrile, 5% formic acid for LC-MS/MS processing.
Liquid chromatography and tandem mass spectrometry.
Mass spectrometric data were collected on an Orbitrap Eclipse mass spectrometer coupled to a Proxeon NanoLC-1200 UHPLC. The 100 μm capillary column was packed with 35 cm of Accucore 150 resin (2.6 μm, 150Å; ThermoFisher Scientific). The scan sequence began with an MS1 spectrum (Orbitrap analysis, resolution 120,000, 350–1400 Th, automatic gain control (AGC) target 5 x105, maximum injection time 100 ms). Data were acquired ~120 minutes per fraction. MS2 analysis consisted of collision-induced dissociation (CID), quadrupole ion trap analysis, automatic gain control (AGC) 2 x104, NCE (normalized collision energy) 35, q-value 0.25, maximum injection time 120 ms), isolation window at 0.5 Th, and TopSpeed set at 3 sec. For FAIMS, the dispersion voltage (DV) was set at 5000V, the compensation voltages (CVs) used were −40V, −60V, and −80V, and the TopSpeed parameter was set at 1 sec.
Data analysis.
Spectra were converted to mzXML via MSconvert [27]. Database searching included all entries from the human UniProt Database (downloaded: August 2019). The database was concatenated with one composed of all protein sequences for that database in the reversed order. Searches were performed using a 50-ppm precursor ion tolerance for total protein level profiling. The product ion tolerance was set to 0.9 Da. These wide mass tolerance windows were chosen to maximize sensitivity in conjunction with Comet searches and linear discriminant analysis [28, 29]. TMTpro labels on lysine residues and peptide N-termini (+304.207 Da), as well as carbamidomethylation of cysteine residues (+57.021 Da) were set as static modifications, while oxidation of methionine residues (+15.995 Da) was set as a variable modification. Peptide-spectrum matches (PSMs) were adjusted to a 1% false discovery rate (FDR) [30, 31]. PSM filtering was performed using a linear discriminant analysis, as described previously [29] and then assembled further to a final protein-level FDR of 1% [31]. Proteins were quantified by summing reporter ion counts across all matching PSMs, also as described previously [32]. Reporter ion intensities were adjusted to correct for the isotopic impurities of the different TMTpro reagents according to manufacturer specifications. The signal-to-noise (S/N) measurements of peptides assigned to each protein were summed and these values were normalized so that the sum of the signal for all proteins in each channel was equivalent to account for equal protein loading. Finally, each protein abundance measurement was scaled, such that the summed signal-to-noise for that protein across all channels equals 100, thereby generating a relative abundance (RA) measurement. Data analysis and visualization were performed in Microsoft Excel or R.
3. RESULTS and DISCUSSION
The workflow was designed to enhance quantitative accuracy while maintaining analytical depth.
We aimed to compare the relative protein abundance of the basal proteome of eight cell lines using a TMTpro16 isobaric tag-based strategy. The eight cell lines were selected for their diversity and their broad use. The samples were processed for mass spectrometry analysis using the Streamlined (SL)-TMT protocol [33] with TMTpro16 reagents (Figure 1). In brief, cells were harvested and syringe-lysed in 8M urea. Cysteines were reduced and alkylated, and then proteins were chloroform-methanol precipitated and digested with LysC followed by trypsin. Peptides for each sample were labeled with TMTpro16 [34], pooled, fractionated by basic pH reversed-phase (BPRP) chromatography, and 24 super-fractions were concatenated from 96 fractions.
Figure 1: Workflow overview.
Eight cell lines were harvested in duplicate, processed using the SL-TMT protocol, and arranged as a TMTpro16plex experiment. The pooled sample was separated into 96 fractions, which were concatenated into 24 super-fractions. Each super-fraction was analyzed using a FAIMS-equipped Orbitrap Eclipse mass spectrometer with real time search (RTS)-MS3.
Samples were analyzed using an MS3-based method on an Orbitrap Eclipse mass spectrometer [34]. However, we also incorporated more recently developed techniques into our data acquisition strategy, namely FAIMS [35] and Real-Time database Searching (RTS) [23] in efforts to reduce ion interference and increase quantitative accuracy. FAIMS allows for gas phase fractionation by separating ionized (gas phase) molecules based on their mobility in a high or low electric field generated from an asymmetric waveform as they transit between an inner and outer electrode [36]. We selected three optimal CVs (−40V, −60V and −80V) for our FAIMS-based analyses [21]. Two major advantages of RTS include: 1) reduced interference and 2) the ability to limit the number time-consuming MS3 quantification scans per protein. First, RTS reduced interference because SPS ions were pre-selected. Second, RTS allowed for greater analytical depth as only peptides in the selected database will trigger time-consuming MS3 scans. Moreover, we can also constrain the number of peptides identified per protein per fraction using the “close-out” parameter. This parameter was set to “2,” which sets a practical limit of 48 peptides per protein over the 24 fractions, preventing the acquisition of potentially thousands of additional MS3 scans, which is particularly effective for larger proteins. Using the outlined sample preparation and data acquisition strategies enabled us to measure quantitatively accurate and analytically deep relative protein abundance. In total, we quantified 154,227 peptides, of which 84,078 were unique in sequence. Using these data, we quantified 8,815 proteins across all eight cell lines in 36 hr of data acquisition.
The diversity of the cell lines was reflected in differences in relative protein abundance.
We investigated further the abundance profiles of the 8,815 quantified proteins. First, we performed hierarchical clustering analysis using Euclidean distance and Ward linkage with the TMT relative abundance values (Figure 2A). We noted very tight clustering among biological replicates. The Jurkat cell line was very much divergent from the other cell lines. We postulated that the vastly different growth conditions (i.e., in suspension) would result in pronounced metabolic and phenotypic differences compared to adherent cell cultures. We next performed principal components analysis (PCA) to determine the degree of variation that can be attributed to a given variable. In accordance with the tight hierarchical clustering, PCA showed tight association between duplicates (Figure 2B). We determined that 30.4% of the variance in the experiment can be explained by the first principal component (PC1), which clearly separated Jurkat cells (a suspension cell line) from the adherent cell lines. Similarly, 18.9% of the variance in the experiment can be explained by the second principal component (PC2). PC2 potentially separated the cell lines by developmental germ layer origin, with HEK293T, SH-SY5Y, HAP1, and Jurkat cells being of mesodermal origin, SVGp21 of ectodermal origin, and HeLa, Panc1, and HepG2 of endodermal origin. These data reinforced further the diversity of these cell lines from each other.
Figure 2: Clustering and multivariate analysis of the eight cell lines.
A) Hierarchical clustering heat map and associated dendrogram of the eight cell lines using Euclidean distance and Ward linkage. B) Principal components analysis (PCA) of the eight cell lines, showing the percentage of the variance that was explained by each component.
We then used the TMT relative abundance values to determine the Pearson correlation coefficients among pairs of each of the sixteen samples analyzed. We illustrated the data using a scatter plot matrix (Figure S1). These data showed very high correlation among replicates (r = 0.93 to 0.99). However, the Pearson correlation values were very low between different cell lines. Specifically, the Pearson correlation coefficient was highest between HAP1 and HEK293T at r= 0.28, while all other values approached zero or were anti-correlated (up to r=−0.26). These data supported our findings that the baseline proteomic profiles of these eight cell lines were very different. For further quantitative assessment, we also calculated the log2 ratio of the TMT relative abundance values for each cell line relative to the average of the remaining cell lines. We illustrated the distribution of these fold changes as box-and-whisker plots (Figure 3A). Jurkat cells had the broadest distribution of fold changes, thus supporting the data presented in the PCA and hierarchical clustering analyses.
Figure 3: Quantitative analysis of the differences among cell lines.
A) Box-and-whisker plots illustrating the distribution of log2 ratios between a given cell line and the average of all other cell lines. The center lines show the medians; box limits indicate the 25th and 75th percentiles; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles and the white dot in the box indicates the mean value. B) The range of coefficient of variation of the TMT relative abundance values for all quantified proteins across the 16 samples.
Despite the considerable protein abundance differences among cell lines, we also sought to discover core proteins which showed little differences in abundance across cell lines. As such, we calculated the coefficient of variation (CV) for each protein across the eight cell lines to determine a core set of proteins with little change in relative abundance. We plotted the distribution of the CV (Figure 3B), and used a CV<0.2 as our threshold, which corresponded to less than 5% of our dataset. In total, we considered 396 proteins as a common core proteome. We subjected this short list of proteins to Gene Ontology (GO) and KEGG pathway enrichment analysis using the DAVID interface [37], while using the entire list of quantified proteins as background. We required 10 proteins per category, a fold enrichment of 1.5, and an FDR<0.05. We examined the categories of biological function, cellular component, as well as KEGG pathways (Table S1). A wide range of biological processes - including those involving transcription, translation, protein degradation and various signaling pathways - were enriched. Moreover, we determined enrichment in a broad range of cellular components, including nucleoplasm, membrane, ribosome, and proteasome complex. However, examining KEGG pathways revealed only the ribosome and proteasome pathways were considered substantially enriched. Together, these data show that the proteins that show little deviation across cell lines have a wide range of functions with 5-10% of these proteins being significantly enriched for roles in transcription, translation, or protein degradation.
Quantitative assessment revealed several hundred proteins that were highly enriched in a single cell line.
We aimed to determine the number of proteins which were more highly expressed in a specific cell line compared to the others investigated. We defined an enriched protein as one with 4-fold or greater abundance difference in one cell line compared to the average of all remaining cell lines. Using these criteria, we observed several hundred proteins that were highly enriched in a particular cell line. We ranked the cell lines by number of enriched proteins, as follows: Jurkat (n=405), HEPG2 (n=303), SH-SY5Y (n=209), SVG (n=140), Panc1 (n=139), HeLa (n=129) HAP1 (n=65), and HEK (n=38) (Figure 4A, inset). We acknowledge that although proteins were considered enriched in a given cell line, if that protein showed 4-fold or greater abundance difference compared to its average abundance in the remaining cell lines, it was possible that the same protein was “enriched” in more than one cell line. To explore this notion further, we compared the enriched-protein lists for each of the eight cell lines using an upset plot (Figure 4A). This plot showed that the vast majority of “enriched” proteins were specific to a single cell line. The most enriched proteins were shared by HepG2 and Jurkat cell lines, however, this value amounted to only 5 proteins. To extract biologically relevant information from these data, we subjected lists of enriched proteins to Gene Ontology (GO) enrichment and KEGG pathway analysis using the DAVID interface [37]. We used the entire list of quantified proteins as background and required 5 proteins per category, a fold enrichment of 1.5, and an FDR<0.05. With these criteria, we determined significant categories for every cell line due to insufficient numbers of unique proteins (Table 1). However, HepG2, Jurkat, SH-SY5Y, and SVGp12 cell lines showed significantly enriched categories, many of which were clearly evident due to the cell type. For example, HepG2, the hepatic cell line, was enriched in metabolic pathway proteins (KEGG pathway). Likewise, SH-SY5Y cells were enriched in proteins associated with neuron projections (GO cellular components) and Dopaminergic Synapse (KEGG pathways), reflective of the cell line’s neuron-like properties [38]. In addition, proteins enriched in Jurkat cells, which are frequently used for phosphorylation-based signaling studies and are of immune system origin [39], displayed protein kinase activity (GO molecular function) and were associated with natural killer cell-mediated cytotoxicity (KEGG pathways). These data once again revealed the wide-range of unique proteins and their associated functions that we observed among the eight cell lines and thus emphasized the need for careful selection of a cell line for a given investigation.
Figure 4: Overlap among cell lines for cell line-enriched or -depleted proteins.
A) Upset plot illustrating the overlap of cell line enriched proteins. Cell line-enriched proteins are defined as those that are at least 4-fold higher in each cell line compared to the average of the remaining cell lines. B) Upset plot illustrating the overlap of cell line-depleted proteins. Cell line-depleted proteins are defined as those that are at least 4-fold lower in given cell line compared to the average of the remaining cell lines.
Table 1:
GO and KEGG pathway analysis for cell line enriched proteins.
| Cell line | Category | Term | No. | % | Enrichment | FDR |
|---|---|---|---|---|---|---|
| HepG2 | BP | oxidation-reduction process | 37 | 12.59 | 3.78 | 2.24E-08 |
| Jurkat | BP | protein phosphorylation | 28 | 7.24 | 2.87 | 3.10E-03 |
| SH-SY5Y | BP | sympathetic nervous system development | 5 | 2.46 | 30.42 | 2.75E-02 |
| SVGp12 | BP | extracellular matrix organization | 19 | 14.07 | 13.02 | 8.04E-12 |
| HepG2 | CC | extracellular exosome | 132 | 44.90 | 2.97 | 2.08E-31 |
| Jurkat | CC | cytosol | 115 | 29.72 | 1.68 | 7.49E-06 |
| Panc1 | CC | extracellular exosome | 46 | 34.85 | 2.29 | 4.64E-05 |
| SH-SY5Y | CC | neuron projection | 16 | 7.88 | 6.48 | 3.40E-05 |
| SVGp12 | CC | extracellular exosome | 48 | 35.56 | 2.34 | 1.12E-05 |
| HepG2 | MF | oxidoreductase activity | 18 | 6.12 | 5.43 | 5.66E-05 |
| Jurkat | MF | protein kinase activity | 22 | 5.68 | 2.88 | 4.22E-02 |
| SH-SY5Y | MF | signal transducer activity | 12 | 5.91 | 5.49 | 1.65E-02 |
| SVGp12 | MF | actin binding | 12 | 8.89 | 5.61 | 1.23E-02 |
| HepG2 | KEGG | Metabolic pathways | 75 | 25.51 | 2.40 | 1.65E-11 |
| Jurkat | KEGG | Natural killer cell cytotoxicity | 20 | 5.17 | 5.75 | 1.34E-06 |
| SH-SY5Y | KEGG | Dopaminergic synapse | 9 | 4.43 | 8.49 | 9.21E-03 |
| SVGp12 | KEGG | Focal adhesion | 13 | 9.63 | 7.24 | 1.43E-04 |
No., number of proteins; %, % of enriched proteins; BP, biological process; CC, cellular component; MF, molecular function; KEGG, Kyoto Encyclopedia of Genes and Genomes; FDR, false discovery rate.
Along with enriched proteins, we also examined those that were at least 4-fold lower in abundance in one cell line compared to the others. We referred to these proteins as cell line-specific “depleted proteins.” We ranked the cell lines by number of depleted proteins, as follows: Jurkat (n=1554), HEPG2 (n=856), HEK (n=734), SH-SY5Y (n=681), Panc1 (n=680), HeLa (n=678), HAP1 (n=587), and SVG (n=559) (Figure 4B, inset). As may be expected, the number of depleted proteins was higher than “enriched proteins.” Specifically, if a protein was present in one cell line and at negligible levels in the other cell lines, this protein will be depleted in seven cell lines. This phenomenon is due to the compositional nature of TMT data [40]. The upset plot of the depleted proteins in each cell line illustrated well this concept (Figure 4B). In contrast to the cell line-enriched proteins, many depleted proteins were shared across cell lines that showed 4-fold or greater abundance differences. For instance, only 50% of proteins depleted in Jurkat cells were uniquely depleted in this cell lines, while the remaining proteins were shared with one or more other cell lines. Likewise, 90 depleted proteins were shared among the 7 adherent cell lines, representing proteins that are highly enriched in Jurkat cells (suspension cells). We again noted the distinctive proteome differences among these cell lines.
Several example TMT abundance profiles were highlighted to illustrate proteins that were highly enriched in a given cell line.
To showcase our dataset further, we highlighted one protein that was highly abundant in a single cell line compared to the rest of the cell lines interrogated. In HAP1, transcription factor SOX-2 (P48431) was among the most enriched proteins (Figure 5A). Sox-2 controls the expression of several genes involved in development and thereby may highly influence the expression of other proteins, including leukemia inhibitory factor signaling [41]. In HEK 293 cells, carbonic anhydrase 2 (CA2, P00918) was highly expressed (Figure 5B). This plasma membrane protein converts cyanamide to urea and contributes to intracellular pH regulation, thereby having an important role in the developing kidney [42]. In HeLa cells, the folate receptor alpha (FOLR1, P15328) was identified as being highly abundant (Figure 5C). This protein binds folate, reduces folic acid derivatives, mediates delivery of folate analogs into the interior of cells, and is a potential biomarker for ovarian cancer, which is related to the origin of this cell line [43]. In HepG2 cells, bile salt sulfotransferase (SULT2A1, Q06520) was highly expressed (Figure 5D). This protein catalyzes the sulfonation of steroids and bile acids in the liver, which is an important function for this organ [44]. In Jurkat suspension cells, T-cell surface antigen CD2 (P06729) was identified as being highly abundant (Figure 5E). This plasma membrane protein interacts with CD48 and CD58 to mediate adhesion between T-cells and other cell types, and is well known to have a role in Jurkat cell signaling [45]. In Panc1, the normal mucosa of esophagus-specific gene 1 protein (NMES1, Q9C002) appeared to be almost exclusive to this cell line (Figure 5F). Although first identified in human esophageal squamous cell carcinoma tissues, this protein has been since identified in the nucleus of cells from the digestive system, but its specific role in the pancreas and pancreatic diseases is not fully understood [46]. In SH-SY5Y neuroblastoma cells, neural cell adhesion molecule 1 (NCAM1, A0A0D9SF30) showed a very high abundance level (Figure 5G). This protein is linked to cell-to-cell interactions during development and differentiation and has been more specifically determined to be involved in nervous system development [47]. In SVGp12, transgelin (TAGLN, Q01995) was nearly exclusive to this cell line (Figure 5H). TAGLN is involved in calcium interactions and contractile properties of the cell [48]. The near exclusivity of these proteins in their respective cell lines emphasized that particular cell lines are better equipped for investigating certain proteins specifically for studies focusing on protein-protein interactions and posttranslational modifications. These data also enable us to explore the distribution of cell line-specific protein isoforms across the eight cell lines. Further datamining will extract more proteins with similar profiles, as well as proteins enriched in several select cell lines.
Figure 5: Example proteins that are among the highest enriched in a given cell line.
These proteins include: A) transcription factor SOX-2, B) carbonic anhydrase 2 (CA2), C) Folate receptor alpha (FOLR1), D) bile salt sulfotransferase (SULT2A1), E) T-cell surface antigen CD2, F) normal mucosa of esophagus-specific gene 1 protein (NMES1), G) neural cell adhesion molecule 1 (NCAM1), and H) transgelin (TAGLN). HP, HAP1; HK, HEK293T; HL, HeLa; HG, HepG2; JK, Jurkat; PC, Panc1; SH, SH-SY5Y; SV, SVGp12.
Concluding remarks.
We explored the relative protein abundance profiles of eight human cell lines using a TMTpro16-based strategy that leveraged the quantitative accuracy of FAIMS and RTS. We noted stark differences in the basal proteome of each cell line and classified the proteins which were substantially enriched in a given cell line. As expected, many of these differences reflected the tissue origin and related function of a given cell line. We provided an interactive R Shiny application “8 Cell Line Protein Abundance Profile Viewer,” which is available at: http://wren.hms.harvard.edu/8cells/ for browsing the relative abundance of the proteins in our dataset (Figure S2). We acknowledge that our panel consisted of only eight cell lines and that the choice of different cell lines may result in different subsets of cell line-enriched proteins. Nonetheless, the differences in protein abundance across cell lines may significantly influence protein interactions and signaling analyses, as such, this dataset can help guide the design, implementation, and optimization of future cell line-based studies.
4. ASSOCIATED DATA
RAW files are available upon request. In addition, the data have been deposited in the ProteomeXchange Consortium via the PRIDE [49] partner repository with the dataset identifier PXD020806. In the supplementary information, we have included tables listing protein names, gene symbols, and TMT quantitation values for the dataset (Supplemental Table 2). We have also included lists of peptides, associated protein names, gene symbols, TMT quantification values, retention time, and isolation purity for the datasets (Supplemental Table 3).
Supplementary Material
Supplemental Table 1: GO and KEGG pathway analysis for common core proteome proteins. Columns include: GO or KEGG category, term, number of proteins, % of enriched proteins, fold enrichment, and false discovery rate.BP, biological process; CC, cellular component; MF, molecular function; KEGG, Kyoto Encyclopedia of Genes and Genomes.
Supplemental Table 2: Proteins quantified in the dataset. Columns include: Uniprot protein identification number (proteinID), gene symbol (Gene Symbol), protein description/name (Description), number of peptides quantified per protein (peptides), and the normalized summed signal-to-noise for each of the 16 channels.
Supplemental Table 3: Peptides quantified in the dataset. Columns include: Uniprot protein identification number (proteinID), gene symbol (Gene Symbol), description/name (Description), redundancy, peptide sequence (peptide sequence), number of quantified peptides (num_quant), and the summed signal-to-noise for each of the 16 channels, isolation specificity (isolation purity), elution time, and charge state.
STATEMENT OF SIGNIFICANCE OF THE STUDY.
Selecting the appropriate cell line for a given experiment is often overlooked but may be a cause of discrepancies in studies focusing on specific protein interactions or signaling mechanisms. Here, we showcase this concept by exploring the relative protein abundance profiles of a panel of eight cell lines, namely: HAP1, HEK293T, HeLa, HepG2, Jurkat, Panc1, SH-SY5Y, and SVGp12. We explore the relative protein abundance profiles of these human cell lines using a mass spectrometry-based TMTpro16 sample multiplexing strategy that leverages the quantitative accuracy of high-Field Asymmetric Ion Mobility Spectrometry (FAIMS) and real-time database searching (RTS). We note differences in the basal proteome of each cell line and classified the proteins which were substantially enriched in each cell line. The variation among the cell line proteomes may significantly influence protein interactions and signaling analyses. Our data can guide cell line selection to address a given scientific question or enhance an experimental design and our workflow provides a platform for expanded cell line comparisons. To aid in exploring the dataset, we have developed an interactive R Shiny application “8 Cell Line Protein Abundance Profile Viewer,” which is available at: http://wren.hms.harvard.edu/8cells/.
ACKNOWLEDGEMENTS
We would like to thank the members of the Gygi Lab at Harvard Medical School. This work was funded in part by NIH/NIGMS grant R01 GM132129 (J.A.P.) and GM67945 (S.P.G.). We declare no conflicts of interest.
5. REFERENCES
- [1].Rondon-Lagos M, Verdun Di Cantogno L, Marchio C, Rangel N, Payan-Gomez C, Gugliotta P, Botta C, Bussolati G, Ramirez-Clavijo SR, Pasini B, Sapino A, Mol Cytogenet 2014, 7, 8; [DOI] [PMC free article] [PubMed] [Google Scholar]; Pu Y, Xue J, Wang W, Xu B, Gu Y, Tang R, Ackerstaff E, Koutcher JA, Achilefu S, Alfano RR, J Biomed Opt 2013, 18, 87002; [DOI] [PMC free article] [PubMed] [Google Scholar]; Rashidi H, Strohbuecker S, Jackson L, Kalra S, Blake AJ, France L, Tufarelli C, Sottile V, Cells Tissues Organs 2012, 195, 484; [DOI] [PubMed] [Google Scholar]; Allegrucci C, Young LE, Hum Reprod Update 2007, 13, 103. [DOI] [PubMed] [Google Scholar]
- [2].Carette JE, Guimaraes CP, Varadarajan M, Park AS, Wuethrich I, Godarova A, Kotecki M, Cochran BH, Spooner E, Ploegh HL, Brummelkamp TR, Science 2009, 326, 1231. [DOI] [PubMed] [Google Scholar]
- [3].Kotecki M, Reddy PS, Cochran BH, Exp Cell Res 1999, 252, 273. [DOI] [PubMed] [Google Scholar]
- [4].Essletzbichler P, Konopka T, Santoro F, Chen D, Gapp BV, Kralovics R, Brummelkamp TR, Nijman SM, Burckstummer T, Genome Res 2014, 24, 2059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Graham FL, Smiley J, Russell WC, Nairn R, J Gen Virol 1977, 36, 59. [DOI] [PubMed] [Google Scholar]
- [6].DuBridge RB, Tang P, Hsia HC, Leong PM, Miller JH, Calos MP, Mol Cell Biol 1987, 7, 379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Rahbari R, Sheahan T, Modes V, Collier P, Macfarlane C, Badge RM, Biotechniques 2009, 46, 277; [DOI] [PMC free article] [PubMed] [Google Scholar]; Scherer WF, Syverton JT, Gey GO, J Exp Med 1953, 97, 695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Chen TR, Cytogenet Cell Genet 1988, 48, 19. [DOI] [PubMed] [Google Scholar]
- [9].Morris KM, Aden DP, Knowles BB, Colten HR, J Clin Invest 1982, 70, 906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Mersch-Sundermann V, Knasmuller S, Wu XJ, Darroudi F, Kassie F, Toxicology 2004, 198, 329. [DOI] [PubMed] [Google Scholar]
- [11].Schneider U, Schwenk HU, Bornkamm G, Int J Cancer 1977, 19, 621. [DOI] [PubMed] [Google Scholar]
- [12].Abraham RT, Weiss A, Nat Rev Immunol 2004, 4, 301. [DOI] [PubMed] [Google Scholar]
- [13].Lieber M, Mazzetta J, Nelson-Rees W, Kaplan M, Todaro G, Int J Cancer 1975, 15, 741. [DOI] [PubMed] [Google Scholar]
- [14].Wu MC, Arimura GK, Yunis AA, Int J Cancer 1978, 22, 728. [DOI] [PubMed] [Google Scholar]
- [15].Biedler JL, Helson L, Spengler BA, Cancer Res 1973, 33, 2643; [PubMed] [Google Scholar]; Biedler JL, Roffler-Tarlov S, Schachner M, Freedman LS, Cancer Res 1978, 38, 3751. [PubMed] [Google Scholar]
- [16].Jeong HJ, Kim DW, Woo SJ, Kim HR, Kim SM, Jo HS, Park M, Kim DS, Kwon OS, Hwang IK, Han KH, Park J, Eum WS, Choi SY, Mol Cells 2012, 33, 471; [DOI] [PMC free article] [PubMed] [Google Scholar]; Bartolome F, de la Cueva M, Pascual C, Antequera D, Fernandez T, Gil C, Martinez A, Carro E, Alzheimers Res Ther 2018, 10, 24; [DOI] [PMC free article] [PubMed] [Google Scholar]; Shang Y, Liu M, Wang T, Wang L, He H, Zhong Y, Qian G, An J, Zhu T, Qiu X, Shang J, Chen Y, Environ Pollut 2019, 246, 763; [DOI] [PubMed] [Google Scholar]; Zainal Abidin S, Fam SZ, Chong CE, Abdullah S, Cheah PS, Nordin N, Ling KH, Gene 2019, 697, 201. [DOI] [PubMed] [Google Scholar]
- [17].Henriksen S, Tylden GD, Dumoulin A, Sharma BN, Hirsch HH, Rinaldo CH, J Virol 2014, 88, 7556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Conde M, Michen S, Wiedemuth R, Klink B, Schrock E, Schackert G, Temme A, BMC Cancer 2017, 17, 889; [DOI] [PMC free article] [PubMed] [Google Scholar]; Cao L, Lei H, Chang MZ, Liu ZQ, Bie XH, Biochem Biophys Res Commun 2015, 462, 389; [DOI] [PubMed] [Google Scholar]; Gong F, Wang G, Ye J, Li T, Bai H, Wang W, Oncol Rep 2013, 30, 2976. [DOI] [PubMed] [Google Scholar]
- [19].Ross PL, Huang YN, Marchese JN, Williamson B, Parker K, Hattan S, Khainovski N, Pillai S, Dey S, Daniels S, Purkayastha S, Juhasz P, Martin S, Bartlet-Jones M, He F, Jacobson A, Pappin DJ, Mol Cell Proteomics 2004, 3, 1154; [DOI] [PubMed] [Google Scholar]; Thompson A, Schafer J, Kuhn K, Kienle S, Schwarz J, Schmidt G, Neumann T, Johnstone R, Mohammed AK, Hamon C, Analytical chemistry 2003, 75, 1895. [DOI] [PubMed] [Google Scholar]
- [20].Paulo JA, O'Connell JD, Gygi SP, J Am Soc Mass Spectrom 2016, 27, 1620. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Schweppe DK, Rusin SF, Gygi SP, Paulo JA, J Proteome Res 2020, 19, 554; [DOI] [PMC free article] [PubMed] [Google Scholar]; Schweppe DK, Prasad S, Belford MW, Navarrete-Perea J, Bailey DJ, Huguet R, Jedrychowski MP, Rad R, McAlister G, Abbatiello SE, Woulters ER, Zabrouskov V, Dunyach JJ, Paulo JA, Gygi SP, Anal Chem 2019, 91, 4010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Schweppe DK, Eng JK, Yu Q, Bailey D, Rad R, Navarrete-Perea J, Huttlin EL, Erickson BK, Paulo JA, Gygi SP, J Proteome Res 2020, 19, 2026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Erickson BK, Mintseris J, Schweppe DK, Navarrete-Perea J, Erickson AR, Nusinow DP, Paulo JA, Gygi SP, J Proteome Res 2019, 18, 1299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Paulo JA, Urrutia R, Banks PA, Conwell DL, Steen H, J Proteomics 2011, 75, 708; [DOI] [PMC free article] [PubMed] [Google Scholar]; Paulo JA, Urrutia R, Banks PA, Conwell DL, Steen H, J Proteome Res 2011, 10, 4835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Wang Y, Yang F, Gritsenko MA, Wang Y, Clauss T, Liu T, Shen Y, Monroe ME, Lopez-Ferrer D, Reno T, Moore RJ, Klemke RL, Camp DG 2nd, Smith RD, Proteomics 2011, 11, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Paulo JA, O'Connell JD, Everley RA, O'Brien J, Gygi MA, Gygi SP, J Proteomics 2016, 148, 85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Chambers MC, Maclean B, Burke R, Amodei D, Ruderman DL, Neumann S, Gatto L, Fischer B, Pratt B, Egertson J, Hoff K, Kessner D, Tasman N, Shulman N, Frewen B, Baker TA, Brusniak MY, Paulse C, Creasy D, Flashner L, Kani K, Moulding C, Seymour SL, Nuwaysir LM, Lefebvre B, Kuhlmann F, Roark J, Rainer P, Detlev S, Hemenway T, Huhmer A, Langridge J, Connolly B, Chadick T, Holly K, Eckels J, Deutsch EW, Moritz RL, Katz JE, Agus DB, MacCoss M, Tabb DL, Mallick P, Nat Biotechnol 2012, 30, 918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Beausoleil SA, Villen J, Gerber SA, Rush J, Gygi SP, Nature biotechnology 2006, 24, 1285. [DOI] [PubMed] [Google Scholar]
- [29].Huttlin EL, Jedrychowski MP, Elias JE, Goswami T, Rad R, Beausoleil SA, Villen J, Haas W, Sowa ME, Gygi SP, Cell 2010, 143, 1174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Elias JE, Gygi SP, Methods Mol Biol 2010, 604, 55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Elias JE, Gygi SP, Nat Methods 2007, 4, 207. [DOI] [PubMed] [Google Scholar]
- [32].McAlister GC, Huttlin EL, Haas W, Ting L, Jedrychowski MP, Rogers JC, Kuhn K, Pike I, Grothe RA, Blethrow JD, Gygi SP, Analytical Chemistry 2012, 84, 7469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Navarrete-Perea J, Yu Q, Gygi SP, Paulo JA, J Proteome Res 2018, 17, 2226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Li J, Van Vranken JG, Pontano Vaites L, Schweppe DK, Huttlin EL, Etienne C, Nandhikonda P, Viner R, Robitaille AM, Thompson AH, Kuhn K, Pike I, Bomgarden RD, Rogers JC, Gygi SP, Paulo JA, Nat Methods 2020, 17, 399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Pfammatter S, Bonneil E, McManus FP, Prasad S, Bailey DJ, Belford M, Dunyach JJ, Thibault P, Mol Cell Proteomics 2018, 17, 2051; [DOI] [PMC free article] [PubMed] [Google Scholar]; Pfammatter S, Bonneil E, Thibault P, J Proteome Res 2016, 15, 4653. [DOI] [PubMed] [Google Scholar]
- [36].Hebert AS, Prasad S, Belford MW, Bailey DJ, McAlister GC, Abbatiello SE, Huguet R, Wouters ER, Dunyach JJ, Brademan DR, Westphall MS, Coon JJ, Anal Chem 2018, 90, 9529; [DOI] [PMC free article] [PubMed] [Google Scholar]; Prasad S, Belford MW, Dunyach JJ, Purves RW, J Am Soc Mass Spectrom 2014, 25, 2143; [DOI] [PubMed] [Google Scholar]; Purves RW, Prasad S, Belford M, Vandenberg A, Dunyach JJ, J Am Soc Mass Spectrom 2017, 28, 525. [DOI] [PubMed] [Google Scholar]
- [37].Huang da W, Sherman BT, Lempicki RA, Nat Protoc 2009, 4, 44. [DOI] [PubMed] [Google Scholar]
- [38].Xie HR, Hu LS, Li GY, Chin Med J (Engl) 2010, 123, 1086. [PubMed] [Google Scholar]
- [39].Carrera M, Canas B, Lopez-Ferrer D, Anal Chem 2017, 89, 8853. [DOI] [PubMed] [Google Scholar]
- [40].O'Brien JJ, O'Connell JD, Paulo JA, Thakurta S, Rose CM, Weekes MP, Huttlin EL, Gygi SP, J Proteome Res 2018, 17, 590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Niwa H, Ogawa K, Shimosato D, Adachi K, Nature 2009, 460, 118. [DOI] [PubMed] [Google Scholar]
- [42].Sterling D, Reithmeier RA, Casey JR, J Biol Chem 2001, 276, 47886. [DOI] [PubMed] [Google Scholar]
- [43].Leung F, Dimitromanolakis A, Kobayashi H, Diamandis EP, Kulasingam V, Clin Biochem 2013, 46, 1462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Otterness DM, Wieben ED, Wood TC, Watson WG, Madden BJ, McCormick DJ, Weinshilboum RM, Mol Pharmacol 1992, 41, 865. [PubMed] [Google Scholar]
- [45].Carmo AM, Castro MA, Arosa FA, J Immunol 1999, 163, 4238; [PubMed] [Google Scholar]; Marie-Cardine A, Maridonneau-Parini I, Ferrer M, Danielian S, Rothhut B, Fagard R, Dautry-Varsat A, Fischer S, J Immunol 1992, 148, 3879. [PubMed] [Google Scholar]
- [46].Zhou J, Wang H, Lu A, Hu G, Luo A, Ding F, Zhang J, Wang X, Wu M, Liu Z, Int J Cancer 2002, 101, 311. [DOI] [PubMed] [Google Scholar]
- [47].Guan G, Niu X, Qiao X, Wang X, Liu J, Zhong M, Med Sci Monit 2020, 26, e923491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Park GH, Lee SJ, Yim H, Han JH, Kim HJ, Sohn YB, Ko JM, Jeong SY, Oncol Rep 2014, 32, 1347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Perez-Riverol Y, Csordas A, Bai J, Bernal-Llinares M, Hewapathirana S, Kundu DJ, Inuganti A, Griss J, Mayer G, Eisenacher M, Perez E, Uszkoreit J, Pfeuffer J, Sachsenberg T, Yilmaz S, Tiwary S, Cox J, Audain E, Walzer M, Jarnuczak AF, Ternent T, Brazma A, Vizcaino JA, Nucleic Acids Res 2019, 47, D442. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental Table 1: GO and KEGG pathway analysis for common core proteome proteins. Columns include: GO or KEGG category, term, number of proteins, % of enriched proteins, fold enrichment, and false discovery rate.BP, biological process; CC, cellular component; MF, molecular function; KEGG, Kyoto Encyclopedia of Genes and Genomes.
Supplemental Table 2: Proteins quantified in the dataset. Columns include: Uniprot protein identification number (proteinID), gene symbol (Gene Symbol), protein description/name (Description), number of peptides quantified per protein (peptides), and the normalized summed signal-to-noise for each of the 16 channels.
Supplemental Table 3: Peptides quantified in the dataset. Columns include: Uniprot protein identification number (proteinID), gene symbol (Gene Symbol), description/name (Description), redundancy, peptide sequence (peptide sequence), number of quantified peptides (num_quant), and the summed signal-to-noise for each of the 16 channels, isolation specificity (isolation purity), elution time, and charge state.





