Abstract
Cell lines are extensively used tools, therefore a comprehensive proteomic overview of hepatocellular carcinoma (HCC) cell lines and an extensive spectral library for data independent acquisition (DIA) quantification are necessary. Here, we present the proteome of nine commonly used HCC cell lines covering 9,208 protein groups, and the HCC spectral library containing 253,921 precursors, 168,811 peptides and 10,098 protein groups. The proteomic overview reveals the heterogeneity between different cell lines, and the similarity in proliferation and metastasis characteristics and drug targets-expression with tumour tissues. The HCC spectral library generating consumed 108 hours’ runtime for data dependent acquisition (DDA) of 48 runs, 24 hours’ runtime for database searching by MaxQuant version 2.0.3.0, and 1 hour’ runtime for processing by SpectronautTM version 15.2. The HCC spectral library supports quantification of 7,637 protein groups of triples 2-hour DIA analysis of HepG2 and discovering biological alteration. This study provides valuable resources for HCC cell lines and efficient DIA quantification on LC-Orbitrap platform, further help to explore the molecular mechanism and candidate therapeutic targets.
Subject terms: Proteomics, Liver cancer
Measurement(s) | Proteome of hepatocellular carcinoma cell lines |
Technology Type(s) | Liquid chromatography-tandem mass spectrometry |
Sample Characteristic - Organism | Homo sapiens |
Background & Summary
Liver cancer ranks the sixth most common cause of cancer-related death world widely1. Hepatocellular carcinoma (HCC) represents approximately 90% of all primary liver cancer2. Studies on the proteomic landscape of HCC have advanced our knowledge at the molecular basis. Based on the label-free proteomic data of hepatocellular carcinoma patients of BCLC 0-A stage, we defined three subtypes, and found SOAT1 as a potential therapeutic target3. Gao et al. identified the tumour characteristics in HCC patients by isobaric tandem mass tags (TMT)-based proteomics, and identified two prognostic biomarkers, PYCR2 and ADH1A4. Cancer cell lines are the most extensively used model systems in tumour biology and development of therapeutics5, thus, a clear understanding at the proteome level may help us make better usage of HCC cell lines to analyse molecular mechanism and screen anti-tumour drugs. In 2014, Megger, D. et al. analysed the proteome of mixture of HepG2, Hep3B and SK-Hep-1 by label-free analysis and identified 2,757 protein groups and 13,744 peptides6. In 2020, the proteome of 375 cell lines of the Cancer Cell Line Encyclopedia were analysed using TMT-based proteomics, while did not cover the commonly used HCC cell lines including HepG2.2.15, PLC/PRF/5, MHCC97L, MHCC97H, HCCLM3 and HCCLM67. Recently, Goncalves, E. et al.8 identified 8,497 protein groups from 949 cell lines by data independent acquisition (DIA) method and identified 5,302 protein groups from Huh7 and 5,589 protein groups from Hep3B. However, whether HCC cell lines are identical or heterogeneous at proteome level was still not being revealed. Meanwhile, it remains unknown whether HCC cell lines are representative of primary HCC tumour at the proteome level. Thus, systemic exploration on the proteomic characteristics of HCC cell lines, and their comparison with primary HCC tumour is still necessary.
DIA mass spectrometry is an emerging method for quantifying protein groups consistently and accurately across multiple samples9. DIA quantification is based on the MS2 level through extraction of fragment ion chromatograms, which are less prone to be interfered than MS1 peak area10. DIA data can be analysed by the spectral library-based approach or the library-free approach. Both approaches could provide highly convergent identification and reliable quantification performance11,12. The spectral library is usually generated through data dependent acquisition (DDA) measurement of the peptides to be analysed by DIA13 and provides the precursor peptide-fragment connection14. Recently, it has been reported that the reproducibility, specificity, and accuracy of spectral library-based approach of DIA quantification is superior to DDA12,15. Thus, an HCC spectral library covering protein groups from HCC cell lines and primary tumour tissues could provide a valuable resource for DIA quantification, thus further support discovery of novel molecular mechanism and candidate therapeutic targets of HCC.
Here, we present the proteomic overview of nine commonly used HCC cell lines covering 9,208 protein groups, and an in-depth HCC spectral library containing 253,921 precursors, 168,811peptides (of which 150,327 peptides were proteotypic) and 10,098 protein groups. We revealed the poor consistency of proteome with transcriptome of these cell lines. Characteristic pathways of each cell line, and difference and similarity between HCC tissues were demonstrated. The HCC spectral library was used to analysis the differentially induced protein groups upon TGFB1 stimulation on HCCLM3 by SpectronautTM version 15.2 (Biognosys AG, Switzerland) and a free software suite, DIA-NN version 1.8. In summary, our results obtained proteome and outstretched pathway overview of commonly used nine HCC cell lines, provided a valuable guide for the usage of these cell lines. The HCC spectral library generated was available for in-depth DIA quantification of HCC cell lines and could help to explore the molecular mechanism and candidate therapeutic targets of HCC. Our research provides a pipeline composed of sample choice, peptide pre-fractionation, spectral library generation and DIA quantification, which is universal and can be used for DIA quantification study in other tumours.
Methods
Study design
Nine HCC cell lines were cultured, and then their total protein lysates were extracted and then trypsin digested to peptides, respectively. Peptides of each cell line was pre-fractionated by High-pH reversed-phase pre-fractionation (Hp-RP) to six fractions and analysed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) using DDA mode (Table 1). The gained raw files were database searched by MaxQuant version 2.0.3.0. Gene Ontology (GO) and single sample gene set enrichment analysis16 (ssGSEA) were then implemented. For generation of the HCC spectral library, peptides of the HCC cell lines, or tumour tissues were mixed and fractionated by Hp-RP, respectively, and then analysed by LC-MS/MS (Table 1). The gained files were database searched by MaxQuant version 2.0.3.0, and the search results were imported into SpectronautTM (version 15.2, Biognosys AG, Switzerland) (Fig. 1) to generate spectral library which could be used in both SpectronautTM version 15.2 and DIA-NN version 1.8 for DIA quantification.
Table 1.
Sample name | Fractions | Number of repetitions | Number of raw files |
---|---|---|---|
HepG2 | 6 | 3 | 18 |
HepG2.2.15 | 6 | 3 | 18 |
Hep3B | 6 | 3 | 18 |
Huh7 | 6 | 3 | 18 |
PLC/PRF/5 | 6 | 3 | 18 |
MHCC97L | 6 | 3 | 18 |
MHCC97H | 6 | 3 | 18 |
HCCLM3 | 6 | 3 | 18 |
HCCLM6 | 6 | 3 | 18 |
Cell line mixture | 8 | 3 | 24 |
Tissue mixture | 8 | 3 | 24 |
Sample names, numbers of fractions after High-pH reverse phase pre-fractionation and experimental repetition times.
Cell culture
Cell lines were cultured in Dulbecco’s modified Eagle’s medium (DMEM, Corning, USA) containing 10% FBS (Gibco, USA), 100 U/mL penicillin and streptomycin mixture (Gibco, USA) in an incubator at 37 °C with 5% CO2. All cell lines we used were proven to be free from bacteria, fungi and mycoplasma by PCR. To investigate the effect of TGFB1 on HCCLM3, HCCLM3 was stimulated with 10 ng/mL of TGFB1 (R&D Systems, UK) for 48 hours.
Protein extraction and digestion
Cell line samples were minced and lysed in Tissue Protein Extraction Reagent (T-PER, Thermo Scientific, USA) followed by 3 minutes of ultrasonic (1 second on and 2 second off, power 200 Watts) (SCIENTZT, JY92-II, China). The lysate was then centrifuged at 14,000 g for 15 minutes at 25 °C, and the supernatant was collected. The concentration of protein lysate was measured by the Bradford assay. The protein digestion was performed by filter-aided sample preparation (FASP)17. Each aliquot of 400 μg protein lysate was injected into a 30-kDa ultra-filter (Merck Millipore, Germany) followed by centrifugation at 14,000 g for 20 minutes at 25 °C. Then, 200 μL of UA solution (8 M Urea in 50 mM Tris-HCl, pH 8) with 10 mM DTT was injected into each ultra-filter. All the ultra-filters were kept for 2 hours at 37 °C for denaturing and reduction reaction. The solution in ultra-filters was removed by centrifugation at 14,000 g for 15 minutes at 25 °C, then 100 μL UA solution with 50 mM iodoacetamide (IAA, Sigma Aldrich, USA) was injected into each ultra-filter for alkylation. The ultra-filters were kept in dark for 30 minutes at 25 °C. After IAA incubation, the solution in ultra-filters was removed by centrifugation at 14,000 g for 10 minutes at 25 °C. Then, 200 μL of UA solution with 10 mM DTT was injected, and ultra-filters were kept at room temperature for another 15 minutes. The ultra-filters were centrifuged at 14,000 g for 15 minutes, and then washed with 200 μL UA solution once and 200 μL of ABC (25 mM ammonium bicarbonate, Sigma Aldrich, USA) three times by centrifugation at 14,000 g for 10 minutes at 25 °C. Then, 100 μL of ABC containing 8 μg trypsin (Promega, USA) was injected into each ultra-filter. All ultra-filters were incubated at 37 °C for 12 hours, and then peptide mixtures were collected into new collecting tubes by centrifugation at 14,000 g for 15 minutes at 25 °C. All ultra-filters were washed twice times with 100 μL of ABC by centrifugation at 14,000 g for 15 minutes at 25 °C. The flow-through solution was collected into the same collecting tube. The peptide concentration was measured using a Nanodrop 2000C (Thermo Scientific, USA) at 280-nm absorbance. The peptide mixtures were acidized with 10 μL of 4% trifluoroacetic acid (TFA, Sigma- Aldrich, USA), heat-dried and then stored at −80 °C.
High-pH reversed-phase pre-fractionation
The peptide mixture was fractionated by Hp-RP with stepwise gradients manually. The C18 tip packed with 5 mg C18 reverse-phase media (3 μm, Durashell, Agela Technologies, China) was washed with 90 μL methanol (Sigma Aldrich, USA) and then with 90 μL ammonia water (pH 10). Then, 50 μg peptide re-dissolved in 160 μL ammonia water (pH 10) was loaded. And the tip was centrifuged at 1,000 g for 8 minutes at 25 °C to remove the liquid followed by washed with 90 μL of ammonia water (pH 10). Peptides binding on the C18 reverse-phase packing was then sequentially eluted with different concentration of acetonitrile (6%, 9%, 12%, 15%, 18%, 21%, 25%, 30%, and 50%) in ammonia water (pH 10). These fractions were collected, and the 25%, 30%, and 50% fractions were mixed with 6%, 9%, 12%, respectively. The final six fractions were heat-dried stored at −80 °C.
LC-MS/MS analysis
For analysis of peptide mixture of each HCC cell line, the LC-MS/MS system consisted of a nanoflow high-performance liquid chromatograph (HPLC) instrument (EASY-nLC 1000 nanoflow LC, Thermo Scientific, USA) coupled to a Orbitrap Fusion Lumos Tribrid MS mass spectrometer (Thermo Scientific, USA) with a nano-electrospray ion source (Thermo Scientific, USA). For data acquisition, each fraction of peptide mixture was re-dissolved in mobile phase A (0.1% formic acid (FA, Sigma-Aldrich, USA), 99.9% pure water), and 1/10 of which was loaded onto the trapping column (100 μm × 20 mm, ReproSil-Pur C18-AQ, 3 μm; Dr Maisch, GmbH, Germany) using mobile phase A and then separated on the analytical column (150 μm × 150 mm, ReproSil-Pur C18-AQ, 1.9 μm; Dr Maisch, GmbH, Germany) at a flow rate of 320 nL/min with following gradients: 0–8 min, 5–8% mobile phase B (0.1% FA in 99.9% acetonitrile); 8–58 min, 8–23% mobile phase B; 58–70 min, 23–32% mobile phase B; 70–71 min, 32–95% mobile phase B; and 71–80 min, 95% mobile phase B. The Orbitrap Fusion Lumos was set to the OT-IT mode. For the MS1 scan, the AGC target was 5 × 105 and the scan ranged from 300 to 1,400 m/z at a resolution of 120,000 and a maximum injection time of 50 ms. For the MS2 scan, a duty cycle of 3 s was set with the top-speed mode. Only spectra with a charge state of 2–6 were selected for fragmentation by higher-energy collision dissociation with a normalized collision energy of 35%. The MS2 spectra were acquired in the ion trap in rapid mode with an AGC target of 5,000 and a maximum injection time of 35 ms.
For analysis of peptides of HCCLM3 and TGFB1-stimulated HCCLM3 using DIA, the LC-MS/MS system consisted of the EASY-nLC 1000 nanoflow LC (Thermo Scientific, USA) coupled to the Q-Exactive HF mass spectrometer (Thermo Scientific, USA). For data acquisition, 2 μg peptide mixtures re-dissolved in mobile phase A was loaded and separated on the analytical column at a flow rate of 500 nL/min with following gradients: 0~13 min, 6 ~ 10% mobile phase B (0.1% FA in 99.9% acetonitrile); 13 ~ 99 min, 10~23% mobile phase B; 99 ~ 120 min, 23 ~ 33% mobile phase B; 120~123 min, 33 ~ 90% mobile phase B; 123 ~135 min, 90% mobile phase B. For the MS1 scan, the AGC target value was set to 3E6, and the m/z scan ranged from 400 to 1,200 Da at a resolution of 120,000 and a maximum injection time of 80 ms. For the MS2 scan, the isolation window range was set to 26 m/z at resolution of 30,000, and the AGC target was set to 5 × 105. The maximum injection time for MS2 was set to auto. The normalized collision energy was set to 27, and the spectrum type was set to profile.
Database searching for DDA and DIA raw files
The DDA raw files were searched against the human UniProt database (updated at 2022-09-07 containing 20,398 protein groups and the iRT peptide sequence) with MaxQuant version 2.0.3.0. The digestion mode was set to specific, and trypsin/P was chosen. Oxidation of methionine and acetylation of N-term of peptides were set as variable modification, and Carbamidomethyl of cysteine was set as fixed modification. False discovery rate (FDR) was set to 0.01 on both PSM and protein groups level. The max peptide mass range was set to 4,600 Da, and the peptide length range was set from 7 to 25, and the missed cleavage was set to 2. The MS/MS match tolerance was set to 20 ppm, and MS/MS de novo tolerance was set to 10 ppm. The proteinGroups.txt file generated by MaxQuant version 2.0.3.0 was then imported into Perseus v1.5.2.6 to extract the iBAQ value of each protein group of each cell line.
The DIA raw files were analysed by SpectronautTM version 15.2 against the HCC spectral library. Trypsin/P was chosen for digestion. Maximum intensity was used for intensity extraction of MS1 and MS2. Both correction factor for MS1 and MS2 mass tolerance were set to 1. XIC RT extraction window was set to dynamic and correction factor was set to 1. The calibration mode was set to automatic, and the iRT-RT regression was set to local (non-linear regression), and used Biognosys iRT Kit was chosen. Decoy method was set to Mutated, and decoy limit strategy were set to dynamic. Kernel density estimator was chosen to estimated p value. The precursor and protein group q-value cut-off was both set to 0.01. Proteotypic sequences and the MS2-level peak area were used for protein quantification and the same human UniProt database used for MaxQuant version 2.0.3.0 was set as the reference database. The top 3 precursors were used for peptide quantification, and the top 3 peptides were used for protein quantification. For DIA-NN version 1.8, Trypsin/P was chosen for digestion, and the miss cleavage site number was set to 2, and modifications including oxidation of methionine, acetylation of N-term and Carbamidomethyl of cysteine were chosen. Peptide length range was set to 7 to 25, and precursor charge range was set to 1 to 4, and precursor m/z range was set to 400 to 1,200, and fragment ion m/z was set to 200 to 2,000. The generated HCC spectral library was used as spectral. Single-pass mode neural network was chosen, and high accuracy was selected for quantification. RT-dependent cross-run normalization was selected. Precursor FDR was set to 1%.
Generation of the HCC spectral library
Peptide mixture of cell lines and HCC tumour tissues was fractionated by Hp-RP with stepwise gradients manually. The C18 tip packed with 5 mg C18 reverse-phase media (3 μm, Durashell, Agela Technologies, China) was washed with 90 μL methanol (Sigma Aldrich, USA) and then with 90 μL ammonia water (pH 10). Then, 50 μg peptides dissolved in 160 μL ammonia water (pH 10) was loaded. And the tip was centrifuged at 1,000 g for 8 min at 25 °C to remove the liquid followed by washed with 90 μL of ammonia water (pH 10). Peptides were then sequentially eluted with 8 different concentrations of acetonitrile (9%, 12%, 15%, 18%, 21%, 25%, 30%, and 50%) in ammonia water (pH 10). These fractions were collected, heat-dried and stored at −80 °C.
The LC-MS/MS detection system consisted of the EASY-nLC 1000 nanoflow LC (Thermo Scientific, USA)coupled to the Q-Exactive HF mass spectrometer (Thermo Scientific, USA). For data acquisition, 1/8 of each of the Hp-RP fractions re-dissolved in mobile phase A was loaded and separated with the analytical column at a flow rate of 500 nL/min with following gradients: 0~13 min, 7 ~ 13% mobile phase B (0.1% FA in 99.9% acetonitrile); 13 ~ 99 min, 13~28% mobile phase B; 99 ~ 120 min, 28 ~ 42% mobile phase B; 120~123 min, 42 ~ 95% mobile phase B; 123 ~135 min, 95% mobile phase B. For the MS1 scan, the target value was set to 3E6 and the m/z scan ranged from 300 to 1,400 Da at a resolution of 120,000 and a maximum injection time of 80 ms. Only spectra with charge states of 2~ 6 were selected for fragmentation with a normalized collision energy of 27%. Precursor ions with top 20 intensities were selected for fragmentation. For the MS2 scan, the AGC target value was 5E4 and the resolution was 15,000 with a maximum injection time of 45 ms. The iRT peptide standards (Biognosys AG, Schlieren-Zürich, Switzerland) were spiked into all runs of spectral library generation.
The obtained DDA raw files were searched against the human UniProt database (updated at 2022-09-07 with 20,398 protein groups and the iRT peptide sequence) by MaxQuant version 2.0.3.0. The digestion mode was set to specific, and trypsin/P was chosen. Oxidation of methionine and acetylation of N-term of peptides were set as variable modification, and Carbamidomethyl of cysteine was set as fixed modification. FDR was set to 0.01 on both PSM and protein groups level. The max peptide mass range was set to 4,600 Da, and the peptide length range was set from 7 to 25, and the missed cleavage was set to 2. The MS/MS match tolerance was set to 20 ppm, and MS/MS de novo tolerance was set to 10 ppm.
The search results of MaxQuant version 2.0.3.0 were imported into SpectronautTM version 15.2 to generate the HCC spectral library. The missed cleavage site number for peptide was set to 2. The m/z range was set as 400 to 1,200 Da, and the best N fragments per peptide was set as 3 to 6. B and y fragments were chosen, and modifications including oxidation of methionine, acetylation of N-term of peptides and Carbamidomethyl of cysteine was kept during library generation. The empirical iRT database was set as the iRT reference, and the minimum square cutoff was set to 0.8. FDR was set to 0.01 on both precursor and protein level. For calibration and main search, the tolerance was set to dynamic.
Data processing
The R package NormalyzerDE (v1.8.0)18 was used for data normalization and quantile function was used. Differently expressed protein groups between cell lines or tissues were identified by R package limma (v3.46.0)19. GO analysis was implemented using R package clusterProfiler (v3.18.1)20 using enricher function. The ssGSEA analysis was implemented using R package GSVA (v3.18.2)21 using GSVA function and ssGSEA method. Gene sets recorded in the Molecular Signatures Database (MSigDB) v7.5.122 was used as reference gene sets for all the analysis. Protein groups whose fold-change value was greater than 1.5 and adjust p-value was less than 0.01 was chosen as differently expressed protein groups. All the analysis was operated on R software (v4.0.3).
Data Records
The 162 DDA raw mass spectrometry data (.raw) had been deposited to the ProteomeXchange Consortium via PRIDE23 with the dataset identifier PXD03664324. The 48 DDA raw mass spectrometry data (.raw) for library generation had been deposited to the ProteomeXchange Consortium via PRIDE with the dataset identifier PXD03502825.
The HCC spectral library at Figshare26 and QC reports generated by DIALib-QC, the search result of DDA raw files of HCC cell lines and spectral library generation by MaxQuant version 2.0.3.0, DIA raw mass spectrometry data (.raw) and all search results of HepG2, HCCLM3 and TGFB1 stimulated HCCLM3 by DIA-NN version 1.8 and SpectronautTM version 15.2 had been deposited to the ProteomeXchange Consortium via PRIDE with the dataset identifier PXD03715927.
Technical Validation
Proteomic profiling of HCC cell lines
Cumulatively, 9,208 protein groups were identified from 162 peptide fractions of nine HCC cell lines, and the average identified number was 7,699 (Fig. 2a), and 61.5% (5,664 of 9,208) were all detected in the nine HCC cell lines (Fig. 2b). These protein groups correspond to 160,042 peptides, and 97% (8,928 of 9,208) of protein groups had at least two unique peptides (Fig. 2c). Three repetitions of each cell lines have high quantitative repeatability (Pearson correlation coefficient > = 0.95 between three repetitions of each cell line, Fig. 2d). All HCC cell lines showed similarity with others (Pearson correlation coefficient > = 0.8), and MHCC97L, MHCC97H, HCCLM3 and HCCLM6 showed high consistency with each other (Pearson correlation coefficient > = 0.93), and HepG2 showed a high consistency with HepG2.2.15 (Pearson correlation coefficient > = 0.9). Compared with the data reported by Goncalves, E. et al. and Nusinow, D.P. et al., our proteomic data still uniquely identified 1,800, 896 and 911 protein groups from HepG2, Hep3B and Huh7 (Fig. 2e).
A poor correlation between the proteome and transcriptome was always revealed in cell lines7 and human tissues28.Comparison of the transcriptome with proteome of five HCC cell lines (Hep3B, HepG2, Huh7, MHCC97H and PLC/PRF/5 in GSE9709829, we also revealed a poor consistency between the proteome and transcriptome: the mean of Pearson correlation coefficient was 0.34; 37.2% of protein groups showed high consistency (Pearson correlation coefficient > 0.6) with their transcriptome, including SOAT1 and NPC1, two core molecules for cholesterol metabolism30, and AFP, an important biomarker of HCC31, and SRC, an important tyrosine protein kinase for cancer proliferation and metastasis32. However, we also found that 9.4% of protein groups showed negative correlation (Pearson correlation coefficient < −0.4) with their transcriptome, including DOCK6, a molecule which could promote chemo- and radio-resistance in cancer33, and YTHDF1, a key regulator of m6A methylation34 (Fig. 2f). This poor correlation maybe due to the differences of normalization strategies between proteome and transcriptome, and also may cause by multiple biological factors including mRNA degradation rate, ribosome binding rate, ribosome density, codon usage bias, protein turnover, PTM variants, peptide sharing among isoforms, low abundant protein and experimental noises35. The existing inconsistency between proteome and transcriptome highlighted the necessity of a proteomic overview of these cell lines.
Proteomic characteristics of HCC cell lines
High consistency was revealed among MHCC97L, MHCC97H, HCCLM3 and HCCLM6, and between HepG2 and HepG2.2.15 according their ssGSEA score of pathways (Fig. 3a). These are in good agreement with backgrounds of these cell lines: MHCC97L, MHCC97H, HCCLM3 and HCCLM6 were derived from the same progenitor cell line, MHCC9736, and HepG2.2.15 was derived from HepG2 and characterized by stable HBV DNA37. Principal component analysis based on ssGSEA score of pathways were showed on the two-dimensional plane composed of principal component 1 (52.4%) and principal component 2 (16.4%): MHCC97L, MHCC97H, HCCLM3 and HCCLM6 almost overlapped, and were far from other cell lines of principal component 1; Hep3B and PLC/PRF/5 displayed the maximum distance of principal component 2 (Fig. 3b). Pathways including actin cytoskeleton, VEGF signalling pathway, focal adhesion were highly variable on principal component 1, and cell cycle, RNA degradation and spliceosome on principal component 2 (Fig. 3c Furthermore, we found that each cell line has its uniquely enriched pathways. Cancer-related pathways such as Wnt signalling pathway38, cell cycle39 and TGF beta signalling pathway40 were heterogeneously enriched in different HCC cell lines (Fig. 3d). It is meaningful to consider these heterogeneities before building of cell models for targets validation.
Cancer cell lines retain tumour characteristics of HCC tissues
We found that the 1,508 protein groups only expressed in HCC cell lines were enriched in cell cycle, signalling by Rho GTPases, kinetochore, chromosome and DNA repair, while the 1,552 protein groups uniquely expressed in tissue were enriched in extracellular matrix, complement and blood, indicated that one main difference between cultured HCC cell lines with tissue is the deficiency of extracellular microenvironment (Fig. 4a,b). HCC cell lines and tumour tissues exhibited a high correlation of expression change relative to normal adjacent tissues (NAT) (Pearson’s correlation coefficient = 0.7, Fig. 4c). Detailed pathway enrichment analysis revealed that all HCC cell lines retained the proliferation and metastasis of HCC, meanwhile MCC97L, MHCC97H, HCCLM3 and HCCLM6 totally lose the liver metabolism related function, indicated that HCC cell lines could be considered as more oncological subtypes of HCC (Fig. 4d). Jiang et al.3 found 21 candidate drug targets, 15 of which were detected in these cell lines, but with different expression characteristics: drug targets involving proliferation including HDAC2, CDK1, CDK2, CSNK1D were highly expressed in all the nine HCC cell lines, while GPC3 only detected in HepG2.2.15, Huh7 and PLC/PRF/5. Drug targets involving metabolism including ALDHA8A1, PKM, SLC16A3, NPC1 and SOAT1 were highly expressed in all the nine HCC cell lines. Drug targets involving metastasis including SRC, PLOD2 and P4HA2 were also detected in all the HCC nine cell lines, while MMP14 only showed low expression in HepG2 and HepG2.2.15, and TGFB1 showed highest expression in HCCLM3 (Fig. 4e).
Properties of HCC spectral library
We generated an HCC spectral library covering protein groups from HCC cell lines and tumour tissue, to support the DIA quantification. We calculated the covered protein groups number of combination of different number (from 2 to 8) of nine HCC cell lines. Then, for combinations with a specific number, we select the combination with the max covered protein groups number. We found that the covered protein groups number of combination three HCC cell lines, HCCLM3, HepG2 and PLC/PRF/5, could cover 97% (8,964 of 9,208) of protein groups covered by all the nine HCC cell lines (Fig. 5a). Thus, peptides of HCCLM3, HepG2 and PLC/PRF/5 were mixed and used for spectral library generation. The peptide mixture of HCC tumour tissue3 was also used for spectral library generation. The generated HCC spectral library at Figshare26 covered 253,921 precursors, 168,811 modified peptides (156,519 peptides, of which 150,327 peptides were proteotypic) and 10,098 protein groups. Evaluation by DIALib-QC41 showed a high quality of the HCC spectral library. About 14.5% (1,462 of 10,098) protein groups were exclusively provided by DDA files of HCC cell lines, while 17.7% (1,775 of 10,098) provided by tumour tissue DDA files only (Fig. 5b). About 94% (238,930 of 253,921) of the precursors have 6 fragment ions, as we set the best N fragments per peptide was set as 3 to 6 (Fig. 5c). Precursor charge states range from + 1 to + 7, in which 97% (246,004 of 253,921) are of charge states between + 2 and + 4 (Fig. 5d). Protein groups with more than 2 unique peptides per protein group constitute about 95% (9,591 of 10,098) of the protein groups in the spectral library (Fig. 5e). Statistics of post translation modifications found that 36,741 (21.76%) peptides have carbamidomethyl modification, and 2,256 (1.34%) peptides have acetyl modification on the N-term of protein, and 12,618(7.46%) peptides have oxidation modification on methionine residue (Fig. 5f). Compared with the reported Pan human library42 and DPHL library43, we found that the HCC spectral library uniquely covered 515 protein groups and 42,834 peptides (Fig. 5g).
Applicability of the HCC spectral library for DIA analysis
We could identify 7,637 protein groups and 82,243 peptides in triples 2-hour DIA analysis of HepG2 peptides using the HCC spectral library and DIA-NN version 1.8 with optimized parameters for LC-MS/MS44, and 94.2% (7,194 of 7,637) of protein groups and 73.4% (60,405 of 82,243) of peptides were quantified three times. Meanwhile, 6,845 protein groups and 73,599 peptides could be identified from the same raw files by SpectronautTM version 15.2, and 95.7% (6,548 of 6,845) of protein groups and 69.6% (51,210 of 73,599) were quantified three times (Fig. 6a). High quantitative reproducibility (Pearson correlation coefficient >0.9, Fig. 6b) was revealed between repeated experiments analysed with DIA-NN version 1.8 or SpectronautTM version 15.2. We then analysed an experimental mode driven from HCCLM3 (TGFB1 stimulated HCCLM3 vs control). We observed down-regulation of CDH1, the main initiation signals of EMT45, and up-regulation of THBS146 and CDH647, two proteins whose up-regulation could represent the activation of EMT by both DIA-NN version 1.8 and SpectronautTM version 15.2 (Fig. 6c). In 121 protein groups identified as TGFB1-induced up-regulated proteins, 26 were annotated as members of EMT hallmark by Molecular Signatures Database v7.5.1, and they were further defined as the TGFB1-induced-EMT gene set (Fig. 6c). Based on this gene set, we calculated the TGFB1-EMT score of each patient in Jiang et al.’s cohort3 using ssGSEA algorithm. The TGFB1-EMT score of patients of S-III tumour was significantly (P < 0.0001) higher than S-I or S-II (Fig. 6d). The 101 patients could be stratified into TGFB1-EMT-high (n = 14) and TGFB1-EMT-low (n = 87) group according to their TGFB1- EMT score. The five-year overall survival rate of TGFB1-EMT-high group was significantly lower than the TGFB1-EMT-low group (overall survival rate: 64.3% (95%CI: 43.5%~95.0%) vs 84.0% (95%CI: 74.4%~94.8%), log-rank P value = 0.0048; the hazard ratio (HR of TGFB1-EMT-high group vs TGFB1-EMT-low group was 4.28 (95% CI: 1.43~12.8), P value = 0.0093) (Fig. 6e). These results indicated that TGFB1-induced EMT is closely related to poor prognosis of early HCC patients, and the novel defined TGFB1-induced-EMT gene set maybe useful for predict the prognosis of early HCC patients.
Acknowledgements
This work was supported by the National Key Program for Basic Research of China (grant numbers 2021YFA1301600 and 2020YFC2002700) and the Research Program of the State Key Laboratory of Proteomics (grant number SKLP-K201901).
Author contributions
X.Q., P.X., W.Y. and M.W. designed the study. M.W. performed the experiments, analysed the data, and wrote the main manuscript. S.W., C.L. and Y.J. checked the proteomic data and tables. The final manuscript was reviewed and approved by all authors without disagreement.
Code availability
No custom computer codes were generated in this work.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Ping Xu, Email: xuping_bprc@126.com.
Wantao Ying, Email: yingwantao@ncpsb.org.cn.
References
- 1.Villanueva A. Hepatocellular carcinoma. N Engl J Med. 2019;380:1450–1462. doi: 10.1056/NEJMra1713263. [DOI] [PubMed] [Google Scholar]
- 2.Sartorius K, Sartorius B, Aldous C, Govender PS, Madiba TE. Global and country underestimation of hepatocellular carcinoma (HCC) in 2012 and its implications. Cancer Epidemiol. 2015;39(3):284–290. doi: 10.1016/j.canep.2015.04.006. [DOI] [PubMed] [Google Scholar]
- 3.Jiang Y, et al. Proteomics identifies new therapeutic targets of early-stage hepatocellular carcinoma. Nature. 2019;567(7747):257–261. doi: 10.1038/s41586-019-0987-8. [DOI] [PubMed] [Google Scholar]
- 4.Gao Q, et al. Integrated proteogenomic characterization of HBV-related hepatocellular carcinoma. Cell. 2019;179:561–577. doi: 10.1016/j.cell.2019.08.052. [DOI] [PubMed] [Google Scholar]
- 5.Wilding JL, Bodmer WF. Cancer cell lines for drug discovery and development. Cancer Res. 2014;74(9):2377–2384. doi: 10.1158/0008-5472.CAN-13-2971. [DOI] [PubMed] [Google Scholar]
- 6.Megger DA, et al. Comparison of label-free and label-based strategies for proteome analysis of hepatoma cell lines. Biochim Biophys Acta. 2014;1844(5):967–976. doi: 10.1016/j.bbapap.2013.07.017. [DOI] [PubMed] [Google Scholar]
- 7.Nusinow DP, et al. Quantitative Proteomics of the Cancer Cell Line Encyclopedia. Cell. 2020;180(2):387–402. doi: 10.1016/j.cell.2019.12.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gonçalves E, et al. Pan-cancer proteomic map of 949 human cell lines. Cancer Cell. 2022;40(8):835–849. doi: 10.1016/j.ccell.2022.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ludwig C, et al. Data-independent acquisition-based SWATH-MS for quantitative proteomics: a tutorial. Mol Syst Biol. 2018;14(8):e8126. doi: 10.15252/msb.20178126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Gillet LC, et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol Cell Proteomics. 2012;11(6):O111.016717. doi: 10.1074/mcp.O111.016717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Navarro P, et al. A multicenter study benchmarks software tools for label-free proteome quantification. Nat Biotechnol. 2016;34(11):1130–1136. doi: 10.1038/nbt.3685. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Barkovits K, et al. Reproducibility, Specificity and Accuracy of Relative Quantification Using Spectral Library-based Data-independent Acquisition. Mol Cell Proteomics. 2020;19(1):181–197. doi: 10.1074/mcp.RA119.001714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Schubert OT, et al. Building high-quality assay libraries for targeted analysis of SWATH MS data. Nature Protocols. 2015;10:426–441. doi: 10.1038/nprot.2015.015. [DOI] [PubMed] [Google Scholar]
- 14.Shao W, Lam H. Tandem mass spectral libraries of peptides and their roles in proteomics research. Mass Spectrom Rev. 2017;36(5):634–648. doi: 10.1002/mas.21512. [DOI] [PubMed] [Google Scholar]
- 15.Fernández-Costa C, et al. Impact of the Identification Strategy on the Reproducibility of the DDA and DIA Results. J Proteome Res. 2020;19(8):3153–3161. doi: 10.1021/acs.jproteome.0c00153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Barbie DA, et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature. 2009;462:108–112. doi: 10.1038/nature08460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wiśniewski JR, Zougman A, Nagaraj N, Mann M. Universal sample preparation method for proteome analysis. Nat Methods. 2009;6(5):359–62. doi: 10.1038/nmeth.1322. [DOI] [PubMed] [Google Scholar]
- 18.Willforss J, Chawade A, Levander F. NormalyzerDE: Online Tool for Improved Normalization of Omics Expression Data and High-Sensitivity Differential Expression Analysis. Journal of Proteome Research. 2019;18(2):732–740. doi: 10.1021/acs.jproteome.8b00523. [DOI] [PubMed] [Google Scholar]
- 19.Ritchie, M.E. et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47. [DOI] [PMC free article] [PubMed]
- 20.Wu, T. et al. ClusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2(3), 100141. [DOI] [PMC free article] [PubMed]
- 21.Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics. 2013;14:7. doi: 10.1186/1471-2105-14-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Subramanian A, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005;102(43):15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Perez-Riverol Y, et al. The PRIDE database resources in 2022: A Hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res. 2022;50(D1):D543–D552. doi: 10.1093/nar/gkab1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wang MC, 2022. Proteome of Human hepatocellular carcinoma cell lines. PRIDE Archive. PXD036643
- 25.Wang MC, 2022. Generation of the HCC spectral library covering more than 10,000 protein groups. PRIDE Archive. PXD035028
- 26.Wang M, 2022. Proteomic overview of hepatocellular carcinoma cell lines and generation of the spectral library. Figshare. [DOI] [PMC free article] [PubMed]
- 27.Wang MC, 2022. Application of the HCC spectral library in DIA quantitation. PRIDE Archive. PXD037159
- 28.Jiang LH, et al. A Quantitative Proteome Map of the Human Body. Cell. 2020;183(1):269–283. doi: 10.1016/j.cell.2020.08.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Qiu Z, et al. A Pharmacogenomic Landscape in Human Liver Cancers. Cancer Cell. 2019;36(2):179–193. doi: 10.1016/j.ccell.2019.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Xu H, Zhou S, Tang Q, Xia H, Bi F. Cholesterol metabolism: new functions and therapeutic approaches in cancer. Biochim Biophys Acta Rev Cancer. 2020;1874(1):188394. doi: 10.1016/j.bbcan.2020.188394. [DOI] [PubMed] [Google Scholar]
- 31.Johnson PJ. Role of alpha-fetoprotein in the diagnosis and management of hepatocellular carcinoma. J Gastroenterol Hepatol. 1999;14:S32–36. doi: 10.1046/j.1440-1746.1999.01873.x. [DOI] [PubMed] [Google Scholar]
- 32.Kim LC, Song L, Haura EB. Src kinases as therapeutic targets for cancer. Nat Rev Clin Oncol. 2019;6(10):587–95. doi: 10.1038/nrclinonc.2009.129. [DOI] [PubMed] [Google Scholar]
- 33.Chi HC, et al. DOCK6 promotes chemo- and radioresistance of gastric cancer by modulating WNT/β-catenin signaling and cancer stem cell traits. Oncogene. 2020;39(37):5933–5949. doi: 10.1038/s41388-020-01390-0. [DOI] [PubMed] [Google Scholar]
- 34.Chen XY, Zhang J, Zhu JS. The role of m6A RNA methylation in human cancer. Mol Cancer. 2019;18(1):103. doi: 10.1186/s12943-019-1033-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kumar D, et al. Integrating transcriptome and proteome profiling: Strategies and applications. Proteomics. 2016;16(19):2533–2544. doi: 10.1002/pmic.201600140. [DOI] [PubMed] [Google Scholar]
- 36.Li Y, et al. Establishment of cell clones with different metastatic potential from the metastatic hepatocellular carcinoma cell line MHCC97. World J Gastroenterol. 2001;7(5):630–636. doi: 10.3748/wjg.v7.i5.630. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Sells MA, Chen ML, Acs G. Production of hepatitis B virus particles in HepG2 cells transfected with cloned hepatitis B virus DNA. Proc Natl Acad Sci USA. 1987;84:1005–1009. doi: 10.1073/pnas.84.4.1005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Duchartre Y, Kim YM, Kahn M. The Wnt signaling pathway in cancer. Crit Rev Oncol Hematol. 2016;99:141–149. doi: 10.1016/j.critrevonc.2015.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Evan GI, Vousden KH. Proliferation, cell cycle and apoptosis in cancer. Nature. 2001;411(6835):342–348. doi: 10.1038/35077213. [DOI] [PubMed] [Google Scholar]
- 40.Zhang KG, Zhang MP, Luo ZJ, Wen ZL, Yan XH. The dichotomous role of TGF-β in controlling liver cancer cell survival and proliferation. J Genet Genomics. 2020;47(9):497–512. doi: 10.1016/j.jgg.2020.09.005. [DOI] [PubMed] [Google Scholar]
- 41.Midha, M.K. et al. DIALib-QC an assessment tool for spectral libraries in data-independent acquisition proteomics. Nature Communications. 11(1), 5251. [DOI] [PMC free article] [PubMed]
- 42.Rosenberger G, et al. A repository of assays to quantify 10,000 human proteins by SWATH-MS. Scientific Data. 2014;1:140031. doi: 10.1038/sdata.2014.31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Zhu TS, et al. DPHL: A DIA Pan-human Protein Mass Spectrometry Library for Robust Biomarker Discovery. Genomics, Proteomics & Bioinformatics. 2020;18(2):104–119. doi: 10.1016/j.gpb.2019.11.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Weng S, Wang MC, Zhao YY, Ying WT, Qian XH. Optimised data-independent acquisition strategy recaptures the classification of early-stage hepatocellular carcinoma based on data-dependent acquisition. Journal of Proteomics. 2021;238(15-16):104152. doi: 10.1016/j.jprot.2021.104152. [DOI] [PubMed] [Google Scholar]
- 45.Serrano-Gomez SJ, Maziveyi M, Alahari SK. Regulation of epithelial-mesenchymal transition through epigenetic and post-translational modifications. Mol Cancer. 2016;15:18. doi: 10.1186/s12943-016-0502-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Liu X, et al. THBS1 facilitates colorectal liver metastasis through enhancing epithelial-mesenchymal transition. Clin Transl Oncol. 2022;22(10):1730–1740. doi: 10.1007/s12094-020-02308-8. [DOI] [PubMed] [Google Scholar]
- 47.Gugnoni M, et al. Cadherin-6 promotes EMT and cancer metastasis by restraining autophagy. Oncogene. 2017;36(5):667–677. doi: 10.1038/onc.2016.237. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Wang MC, 2022. Proteome of Human hepatocellular carcinoma cell lines. PRIDE Archive. PXD036643
- Wang MC, 2022. Generation of the HCC spectral library covering more than 10,000 protein groups. PRIDE Archive. PXD035028
- Wang M, 2022. Proteomic overview of hepatocellular carcinoma cell lines and generation of the spectral library. Figshare. [DOI] [PMC free article] [PubMed]
- Wang MC, 2022. Application of the HCC spectral library in DIA quantitation. PRIDE Archive. PXD037159
Data Availability Statement
No custom computer codes were generated in this work.