Abstract

Host cell proteins (HCPs) coexpressed during the production of biotherapeutics can affect the safety, efficacy, and stability of the final product. As such, monitoring HCP populations and amounts throughout the production and purification process is an essential part of the overall quality control framework. Mass spectrometry (MS) is used as an orthogonal method to enzyme-linked immunosorbent assays (ELISA) for the simultaneous identification and quantification of HCPs, particularly for the analysis of downstream processes. In this study, we present an MS-based analytical protocol with improvements in both speed and identification performance that can be implemented for routine analysis to support upstream process development. The protocol adopts a streamlined sample preparation strategy, combined with a high-throughput MS analysis pipeline. The developed method identifies and quantifies over 1000 HCPs, including 20 proteins listed as high risk in the literature, in a clarified cell culture sample with repeatability and precision shown for digest replicates. In addition, we explore the effects of varying standard spike-ins and changes to the data processing pipeline on absolute quantification estimates of the HCPs, which highlight the importance of standardization for wider use in the industry. Data are available via ProteomeXchange with the identifier PXD053035.
Keywords: host cell proteins, Chinese hamster ovary, LC-MS, data-independent acquisition, absolute quantification, clarified cell culture fluid, bioprocessing, process analytical technologies, hi3 quantification
Introduction
Biopharmaceuticals, a rapidly expanding segment of drug products, are experiencing exponential growth, particularly in approved antibody therapeutics,1 which are increasing at double the rate compared with conventional drug products.2 Currently, antibody therapeutics hold a market share of 27% of the total pharmaceutical market, with a predicted revenue of $300 billion per annum.2 These complex molecules are produced using biological systems, which are instrumental in their assembly and folding and the addition of post-translational modifications (PTMs).3 Such processes, however, can result in varying degrees of inherent heterogeneity.4
Chinese hamster ovary (CHO) cells are the predominant expression system for biotherapeutics, due to (1) their proven safety, (2) relatively high specific productivity, (3) the ability to produce glycoforms beneficial for human therapeutic use, and (4) their aptness to grow in serum-free medium as suspension cultures, thus facilitating scalability in production.5 The recombinantly expressed product is subsequently harvested for further processing. However, the medium also contains host cell proteins (HCPs) from the CHO cells, including both secreted and nonsecreted proteins, particularly toward the end of a production run due to a decrease in viability.6 These HCPs, which can influence the safety, efficacy, and stability of the product, are removed to acceptable levels during downstream processing.3,7 With advancements in product yields, the levels of HCPs have increased, thereby heightening the burden on downstream processes to ensure adequate clearance.8,9 There is an emerging interest for process analytical technology (PAT) methods to monitor HCPs during production.10 Identifying and quantifying HCPs are essential for quality by design (QbD)-based process development, particularly in the early stages, when the effects of processing parameters on HCP profiles are crucial. This has been underscored by studies demonstrating the impact of scale-up,11 process changes,12 and cell line aging13 on HCP expression and quantities.
The current state-of-the-art technology to determine the HCP content is enzyme-linked immunosorbent assays (ELISA). However, these assays can be costly to develop, especially for cell line-specific variants, and are known to exhibit bias toward only HCPs that exhibit immunogenic response. The broad selectivity CHO HCP content ELISA also lacks the capability to selectively identify and characterize individual HCP species for risk assessment.14 In contrast, mass spectrometry (MS) emerges as a promising orthogonal analytical approach free from many of the inherent limitations of ELISA-based methods. Nonetheless, the adoption of MS for HCP identification and quantification remains slow and not yet widespread due to the relative greater initial facility costs, technical complexity of the instrumentation, and required specialist experience. Currently, no specific limits in regulatory guidance for HCP in the final purified product exist, which should lead to a dynamic discussion in the community about commonly acceptable levels and their comparability between analytical complementary technologies, such as MS and ELISA. While applying MS-based techniques addresses a number of key limitations of ELISA methods, it also introduces new sets of limitations that must be understood and accounted for during analysis. These include biases in protein/peptide ion generation and requirements for lengthy sample preparation protocols to facilitate enzymatic digestions often including enrichment or depletion steps to deal with the inherently high dynamic range between the drug product and HCPs,15 as well as dependency on coupling additional complex low-throughput separation techniques to improve detection, such as high-performance liquid chromatography (HPLC), which commonly employ nanoflow gradients of 80 min (or longer).16,17
Recently, advances have been made in implementing MS-based technologies as PAT tools. Under well-optimized conditions and characterized target analytes, product-related critical quality attributes (CQAs) can be directly measured from clarified cell culture fluid.18 Further consistency and throughput enhancements offered by automation have also been demonstrated, potentially reducing variability, analysis times, and costly analyst hands-on time.19,20 The recent increasing reports of implementation of online direct coupling of MS for well-developed monitoring-based method technologies underscore the potential of MS to become a viable PAT tool for routine monitoring, process development, and control when combined with fast acquisitions and processing pipelines.21 Subsequent developments of HCP analysis as extensions of these MS-based PAT techniques have been demonstrated in principle, yet specific examples targeting the combination of the robust automation-friendly sample preparation approach combined with the well-developed quantitative spiking strategy, robust acquisition, and data processing pipelines remain to be fully realized.
This study presents an effort to establish a robust streamlined workflow for HCP detection directly from a clarified cell culture fluid (CCCF). The report will cover optimizations made to sample preparation, the standard spiking strategy, postsample preparation cleanup followed by a data acquisition strategy using a microflow HPLC, and high-resolution time-of-flight mass spectrometry (ZenoTOF 7600). By employing heat-resistant trypsin for sample preparation, we sought to reduce sample digestion times while ensuring a sufficient amount of digestion of the sample. Additionally, through the introduction of robust protein and peptide level spiking strategies, this study investigates the relative applicability of currently widespread hi3 approaches first developed by Silva et al.22 for absolute quantification estimations of the identified HCPs, as well as the effects of changes in the data processing pipeline, all in an effort to potentially enhance the viability of MS-based methods for PAT applications.
Experimental Methods
An overview of the final workflow and data analysis pipeline after method development and optimization can be found in Scheme 1, and a more detailed description of the workflows and procedures used for optimization and validation can be found subsequently.
Scheme 1. Overview of the Sample Processing Pipeline Including Sample Preparation and Acquisition and Data Processing of the Workflow Evaluated for Method Reproducibility, Performance, and Influences on Relative Absolute Quantification Using the SMART Digest Protocol. Created with BioRender.com.

Culture Condition Method Development Sample
For method development, we used a representative CCCF harvest sample of an IgG monoclonal antibody process from CSL Ltd. In brief, the antibody-expressing CHO cells were grown as a suspension culture using in-house proprietary media and feed for 12 days. The harvest sample was centrifuged to remove solids and cells before the product titer was determined through an in-house protein A HPLC assay at CSL.
Sample Preparation Workflow
Total protein concentrations were estimated using a Pierce (Bradford) assay kit (Cat.23200). An equivalent of 100 μg of mAb product was used for the sample preparation protocol.
Acetone Precipitation
To reduce matrix effects due to contaminants in the CCCF and to process low-product concentration samples, an acetone precipitation step was carried out. For the investigation of precipitation effects on the protein standard-related quantification, chicken lysozyme (Sigma-Aldrich, ≥ 90%, Cas: 12650-88-3) was added at a defined level in ng/mg antibody product. 4 times the volume of ice-cold acetone (−20 °C) was added to the sample before the vial was inverted three times and stored in the −20 °C freezer overnight. The sample was then centrifuged using a precooled centrifuge at −9 °C and 20,000 g for 10 min. The supernatant was discarded, and the protein pellet was washed once with 200 μL of ice-cold acetone. After removal of the acetone, the pellet was air-dried on ice for 30 min.
SMART Digest
The pellet or sample was mixed with ultrapure water to a volume of 50 μL. In all samples that have not added the protein standard before precipitation, chicken lysozyme as the standard was added at this time point at a defined level at ng/mg of antibody product. Digestion was carried out as specified in the ThermoFisher SMART digest trypsin kit, soluble (Cat. 60113-101), which included a heat-resistant trypsin and a digest buffer. In short, 150 μL of SMART digest buffer and 5 μL of soluble trypsin were added to the 50 μL sample. The samples were digested for 40 min at 70 °C on an Eppendorf thermomixer without agitation. Subsequently, 0.1 M dithiothreitol (Roche) was added for a final concentration of about 5 mM and incubated for 30 min at 70 °C while shaking at 300 rpm. After the sample was cooled to room temperature, 0.1 M 2-chloroacetamide (CAA) (Sigma-Aldrich) was added to reach a final concentration of about 10 mM. The sample was incubated at room temperature for 10 min while being shaken at 1000 rpm. Finally, the peptides were acidified using formic acid (FA) to approximately pH 2 to quench the digest.
SPEED Digest
Samples were digested using a sample preparation by easy extraction and digestion (SPEED) protocol developed by Doellinger et al.23 In brief, samples were denatured using trifluoroacetic acid (Sigma-Aldrich, ≥ 99%) by either dissolving the precipitation pellet in 10 μL or adding double the sample volume of TFA. After 2 min, the samples were neutralized with 10 times the volume of 2 M TrisBase. Reduction and alkylation were carried out simultaneously by adding a buffer containing 0.1 M tris(2-carboxyethyl)phosphine (TCEP) and 0.4 M CAA to a final concentration of 10and 40 mM, respectively. The sample was incubated for 5 min at 95 °C in the dark. Afterward, Trypsin Gold (Promega) was added in 1:20 protein ratio and incubated at 37 °C overnight in an Eppendorf thermomixer while being shaken at 500 rpm. Finally, the peptides were acidified in the same manner as that for the SMART digest.
Solid-Phase Extraction Cleanup
The C18 cleanup was carried out according to the manufacturer’s protocol using SepPak C18 cartridges with 50 mg of resin from Waters (Part No. WAT054955). In brief, methanol was used for initial cleaning, with 80% acetonitrile (ACN) with 0.1% FA used for conditioning and 0.1% FA in water used for equilibration before sample loading and washing. The sample flow-through was passed through an additional time to maximize binding. Finally, 2 times 0.5 mL of 50% ACN with 0.1% FA was used to elute the sample. The eluted samples were dried down using a vacuum centrifuge and resuspended in 20 μL of 0.1% FA. A peptide concentration estimation was carried out using a tryptophan fluorescence assay on a Cary Eclipse fluorescence spectrometer with excitation at 295 nm and measurements at 350 nm.24 Loading amounts for LC-MS analysis were equalized by diluting all samples to a peptide concentration of 100 ng/μL with 0.1% FA in water and simultaneously spiking a Waters Hi3 PhosB (SKU: 186006011) peptide standard at a final concentration of 25 fmol/μL.
LC Tandem MS (LC-MS/MS) Analysis
All samples were analyzed in triplicate using a Waters Acquity M UPLC Class microflow LC coupled to a Sciex ZenoTOF 7600 system (Singapore) with an optiflow turbo V source. The LC was set to a flow rate of 5 μL/min, and approximately 200 ng was injected per sample. Peptides were separated on a reversed-phase C18 Waters nanoEaseM/Z HSS T3 column with an inner diameter of 300 μm, a length of 150 mm, a particle size of 1.8 μm, and a pore size of 100 Ȧ, using a 20 min linear gradient from 3% ACN/0.1% FA to 35% ACN/0.1% FA. Afterward, a high ACN wash and re-equilibration at 3% ACN 0.1% FA was performed. A more detailed description can be found in Table S1.
The ZenoTOF was operated in positive polarity mode, with a spray voltage of 5000 V, a source temperature of 150 °C, and a column temperature of 40 °C. A ZenoSWATH DIA scheme was used with an MS1 TOF mass range from 400 to 1500 Da and an accumulation time of 0.1 s. For MSMS scans, 65 variable windows were employed covering precursors from 399.5 to 750.5 Da (Table S2). The MS2 mass range was between 140 and 1750 Da with an accumulation time of 0.013 s and Zeno pulsing enabled, resulting in a total scan time of 1.283 s. A more detailed description can be found in Table S3. The mass spectrometry proteomics data except mAb sequence-related proprietary information have been deposited to the ProteomeXchange Consortium via the PRIDE25 partner repository with the dataset identifier PXD053035.
Protein Identification and Quantification Using Spectronaut
All data files were processed using Spectronaut (Biognosys) v17.4 using the directDIA+ workflow. All DIA SWATH acquisition files were searched against an in silico tryptic digest of the Chinese hamster proteome (Cricetulus griseus, UP000001075) containing unreviewed (TrEMBL) entries (total: 23886 entries), with a curated version of the common contaminant protein database from Frankenfield et al.26 (222 entries), the product antibody sequence (one entry for light and heavy chains, correspondingly), the protein standard sequence (chicken lysozyme), and the peptide standard as a protein containing only the four proteotypic peptides. Default settings were used, except for targeted changes to the quantification described subsequently. In summary, the Pulsar search used Trypsin/P as a cleavage rule with two allowed miscleavages, carbamidomethylation as a fixed modification of cysteine, and N terminal protein acetylation and methionine oxidation as variable modifications. The maximum of variable modification was set to five. False discovery rates (FDRs) on PSM, peptide, and protein group levels were set to 0.01. Additional settings can be found in Table S4.
Standard Processing Quantification (std)
Default settings were used to determine a baseline (in the following referred to as Std) for the evaluation of label-free quantification (LFQ) parameter influences and resulting changes on hi3 quantification. Precursor filtering is based on Q-value identification with no imputation. No proteotypic filter was set at this stage in the workflow. Quantity calculation is based on MS2 level signals (fragment ions) and area under the peak of the respective XIC traces of targeted ions. A cross-run normalization based on a local normalization metric was evaluated to determine whether the processing of triplicates was beneficial. The Protein LFQ method was set to automatic, which is equal to using the MaxLFQ for processing less than 500 runs. The major group (protein) and minor group (peptide) were set to means of the maximum top three peptide and precursor quantities, in alignment with common hi3 approaches that state that the top3 ranked peptides/precursors can be used to estimate the protein/peptide quantity, respectively, and that absolute quantity estimates can be obtained by comparing these values to standards of known abundance.
Top 3 Sum Quantification (top3)
The top3 sum quantification approach makes subtle changes to the LFQ quantification settings. Namely, the precursor filtering is set stricter to ”identified in all runs”, and selection for quantification was based on the intensity of the signals instead of the certainty criterion Q-value. Further, the major group and minor quantity group settings were set to ”sum” of the maximum top three peptides and precursors, respectively.
Postprocessing in Python
Report exports (including peptide and protein level information) from Spectronaut are used to filter the resulting identifications for higher certainty criteria and calculate relative absolute quantification estimations based on the LFQ protein signals from the different standards used. For further processing, columns used for grouping, filtering, and calculations include R.Condition, R.FileName, PG.ProteinGroups, PG.MolecularWeight, PG.IsSingleHit, PG.Qvalue, PG.Quantity, PEP.StrippedSequence, and PEP.IsProteotypic.
Protein Group Filtering
Quantified protein groups are filtered to include only proteotypic peptides and simultaneously exclude proteins that have only been identified by a single peptide sequence. Identified contaminant protein groups were eliminated before entries for the protein standard were extracted.
Absolute Quantification
A hi3 absolute quantification approach was used to calculate absolute quantity estimates in relation to the spiked-in standard signals following the method proposed by Silva et al.22 Mean molecular weights were calculated for protein groups consisting of multiple associated protein entries. Absolute quantities for single CHO proteome HCP species in ng/mg antibody product in relation to a protein spike X in ng/mg product were calculated for each run according to eq 1
| 1 |
where HCPi gives the relative estimates of each identified HCP species in ng/mg product, LFQHCPi and LFQProtStd refer to the exported PG.Quantity values of the Spectronaut LFQ results, and M refers to the respective molecular weights of HCP species and protein standard used (g/mol).
Similarly, the relative absolute quantification in relation to the peptide spike Y in ppm (mol/mol product) was calculated according to eq 2 relative to the injected peptide amount, based on the estimation that the antibody product contributes to the majority of the total measured peptide signal due to the relatively high tryptophan content in its sequence and the expected high relative abundance in the sample.
| 2 |
Hereby, LFQPepStd refers to the Spectronaut PG.Quantity value of the peptide standard and MproductmAb refers to the molecular weight of the product antibody (g/mol). If not specified, absolute quantification estimates for each HCP species refer to the mean of injection replicates.
Results and Discussion
Method Development and Optimization of the Workflow
Our preliminary studies focused on assessing the impact of various factors on HCP identification and quantification. The factors considered most likely to affect HCP identification and quantification in CCCF samples were (1) the digest protocol utilized, (2) the incorporation of a precipitation step, (3) sufficient digestion time for the completeness of protein digestion, and (4) the time point the protein standard was added. A comprehensive overview of the investigated parameters can be found in Figure 1a.
Figure 1.
Development of an optimized sample preparation workflow to identify and quantify HCP in the CCCF. (a) Overview of parameters investigated, namely, digest protocol used, the effect of the precipitation (Prec) step, digest time, and addition of chicken lysozyme at 5000 ng/mg product before or after the Prec, * effect of digestion times has been investigated using the SMART digest including a Prec step, ° influences of addition of chicken lysozyme before or after the Prec step has been investigated using the SMART digest and a digest time of 40 min. (b) Comparison of protein identifications using different digest protocols with and without Prec, where no standards were spiked into the sample. Raw data from injection triplicates were processed by Spectronaut separately for each condition using default settings. Identifications were filtered to include only proteotypic identifications with two or more peptides per protein. (c) Effect of digest times on identified HCP quantity distributions as kernel density estimates based on protein standard spikes and the default Spectronaut setting. All conditions were processed together and normalized. (d) Effect of the protein standard spiking on HCP quantity distribution linearity as evaluated by Pearson correlation. HCP quantity estimations are based on protein standard spikes at 5000 ng/mg product and the default Spectronaut setting. Both conditions were processed together and normalized.
Trypsin-based digestion protocols can widely vary, and selecting a suitable starting protocol is critical given that it depends on the intended application and time sensitivity and that generally, there are no single universal approaches that are compatible with all samples. We initially compared a standard overnight digest protocol (SPEED) and a vendor-specific rapid digestion protocol using thermally stable trypsin (SMART digest) to assess whether the decreased processing time still results in a similar number of identifications, across precipitated and nonprecipitated CCCF materials. The precipitated SPEED digest resulted in 1185 proteins identified, which is an approximately 13% increase in comparison to both the precipitated and nonprecipitated SMART digest, indicating improved performance (Figure 1b). However, we observed that neat nonprecipitated samples digested using the SPEED protocol showed a 30% decrease in protein identifications. A possible explanation for these observations is that the denaturing effect of the TFA was insufficient due to the dilution of TFA with the sample and matrix interference. Increasing the TFA-to-sample ratio might solve this problem but would also lead to high liquid volumes for the subsequent trypsin digestion such that any benefit gained by better sample denaturation would be potentially offset by sample losses and effects on digestion kinetics.
Noteworthily, processing all data files together results in identifications that are very similar between all conditions (Figure S1), suggesting that Spectronaut operates algorithms similar to match-between-run features available in other proteomics software tools. We speculate that the differences in identifications first observed for separate sample processing are more likely from the quality of spectra acquired, leading to higher certainty in the identification by the statistical tools used, as opposed to actual differences in the peptides generated through the digest.
When the SMART digest protocol was employed, we found that it benefited from significantly reduced processing and digestion time. The observed consistent performance when handling both neat and precipitated samples was found to be generally high compared with the SPEED protocol with CCCF materials at hand. As a result, the SMART digest with precipitation was chosen for all subsequent assessments. Further investigations were performed to determine the influence of increasing the digestion time on the robustness of HCP quantification. No major differences were identified regarding the miscleavage rates of identified precursors (data not shown) and the overall HCP quantity distribution calculations after different digestion times (Figure 1c). We note that reducing digestion times may assist in increasing identifications15 but was not pursued in the interest of ensuring method robustness and consistency in quantification.
We employed a 2-stage spiking strategy to study its impact on quantitation. Protein level spiking (chicken lysozyme) was performed both before and after precipitation to investigate the effect of precipitation steps on absolute quantification estimates of HCPs based on the protein standard. Our results show good linearity in the correlation of the calculated HCP quantities (Figure 1d). This indicates no major effects on HCP quantification depending on the time of spiking of the protein standard in relation to the precipitation step.
Different spike-in levels for the protein (50–50000 ng/mg product) and peptide standards (10–100 fmol per injection) were tested. The 500 ng/mg protein level was optimal, offering stable intensities without a significant identification reduction (Table S5). Peptide spike studies led to the exclusion of two peptides (nonproteotypic) and showed sufficient linearity for the remaining peptides and the calculated standard protein quantity using the hi3 approach, with R2 values of higher than 0.93 and equal to 0.975, respectively (Figures S2 and S3). A level of 50 fmol per injection was therefore chosen for peptide standard spiking.
Method Repeatability and Performance
The method repeatability was assessed by performing three digest replicates of the full protocol using the same CCCF sample and analyzing triplicate injections from each digest. The results show almost identical density distributions of quantified HCPs (Figure 2a), and Pearson correlation analysis indicated high reproducibility (Pearson r ≥ 0.98 for all comparisons) (Figure 2b). Furthermore, a local normalization approach (Figure 2c) reduces variability (Pearson r ≥ 0.99 for all comparisons) from instrument-based precision limitations (inherent injection volume variability, temperature fluctuations, electronic noise). Our digest protocol, under 2 h, is amenable to automation on systems compatible with the SMART digest.27 When coupled with microflow HPLC systems, the method offers high robustness and throughput, confidently identifying over a thousand HCPs postfiltering within 30 min method duration, as demonstrated in Table 1. This shows an increase in the reported species after filtering11 or a comparable total of unfiltered protein identifications obtained in a smaller time frame without product depletion13 compared to the literature.
Figure 2.
Assessment of repeatability by comparing absolute quantity estimates of the 1017 proteins found and quantified in all digest replicates when processed separately. Quantification in relation to the protein standard spike and processing was performed using standard settings. (a) Kernel density estimation of HCP quantities from three digest replicates. Linear correlation assessment (Pearson) between digest replicates one and two (b) without normalization and (c) with normalization.
Table 1. Overview of Identification Method Performance Evaluating Three Digest Replicates in Triplicate Injections Pre- and Postapplication of Proteotypic and Single Peptide Hit Exclusion Filters.
| identification performance | identifications | |
|---|---|---|
| mean [] | standard deviation [] | |
| proteins pre filtering | 1601 | ±44 |
| HCP post filtering | 1154 | ±28 |
| contaminant proteins post filtering | 5 | ±0.5 |
| product-related protein entries | 2 | ±0 |
Identification and Quantification of High-Risk HCP Species
Our method can reliably identify and quantify 20 out of 23 high-risk CHO-related HCPs listed on BioPhorum, part of a multicompany collaborative initiative driven by key industry peers and contributors (Table 2).28 Early identification of HCPs facilitates risk assessment based on their physicochemical properties, such as molecular weight and pI, in relation to the product.9 In combination with estimated quantity values, data-driven decisions for process development of suitable downstream unit operations can be carried out.29 Moreover, risk assessment methodologies like failure mode effect analysis (FMEA) can identify species needing targeted depletion monitoring based on their risk and impact on the process.
Table 2. Overview of High-Risk HCPs (Biophorum)28 Identified in CCCF and Estimated Mean Quantities with Standard Deviation Based on Different Reference Standards.
| protein description | uniprot accession no. | HCP quantity based on the peptide/protein standard [ng/mg product] | MW [kDa] | pI [] |
|---|---|---|---|---|
| 78 kDa glucose-regulated protein (GRP 78,BiP) | G3I8R9 | 11801 ± 422/28524 ± 2478 | 72.4 | 5.1 |
| C-X-C motif chemokine 3 (CXCL3) | A4URF0 | 734 ± 109/1776 ± 214 | 11.0 | 9.1 |
| carboxylesterase 1-like protein, liver (CES1) | G3I7X9 (A0A061IFE2a) | 191 ± 28/464 ± 80 | 97.5 | 5.9 |
| carboxypeptidase D (Cpd) | G3HR95 | 176 ± 30/425 ± 67 | 123.4 | 5.8 |
| cathepsin B (CatB) | G3H0L9 | 330 ± 45/793 ± 81 | 37.5 | 5.7 |
| cathepsin L (CatL) | G3INC5 | 498 ± 97/1208 ± 267 | 37.3 | 6.8 |
| cathepsin Z (CatZ) | Q9EPP7 | 1053 ± 56/2538 ± 144 | 34.0 | 7.5 |
| clusterin (CLU) | G3HNJ3 | 9840 ± 523/23735 ± 1589 | 51.8 | 5.5 |
| glutathione S-transferase P 1 (GSTP1) | G3I3Y6 | 1292 ± 157/3138 ± 545 | 25.0 | 8.2 |
| lipoprotein lipase (LPL) | G3H6 V7 | 1284 ± 81/3092 ± 155 | 50.5 | 8.0 |
| lysosomal acid lipase (LAL) | G3HQY6 | 159 ± 10/385 ± 32 | 45.6 | 7.3 |
| matrix metalloproteinase-19 (MMP-19) | G3HRK9 | 741 ± 60/1787 ± 174 | 58.9 | 7.7 |
| monocyte chemoattractant protein-1 (MCP-1) | G3GTT2 | 4765 ± 286/11498 ± 885 | 15.9 | 9.3 |
| peroxiredoxin-1 (PRDX1) | G3GYP9 (Q9JKY1a) | 6433 ± 324/15512 ± 919 | 22.3 | 8.2 |
| phospholipase B-like 2 (PLBL2) | G3I6T1 | 1886 ± 87/4563 ± 453 | 65.5 | 5.9 |
| procollagen-lysine 2-oxoglutarate | G3IIE7 | 1275 ± 72/3075 ± 174 | 76.0 | 5.8 |
| 5-deoxygenase 1 (PLOD1) | ||||
| protein S100-A6 (S10A6) | G3HC31 | 81 ± 7/197 ± 25 | 10.0 | 5.3 |
| pyruvate kinase (PK) | G3H3Q1 | 9861 ± 486/23758 ± 955 | 51.6 | 7.6 |
| serine protease (HTRA1) | G3IBF4 | 1533 ± 283/3689 ± 625 | 28.7 | 6.5 |
| sialate o-acetylesterase (SIAE) | G3IIB1 (A0A3L7HS03a) | 459 ± 17/1108 ± 79 | 59.3 | 8.6 |
Obsolete or not in the Cricetulus griseus reference proteome (UP000001075) included Uniprot accession No.; these entries have been exchanged with equivalent protein entries.
The method’s sensitivity is evident from the quantification of the lowest detectable protein, in the lower double-digit ng/mg range (Table 3) for all possible data analysis workflows and reference spikes employed. We further obtained low mean CVs for protein quantities of replicate injections as well as low standard deviations for mean CVs between the digest replicates, demonstrating good precision (Table 3). It was possible to achieve mean CV values below 10% after implementing stricter postprocessing filters, which indicates that the filters are appropriate for excluding outliers and increasing the certainty in identifications.
Table 3. Overview of Quantification Method Performance Evaluating Three Digest Replicates in Triplicate Injections with Pre- and Postapplication of Proteotypic and Single Peptide Hit Exclusion Filters.
| overall quantification performance | mean CV [%] | sd CV [%] | |
|---|---|---|---|
| proteins prefiltering (repl injections) | 13.8 | ±0.5 | |
| proteins postfiltering (repl injections) | 9.9 | ±0.4 |
| HCP quantity estimation (filtered) | minimum [ng/mg] | max [ng/mg] | total HCP [m%] |
|---|---|---|---|
| protein spike std quantification | 21.6 ± 2.3 | 35089 ± 4523 | 54.6 |
| peptide spike std quantification | 8.95 ± 1.1 | 14618 ± 2282 | 34.0 |
| protein spike top3 quantification | 12.1 ± 1.2 | 46361 ± 3609 | 53.9 |
| peptide spike top3 quantification | 3.69 ± 0.5 | 14141 ± 904 | 26.3 |
HCP quantity estimates cover about four magnitudes and total mass percentage values of HCPs are aligning with values for CCCF previously reported in the literature (2911–50 m%30). Orthogonal relative HCP mass estimates by Bradford (≈34 m%) and gel intensity assays (≈44 m%) result in a similar range. The adoption of a commercial CHO ELISA assay indicated a total HCP percentage of around 17 m% (data not shown); this is likely an underestimate of the abundance due to the limitations of ELISA.
It is worth mentioning that direct comparison between these analytical methods is not recommended since all these assays are based on contrasting quantification principles and there are different inherent limitations in their achievable accuracy and coverage. As demonstrated, absolute quantity estimates can vary between the different technologies used.
Influences on Relative Absolute Quantification
Adopting hi3 approaches to estimate relative absolute quantities for HCPs introduces challenges in comparing studies due to variations in the spiked standard, software, and mathematical approaches in data processing pipelines.31−33 A closer investigation was deemed useful to investigate their potential effects.
Peptide vs Protein Level Spiking of Standards
The absolute values based on protein and peptide spikes showed significant differences in the total mass percentage estimates (Table 3). Both approaches resulted in similar distribution shapes but shifted to lower absolute quantities for peptide standard estimates (Figure 3a). This demonstrates that both approaches result in the same relative quantification distributions. The shift itself could be due to multiple reasons: (1) The spiking of the protein standard and subsequent calculations are directly based on the reference product titer from analytical protein A chromatography. On the other hand, peptide standard-based calculations and spiking are dependent on the reference measurement of total peptide amounts, and as a result, the shift could be due to limitations in peptide quantification accuracy and the resulting adjustment of differences in sample preparation recovery. (2) Although the hi3 approaches that assume the highest ionizing peptides for each protein are similar in their ionization efficiencies,22 there is still some variability associated with these strategies. Chicken lysozyme is a commonly used standard because of being readily available and having a high degree of characterization. The relatively small protein, however, results in a smaller chance of containing ionizable peptides in its sequence. This is not an issue for the hi3 standards, which have been designed to contain the six highest ionizing peptides of large proteins.
Figure 3.
Changes in the kernel density distribution of identified HCP quantities depend on calculations based on peptide or protein standards as well as changes in the mathematical estimation approaches (mean vs sum of fragments). (a) Comparison of peptide/protein-based estimates for the standard Spectronaut settings. (b) Comparison of peptide/protein-based estimates based on the top3 sum quantitation settings. (c) Comparison of std and top3 Spectronaut for peptide-based estimation. (d) Comparison of std and top3 Spectronaut for protein-based estimation.
Standard vs Top3 Sum Quantification
It has been recently shown that HCP quantification by a DIA top3 method can demonstrate good accuracy compared to the gold standard method based on isotope dilution selected reaction monitoring for small subsets of HCPs,34 although no investigation on the effect of data processing pipeline changes was carried out. We compare protein quantity calculations using the mean versus the sum (std vs top3) of the three highest intensity fragments and peptides to calculate peptide and protein quantities. The absolute differences for peptide and protein spikes remain similar for both approaches (3a and b). This is in contrast to the direct comparison of the std vs top3 calculations for either the peptide (Figure 3c) or the protein (Figure 3d) level standard spikes, with a noticeable shift toward lower quantities for the peptide level standard. This shift could be due to different spiking levels, with higher peptide spikes potentially leading to greater nonlinearity in response. Furthermore, some of the low-abundance identifications are only based on two quantifiable peptides. Using the sum (instead of the mean) adjusts the abundance for the number of peptides found and therefore correctly leads to smaller values in absolute estimate calculations. In summary, the sum approach likely provides a more accurate representation of actual quantities, especially for less abundant HCPs, as it adjusts for the number of peptides found.
Conclusions
Our method combining SMART digestion with optimized acquisition demonstrates improvements in throughput, balancing sensitivity with time constraints for standard upstream process characterization. While MS is still not fully amenable with automation and online implementation, it marks a step toward implementing MS-based HCP monitoring as a routine PAT tool in biopharmaceutical processing. The method’s potential applications include monitoring batch-to-batch variability, process parameter changes, scale-up effects on HCP populations, and risk assessments for new developments. The data generated could inform the creation of targeted, highly sensitive MS methods, providing high-risk protein targets along with well-ionizing peptides and transition states for multiple reaction monitoring approaches. In scenarios without time constraints, the limit of quantification (LOQ) could be further lowered by employing enrichment or depletion strategies, such as native short contact digest or protein A capture, although these steps require re-evaluation for potential quantification biases.
A key finding of this study is the importance of postprocessing workflows and how spiking and processing changes impact the absolute quantity estimates. Currently, the lack of standardized approaches hampers comparisons among different studies. The adoption of more sophisticated hi3-based approaches such as the HCP–PROFILER standard peptides from Anaquant (Villeurbanne, France) described by Beaumal et al.33 would increase the accuracy due to calculating quantities against an averaged peptide calibration curve, thereby reducing nonlinearity influences. However, this alone cannot resolve the inherent differences in ionization efficiency for detected peptides of low-abundance protein species in hi3 approaches, which will most likely result in some inaccuracies over the detected dynamic range.
The use of protein and peptide level spikes at the same time, as shown in our study, has the benefit of assessing protocol variations and potentially could be leveraged for normalization, which would be especially beneficial for larger cohort studies. Hereby, the choice of suitable peptides and proteins should be further investigated and ideally standardized for usage in a biopharmaceutical environment. As an example, in the future, the hi3 E. coli-based peptide standard could be considered to avoid any potential sequence overlap with peptides in the sample.
In summary, our data suggest that both the scientific community and industrial applications would benefit greatly from standardized workflows, particularly in data processing and software, to facilitate better comparability of results across different laboratories.
Acknowledgments
Special thanks to Alok Shah, Sri Ramarathinam, and Lee Xing Chong for the invaluable discussions and help in revising the manuscript and also the CSL upstream department, especially Yih Yean Lee, for providing and discussing the samples used in this study.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jproteome.4c00637.
Liquid chromatography profile and settings; ZenoTOF 7600 windows for data-independent acquisition; spectronaut processing settings; upsetplot protein identifications for different digests and influence of the preprecipitation step; processed together; ZenoTOF MS acquisition settings; protein standard spiking signal CVs at different spiking levels and spiking influence on protein identifications; and linearity assessment of hi3 peptide standard-derived estimated protein (PDF)
The authors acknowledge with thanks the contribution of the University of South Australia, CSL Innovation Pty Ltd., and the South Australian State Government in providing scholarship funding and support for this research as part of the Industrial Doctoral Training Centre (IDTC) PhD program for biomanufacturing. We also thank Bioplatforms Australia and the State and Federal Governments, which cofund the NCRIS-enabled Mass Spectrometry and Proteomics Facility at the University of South Australia.
The authors declare the following competing financial interest(s): This research is partly sponsored by CSL Innovation Pty Ltd. Craig Kingdon is an employee of CSL Innovation Pty Ltd. Mark R. Condina is an employee of Mass Dynamics. Janik D. Seidel, Clifford Young, Leigh Donellan, Manuela Klingler-Hoffmann and Peter Hoffmann declare no competing financial interests or personal relationship that could have appeared to influence the work reported in this paper.
Supplementary Material
References
- Kaplon H.; Crescioli S.; Chenoweth A.; Visweswaraiah J.; Reichert J. M. Antibodies to watch in 2023. mAbs 2023, 15, 2153410 10.1080/19420862.2022.2153410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dasani S.; Palanki R.; Menon P.; Bose S. K.. Translational Surgery; Academic Press, 2023; pp 535–538. [Google Scholar]
- Hogwood C. E. M.; Bracewell D. G.; Smales C. M. Measurement and control of host cell proteins (HCPs) in CHO cell bioprocesses. Curr. Opin. Biotechnol. 2014, 30, 153–160. 10.1016/j.copbio.2014.06.017. [DOI] [PubMed] [Google Scholar]
- Liu H.; Gaza-Bulseco G.; Faldu D.; Chumsae C.; Sun J. Heterogeneity of Monoclonal Antibodies. J. Pharm. Sci. 2008, 97, 2426–2447. 10.1002/jps.21180. [DOI] [PubMed] [Google Scholar]
- Kim J. Y.; Kim Y.-G.; Lee G. M. CHO cells in biotechnology for production of recombinant proteins: current state and further potential. Appl. Microbiol. Biotechnol. 2012, 93, 917–930. 10.1007/s00253-011-3758-5. [DOI] [PubMed] [Google Scholar]
- Jin M.; Szapiel N.; Zhang J.; Hickey J.; Ghose S. Profiling of host cell proteins by two-dimensional difference gel electrophoresis (2D-DIGE): Implications for downstream process development. Biotechnol. Bioeng. 2010, 105, 306–316. 10.1002/bit.22532. [DOI] [PubMed] [Google Scholar]
- Alt N.; Zhang T. Y.; Motchnik P.; Taticek R.; Quarmby V.; Schlothauer T.; Beck H.; Emrich T.; Harris R. J. Determination of critical quality attributes for monoclonal antibodies using quality by design principles. Biologicals 2016, 44, 291–305. 10.1016/j.biologicals.2016.06.005. [DOI] [PubMed] [Google Scholar]
- Gilgunn S.; Bones J. Challenges to industrial mAb bioprocessing–removal of host cell proteins in CHO cell bioprocesses. Curr. Opin. Chem. Eng. 2018, 22, 98–106. 10.1016/j.coche.2018.08.001. [DOI] [Google Scholar]
- Kornecki M.; Mestmäcker F.; Zobel-Roos S.; de Figueiredo L. H.; Schlüter H.; Strube J. Host Cell Proteins in Biologics Manufacturing: The Good, the Bad, and the Ugly. Antibodies 2017, 6, 13 10.3390/antib6030013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goey C. H.; Alhuthali S.; Kontoravdi C. Host cell protein removal from biopharmaceutical preparations: Towards the implementation of quality by design. Biotechnol. Adv. 2018, 36, 1223–1237. 10.1016/j.biotechadv.2018.03.021. [DOI] [PubMed] [Google Scholar]
- Goey C. H.; Bell D.; Kontoravdi C. CHO cell cultures in shake flasks and bioreactors present different host cell protein profiles in the supernatant. Biochem. Eng. J. 2019, 144, 185–192. 10.1016/j.bej.2019.02.006. [DOI] [Google Scholar]
- Park J. H.; Jin J. H.; Lim M. S.; An H. J.; Kim J. W.; Lee G. M. Proteomic Analysis of Host Cell Protein Dynamics in the Culture Supernatants of Antibody-Producing CHO Cells. Sci. Rep. 2017, 7, 44246 10.1038/srep44246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hamaker N. K.; Min L.; Lee K. H. Comprehensive assessment of host cell protein expression after extended culture and bioreactor production of CHO cell lines. Biotechnol. Bioeng. 2022, 119, 2221–2238. 10.1002/bit.28128. [DOI] [PubMed] [Google Scholar]
- Baik J. Y.; Guo J.; Lee K. H.. Cell Culture Engineering; Wiley-VCH, 2019; pp 295–311. [Google Scholar]
- Huang L.; Wang N.; Mitchell C. E.; Brownlee T.; Maple S. R.; De Felippis M. R. A Novel Sample Preparation for Shotgun Proteomics Characterization of HCPs in Antibodies. Anal. Chem. 2017, 89, 5436–5444. 10.1021/acs.analchem.7b00304. [DOI] [PubMed] [Google Scholar]
- Vowinckel J.; Zelezniak A.; Bruderer R.; Mülleder M.; Reiter L.; Ralser M. Cost-effective generation of precise label-free quantitative proteomes in high-throughput by microLC and data-independent acquisition. Sci. Rep. 2018, 8, 4346 10.1038/s41598-018-22610-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bian Y.; Zheng R.; Bayer F. P.; et al. Robust, reproducible and quantitative analysis of thousands of proteomes by micro-flow LC-MS/MS. Nat. Commun. 2020, 11, 157 10.1038/s41467-019-13973-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Camperi J.; Dai L.; Guillarme D.; Stella C. Fast and Automated Characterization of Monoclonal Antibody Minor Variants from Cell Cultures by Combined Protein-A and Multidimensional LC/MS Methodologies. Anal. Chem. 2020, 92, 8506–8513. 10.1021/acs.analchem.0c01250. [DOI] [PubMed] [Google Scholar]
- Jakes C.; Füssl F.; Zaborowska I.; Bones J. Rapid Analysis of Biotherapeutics Using Protein A Chromatography Coupled to Orbitrap Mass Spectrometry. Anal. Chem. 2021, 93, 13505–13512. 10.1021/acs.analchem.1c02365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Camperi J.; Grunert I.; Heinrich K.; Winter M.; Özipek S.; Hoelterhoff S.; Weindl T.; Mayr K.; Bulau P.; Meier M.; Mølhøj M.; Leiss M.; Guillarme D.; Bathke A.; Stella C. Inter-laboratory study to evaluate the performance of automated online characterization of antibody charge variants by multi-dimensional LC-MS/MS. Talanta 2021, 234, 122628 10.1016/j.talanta.2021.122628. [DOI] [PubMed] [Google Scholar]
- Liu Y.; Zhang C.; Chen J.; Fernandez J.; Vellala P.; Kulkarni T. A.; Aguilar I.; Ritz D.; Lan K.; Patel P.; Liu A. A Fully Integrated Online Platform For Real Time Monitoring Of Multiple Product Quality Attributes In Biopharmaceutical Processes For Monoclonal Antibody Therapeutics. J. Pharm. Sci. 2022, 111, 358–367. 10.1016/j.xphs.2021.09.011. [DOI] [PubMed] [Google Scholar]
- Silva J. C.; Gorenstein M. V.; Li G.-Z.; Vissers J. P. C.; Geromanos S. J. Absolute Quantification of Proteins by LCMSE: A Virtue of Parallel ms Acquisition. Mol. Cell. Proteomics 2006, 5, 144–156. 10.1074/mcp.M500230-MCP200. [DOI] [PubMed] [Google Scholar]
- Doellinger J.; Schneider A.; Hoeller M.; Lasch P. Sample Preparation by Easy Extraction and Digestion (SPEED) - A Universal, Rapid, and Detergent-free Protocol for Proteomics Based on Acid Extraction*. Mol. Cell. Proteomics 2020, 19, 209–222. 10.1074/mcp.TIR119.001616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wiśniewski J. R.; Gaugaz F. Z. Fast and Sensitive Total Protein and Peptide Assays for Proteomic Analysis. Anal. Chem. 2015, 87, 4110–4116. 10.1021/ac504689z. [DOI] [PubMed] [Google Scholar]
- Perez-Riverol Y.; Bai J.; Bandla C.; García-Seisdedos D.; Hewapathirana S.; Kamatchinathan S.; Kundu D.; Prakash A.; Frericks-Zipper A.; Eisenacher M.; Walzer M.; Wang S.; Brazma A.; Vizcaíno J. A. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res. 2022, 50, D543–D552. 10.1093/nar/gkab1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frankenfield A. M.; Ni J.; Ahmed M.; Hao L. Protein Contaminants Matter: Building Universal Protein Contaminant Libraries for DDA and DIA Proteomics. J. Proteome Res. 2022, 21, 2104–2113. 10.1021/acs.jproteome.2c00145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Strasser L.; Oliviero G.; Jakes C.; Zaborowska I.; Floris P.; Ribeiro da Silva M.; Füssl F.; Carillo S.; Bones J. Detection and quantitation of host cell proteins in monoclonal antibody drug products using automated sample preparation and data-independent acquisition LC-MS/MS. J. Pharm. Anal. 2021, 11, 726–731. 10.1016/j.jpha.2021.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones M.; Palackal N.; Wang F.; et al. High-risk” host cell proteins (HCPs): A multi-company collaborative view. Biotechnol. Bioeng. 2021, 118, 2870–2885. 10.1002/bit.27808. [DOI] [PubMed] [Google Scholar]
- Oh Y. H.; et al. Identification and characterization of CHO host-cell proteins in monoclonal antibody bioprocessing. Biotechnol. Bioeng. 2023, 121 (1), 291–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walker D. E.; Yang F.; Carver J.; Joe K.; Michels D. A.; Yu X. C. A modular and adaptive mass spectrometry-based platform for support of bioprocess development toward optimal host cell protein clearance. mAbs 2017, 9, 654–663. 10.1080/19420862.2017.1303023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith J.; Strasser L.; Guapo F.; Milian S. G.; Snyder R. O.; Bones J. SP3-based host cell protein monitoring in AAV-based gene therapy products using LC-MS/MS. Eur. J. Pharm. Biopharm. 2023, 189, 276–280. 10.1016/j.ejpb.2023.06.019. [DOI] [PubMed] [Google Scholar]
- Hessmann S.; Chery C.; Sikora A.-S.; Gervais A.; Carapito C. Host cell protein quantification workflow using optimized standards combined with data-independent acquisition mass spectrometry. J. Pharm. Anal. 2023, 13, 494–502. 10.1016/j.jpha.2023.03.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beaumal C.; Beck A.; Hernandez-Alba O.; Carapito C. Advanced mass spectrometry workflows for accurate quantification of trace-level host cell proteins in drug products: Benefits of FAIMS separation and gas-phase fractionation DIA. Proteomics 2023, 23, 2300172 10.1002/pmic.202300172. [DOI] [PubMed] [Google Scholar]
- Husson G.; Delangle A.; O’Hara J.; Cianferani S.; Gervais A.; Van Dorsselaer A.; Bracewell D.; Carapito C. Dual Data-Independent Acquisition Approach Combining Global HCP Profiling and Absolute Quantification of Key Impurities during Bioprocess Development. Anal. Chem. 2018, 90, 1241–1247. 10.1021/acs.analchem.7b03965. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



