Abstract
Abstract
Biopharmaceuticals, such as antibody therapeutics, are produced by culturing mammalian cells with chemically defined media that consist of more than 50 synthesized components. The screening of medium components related to culture performance and the subsequent optimization of the composition are required in the development of new modalities, host cells, and culture methods. Screening the components to be optimized is typically labor-intensive. The easiest approach is media blending, which creates variations in the concentrations of the components with only liquid mixing. However, a workflow for systematically determining experimental conditions (i.e., how to blend media) has not been established. Therefore, we reassessed media blending from a mathematical perspective and proposed a workflow for the first time. In the workflow, we evaluated the use of a commercially available chemically defined media to maximize simplicity and applicability. From a mathematical perspective, we clarified that multicollinearity is an inevitable challenge in both experimental design and its analysis. Under the constraint, we showed that one of the most appropriate experimental conditions could be systematically calculated and selected by applying D-optimal design focusing on the principal components. We performed a case study of cell culture to screen medium components under 120 experimental conditions using 11 chemically defined media designed for Chinese Hamster ovary cells. The case study provided a reasonable set of components that explained the variance in viable cell concentrations, which range from 5.8 to 19.4 (× 106) cells/mL. Finally, our mathematical redefinition also enabled the design of a dedicated media set for media blending.
Key points
• The constraints in media blending were clearly explained.
• A systematic workflow from blending design to analysis was proposed.
• The workflow also enabled the design of a dedicated media set for media blending.
Supplementary information
The online version contains supplementary material available at 10.1007/s00253-025-13594-z.
Keywords: Media blending, Medium component screening, Chemically defined media, Mammalian cell culture, New methodology
Introduction
Mammalian cell culturing is an indispensable process for producing active biopharmaceutical ingredients, such as antibody therapeutics and gene therapeutics (O’Flaherty et al. 2020). It is also essential for using the cells as final products, including cellular food and regenerative medicine (Ozawa et al. 2023; Martins et al. 2024). Historically, medium composition has been studied and discussed for the improvement of biomass accumulation and the amount of titer because the cells consume the medium components for their own growth and production of the target product (Landauer 2014; Ritacco et al. 2018). Chemically defined media (CDM) that consist only of synthesized components are preferred when the final product is intended for human use because of the prevention of the risk of viral infection and the ease of the management of raw materials. For example, many basal and feed CDM have been developed for the culture of Chinese hamster ovary (CHO) cells (Reinhart et al. 2015). Although the composition of CDM developed by suppliers is sufficiently sophisticated to achieve specific applications, the individual optimization of the composition is still required. In various stages of culture process development, such as cell line modification and culture methods (batch, fed-batch, and perfusion culture), the identification of the optimal medium composition has been explored (Wurm 2004; Galbraith et al. 2018). Furthermore, when new host cells or modalities are used, the development of a completely new composition may need to be considered.
CDM commonly consists of more than 50 components (Xu et al. 2018; Galbraith et al. 2018; Cordova et al. 2023). The simultaneous optimization of all components is difficult because the increase in the number of targeted components to be optimized expands the search space exponentially. Additionally, the number of implementations of cell culture at one time is limited, and the process requires significant cost and time. Therefore, the optimization of the composition is commonly conducted by focusing on specific components. A review of medium optimization indicated that the average number of conditions per experimental batch was 27 and the average number of components to be optimized was 8.1 (Zhou et al. 2023). The components to be optimized are screened through experimental evaluation to determine those closely related to culture performance. The most common approaches are one factor at a time (OFAT) and factorial design-based methods (Combe and Sokolenko 2021). OFAT is performed by changing the concentration of only one variable, whereas factorial design is performed by changing several variables simultaneously. The selection of the approach depends on the number of components to be evaluated and the need to take into account interactions (Yasui et al. 2021; Hashizume and Ying 2024). For example, the Plackett–Burman design (Plackett and Burman 1946) is suitable for screening multiple variables generally, although interactions are disturbed. When OFAT and factorial design are applied to screening medium components, these approaches typically require the laborious preparation of concentrated solutions of the individual components. The preparation of concentrated solutions seems to be a simple operation, but it requires using solvents with different pH values that depend on the components because of solubility considerations, which necessitates pH adjustment of the medium after the supplementation of the components. This is quite an arduous operation, particularly when the number of components to be screened increases, and it is unsuitable for the automation of experiments.
Media blending is an easy approach that creates variations in the concentrations of the components to be evaluated with only the pipette operation of liquid mixing. Jordan and Rouiller et al. evaluated medium components related to cell growth and titer in antibody-producing CHO cells by designing the original media used for blending, referred to as the “mother media” in this study (Jordan et al. 2013; Rouiller et al. 2013). However, media blending has not been widely adopted because a workflow for systematically determining the experimental conditions (i.e., how to blend the mother media) has not been established. The establishment of a systematic workflow requires handling the process mathematically. Additionally, considering the issue mathematically will enable a structured evaluation of the limitations and applicability of media blending. Therefore, in this study, we reassessed media blending from a mathematical perspective and proposed a workflow for the initial screening of the components to be optimized. In the development of the workflow, it is important that it is user-friendly and immediately usable for any case by all researchers. Thus, we notably used commercially available CDM as mother media in the evaluation. There are two main advantages to using CDM as mother media. The first is that the design and preparation of the mother media is not required. This point is simple but important because the preparation of the mother media is the only laborious step in the media blending approach. For simplicity, material candidates include commercially available stock solutions for each component category (e.g., amino acids, vitamins, and trace salts) used by Biederman et al. (2022) in modular blending; however, we consider the following second advantage: The selection of CDM that meets the intended purpose allows researchers to reference empirically adjusted concentrations. This enables the straightforward incorporation of these concentration ranges into experimental conditions, even for the high concentrations in production and feed CDM for biopharmaceuticals. Although the range of concentrations to be tested depends on the chosen CDM, components with small variance among products can be considered to reflect an empirical consensus.
To summarize the key points, the objective of this study is to develop a workflow that can systematically determine experimental conditions by reassessing media blending from a mathematical perspective. In the development of the workflow, we evaluate the use of commercially available CDM to maximize simplicity and applicability, thereby ensuring it can be widely used by all researchers.
Materials and methods
Initial screening of the medium components involves constructing a model for culture performance (e.g., viable cell concentration (VCC) or product concentrations) using , in which the medium components are variables, and then interpreting the model to estimate the medium components that are most closely related to . Thus, a methodology that carries out this process effectively and logically is important. Our proposed workflow for the initial screening of medium components through media blending with commercially available CDM consists of the following three steps: (ⅰ) experimental design, (ⅱ) cell culture experiments, and (ⅲ) regression modeling and its interpretation (Fig. 1a).
Fig. 1.
Overview of the entire process for the initial screening of medium components through media blending
Step (ⅰ)−1: Experimental design (theory)
Definition of the matrices in media blending
First, we define the composition of the mother media using matrix , defined as follows:
| 1 |
Next, we define the matrix , which represents the dispensing volumes of the mother media for all possible blending combinations, as follows:
| 2 |
By multiplying matrices and , we obtain matrix , which gives the composition of each medium component in all blending combinations (Fig. 2a):
| 3 |
Fig. 2.
Systematic experimental design for the initial screening of medium components through media blending. a Initial definition of the matrices in media blending. To illustrate the general concept of experimental design in media blending, a simplified example is shown with four mother media and three final experimental conditions. b The desired situation in the experimental design for regression modeling, particularly for the aim of model interpretation. A simplified example is shown with two medium components and three final experimental conditions. This example illustrates that when there is high correlation between medium components (multicollinearity), it becomes challenging to determine which component is related to y. c Geometric illustration of selecting experimental conditions that minimize intervariable correlation (multicollinearity). In this simplified example, three experimental conditions are chosen from five options. Choosing conditions 1, 3, and 4 results in column vectors , , and , which are oriented in more diverse directions than those when conditions 2, 3, and 5 are selected, thereby creating a larger vector space. Maximizing the volume of the space (parallelepiped) is equivalent to maximizing the diversity of the column vector directions. d Illustration of dimensionality reduction using principal component analysis (PCA). The original three-dimensional space defined by the concentrations of media components , , and for each mother medium is projected onto a two-dimensional space. The red line corresponds to the direction with the maximum variance, which is the 1 st principal component. The blue line, which is the vector orthogonal to the 1 st principal component with the maximum variance is the 2nd principal component. e Flowchart of our proposed systematic experimental design for the initial screening of medium components through media blending. Blue text represents execution steps, whereas black text indicates matrices. P, parallelepiped; PC, principal component
Given the limited number of conditions per experimental batch for a cell culture, it is impractical to conduct experiments for all possible combinations. Therefore, a specified number of experimental conditions, denoted by , is extracted from the matrix ; that is,
| 4 |
Definition of the direction of experimental design
As mentioned above, the purpose of the initial screening is achieved by modeling and interpreting the model. In the context of media blending, the matrix in Eq. (4) corresponds to in the model. If the columns of (each representing a different medium component) essentially provide the same information, it become impossible to distinguish which medium components contribute to when constructing the model (Fig. 2b). This situation, in which the columns can be expressed as a linear combination of the other columns, is called “multicollinearity.” Therefore, in the experimental design step for the screening study, it is desirable to maximize the diversity of the column vectors of when extracting rows from . Geometrically, this means maximizing the volume of a -dimensional parallelepiped formed by the standardized column vectors of . In the three-dimensional case, a parallelepiped can be schematically represented as in Fig. 2c, where the volume enclosed by the column vectors corresponds to the volume of the parallelepiped. Clearly, the volume increases as the column vectors diverge in direction. Denoting the parallelepiped as , its volume is defined as follows.
| 5 |
Thus, maximizing the volume of parallelepiped is equivalent to solving the following expression.
| 6 |
In the context of design of experiment (DoE), solving is the D-optimal design problem. A wide range of methods can be used for DoE, including factorial design-based approaches (e.g., full factorial, fractional factorial, and Plackett–Burman design) as well as methods such as Latin hypercube sampling. However, the methods that assume that the values for the explanatory variables are independently specified, such as factorial design and Latin hypercube sampling, cannot be used in the media blending context. This is because the explanatory variables (i.e., the medium components in this study) are embedded within the manipulated variables (i.e., the mother media). Moreover, because the mother media themselves are predetermined (i.e., observational data), D-optimal design cannot be replaced by these DoE methods. Hence, we believe that solving for (D-optimal design) is the most reasonable experimental design approach for the purpose of screening medium components.
Mathematical constraint in media blending
To state the conclusion first, the mathematical constraint in the experimental design for media blending is that always becomes zero. In other words, it is impossible to implement the D-optimal design principle in the standard way, as described above. This issue arises from the rank deficiency of matrix . The rank of a matrix is defined as the maximum number of linearly independent rows or columns; therefore, the rank is always less than or equal to the smaller of the two. Rank deficiency occurs when the rank of a matrix is smaller than the number of rows or columns (whichever is smaller). To clarify why matrix is rank deficient, we explain each step of the process that causes the rank deficiency. First, we consider the rank of matrix . As stated in the Introduction, the main advantage of media blending is the simplicity of its experimental procedures. Therefore, it is generally assumed that the number of mother media is less than the number of medium components (which is greater than 50), implying that . Thus, the rank of the matrix satisfies.
| 7 |
Next, for matrix , since the number of all blending combinations is assumed to be larger than the number of mother media , the rank of matrix satisfies
| 8 |
Therefore, the rank of the matrix (where ) is given by
| 9 |
Since matrix is obtained by extracting rows (the specified number of experimental conditions) from matrix , it follows that
| 10 |
Consequently, the rank of is also
| 11 |
As indicated in Eq. (4), we have . However, since , matrix is rank deficient with respect to its dimension . In a rank-deficient case, the determinant of the matrix is zero; hence,
| 12 |
For further details on this derivation, please refer to Supplemental Note S1. Since is always zero, the approach of directly solving the determinant for matrix , which is often a reasonable experimental design strategy, cannot be used. This is the mathematical constraint in the experimental design of media blending.
Reasonable solutions
On the basis of the derivation so far, the fundamental cause for lies in the fact that the number of mother media is smaller than the number of medium components in matrix . However, considering the advantage of the simplicity of media blending, it is necessary to permit as a premise. Because rank deficiency inevitably leads to multicollinearity, allowing implies tolerating partial multicollinearity. Furthermore, since matrix consist of observational data from commercially available CDM, there exists the possibility of multicollinearity even if . Thus, it is necessary to select experimental conditions that minimize redundancy among the column vectors of under partial multicollinearity.
To overcome the mathematical constraint, it is necessary to reduce the dimensionality of matrix into a vector space of at most dimensions that is free of multicollinearity. It is important to consider the method used to achieve this. Possible methods for achieving such dimensionality reduction include variable filtering, which removes variables with low information based on metrics such as the variance inflation factor (VIF) or correlation coefficients, and principal component analysis (PCA), which reduces the number of dimensions while retaining as much information as possible in a space of linearly independent column vectors.
Among the possible methods, we took into account a characteristic feature of our study—the use of commercially available CDM as mother media—when selecting the most appropriate method. Commercially available CDM are pre-formulated products, meaning that their composition cannot be freely manipulated. The variability among CDM differs by component; some components are present at consistent levels across all products, while others exhibit considerable variability in different products. Under the basic assumption that differences in culture performances arise from variability in the composition of the mother media, we reasoned that it is important to reduce dimensionality in a manner that retains as much variance as possible. Accordingly, we pursued dimensionality reduction that achieves two objectives: reducing matrix to a vector space free of multicollinearity and minimizing the loss of the variance.
The dimensionality reduction method that achieves these two objectives is formulated and explained in Supplemental Note S2. In brief, these objectives are achieved by performing repeated orthogonal projection onto the axis of maximum variance on the standardized matrix . Here, this procedure corresponds exactly to the operation performed by PCA (Fig. 2d). In PCA, the original data space is reduced to a subspace formed by axes called principal components. Because the principal components are orthogonal, PCA can eliminate the redundant information (multicollinearity) that can obscure factor relationships in high-dimensional data. Furthermore, by selecting the directions of maximum variance first, PCA retains a substantial portion of the original data’s total variance. By mapping the original matrix onto these principal components , the data is reduced to a -dimensional space () represented as follows:
| 13 |
Therefore, given the two objectives—that is, to retain as much variance in the medium composition as possible and to reduce the dimensionality without multicollinearity—PCA is clearly the most appropriate approach.
We denote the matrix after dimensionality reduction by . Accordingly, matrix , originally defined in Eq. (3), can be revised as
| 14 |
That is, represents the locations in -dimensional space (obtained after PCA) for each blending combination. From , rows are selected to construct , as follows:
| 15 |
Since matrix is no longer rank deficient, the maximization of becomes meaningful. When we employ , which assumes that conditions have been selected from matrix , matrix can be decomposed based on Eqs. (13) and (14) as follows:
| 16 |
Thus,
| 17 |
Since is an orthonormal basis for the -dimensional space, is the matrix that expresses within that subspace. Consequently, solving is equivalent to mitigating multicollinearity in the non-redundant portion of . Since corresponds to a matrix obtained by extracting rows from (Eq. (3)), ensuring that the non-redundant portion of maintains as much diversity as possible is consistent with the original problem to be solved.
Therefore, in a scenario where matrix (i.e., a composition of mother media) is rank deficient, we propose the approach of employing PCA to form a -dimensional space that retains as much of the variance as possible, and then selecting experimental conditions that maximize (i.e., D-optimal design). This approach helps reduce the undesirable intervariable correlations (multicollinearity) in the experimental conditions, thus aiding the interpretation of the regression model of the relationship between (the medium components) and culture performance . The proposed workflow represents the first systematic experimental design for the initial screening of medium components throughout media blending. This framework is presented in Fig. 2e.
Step (ⅰ)−2: Experimental design (case study)
Chemically defined media
The CDM used in this study were 1X CD CHO (Thermo Fisher Scientific, Waltham, MA, USA, Cat# 10743-011); CH400AZ (GMEP, Fukuoka, Japan, Cat# CH400AZ-0005); HyClone™ ActiPro™ (Cytiva, Marlborough, MA, USA, Cat# SH31039.02); TC-42 w/o GF (Sartorius, Göttingen, Germany, Cat# 511-0001); CHOlean (Sartorius, Göttingen, Germany, Cat# 1140-0001); 4Cell® SmartCHO Stock & Adaptation Medium (Sartorius, Göttingen, Germany, Cat# CFP3FB1101); 4Cell® SmartCHO Media System Production Medium (Sartorius, Göttingen, Germany, Cat# CFP3FB2102); Ham’s F-12 Nutrient Mix (Thermo Fisher Scientific, Waltham, MA, USA, Cat# CFP3FB2102); RPMI 1640 (Thermo Fisher Scientific, Waltham, MA, USA, Cat# 11875-093); CD FortiCHO™ (Thermo Fisher Scientific, Waltham, MA, USA, Cat# A11483-01); CD OptiCHO™ (Thermo Fisher Scientific, Waltham, MA, USA, Cat# 12681-011); and EX-CELL® CD CHO Fusion (Sigma-Aldrich, St. Louis, MO, USA, Cat# 14365C-1000ML). EX-CELL® CD CHO Fusion is the medium to which the cells were originally acclimated and was used as the common medium (Fig. 5a). The other 11 protein-free media were used as mother media.
Fig. 5.
Batch culture under the selected media blending conditions. a Overview of the experiment in the case study. Time course of b viability and c viable cell concentrations (VCCs) under the 120 selected conditions. Batch cultures were performed with biological replicates (n = 3) and the results are shown within a range exceeding 80% viability. The plots indicate the mean value for 69 h (at harvest), whereas the measurement value of mixtures of replicates for values other than 69 h. The conditions shown in various colors were the lowest, lower 10%, median (50%), lower 90%, and highest VCC at harvest, respectively. Box plots of d VCC and e immunoglobulin (Ig)G1 concentration at harvest. The horizontal line within the box represents the median value, whereas the upper and lower edges of the box represent the first and third quartiles, respectively. Dots represent individual measurements, which highlight the distribution and density. f Scatter plot of VCC and IgG1 concentration at harvest. The shaded area represents the 99% confidence interval of the regression line
Media analysis using mass spectrometry
Since the composition of the CDM used in this study was mostly undisclosed, the components of the CDM were analyzed using triple quadrupole liquid chromatograph mass spectrometry (LC-MS/MS) and inductively coupled plasma mass spectrometry (ICP-MS).
For the LC-MS/MS analysis, the mother media were mixed with acetonitrile (FUJIFILM Wako Pure Chemical, Osaka, Japan, Cat# 018-19853), followed by centrifugation at 15,000 rpm for 15 min at room temperature. The supernatants from the centrifugation were collected and diluted tenfold with ultrapure water. The samples were then analyzed using the LCMS-8060NX (Shimadzu Corp., Kyoto, Japan). LC/MS/MS Method Package for Cell Culture Profiling Ver. 3 (Shimadzu Corp., Kyoto, Japan), which is tailored for cell culture samples and targets a total of 144 components (amino acids: 72, nucleic acids: 32, vitamins: 18, sugars: 4, and other components: 18), was used for the analytical condition. For the ICP-MS analysis, the mother media were diluted with 1% nitric acid (FUJIFILM Wako Pure Chemical, Osaka, Japan, Cat# 140-10045) to prepare the samples. The dilution ratios were 10-fold for trace metal ions (Al, Ba, Br, Ca, Cd, Co, Cu, Fe, Li, Mg, Mn, Mo, Rb, Se, Sn, Sr, Ti, V, and Zn) and 1000-fold for high-concentration metal ions (K, Na, and P). These metal ions were analyzed using the ICPMS-2030 (Shimadzu Corp., Kyoto, Japan) under the conditions listed in Supplemental Table S1.
Breakdown of each matrix
Here, in the case study, the breakdown of the matrices defined in the “Experimental design (theory)” section is presented.
First, matrix , representing the composition of mother media as described in Eq. (1), was arranged such that its rows correspond to the various mother media types and its columns to the medium components, with each element representing the analytical value of a medium component determined by mass spectrometry.
Matrix was then projected onto a low-dimensional orthogonal space that retains as much variance as possible via PCA, as shown in Eq. (13). So that each medium component had the same scale, matrix was first standardized for each medium component to form , and then PCA was applied to obtain matrix . The dimensionality of was chosen such that 99% of the variance in was retained.
Matrix , which represents the dispensing volumes of the mother media for all possible blending combinations (Eq. (2)), was set up as follows in this case study. To avoid excessive experimental complexity, up to 6 of the 11 mother media were selected and blended in increments of 200 µL to yield a total volume of 1200 µL. Thus, the number of blending combinations (i.e., number of rows) was .
According to Eq. (14), matrix was derived as the product of matrices and . Matrix indicates the locations in -dimensional space for the 8008 blending combinations, and is the matrix obtained when the number of experimental conditions have been extracted from . In this case study, the best selected from was determined by the following procedure:
- The following two steps were iterated 10,000 times:
-
i.A set of experimental conditions (i.e., rows) was randomly selected from matrix to form .
-
ii.was calculated.
-
i.
From the 10,000 iterations, the set of rows that yielded the maximum value of was identified.
Finally, was used to denote that these rows were extracted from matrix and determines how to blend the mother media in the experiment.
A visual summary of the above workflow is provided in Fig. 2e as a flowchart. In addition, the Python code for executing this workflow is available on GitHub (https://github.com/HK-bio/medium_study_repo/tree/main/paper_202504).
Step (ⅱ): Cell culture experiment
Cell line and preculture
A recombinant immunoglobulin (Ig)G1 (trastuzumab)-producing Chinese hamster lung (CHL)-YN suspension cell pool was used. The CHL-YN cell line was established in our laboratory, and its parental cell line is registered as RCB5004 in the RIKEN Cell Bank, from which it is available. This IgG1-producing cell pool was established through polyethyleneimine-based transfection and puromycin-based selection, as previously detailed (Yamano-Adachi et al. 2020). All cultivations were conducted using the same working cell bank. After the stock solution was thawed, three passages of preculture were performed. During the preculture, the cell pool was cultured in EX-CELL® CD CHO Fusion medium supplemented with 6 mM l-glutamine (FUJIFILM Wako Pure Chemical, Fukuoka, Japan, Cat# 073-05391) and 5 µg/mL puromycin (Invivogen, San Diego, CA, USA, Cat# ant-pr-1) in a humidified incubator (Climo-Shaker, Kuhner, Basel, Switzerland) at 37 °C, 5% CO2, Φ25 mm, and 140 rpm.
Batch culture
Batch cultures in 24 deep well plates were conducted using the Duetz sandwich-cover system (EnzyScreen BV, Heemstede, Netherlands) and 24DW plates (EnzyScreen BV, Heemstede, Netherlands). The working volume was 2.5 mL, which consisted of 1.3 mL of EX-CELL® CD CHO Fusion medium (as the common medium) supplemented with 11.5 mM l-glutamine and 9.6 µg/mL puromycin, and 1.2 mL of mother media in total (Fig. 5a). This was to minimize the need for cell adaptation to a significantly different composition. Cells were inoculated at 5 × 105 cells/mL. The cultures were performed at 37 °C, 5% CO2, Φ25 mm, and 300 rpm.
Measurement of cultured samples
Viable cell concentration (VCC) and viability were measured using a Vi-CELL BLU Cell Viability Analyzer (Beckman Coulter, Inc., Brea, CA, USA).
IgG1 concentrations in the cell supernatants were determined using a sandwich enzyme-linked immunosorbent assay, as described in a previous report (Kaneyoshi et al. 2019).
Step (ⅲ): Regression modeling and its interpretation
As described above, the purpose of this study was to screen for the medium components that were likely to be related to culture performance . In both modeling and its interpretation, it is important to establish a regression model that explains the variance of using , as well as to ensure that the model is interpretable.
Modeling algorithms
An interpretable regression model is defined by its capacity to clearly indicate which variables are largely responsible for predictions via its internal weights and estimated parameters. For example, in a linear regression model, the sign and magnitude of each regression coefficient quantitatively reveal the effects of medium components on culture performance. This clarity allows researchers to more readily identify promising directions for medium optimization. However, it is known that interactions occur among medium components (Schnellbaecher et al. 2019; Pandey et al. 2024), and drastic changes may arise from component depletions or metabolic shifts. Therefore, it is necessary to consider whether some patterns might be difficult to explain using simple models. Accordingly, machine learning approaches, which can model such complex nonlinearities, have been increasingly used in recent years (Tai- et al. 2021; Hashizume et al. 2023; Hashizume and Ying 2024; Gangwar et al. 2024). However, many of these approaches operate as black boxes, resulting in cases where high predictive accuracy is accompanied by low interpretability. Thus, in models developed by machine learning, interpretability is often achieved by numerically validating the model after modeling. For example, SHapley Additive exPlanations (SHAP) values (Lundberg and Lee 2017) and permutation feature importance (PFI) (Breiman 2001) are representative of these methods, and these methods have even been applied to medium components screening (Gangwar et al. 2024). In theory, these methods can be applied to any model.
Therefore, in this study, regression modeling was carried out by considering both simple linear regression models and more complex machine learning models to account for the possibilities of interactions among medium components and the inherent complexity of cultivation. Specifically, several regression algorithms were employed corresponding to the following assumptions. First, under the assumption of drastic changes (e.g., component depletions or metabolic shifts), decision tree-based ensemble techniques—including random forest (RF) regression, gradient boosting decision tree (GBDT) regression, and extreme gradient boosting (XGB) regression—were used. Second, under the assumption of smoother multimodal behavior, algorithms that capture smooth nonlinearities—such as nonlinear Gaussian process regression (nonlinear GPR) and nonlinear support vector regression (nonlinear SVR) with a radial basis function kernel—were employed. Third, under the assumption that the observed patterns can be explained by linear relationships, linear regression techniques including partial least squares regression (PLS-R), ridge regression, least absolute shrinkage and selection operator (Lasso), elastic net regression, linear support vector regression (Linear SVR), and linear Gaussian process regression (linear GPR) were included. Multiple algorithms were chosen for each assumption to enhance interpretability and predictive robustness by identifying common trends across models.
Modeling procedure
In supervised learning, a smaller sample size increases the likelihood of overfitting, particularly in complex machine learning models (Ying 2019). Overfitted models tend to exhibit poor predictive accuracy on unknown data and thus reduced generalizability. Therefore, it is important to evaluate the performance of models based not only on the training set but also on unknown data (i.e., test set). In this study, model construction and evaluation were performed following a guideline for supervised-learning approaches in biological studies (Walsh et al. 2021).
The entire dataset (360 samples [120 conditions, 3 biological replicates]) was split using a 6:4 ratio into a training set and test set such that data with the same condition were consistently grouped and exclusively split into either the training set or the test set, ensuring no data leakage. Next, for hyperparameter tuning, the training set was divided using fivefold cross-validation, and the hyperparameters corresponding to the highest value (mean − standard deviation) of the predictive error (coefficient of determination; R2) on the validation set were selected. In this cross-validation, data with the same condition were also consistently grouped and exclusively assigned to either the training set or validation set. The optimized hyperparameters were used to retrain the entire training set, and then, the test set was predicted using the trained model. As performance metrics, R2 (to evaluate the extent to which the model’s predictions explain the variance of ) and the mean squared error (MSE; to identify large predictive errors) were used. Their equations are respectively as follows:
| 18 |
,
| 19 |
The test set was randomly split into five sets, and the MSE was computed for each set. The mean and standard error were then calculated to evaluate the significance relative to the baseline model (which predicts the mean value). The overall framework of the model construction and evaluation is shown in Fig. 3a, with details provided in Supplemental Table S2.
Fig. 3.
Regression modeling and the interpretation. a Flowchart outlining the entire process from regression modeling to interpretation. Blue text represents execution steps, whereas boxes indicate information such as data, variables, and models. b Graphical explanation of permutation feature importance (PFI) calculation. Here, an example of calculating the PFI for is presented. and are highly correlated, containing overlapping information for y = f(X). Thus, c1 compensates for the loss when is shuffled, resulting in underestimated errors. c Graphical explanation of the adjusted PFI calculation procedure, which considers intervariable correlation. Here, an example of calculating the adjusted PFI for is presented. The simultaneous shuffling of , which highly correlates with , ensures that the error for is accurately estimated without underestimation. RF, random forest regression; GBDT, gradient boosting decision tree regression; XGBoost, extreme gradient boosting regression; Nonlinear SVR, nonlinear support vector regression; Nonlinear GPR, Nonlinear Gaussian process regression; PLS-R, partial least squares regression; Ridge, ridge regression; Lasso, least absolute shrinkage and selection operator; ElasticNet, elastic net regression; Linear SVR, linear support vector regression; Linear GPR, linear Gaussian process regression
Finally, the models meeting the following conditions were defined as “superior models,” and consequently were used in the model interpretation step.
Predictive accuracy: the MSE on the test data is significantly lower than that of the baseline (one-sided paired t-test [Holm’s correction, α = 0.05]).
Generalizability: the ratios of the average R2 to MSE for the training and test datasets are within 15%.
Interpretation of the models
In the superior models, the PFI, which is a model-independent metric (Breiman 2001), was applied to estimate the components that are most closely related to . The PFI is calculated as follows:
The R2 score (baseline score;) of the trained model on the test set is calculated.
- The following is iterated multiple times () for all explanatory variables.
- One explanatory variable in the test set is selected and the values are shuffled randomly.
- The predictive score () is calculated with the shuffled test set using the same trained model.
-
(3)
The PFI for is calculated using the following equation:
| 20 |
Nevertheless, in the case of media blending, the results of PFI can be misinterpreted. Although the proposed experimental design can minimize multicollinearity in the non-redundant portion among the explanatory variables as explained, multicollinearity arising from the rank deficiency of matrix remains inevitable. Consequently, as illustrated in Fig. 2b, inherent correlations among the explanatory variables cause components that are essential for predicting to be overlooked. Notably, the PFI of highly correlated variables tends to be underestimated relative to that of other variables (Fig. 3b). Given the objective of the initial screening of medium components, it is important to reduce the risk of overlooking key variables. Therefore, a method for the importance estimation that prioritizes sensitivity even if allowing some redundancy is necessary.
To address this issue, we adopted the method proposed by Kaneko et al. (Kaneko, 2022), which involves simultaneously shuffling groups of highly correlated variables during the calculation of PFI. This method ensures that the importances are not underestimated and are appropriately quantified (Fig. 3c). The adjusted PFI, which accounts for the correlations among variables, was calculated as follows:
The absolute correlation coefficient rp,q between the p-th and q-th explanatory variables was calculated.
- This was then transformed into via the Fisher -transformation as follows:
21 With the sample size defined as , was assumed to follow a normal distribution with mean and variance . The lower and upper limits of the interval at significance level were denoted by and .
- and were converted into and , respectively, which represent the lower and upper bounds of the correlation coefficient , and calculated as follows:
22 If , the correlation was considered incidental and was assigned.
Otherwise, was adopted.
- In the equation for PFI, when calculating predictive scores for the shuffled test set of the -th explanatory variable, the following steps were carried out:
- For each explanatory variable , the procedure below was applied to each sample in the test set:
-
(i)A random number in was generated.
-
(ii)If , the corresponding sample value of explanatory variable was shuffled as well.
-
(i)
Simulation of mother media
Desirable mother media themselves were simulated using the following steps.
- The following was iterated 100 times.
- The following was iterated 1000 times.
-
(i)Five concentration levels were randomly assigned to each component concentration value for each mother medium (e.g., 0, 1, 2, 3, and 4).
-
(ii)A combination of mother media was selected in which the three penalties below were not applicable or were the smallest. A high penalty was set so that the stated condition never occurred.
- High penalty: variances of variables (components) and samples are zero.
- High penalty: the combinations of |r|> 0.9 between variables are included in the selected mother media.
- Penalty: the combinations of |r|> 0.7 between variables are included in the selected mother media.
-
(i)
- The experimental conditions (how to blend the mother media) were determined according to the workflow as explained.
The mother media set that yielded the in 100 iterations was selected as the best one.
Results
Experimental design of media blending for the initial screening of medium components (a case study)
As explained above, commercially available CDM were used as mother media to maximize simplicity and applicability for all researchers. To obtain matrix , as defined in Eq. (1), the medium components were analyzed using LC-MS/MS and ICP-MS to determine the relative concentrations. As a result, a total of 67 components were detected and quantified. The results of hierarchical clustering based on the similarity of the medium components are shown in Fig. 4a. The CDM with the highest similarity were Ham’s F-12 Nutrient Mix and RPMI 1640. Removing one of the highly similar CDM is an option when selecting which CDM to use as the mother media. However, Ham’s F-12 Nutrient Mix (Ham 1965) and RPMI 1640 (Moore et al. 1966) were developed from different origins and have different characteristics at the component level, such as hypoxanthine and thymidine. Therefore, it was decided to use all 11 CDM as the mother media because an increase in absolute data variability could be expected.
Fig. 4.
Media analysis and the selection of experimental conditions in a case study. a Differences in the profiles of the detected medium components among 11 types of chemically defined media used for blending. The color gradient in the heatmap indicates the relative values of each component between media: red represents high level and blue represents low level. The dendrograms above and left of the heatmap indicate the hierarchical clustering of medium components and the media, respectively, using Ward’s method with the Euclidean distance as the metric. The height at which two branches merge reflects the degree of dissimilarity between the clusters, with lower heights indicating higher similarity. F12: Ham’s F-12 Nutrient Mix; RPMI: RPMI 1640; CH400AZ: CH400AZ; PM: 4Cell® SmartCHO Media System Production Medium; SAM: 4Cell® SmartCHO Stock & Adaptation Medium; CDCHO: 1X CD CHO; Forti: CD FortiCHO™; Opti: CD OptiCHO™; TC42: TC-42 w/o GF; and Acti: HyClone™ ActiPro™. b Histogram of 10,000 . The experimental conditions were selected where the determinant value was the largest, located on the right-hand side of the histogram. c Heatmap of Pearson’s product-moment correlation coefficients of the condition with the minimum (left) and maximum (right) determinant. The lighter the red or blue, the smaller the correlation between the principal components. PC: principal component
The analytical results for CDM can be regarded as an 11 × 67 matrix , where the number of mother media (:11) is less than the number of medium components (:67), as explained as the mathematical constraint in the experimental design. Therefore, PCA was applied to project matrix onto matrix , a low-dimensional orthogonal space that retains as much variance as possible. While it is obvious from the equations (Supplemental Note S2), it is shown that the principal components of matrix are orthogonal, indicating no inter-correlation (i.e., multicollinearity) in Supplemental Fig. S1. Then, matrix was calculated using Eq. (14). Multiple iterations (e.g., 10,000 iterations in this case study) were performed to randomly select the number of experimental conditions (e.g., 120 conditions in this case study) from the rows in matrix , resulting in matrix . The histogram of is shown in Fig. 4b. The experimental condition with is on the right-hand side in the histogram, and this was the condition used in the experiments. Pearson’s product-moment correlation coefficients between the principal components and variance inflation factor, which is an indicator for multicollinearity, were compared under the condition with the maximum determinant and the other conditions (the minimum and median determinant) (Table 1). As the determinant increased, both the maximum and mean absolute values of the correlation coefficients and maximum VIF tended to decrease, reaching the smallest values at the maximum determinant. Additionally, the correlation coefficients of all combinations were visualized under the conditions of providing minimum and maximum determinants (Fig. 4c). It was clearly shown that the overall uncorrelatedness improved. Thus, as expected given the mathematical explanation, the results supported that our proposed workflow enables systematic selection of the intended experimental conditions that mitigate multicollinearity among the variables.
Table 1.
Maximum and mean absolute values of the correlation coefficients and maximum variance inflation factor (VIF) among the principal components of the medium components
| Minimum | 50% | Maximum*1 | |
|---|---|---|---|
| Determinant () | 1.40 | 2.59 | 3.76 |
| Maximum |r| | 0.491 | 0.327 | 0.219 |
| Average |r| | 0.118 | 0.092 | 0.075 |
| Maximum VIF | 2.11 | 1.41 | 1.20 |
*1 The condition with the maximum determinant indicates the condition to be tested
Batch culture with the experimental conditions designed using the proposed workflow
In this section, the results of the batch culture under the experimental conditions selected by the proposed workflow are presented. The screening of medium components in CHL-YN cells (Yamano-Adachi et al. 2020) was conducted as a case study. CHL-YN is a lung fibroblast-derived cell line that has similar properties to CHO cells in terms of the availability of serum-free suspension culture and a glycosylation profile. Additionally, the specific growth rate of CHL-YN cells is twice that of CHO-K1 cells. Therefore, CHL-YN is expected to be a novel host for antibody production to achieve shortened culture process development and production culture duration. However, the screening of the medium components for CHL-YN cells has not been adequately conducted. This case study was expected to assess the potential for the initial screening of components in antibody production using a new host. Batch cultures with IgG1-producing CHL-YN cells were conducted using 24 well plates under the selected 120 experimental conditions (Fig. 5a). Culture conditions for CHL-YN cells using 24 well plates were determined based on conditions reported for CHO cells (Chaturvedi et al. 2014). The changes over time in major culture parameters and extracellular metabolic components were comparable with the results in flask batch cultures (Supplemental Fig. S2). This batch culture was performed within a range exceeding 80% viability (Fig. 5b). The harvest time was set to 69 h in this case study. The stationary phase (corresponding to peak VCC) with high viability (> 95%) was expected to be recorded between 48 and 72 h in this culture condition based on preliminary studies using the EX-CELL® CD CHO Fusion medium. As expected, peak VCC were recorded at 69 h in most conditions; however, 8% in the total were recorded at 96 h (Fig. 5c). The value of VCC at 96 h in the 8% conditions was only 10% higher on average than at 69 h, which suggests that the culture was in the stationary phase from 69 to 96 h. Therefore, it was considered reasonable to set 69 h as the time for the harvest across all conditions. The VCCs at harvest were recorded from 5.8 × 106 to 19.4 × 106 cells/mL (Fig. 5d). The IgG1 concentrations in culture supernatants at harvest were recorded from 9.1 to 22.1 mg/L (Fig. 5e). The correlation between the VCC and IgG1 concentrations at harvest was then evaluated and the correlation coefficient was 0.54 (p < 0.01), which suggests that the variation in VCC was one of the reasons for the variation in IgG1 concentrations (Fig. 5f). Therefore, in this case study, an attempt was made to construct a model to explain VCC and to estimate the components that made a high contribution to the model.
Estimation of the components related to culture performance
An attempt was made to estimate the medium components that contributed to the variation in VCC at harvest using the dataset of batch culture with 120 experimental conditions. The variation in VCC among the conditions represented approximately 80% of the total variation (Fig. 6a), and we hence constructed regression models for this variation. The explanatory variables () were the theoretical relative concentrations of each component and the objective variable () was the VCC at harvest. A total of 10 algorithms, linear and nonlinear regression algorithms, were used under the specific assumptions as described. Model construction was performed following a guideline for supervised learning approaches in biological studies (Walsh et al. 2021) (Fig. 3a). The R2 metric directly reflects the explainability of the variation in VCC and was calculated using the entire training set and entire test set separately (Fig. 6b). Consequently, the non-linear methods achieved high R2 values on the training set; in particular, Nonlinear GPR and RF reached accuracies approaching the upper limit of R2 (0.8) as suggested by Fig. 6a. However, for these non-linear approaches, the R2 on the test set was lower than that on the training set, suggesting overfitting. In contrast, linear models exhibited comparable R2 values for both the training and test sets, indicating that they captured the fundamental structure of the entire dataset. To assess the significance of the predictive accuracy of linear models, the test set was randomly split into subsets, and the MSE and its variability were computed (Fig. 6c). Because MSE is sensitive to outliers, it was employed for evaluating large predictive errors. In this analysis, in addition to the 10 models, a baseline model that predicts the mean of VCC was also included. Similar to the results for R2, these results also showed that the nonlinear models tended to overfit, whereas this trend was not observed in the linear models. One-sided paired t-tests (with Holm’s correction, α = 0.05) were conducted between the MSE of the baseline model and those of the other models for the test set, and all models showed significantly lower MSE values (p < 0.01). Thus, it was suggested that the linear models were more suitable for predicting VCC than the baseline model and could capture the overall trend of the dataset without overfitting. Superior models were finally selected based on the criterion that the ratio of the average predictive error (R2 and MSE) between the training and test sets are within 15%. In this case study, the superior models were PLS-R, Ridge, Lasso, and elastic net. In Fig. 6d, scatter plots of the test set predictions using the superior models are shown, and it was observed that all models captured the overall trend of the test set.
Fig. 6.
Regression modeling with VCC as the objective variable. a Breakdown of the sum of squares between (SSB) and within (SSW) for VCC, illustrating the ratios of SSB and SSW relative to the total sum of squares (TSS), normalized to 1 for the entire dataset, training set, and test set. b Coefficient of determination (R2) for the prediction of VCC at harvest by 10 different models. Blue bars indicate the prediction results for the training set and orange bars indicate the prediction results for the test set. c Box plots of the mean squared error for predicting VCC at harvest, comparing 10 different models with the baseline model. A one-sided paired t-test (Holm’s correction, α = 0.05) was used to compare each model with the baseline (Mean) on the test set. All models showed significantly lower MSE (p < 0.01). d Scatter plot presenting the predicted values (vertical axis) and observed (true) values (horizontal axis) in the superior models. Each point represents one sample from the test set, and the red dashed line indicates the ideal case when the predicted values equal the observed values. PLS-R, partial least squares regression; Ridge, ridge regression; Lasso, least absolute shrinkage and selection operator; Elastic Net, elastic net regression; Linear SVR, linear support vector regression; Nonlinear SVR, nonlinear support vector regression; Linear GPR, linear Gaussian process regression; Nonlinear GPR, nonlinear Gaussian process regression; RF, random forest regression; GBDT, gradient boosting decision tree regression; and XGBoost, extreme gradient boosting regression; IQR, interquartile range
The medium components that made a high contribution in the construction of these superior models were then estimated. We employed the adjusted PFI as a metric capable of appropriately calculating contributions even under the unavoidable partial multicollinearity.
The adjusted PFI of the four superior models were calculated. The top 25% (i.e., top 17 components) components in each model are shown in Table 2. The 12 components consistently in the top 25% among all four superior models were identified. The 12 components were asparagine, tyrosine, glucose, choline, histidine, leucine, Zn, valine, lysine, 5-oxoproline, Cu, and methionine. Most of these components were regarded as reasonable choices, based on previous findings with CHO cells and the characteristics of CHL-YN cells (a detailed interpretation is given in the Discussion part). Additionally, PLS-R was among the superior models, which facilitates the use of the variable importance in projection (VIP) score, which is a metric capable of appropriately computing variable contributions even when multicollinearity exists (Wold et al. 2001). When the 12 components were compared with the result based on the VIP score, 10 of the components were the same (Supplemental Table S3). This suggested that the observed results are largely consistent with the historically validated metric. Consequently, the entire process (from experimental design to modeling and its interpretation) of screening medium components using media blending was successfully implemented as a systematic workflow that eliminated the need for human intervention.
Table 2.
Top 25% components of adjusted permutation feature importance (PFI) in superior models
*1 The components identified as common among the superior models are indicated as follows: green: common in the top 5%, yellow: common in the top 10%, blue: common in the top 15%, orange: common in the top 20%, and gray: common in the top 25%
*2 PLS-R: partial least squares regression; Ridge: ridge regression; Lasso: least absolute shrinkage and selection operator; Elastic Net: elastic net regression
Simulation of the mother media
We have shown that the proposed workflow can be used to systematically select components to be optimized throughout media blending. By contrast, the availability of a commercially available mother media set designed primarily for media blending with disclosed compositions would be more desirable to further enhance the interpretability of screening results. Now that the mathematical process has been established, we can conduct simulations based on a quantitative index. At the end of the study, we simulated the appropriate composition of mother media for the purpose of evaluating the benefits of developing such a mother media set and raising awareness of the development of the foundation for media blending.
First, we simulated and defined a mother media set in the same condition of the case study described in the “Breakdown of each matrix” section (number of mother media: 11, number of medium components: 67, number of experimental conditions: 120). As a result, the ratio of combinations for which |r|> 0.7 among the medium components in 11 different mother media was 0.05% in the simulation. This was sufficiently low compared with the 12% of combinations in this case study using CDM as mother media. Furthermore, blending these simulated mother media with our proposed workflow resulted in an average |r| of 0.26 among the medium components under 120 experimental conditions, and the ratio of combinations with |r|> 0.7 was 0.36% (Fig. 7a). For 67 medium components, the number of combinations between two components was 2211. Thus, 0.36% corresponds to approximately only eight combinations. This indicates that the number of highly correlated combinations was less than one per component, on average. Researchers may consider it sufficiently easy to interpret the importance of each variable in this scenario. These results suggest that the development of a dedicated mother media set for media blending improved the effectiveness of interpretation.
Fig. 7.
Simulation of mother media. a Heatmap of Pearson’s product-moment correlation coefficients (r) between medium components in 120 experimental conditions using simulated mother media. The simulation setup conditions are the same as those used in the case study (number of mother media: 11, number of media blending: 6, number of media components: 67). Only combinations with |r|> 0.7, which indicates high correlation, are shown. Red represents r > 0.7 and blue represents r < 0.7. b Ratios of combinations with |r|> 0.7 (z-axis) between medium components in 120 experimental conditions in each parameter. The parameters are the number of components and mother media. The number of components was set to 25, 50, 75, and 100. The number of mother media was set to 4, 6, 8, 10, and 12
Next, we evaluated the number of mother media sufficient for the number of medium components to be screened. We simulated the number of components to be screened varied at 25, 50, 75, and 100, and the number of mother media varied at 4, 6, 8, 10, and 12. In this simulation, we generated 120 experimental conditions by media blending with our proposed workflow. Figure 7 b shows the ratios of combinations with |r|> 0.7 for each pairing of the number of components and number of mother media. Interestingly, at the same number of mother media, the number of components to be screened had no effect on the ratio of combinations |r|> 0.7. By contrast, the number of mother media seemed to dominate the effect on highly correlated combinations. Even when the number of components to be screened was 100, the number of highly correlated combinations was 16 when the number of mother media was 12. Thus, we considered that 12 mother media would be sufficient in most cases. However, because increasing the number of mother media is critical to cost and work efficiency, it is better to determine the number of mother media based on the number of components to be screened.
Although commercially available CDM as mother media and our proposed method will allow all researchers to begin screening medium components soon, the development of a commercially available mother media set designed for media blending would further enhance the usefulness of this approach, as shown in this section.
Discussion
In this study, we proposed a workflow for the initial screening of medium components through media blending. This workflow consists of three steps: (ⅰ) experimental design, (ⅱ) cell culture experiment, and (ⅲ) regression modeling and its interpretation. To establish a systematic experimental design method, we mathematically redefined media blending for the first time and formulated the problem to be solved in the experimental design step. Media blending can be treated as a matrix calculation, and to efficiently execute the screening of medium components, we proposed the fundamental strategy of maximizing the volume of parallelepiped, as illustrated in Fig. 2c. This approach corresponds to D-optimal design in the context of DoE, and it was deemed a reasonable means for selecting appropriate experimental conditions, even with observational data. By contrast, we clarified a constraint (multicollinearity), which is inherent to media blending because the number of mother media is smaller than the number of medium components. This constraint poses challenges to systematic experimental design with the standard use of D-optimal design. Consequently, we explored a dimensionality reduction method aimed at reducing the matrix to a vector space free of multicollinearity while retaining the variance in the mother media. Finally, the dimensionality reduction method is equivalent to PCA (Fig. 2d). Using PCA for dimensionality reduction, we were able to execute the experimental design under the specific mathematical constraint of media blending. The overall workflow for the experimental design is summarized in the flowchart in Fig. 2e. When constructing and interpreting the models, we built models using both linear and nonlinear approaches under the assumption that the medium components exhibit complex behavior. We then comprehensively evaluated their predictive performance and generalizability to define the superior models. Given that the set of superior models might include multiple types, we applied a PFI-based method that can estimate variable contributions regardless of the model type. However, multicollinearity remained a challenge at this stage, as conventional PFI tends to underestimate the contributions of highly correlated variables. To address this issue, we introduced an adjusted PFI that groups highly correlated variables for evaluation, thereby enabling a more accurate estimation of their contributions (Fig. 3c). Finally, we proposed a systematic and feasible workflow that could handle the entire process from experimental design to data analysis based on quantitative indices.
We applied the workflow to medium components screening for IgG1-producing CHL-YN cells. This enabled the identification of the 12 components set that could be related to the variation of the VCC from the 67 components detected in the CDM. The next stage of optimization allowed us to determine the optimal compositions of these components in a basal or feed medium suitable for CHL-YN cells. The 12 components were asparagine, tyrosine, glucose, choline, histidine, leucine, Zn, valine, lysine, 5-oxoproline, Cu, and methionine. Notably, asparagine, tyrosine, and glucose were common in the top 10% among all models. Glucose is clearly an important carbon source that accounts for 65% of the influx carbon reported in a metabolic flux analysis study using CHO-K1 cells (Nicolae et al. 2014). We previously reported that CHL-YN cells also require glucose in glucose-controlled fed-batch cultures (Sukwattananipaat et al. 2024). Therefore, we assumed that glucose was one of the top factors that could be related to VCC in CHL-YN batch culture. Asparagine depletion has been shown to severely limit cell growth, even in the presence of aspartic acid in GS-CHO (Duarte et al. 2014). Asparagine is converted to aspartic acid and used to supply intermediates in the tricarboxylic acid (TCA) cycle and is also considered important in the supply of arginine to maintain the ornithine synthesis pathway in CHL-YN cells (Sukwattananipaat et al. 2024). The ability to produce ornithine is a metabolic feature of CHL-YN cells that is absent in CHO-K1 cells, and this is supported by the mRNA expression of arginase-2 (Sukwattananipaat et al. 2024). Ornithine is used as a precursor to reactions that produce polyamines and glutamic acid. In a previous study, researchers indicated that polyamines enhance cell growth in CHO cells (Capella Roca et al. 2019). In our previous analysis of putrescine, one of the polyamines, we did not detect it in CHO-K1 cells, whereas we did detect it in CHL-YN cells, which suggests that polyamines are produced in CHL-YN cells. Glutamic acid is used to produce not only α-ketoglutaric acid, an intermediate in the TCA cycle, but also glutamine in CHL-YN cells. CHL-YN cells can produce glutamine because the expression level of glutamine synthase is higher than that in CHO cells (Yamano-Adachi et al. 2020). Glutamine is produced by glutamine synthase using glutamic acid and ammonia as substrates. Given that excessive accumulation of ammonia is widely known to induce cytotoxicity in CHO cells (Synoground et al. 2021), we expect glutamine synthesis ability in CHL-YN cells to be advantageous for cell growth. From the above, it is reasonable that asparagine was consistently of high importance in all superior models in this study because it is involved in several important metabolic reactions in CHL-YN cells. Tyrosine has been shown to suppress endoplasmic reticulum (ER) stress and apoptosis in CHO cells, accompanied by the synthesis of coenzyme Q10 (Shibafuji et al. 2023). Moreover, the depletion of tyrosine during culture is known to result in decreased viable cell concentrations (Zhang et al. 2020). Considering that tyrosine is sharply consumed in CHL-YN cells (Sukwattananipaat et al. 2024), the identification of tyrosine is also deemed a reasonable result.
Clearly, it is worth noting that this workflow is also applicable to commercially available CDM, as demonstrated in the case study. The use of commercially available CDM designed for the intended purpose would enable researchers to evaluate the appropriate concentration ranges without requiring extensive prior knowledge. When researchers screen components for new hosts, as conducted in this study, the use of mother media with considerably different profiles, such as Ham’s F-12 Nutrient Mix or RPMI 1640, would be effective to evaluate a wide range of concentrations. In this case, the common medium used in the pre-culture should be included to minimize the need for cell adaptation to a significantly different composition. Therefore, two types of designs should be considered: one where a fixed amount of the common medium is used for evaluation and another where the entire working volume is subject to blending mother media without the common medium. Scripts for both types of experimental design are available on GitHub (https://github.com/HK-bio/medium_study_repo/tree/main/paper_202504).
In this study, we were the first to mathematically explain the constraints of media blending. Although we proposed a workflow that works under these constraints, it is still necessary to acknowledge that the results obtained from this workflow need to be interpreted carefully. In the modeling and its interpretation, we applied adjusted PFI to multiple models, which allowed for the appropriate calculation of importance, even in the presence of multicollinearity (Fig. 3c). The approach successfully narrowed down the number of medium components of interest. We expect the reduction in the number of variables to be handled to make subsequent optimization work feasible. However, we recognize that it is limited only to the narrowing down of the component “set.” In the presence of multicollinearity, it is not possible to assess the effect of individual medium components and the interactions. Thus, after narrowing down the components to a manageable number through initial screening with media blending, we should acknowledge that it is necessary to verify the effects of individual components and optimize their concentrations using optimization methods (e.g., factorial design or Bayesian optimization). It is important to note that if multicollinearity can be eliminated, the effects and interactions of individual components can be assessed in a single step, but this diminishes the advantages of the simplicity of media blending. As explained thus far, mathematically eliminating multicollinearity requires that the number of mother media is at least equal to the number of components. The primary advantage of media blending is its ability to easily screen multiple components; hence, preparing more mother media than the number of components is impractical and diminishes the convenience of media blending. From this perspective, the workflow presented in this study can serve as a general guideline for conducting media blending.
Regarding the key component set identified by a data-driven approach, genome-scale models (GEMs) can be useful for evaluating the effects and interpretability of individual components. GEMs computationally describe a whole set of stoichiometry-based, mass-balanced metabolic reactions using gene-protein reactions, and their application to media design has been advancing (Park et al. 2024a). When a linear relationship between component concentration and specific consumption rate can be assumed, it is possible to simulate the effect of the medium composition on culture performance by arbitrarily changing the constraint values (in this case, the specific consumption rates of components) in flux balance analysis (FBA). For example, Pang et al. (2024) simulated the culture performance by varying the asparagine/aspartic acid ratio in enzyme capacity-constrained FBA (ecFBA). Additionally, several studies combining data-driven approaches and genome-scale models have been reported. For example, Hong et al. (2022) identified key metabolites that distinguish differences between culture conditions, based on VIP scores derived from partial least squares discriminant analysis (PLS-DA), and then detected enriched pathways (specifically, TCA cycle) by metabolite set enrichment analysis. By estimating fluxes using ecFBA with a GEM, specific metabolic bottlenecks in the TCA cycle were identified. Finally, simulation results suggested that increasing the specific consumption rate of coenzyme Q10 would resolve the bottleneck, and this was verified by supplementing coenzyme Q10. This approach has been applied not only to basal media (Yeo et al. 2022; Hong et al. 2022) but also to feed media (Park et al. 2023, 2024b). Such a combination of data-driven approaches and GEMs enables both structural understanding of experimental data and interpretability, thereby offering a promising direction for further improvements in medium design efficiency. Since the construction of iCHO1766 (Hefzi et al. 2016), GEMs of CHO cells have been improved by increasing the number of reactions and developing contextualized models specific to cell lines. CHL-YN cells were also derived from a Chinese hamster but have metabolic characteristics that differ from those of CHO cells (Yamano-Adachi et al. 2020; Sukwattananipaat et al. 2024). We are currently developing a GEM specific to CHL-YN cells, and it is expected to be applied.
We recognize that the present study has been confined to CHL-YN cells, and we consider this a limitation in terms of demonstrating the full range of applications. Nevertheless, the underlying formalized methodology is designed to be broadly applicable, and we expect for it to be effectively adapted to other cell types in future studies. In addition to the scope of application with other cell types, it is important to consider the limitations imposed by the sample size in bioprocessing applications. In this study, we employed a small-scale culture with a total culture volume of 2.5 mL, which allowed us to obtain data from 360 samples. However, in industrial bioprocessing applications using bioreactors, it is often only feasible to test a limited number of conditions per culture. In supervised learning, a smaller sample size increases the likelihood of overfitting, particularly in complex machine-learning models (Ying 2019). Therefore, when deploying such nonlinear models, it is essential to rigorously assess their generalizability. Recent studies have reported that transfer learning and meta-learning can be successfully applied to bioprocessing, facilitating the development of effective models from limited samples (Helleckes et al. 2024). Furthermore, hybrid modeling that combines data-driven modeling and kinetic modeling is also expected to be a method for improving generalizability (Narayanan et al. 2023). Kinetic models are expressed as differential–algebraic equations and formalize knowledge based on physical principles, thereby providing a reasonable explanation of the general behavior of the cell culture system. It is possible to improve both predictive accuracy and generalizability by expressing the general behavior using kinetic models and fine-tuning the complex nonlinearities specific to cell lines or culture conditions using data-driven models. For example, Okamura et al. (2022) constructed a kinetic model using mass balance equations and Monod equations, and then used a data-driven model to update predictions for metabolites such as lactate which show non-linear and drastic changes and fed these corrections back to adjust the model parameters. As a result, it was suggested that hybrid modeling can improve flexibility in modeling. Therefore, it is expected that the use of new machine learning techniques such as transfer learning and the integration with kinetic modeling will enable the construction of generalizable models with the benefit of non-linear explainability of data-driven approaches.
We await the availability of mother media dedicated to media blending from suppliers. This means that, in addition to the simplicity of media blending, interpretability is maximized through compositional disclosure and the minimization of correlations between components. This is precisely what is now possible because the mathematical understanding and proposed workflow enabled the design of mother media through simulation, as shown in this study. We showed that the development of a dedicated mother media set further improves the interpretability of the results based on correlations. In the culture experiment case study, 12% of all combinations had |r|> 0.7, whereas simulations under the same conditions showed only 0.05% of all combinations among medium components. Additionally, it is important to focus on increasing the number of mother media rather than decreasing the number of components to be screened to reduce correlations among components as much as possible. Clearly, increasing the number of mother media increases complexity, which leads to decreased work efficiency and increased costs; hence, it is important to define appropriate criteria. For example, a criterion such as “no more than one highly correlated combination per component” could be considered.
In terms of practicality, the number of culture experimental conditions in one operation varies depending on each experiment. D-optimal design allows researchers to determine the number of experimental conditions to be cultured optionally. In the experiment in this study, we performed pipetting manually for blending media, but it can be automated using a liquid handling system. We are currently working on the automation of pipette manipulation. We plan to provide code for automation that can be incorporated into the workflow presented in this study.
To summarize, we outlined the challenges to progress in media blending. This meant reassessing media blending from a mathematical perspective. Finally, we presented a systematic workflow for screening medium components throughout media blending. The workflow can be applied to any case, including the case using commercially available CDM. Additionally, the mathematical process throughout the entire step made it possible to design appropriate mother media sets backward. The workflow and simulation scripts presented in this study are publicly accessible. We look forward to the results of this study leading to further media blending developments in the future.
Supplementary information
Below is the link to the electronic supplementary material.
(PDF 262 KB)
(PDF 966 KB)
Acknowledgements
The authors extend their gratitude to Akari Noma for her helpful advice. HK also thank Yasuhiro Mizutani, Junko Iida, Hiroaki Nakanishi, Daisuke Hiramaru, Kaori Sugimura, Taichi Tomono, Tomoki Okubo, Akira Noda, Yusuke Nakai, and Yusuke Tamai for their useful advice and discussions. The authors thank Edanz (https://jp.edanz.com/ac) for editing a draft of this manuscript. This study represents a portion of a dissertation submitted by Hirotaka Kuroda to Osaka University in partial fulfillment of the requirements for his Ph.D.
Author contributions
HK conceptualized and designed the study. HK and KS conducted the experiments and data analysis. HK wrote the manuscript and code. NYA and TO acquired funding, and supervised the project. All authors read and approved the manuscript.
Funding
Open Access funding provided by The University of Osaka. This work was supported by the Japan Agency for Medical Research and Development (AMED) under Grant Numbers JP18ae0101066 and JP21ae0121021; and the Japan Society for the Promotion of Science (JSPS) under KAKENHI Grant Number JP22H00276.
Data availability
The code for experimental design in media blending described in Fig. 2e is publicly available from the ‘Workflow_blending_media’ folder in the GitHub repository (https://github.com/HK-bio/medium_study_repo/tree/main/paper_202504). The code for regression modeling and evaluation described in Fig. 3a are publicly available from the ‘Modeling’ folder in the same repository. The code for the simulation of mother media set is available from the ‘Simulation_of_mother_media’ folder in the same repository. The raw data of media analysis in Fig. 2a is not publicly available due to the risk of reverse engineering causing a loss of profit for the companies developing cell culture media.
Declarations
Ethics approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Consent for publication
All authors have read and approved the manuscript for submission and publication.
Conflict of interest
The authors declare no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- Biedermann AM, Gengaro IR, Rodriguez-Aponte SA, Love KR, Love JC (2022) Modular development enables rapid design of media for alternative hosts. Biotechnol Bioeng 119:59–71. 10.1002/bit.27947 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Breiman L (2001) Random forests. Mach Learn 45:5–32. 10.1023/a:1010933404324 [Google Scholar]
- Capella Roca B, Lao N, Barron N, Doolan P, Clynes M (2019) An arginase-based system for selection of transfected CHO cells without the use of toxic chemicals. J Biol Chem 294:18756–18768. 10.1074/jbc.RA119.011162 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chaturvedi K, Sun SY, O’Brien T, Liu YJ, Brooks JW (2014) Comparison of the behavior of CHO cells during cultivation in 24-square deep well microplates and conventional shake flask systems. Biotechnol Rep (Amst) 1–2:22–26. 10.1016/j.btre.2014.04.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Combe M, Sokolenko S (2021) Quantifying the impact of cell culture media on CHO cell growth and protein production. Biotechnol Adv 50:107761. 10.1016/j.biotechadv.2021.107761 [DOI] [PubMed] [Google Scholar]
- Cordova LT, Dahodwala H, Elliott KS, Baik J, Odenewelder DC, Nmagu D, Skelton BA, Uy L, Klaubert SR, Synoground BF, Chitwood DG, Dhara VG, Naik HM, Morris CS, Yoon S, Betenbaugh M, Coffman J, Swartzwelder F, Gillmeister MP, Harcum SW, Lee KH (2023) Generation of reference cell lines, media, and a process platform for CHO cell biomanufacturing. Biotechnol Bioeng 120:715–725. 10.1002/bit.28290 [DOI] [PubMed] [Google Scholar]
- Duarte TM, Carinhas N, Barreiro LC, Carrondo MJT, Alves PM, Teixeira AP (2014) Metabolic responses of CHO cells to limitation of key amino acids. Biotechnol Bioeng 111:2095–2106. 10.1002/bit.25266 [DOI] [PubMed] [Google Scholar]
- Galbraith SC, Bhatia H, Liu H, Yoon S (2018) Media formulation optimization: current and future opportunities. Curr Opin Chem Eng 22:42–47. 10.1016/j.coche.2018.08.004 [Google Scholar]
- Gangwar N, Balraj K, Rathore AS (2024) Explainable AI for CHO cell culture media optimization and prediction of critical quality attribute. Appl Microbiol Biotechnol 108:308. 10.1007/s00253-024-13147-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ham RG (1965) Clonal growth of mammalian cells in a chemically defined, synthetic medium. Proc Natl Acad Sci U S A 53:288–293. 10.1073/pnas.53.2.288 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hashizume T, Ying B-W (2024) Challenges in developing cell culture media using machine learning. Biotechnol Adv 70:108293. 10.1016/j.biotechadv.2023.108293 [DOI] [PubMed] [Google Scholar]
- Hashizume T, Ozawa Y, Ying B-W (2023) Employing active learning in the optimization of culture medium for mammalian cells. NPJ Syst Biol Appl 9:20. 10.1038/s41540-023-00284-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hefzi H, Ang KS, Hanscho M, Bordbar A, Ruckerbauer D, Lakshmanan M, Orellana CA, Baycin-Hizal D, Huang Y, Ley D, Martinez VS, Kyriakopoulos S, Jiménez NE, Zielinski DC, Quek L-E, Wulff T, Arnsdorf J, Li S, Lee JS, Paglia G, Loira N, Spahn PN, Pedersen LE, Gutierrez JM, King ZA, Lund AM, Nagarajan H, Thomas A, Abdel-Haleem AM, Zanghellini J, Kildegaard HF, Voldborg BG, Gerdtzen ZP, Betenbaugh MJ, Palsson BO, Andersen MR, Nielsen LK, Borth N, Lee D-Y, Lewis NE (2016) A Consensus Genome-scale Reconstruction of Chinese Hamster Ovary Cell Metabolism. Cell Syst 3:434-443.e8. 10.1016/j.cels.2016.10.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Helleckes LM, Wirnsperger C, Polak J, Guillén-Gosálbez G, Butté A, von Stosch M (2024) Novel calibration design improves knowledge transfer across products for the characterization of pharmaceutical bioprocesses. Biotechnol J 19:e2400080. 10.1002/biot.202400080 [DOI] [PubMed] [Google Scholar]
- Hong JK, Choi D-H, Park S-Y, Silberberg YR, Shozui F, Nakamura E, Kayahara T, Lee D-Y (2022) Data-driven and model-guided systematic framework for media development in CHO cell culture. Metab Eng 73:114–123. 10.1016/j.ymben.2022.07.003 [DOI] [PubMed] [Google Scholar]
- Jordan M, Voisard D, Berthoud A, Tercier L, Kleuser B, Baer G, Broly H (2013) Cell culture medium improvement by rigorous shuffling of components using media blending. Cytotechnology 65:31–40. 10.1007/s10616-012-9462-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaneko H (2022) Cross-validated permutation feature importance considering correlation between features. Anal Sci Adv 3:278–287. 10.1002/ansa.202200018 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaneyoshi K, Yamano-Adachi N, Koga Y, Uchiyama K, Omasa T (2019) Analysis of the immunoglobulin G (IgG) secretion efficiency in recombinant Chinese hamster ovary (CHO) cells by using Citrine-fusion IgG. Cytotechnology 71(1):193–207. 10.1007/s10616-018-0276-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Landauer K (2014) Designing media for animal cell culture: CHO cells, the industrial standard. Methods Mol Biol 1104:89–103. 10.1007/978-1-62703-733-4_7 [DOI] [PubMed] [Google Scholar]
- Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Neural Inf Process Syst 30:4765–4774 [Google Scholar]
- Martins B, Bister A, Dohmen RGJ, Gouveia MA, Hueber R, Melzener L, Messmer T, Papadopoulos J, Pimenta J, Raina D, Schaeken L, Shirley S, Bouchet BP, Flack JE (2024) Advances and challenges in cell biology for cultured meat. Annu Rev Anim Biosci 12:345–368. 10.1146/annurev-animal-021022-055132 [DOI] [PubMed] [Google Scholar]
- Moore GE, Ito E, Ulrich K, Sandberg AA (1966) Culture of human leukemia cells. Cancer 19:713–723. 10.1002/1097-0142(196605)19:5%3c713::aid-cncr2820190518%3e3.0.co;2-y [DOI] [PubMed] [Google Scholar]
- Narayanan H, von Stosch M, Feidl F, Sokolov M, Morbidelli M, Butté A (2023) Hybrid modeling for biopharmaceutical processes: advantages, opportunities, and implementation. Front Chem Eng. 10.3389/fceng.2023.1157889 [Google Scholar]
- Nicolae A, Wahrheit J, Bahnemann J, Zeng A-P, Heinzle E (2014) Non-stationary 13c metabolic flux analysis of Chinese hamster ovary cells in batch culture using extracellular labeling highlights metabolic reversibility and compartmentation. BMC Syst Biol 8:50. 10.1186/1752-0509-8-50 [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Flaherty R, Bergin A, Flampouri E, Mota LM, Obaidi I, Quigley A, Xie Y, Butler M (2020) Mammalian cell culture for production of recombinant proteins: a review of the critical steps in their biomanufacturing. Biotechnol Adv 43:107552. 10.1016/j.biotechadv.2020.107552 [DOI] [PubMed] [Google Scholar]
- Okamura K, Badr S, Murakami S, Sugiyama H (2022) Hybrid modeling of CHO cell cultivation in monoclonal antibody production with an impurity generation module. Ind Eng Chem Res 61:14898–14909. 10.1021/acs.iecr.2c00736 [Google Scholar]
- Ozawa H, Matsumoto T, Nakagawa M (2023) Culturing human pluripotent stem cells for regenerative medicine. Expert Opin Biol Ther 23:479–489. 10.1080/14712598.2023.2225701 [DOI] [PubMed] [Google Scholar]
- Pandey U, Madhugiri I, Gadgil C, Gadgil M (2024) Leveraging machine learning to dissect role of combinations of amino acids in modulating the effect of zinc on mammalian cell growth. Biotechnol Prog 40:e3436. 10.1002/btpr.3436 [DOI] [PubMed] [Google Scholar]
- Pang KT, Hong YF, Shozui F, Furomitsu S, Myint M, Ho YS, Silberberg YR, Walsh I, Lakshmanan M (2024) Genome-scale modeling of CHO cells unravel the critical role of asparagine in cell culture feed media. Biotechnol J 19:e202400072. 10.1002/biot.202400072 [DOI] [PubMed] [Google Scholar]
- Park S-Y, Choi D-H, Song J, Park U, Cho H, Hong BH, Silberberg YR, Lee D-Y (2023) Debottlenecking and reformulating feed media for improved CHO cell growth and titer by data-driven and model-guided analyses. Biotechnol J 18:e2300126. 10.1002/biot.202300126 [DOI] [PubMed] [Google Scholar]
- Park S-Y, Choi D-H, Song J, Lakshmanan M, Richelle A, Yoon S, Kontoravdi C, Lewis NE, Lee D-Y (2024a) Driving towards digital biomanufacturing by CHO genome-scale models. Trends Biotechnol 42:1192–1203. 10.1016/j.tibtech.2024.03.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Park S-Y, Song J, Choi D-H, Park U, Cho H, Hong BH, Silberberg YR, Lee D-Y (2024b) Exploring metabolic effects of dipeptide feed media on CHO cell cultures by in silico model-guided flux analysis. Appl Microbiol Biotechnol 108:123. 10.1007/s00253-023-12997-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plackett RL, Burman JP (1946) The design of optimum multifactorial experiments. Biometrika 33:305–325. 10.1093/biomet/33.4.305 [Google Scholar]
- Reinhart D, Damjanovic L, Kaisermayer C, Kunert R (2015) Benchmarking of commercially available CHO cell culture media for antibody production. Appl Microbiol Biotechnol 99:4645–4657. 10.1007/s00253-015-6514-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ritacco FV, Wu Y, Khetan A (2018) Cell culture media for recombinant protein expression in Chinese hamster ovary (CHO) cells: history, key components, and optimization strategies. Biotechnol Prog 34:1407–1426. 10.1002/btpr.2706 [DOI] [PubMed] [Google Scholar]
- Rouiller Y, Périlleux A, Collet N, Jordan M, Stettler M, Broly H (2013) A high-throughput media design approach for high performance mammalian fed-batch cultures. Mabs 5:501–511. 10.4161/mabs.23942 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schnellbaecher A, Binder D, Bellmaine S, Zimmer A (2019) Vitamins in cell culture media: stability and stabilization strategies. Biotechnol Bioeng 116:1537–1555. 10.1002/bit.26942 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shibafuji Y, Nagao N, Yohda M (2023) Cystine and tyrosine feed reduces oxidative and ER stress in CHO cells. Biotechnol J 18:e2200638. 10.1002/biot.202200638 [DOI] [PubMed] [Google Scholar]
- Sukwattananipaat P, Kuroda H, Yamano-Adachi N, Omasa T (2024) Metabolomic characterization of monoclonal antibody-producing Chinese hamster lung (CHL)-YN cells in glucose-controlled serum-free fed-batch operation. Biotechnol Bioeng. 10.1002/bit.28777 [DOI] [PubMed] [Google Scholar]
- Synoground BF, McGraw CE, Elliott KS, Leuze C, Roth JR, Harcum SW, Sandoval NR (2021) Transient ammonia stress on Chinese hamster ovary (CHO) cells yield alterations to alanine metabolism and IgG glycosylation profiles. Biotechnol J 16:e2100098. 10.1002/biot.202100098 [DOI] [PubMed] [Google Scholar]
- Tai- ST, Chiou Y, Konishi M (2021). Machine Learning Modeling of the Effects of Media Formulated with Various Yeast Extracts on Heterologous Protein Production in Escherichia Coli. 10.1002/mbo3.1214 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walsh I, Fishman D, Garcia-Gasulla D, Titma T, Pollastri G, ELIXIR Machine Learning Focus Group, Harrow J, Psomopoulos FE, Tosatto SCE (2021) Dome: recommendations for supervised machine learning validation in biology. Nat Methods 18:1122–1127. 10.1038/s41592-021-01205-4 [DOI] [PubMed] [Google Scholar]
- Wold S, Sjöström M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemometr Intell Lab Syst 58:109–130. 10.1016/s0169-7439(01)00155-1 [Google Scholar]
- Wurm FM (2004) Production of recombinant protein therapeutics in cultivated mammalian cells. Nat Biotechnol 22:1393–1398. 10.1038/nbt1026 [DOI] [PubMed] [Google Scholar]
- Xu J, Rehmann MS, Xu X, Huang C, Tian J, Qian N-X, Li ZJ (2018) Improving titer while maintaining quality of final formulated drug substance via optimization of CHO cell culture conditions in low-iron chemically defined media. Mabs 10:488–499. 10.1080/19420862.2018.1433978 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yamano-Adachi N, Arishima R, Puriwat S, Omasa T (2020) Establishment of fast-growing serum-free immortalised cells from Chinese hamster lung tissues for biopharmaceutical production. Sci Rep 10:17612. 10.1038/s41598-020-74735-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yasui R, Sekine K, Taniguchi H (2021) Clever experimental designs: shortcuts for better iPSC differentiation. Cells 10(12):3540. 10.3390/cells10123540 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yeo HC, Park S-Y, Tan T, Ng SK, Lakshmanan M, Lee D-Y (2022) Combined multivariate statistical and flux balance analyses uncover media bottlenecks to the growth and productivity of Chinese hamster ovary cell cultures. Biotechnol Bioeng 119:1740–1754. 10.1002/bit.28104 [DOI] [PubMed] [Google Scholar]
- Ying X (2019) An overview of overfitting and its solutions. J Phys Conf Ser 1168:022022. 10.1088/1742-6596/1168/2/022022 [Google Scholar]
- Zhang W, Liu X, Tang H, Zhang X, Zhou Y, Fan L, Wang H, Tan W-S, Zhao L (2020) Investigation into the impact of tyrosine on the product formation and quality attributes of mAbs in rCHO cell cultures. Appl Microbiol Biotechnol 104:6953–6966. 10.1007/s00253-020-10744-3 [DOI] [PubMed] [Google Scholar]
- Zhou T, Reji R, Kairon RS, Chiam KH (2023) A review of algorithmic approaches for cell culture media optimization. Front Bioeng Biotechnol 11:1195294. 10.3389/fbioe.2023.1195294 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
(PDF 262 KB)
(PDF 966 KB)
Data Availability Statement
The code for experimental design in media blending described in Fig. 2e is publicly available from the ‘Workflow_blending_media’ folder in the GitHub repository (https://github.com/HK-bio/medium_study_repo/tree/main/paper_202504). The code for regression modeling and evaluation described in Fig. 3a are publicly available from the ‘Modeling’ folder in the same repository. The code for the simulation of mother media set is available from the ‘Simulation_of_mother_media’ folder in the same repository. The raw data of media analysis in Fig. 2a is not publicly available due to the risk of reverse engineering causing a loss of profit for the companies developing cell culture media.












