Abstract
Medium optimization is a crucial step during cell culture for biopharmaceutics and regenerative medicine; however, this step remains challenging, as both media and cells are highly complex systems. Here, we addressed this issue by employing active learning. Specifically, we introduced machine learning to cell culture experiments to optimize culture medium. The cell line HeLa-S3 and the gradient-boosting decision tree algorithm were used to find optimized media as pilot studies. To acquire the training data, cell culture was performed in a large variety of medium combinations. The cellular NAD(P)H abundance, represented as A450, was used to indicate the goodness of culture media. In active learning, regular and time-saving modes were developed using culture data at 168 h and 96 h, respectively. Both modes successfully fine-tuned 29 components to generate a medium for improved cell culture. Intriguingly, the two modes provided different predictions for the concentrations of vitamins and amino acids, and a significant decrease was commonly predicted for fetal bovine serum (FBS) compared to the commercial medium. In addition, active learning-assisted medium optimization significantly increased the cellular concentration of NAD(P)H, an active chemical with a constant abundance in living cells. Our study demonstrated the efficiency and practicality of active learning for medium optimization and provided valuable information for employing machine learning technology in cell biology experiments.
Subject terms: Systems analysis, Cell biology, Computer modelling
Introduction
It is increasingly important to develop approaches that can efficiently optimize culture medium, as developing culture medium for mammalian cells is essential in the medical and biopharmaceutical fields1. In addition to developing cell lines2–4, culture media have been intensively studied to improve the performance of developed cell lines5. The composition of the culture medium, e.g., carbon sources, amino acids, fatty acids, vitamins, trace elements, and salts6, usually must be optimized for cell growth and production7–9. However, it is challenging to optimize medium because the influence of components in medium on cellular metabolism is complex10. The conventional one-factor-at-time (OFAT) method fine-tunes the medium components individually11, but this approach is time-consuming and inefficient. The statistical method design of experiments (DOE) is efficient when less than 10 medium components must be adjusted12. Response surface methodology (RSM)13,14 uses the quadratic polynomial approximation, which may be too simple to represent the comprehensive interaction between the medium and cell15. Machine learning (ML) has been used to overcome these limitations16.
ML has generally been used to develop predictive models based on the relationships among features of a given dataset. Its workflow commonly involves processing input data, training the underlying model, and predicting new data17. In recent years, ML has been increasingly applied in biological studies18, particularly for analyzing large or highly complex datasets16. As a representative example, the cell culture medium is a highly structured dataset that commonly comprises hundreds of components as variable features. Optimizing medium components is essential for performing cell cultures, which are widely used in the food industry, pharmaceutical development, medical therapy, etc12.. ML has been tested for medium development19,20 and has shown higher performance than that of DOE and RSM, the commonly used approaches15,21.
The efficiency of medium optimization should depend on the prediction accuracy of the ML model. Active learning has been proposed as a method to improve prediction accuracy with a small dataset by allowing ML models to select data for training22,23. This approach has been practical in drug discovery24–26 and has successfully been used to optimize the buffer composition for protein biosynthesis in a cell-free system27. However, applying active learning to optimize media for mammalian cell culture remains under investigation. For instance, in the past 20 years, 37% of studies on the development of medium for CHO cells considered only one additive, 37% used OFAT, 15% used DOE, and none used machine learning or active learning28. Researchers that optimized the medium for other cell lines used ML algorithms without active learning19,29. Whether active learning can be applied to optimize the culture medium of mammalian cells is an intriguing topic.
In the present study, active learning combining explanatory ML with experimental validation was used to optimize the medium composition to improve cell culture28,30. Although all medium components influence cell growth and production, when considering the contributions of medium components to cell culture, researchers have mainly focused on amino acids, which are critical for cell viability, growth, and bioproduction31,32. The other components remain largely under investigation. To address this issue, the present study employed a gradient-boosting decision tree (GBDT), a white-box ML algorithm, instead of the widely used black-box ML algorithms, e.g., neural networks (NNs). Owing to its high interpretability, GBDT has been broadly used in the fields of medicine33–35, pharmacy36, ecology37, and so on. Our previous study used GBDT to explore the contribution of medium components to bacterial growth and successfully observed the chemical dependence of different components as the survival strategy38. By applying GBDT in active learning to optimize the medium for mammalian cells, we can identify the contribution of individual medium components to cell culture.
Results
Experimental design for acquiring the training data
The cell line HeLa-S3 was used in the study, as it could grow in suspension mode and was thus easy to evaluate. The initial cell concentration was determined at 104 cells/ml, as lower and higher concentrations led to an extended culturing time (Fig. 1A) and reduced growth rate (Fig. 1B). Quantitative evaluation of the cell culture was achieved through testing multiple methods, i.e., counting the cell particles (Multisizer), cell imaging analysis (BioStudio-T), chemical reaction assay (CCK-8), and cell stain and counting (Haemocytometer). When comparing these methods to the most reliable and commonly used method using Haemocytometer, the methods using Multisizer and CCK-8 were preferred according to the correlation coefficients (Fig. 1C). Considering the time needed for operation and the measurement range of cell concentration (Fig. 1D), the chemical reaction assay using CCK-8 upon the cellular NAD(P)H abundance was selected, in which the cell concentration was evaluated through the absorbance at 450 nm (A450). This method was efficient and convenient for acquiring an extensive dataset for ML, as it could be performed in a high-throughput manner. In addition, the medium components subjected to optimization were determined according to the composition of the commonly used Eagle’s minimum essential medium (EMEM), which comprised approximately 31 components. Except for phenol red and penicillin‒streptomycin, 29 components were used to prepare the medium combinations for active learning (Fig. 2A). The concentration gradients of these components were varied on a logarithmic scale to acquire a broad data variation and were not biased from biological measurements or experimental experience. Through the wide range of chemical concentrations, ML could search the medium combinations that were never tested by conventional medium optimization. Finally, cell culture in 232 medium combinations was performed, and the temporal changes in cell culture were measured at 24- or 48-h intervals (Fig. 2B). Biological replicates (N = 3–4) were conducted for each sampling point, resulting in thousands of A450 records. Note that any potential manual (personal) bias in preparing the medium combinations irrelevantly affected the cell culture, as the commercially purchased and the lab-made media (EMEM) showed approximately equivalent cell concentration and viability (Fig. 2D).
Regular and time-saving modes of active learning for medium optimization
Active learning was performed to search for the medium combination that improved cell culture. As a regular mode of active learning, A450 of the cell culture at 168 h was used for the training because it was roughly the time point at which saturated cell concentration was reached (Fig. 1A). The GBDT model was used to predict the medium combinations leading to a better cell culture, i.e., higher A450, and the learning loop was started with the dataset comprising 232 medium combinations (Fig. 2C). Every 18–19 predicted combinations were subjected to experimental validation. The experimental results were added to the training data. The procedures for model building, medium prediction, experimental validation, and training (Fig. 2C) were repeated four times (Fig. 3A). As a result, both the A450 values of the cell culture and the accuracy of the GBDT models were elevated. The cell culture was significantly improved in round 3 and remained comparable after round 4 (Fig. 3A). We assumed that either the methodology or the cell culture reached its limitation. Note that for the following round of ML, the tested media were not better than EMEM, i.e., 74% and 58% showed better cell culture than EMEM in rounds 3 and 4 (Supplementary Fig. 1A). The best media were always better than EMEM after round 3 (Supplementary Fig. 1B). The prediction accuracy improved gradually as the number of rounds increased (Fig. 3B and Supplementary Fig. 2A). Thus, active learning successfully fine-tuned the medium combination associated with increasing ML accuracy.
Subsequently, whether the active learning loop could be shortened was tested. As the data acquisition step (168 h) was time-consuming, it was investigated whether A450 of the cell culture at an earlier time could be used to predict the cell culture at 168 h. Significant correlations in A450 were detected between the earlier time points (48, 96, and 144 h) and the endpoint at 168 h (Fig. 4A). Although A450 at 144 h showed the best correlation to that at 168 h, the time point of 96 h was selected because the time was effectively shortened. Consequently, A450 of the cell culture at 96 h was employed as a time-saving mode for active learning. The 232 medium combinations associated with the A450 values of the cell culture at 96 h were employed as the initial dataset for active learning. Both the A450 of the cell culture at 96 h and the prediction accuracy were improved (Fig. 4B, C and Supplementary Fig. 2B), as observed in the regular mode. Intriguingly, A450 of the cell culture at 168 h was significantly increased (Fig. 4D), even though ML used the culture data acquired at 96 h. The medium combinations for improved cell culture were successfully achieved when the culture data was used earlier than necessary, which shortened the hundreds of hours needed for medium optimization in the present case (Supplementary Fig. 3). Note that no significant improvement was achieved even if the additional round was performed with the time-saving mode (Supplementary Fig. 4). In addition, the correlation of A450 at 96 h to that at 168 h was higher in active learning (Supplementary Fig. 5). The changing ratio of A450 between 96 and 168 h followed the high ratio, despite the initial bimodal distribution of high and low changing ratios observed for A450 between 96 and 168 h (Supplementary Fig. 5). This result indicated that the initial data distribution slightly affected the ML. These results demonstrated that the time-saving mode was practical for ML-assisted medium optimization to improve cell culture.
Contribution and composition of optimized medium components
The contribution of medium components to cell culture was estimated using GBDT. All datasets acquired through active learning, i.e., 302 and 403 medium combinations in the regular and time-saving modes, respectively, were used (Supplementary Figs. 6 and 7). The importance of each component in the two modes was estimated. The top ten components that mainly contributed to cell culture partially overlapped (Fig. 5A, B). This result indicated that both the regular and time-saving modes fine-tuned similar components, e.g., metal salts and FBS, for improved cell culture. This finding partially explained why the medium optimization at 96 h improved cell culture at 168 h. Intriguingly, NaCl and CaCl2, but not FBS, were the primary components that determined the cell culture in the regular and time-saving modes, respectively. Although FBS usually contains calcium ions39 and 1–3 mM CaCl2 is generally supplied in cell culture6, adjusting the concentration of calcium ions is essential for cell growth because either excess or deficient calcium ions induce cell apoptosis40–43. In addition, high osmotic pressure, which is caused by the high concentration of NaCl in the medium, arrests cell growth44. The results indicated that the cellular permeability, which is regulated by NaCl and CaCl2, might present the highest priority in cell culture rather than the growth factors provided by FBS.
The compositions of the best ten media predicted with the regular mode, which optimized A450 at 168 h, and the time-saving mode, which optimized A450 at 96 h, were compared. The concentrations of most components in the media predicted by the two modes were comparable to those in EMEM (Supplementary Fig. 8). Six components showed significant differentiation in concentrations of either amino acids or vitamins between the two modes (Fig. 5C). This indicates that the regular and time-saving modes specifically optimized amino acids and vitamins, respectively. Interestingly, the concentration of FBS in all 20 media predicted by the two modes was one order of magnitude lower than that in the commonly used EMEM6, although the compositions of other components were varied in these media (Fig. 5D). FBS contains a variety of factors essential for cell growth, such as trace elements (vitamins and minerals), hormones, free radical scavengers and mitogenic growth factors45, which are absent in the basal media46. Generally, 10–20% FBS in the media is suitable for cell culture, as reported in enterocytes47, stem cells48, and hybridomas49. This reduced amount of FBS should be a preferable trend, considering the risk of contamination, the cost, and the batch-to-batch variability in cell production50,51. Arginine in the regular mode and choline, pantothenic acid, niacinamide, and pyridoxal in the time-saving mode were significantly different between the regular and time-saving modes (Fig. 5D). This indicates that in regular mode, the amino acids necessary for cell survival in the late phase of culture were adjusted, while in time-saving mode, the vitamins that protect cells from oxidative stress in the early phase of culture were adjusted. In conclusion, different chemical components, with vitamins in the early phase and amino acids in the late phase, may affect cell culture.
Comparison of the optimized media to the original medium
The optimized media showed higher A450 than that of the original medium EMEM, regardless of the regular and time-saving modes. The best ten medium combinations (i.e., those that showed the highest A450) predicted with the regular and time-saving modes were prepared to experimentally verify the cell culture at 168 h. The predicted media showed better performance, i.e., higher A450, than that of the original medium EMEM (Fig. 6A). Thus, both modes resulted in successful medium optimization, although a better performance was obtained by the media predicted with the regular mode than those predicted with the time-saving mode.
In addition, a direct comparison of the optimized medium to the original medium was performed. The best of the 20 predicted media was selected for comparison with the commercially purchased EMEM. The cell culture was performed in a 1-mL volume as a scale-up compared to the culture volume used in active learning. The selected medium showed higher A450 (Fig. 6B), as well as increased A450 per cell (Fig. 6C), which failed to improve the viable cell concentration (Supplementary Fig. 9). Active learning of medium optimization likely increased the cellular abundance of NAD(P)H more significantly than the total abundance of NAD(P)H. This finding made us reconsider whether the commonly used chemical assay for cell culture solely provides the cell concentration. The methodology used for experimental data collection plays a crucial role in the final output of the improved parameters.
Discussion
Cell culture is a time-consuming process in medium development. A much longer time is usually necessary for mammalian cell culture than biochemical reactions using cell extracts and bacterial cultures. Medium composition is often quickly optimized by statistically reducing the experimental test conditions52. An alternative approach to acquire the experimental dataset for ML-assisted medium optimization is shortening the data acquisition time for quick optimization. The present study was a challenging trial to reduce the experimental time. The information on the early phase culture successfully predicted the output of the late phase culture (Fig. 4). This indicates that time-saving active learning is practical in accelerating medium development. Nevertheless, further verification is necessary to ensure that the current approach is practical in optimizing culture media for biomedical applications and biopharmaceutical production, which is highly essential53.
In the present study, active learning of medium optimization successfully increased cellular NAD(P)H abundance but not the concentration of cells, which was unexpected. NAD(P)H abundance is proportional to the number of viable cells54 based on the assumption that NAD(P)H abundance per viable cell is constant. However, our results showed that the cellular NAD(P)H abundance could be increased by optimizing the medium (Fig. 6). We assume that the optimized media was beneficial for metabolism and increased NAD(P)H abundance in cells. The upper limit of cellular NAD(P)H abundance might be why active learning failed to increase A450 after round 4 (Fig. 3). In an alternative view, the cellular capacity of NAD(P)H could be addressed by active learning. This finding provided us with an ideal approach to search for the biological limitation of living cells, e.g., the highest growth rate, through active learning. Additionally, the results demonstrated that the ML-combined medium optimization was highly sensitive to the biological parameter, e.g., cellular activity or amount, that was chosen as the variable in ML.
High priorities of vitamins and amino acids were predicted by the two modes when determining the cell culture. As the time-saving and regular modes used the culture data at 168 h and 96 h, diverse components predicted by the two modes might reflect the cellular physiological differentiation in the early and late phases. In the time-saving mode, five of the ten determinant components were vitamins (Fig. 5), which are closely related to the cellular NAD(P)H abundance mediated by the stress response component reactive oxygen species (ROS). The choline pathway synthesizes serine and glycine, which are necessary for antioxidant glutathione synthesis55 and were absent in the media. Pyridoxal inhibits the formation of superoxide radicals56. Pantothenic acid prevents ROS-induced apoptosis57. Niacinamide is known as the amide of niacin, a precursor of NAD+ and NADP+58. The high priority of these vitamins indicated that they play a crucial role in deciding the early phase of cell culture. In the regular mode, the determinants were tyrosine, arginine, and glutamine. The abundance of amino acids was reported to be related to cell viability59, as an excess amount was reported to accumulate toxic metabolites60. Metabolisms of the three amino acids were found to produce ammonia, 4-hydroxyphenylpyruvate, and dimethylarginine61, which are toxic to cells62,63 and cause growth inhibition64 or cell apoptosis65,66. The high priority of these amino acids strongly suggested that an adequate amount was essential to maintain balanced metabolism for cell viability in the late phase of cell culture. Taken together, we assumed that the amino acids promoted cell growth in the late phase and that the vitamins protected the cells from oxidative stress in the early phase. Further transcriptome and metabolome analyses are essential to clarify the differentiated requirement of amino acids and vitamins in various cell culture phases.
The study demonstrated that active learning was effective in medium optimization and that using time-saving mode with data from the early culture phase was practical. Nevertheless, whether active learning can be used for media development for other cell lines or substance production remains to be examined. As a quick reference, once the media optimized for the HeLa cell were tested with another cell line CHO, an increase in cellular NAD(P)H abundance was observed (Supplementary Fig. 10). Although the optimized media were useful in culturing different cells, media optimization using the target cell line is preferred to obtain the best performance of cell culture. Additional training data are needed for constructing the ML models. Living cells fluctuate even under stable conditions67,68, and their working principles remain largely unclear. ML-assisted medium optimization is assumed to be limited within the cell line and the purpose of culture for growth or production. Employing additional parameters related to cellular physiology, e.g., cell size and gene expression, may lead to a better and more efficient ML model for medium optimization.
In addition to biological restrictions, additional issues should be considered to improve the accuracy of ML with fewer experiments. First, uncertainty sampling is commonly employed to improve ML models in active learning23. In the present study, uncertainty sampling was absent in the datasets, as only the medium combinations for better cell culture were selected for experimental validation in active learning. Therefore, additional experimental data of worse cell culture, such as uncertainty sampling, for training was necessary to improve the accuracy of ML models to further develop the culture medium. Second, the overfitting or underfitting issues, which occurred once the ML models were too complex or too simple17, were thought to influence the performance of ML. The success in ML-assisted medium optimization revealed that the ML model built in the present study was neither overfitting nor underfitting. Here, the ML model was built using existing Python libraries with adjusted hyperparameters. Although most ML algorithms were developed outside the biological field and used for nonbiological purposes16, our results demonstrated that it was possible to construct ML models accurately processing biological data using existing libraries without requiring highly professional computational techniques. The simple application of ML to medium optimization was achieved due to a considerable effort to fine-tune the experimental operations and obtain biological data that fit ML technology. The present study provided an example of how to apply ML to biological studies. Although many issues remain, the successful trial provides valuable knowledge to further develop ML-assisted medium optimization for biomedical applications and biopharmaceutical production.
Materials and methods
Cell line and culture
The commonly used mammalian cell line, HeLa-S3, was obtained from the RIKEN Cell Bank (Tsukuba, Japan). Cell culture was performed at 37 °C in a CO2 incubator (E-50, As One) supplied with 5% CO269. Cells were cultured in multiwell plates (Iwaki) with a culture volume of 0.5 or 1 mL in 48- or 24-well microplates, respectively. Multiple wells were used for biological replicates. The cultured cells were detached with PBS (-) (Wako) supplemented with trypsin-EDTA (Sigma). The cells were collected by centrifugation and resuspended in cryopreservation solution (Wako) with trypan blue (Wako). The cells were counted in a haemocytometer (DHC-N01, Nano Entek) with a microscope (ECLIPSE Ts2, Nikon)70. The number of viable cells was evaluated accordingly.
Preparation of cell stocks
Cell stocks for repeated cultures were prepared to reduce experimental errors71. The cells stored in liquid nitrogen were thawed at 37 °C and suspended in 4 ml of Eagle’s minimum essential medium (EMEM, Wako) for medium exchange. Subsequently, the cells were collected and suspended in 10 mL of EMEM supplemented with 1% penicillin‒streptomycin solution (Wako) and 10% FBS (Japan Bio Serum). The cells were cultured in 10 cm dishes (Violamo) for two days at 37 °C in a CO2 incubator and were counted in a haemocytometer, as described above. The cell culture was divided into 1 mL, dispensed into 1.2 ml cryotubes (Biosigma), frozen at −80 °C for 24 h, and finally stored in liquid nitrogen for future use. As a result, dozens of identical cell stocks were prepared.
Preparation of medium combinations
The medium combinations were prepared with 31 commercially available compounds, in which choline chloride and pyridoxal hydrochloride were from Tokyo Chemical Industry, FBS was from Japan Bio Serum, and the remaining compounds were from Wako. The abundance of penicillin‒streptomycin and phenol red were maintained at 1% and 0.03 mM, respectively. The concentrations of FBS were changed in the range of 0.1–10%. The remaining 28 components were changed zero- to 10–100-fold of their concentrations in EMEM. In brief, four to five different concentrations were used for each component, and the changes were roughly on a logarithmic scale, because a broad range of concentration gradients benefited the ML-assisted optimization38. The medium combinations were prepared by mixing the stock solutions of the chemical compounds, which were individually prepared in advance, with highly pure water (Direct-Q UV, Merck). The stock solutions were sterilized by sterile syringe filters (Merck) with hydrophilic PVDF membranes of 0.22 µm pore size, dispensed in a small volume and stored at −20 °C for future use. Note that all medium combinations were prepared immediately before use, and the stock solutions were used once. All medium combinations tested in the present study were summarized in Supplementary Data 1 and Supplementary Data 2 for the regular and time-saving modes, respectively.
Cell counting according to single-particle analysis
A particle size analyzer (Multisizer 4, Beckman Coulter) was used to count the number of cells. The cell culture in a 48-well microplate (Iwaki) was suspended in 10 ml of ISOTON II (Beckman Coulter). Every 100–500 µl of the suspended solutions were flowed for particle analysis, according to the manufacturer’s instructions. Particles within the range from 8 to 12 µm in diameter were gated as the cells. The mean value of the biological replicates was calculated as the cell concentration.
Cell counting by imaging analysis
The cells cultured in a 48-well microplate (Iwaki) were imaged with BioStudio-T (Nikon) according to the manufacturer’s instructions. The image of the cell culture was analyzed with the software NIS-Elements (Nikon), and the number of cells was evaluated automatically.
Temporal changes in cell culture evaluated by chemical assay
The cell culture was performed in 200 µl with 96-well microplates (Iwaki), and the culture conditions were described above. As explained elsewhere72, only the wells in the middle of the plate (60 wells) were used for cell culture to prevent evaporation. Multiple microplates of identical cell cultures were prepared for temporal evaluation of the cell culture. These plates were subjected to the chemical assay sequentially at 48, 96, 144, and 168 h. Ten microlitres of CCK-8 (Dojin) was added to the cell culture and incubated at 37 °C for one hour, according to the protocol. Finally, 20 µl of 0.1 M HCl was added to arrest the reaction, and the absorbance at 450 nm was measured with a plate reader (Epoch 2, BioTek). A450 was used as the relative cell concentration for machine learning.
Data processing
Absorbance reads of the chemical assay were exported from the plate reader and processed with Python, in which the mean A450 of biological replicates was calculated using “mean” in the “numpy” library. The actual A450 of the cell culture was calculated by subtracting the mean A450 of the medium. The datasets obtained in the present study are summarized in Supplementary Data 1 and 2.
Machine learning
The gradient-boosting decision tree (GBDT) algorithm was used in machine learning (ML), which was performed with Python. The “GradientBoostingRegressor” from the “ensemble” module of the “scikit-learn” library was used to construct the ML model, where the medium components and the A450 of cell culture were employed as the explanator and the objective variables, respectively. Fivefold cross-validation and grid search were performed to search for hyperparameters, which used “GridSearchCV” in the “model_selection” module of the “scikit-learn” library. The hyperparameters were searched for “learning_rate” from 0.001 to 0.5 in increments of 0.005, “max_depth” from 2 to 5 in increments of 1, and n_estimators fixed at 300. The other hyperparameters were used by default. In addition, the “feature_importances_“ attributed to the GBDT model constructed in the outer fivefold cross-validation was used to calculate the feature importance. Gini feature importance was used, which was calculated by computing the mean squared error (MSE) at each node of the decision tree and then calculating the degree to that of reduced MSE by partitioning by features73,74. Five replicates were performed, and the mean values were used as the final evaluation.
Active learning for medium optimization
Active learning in either regular or time-saving mode was performed with a supercomputer, the Cygnus system (NEC LX 124Rh-4G). The A450 values obtained at 168 h and 96 h during cell culture were used as objective variables in the regular and time-saving modes, respectively. According to the ML model constructed with the initial training dataset, approximately 10 million candidate medium combinations were obtained by altering the concentrations of the medium components with four to five variations. By inputting the 10 million candidate media into the ML model, the relative cell culture, represented by A450, was predicted. The top 18 or 19 medium combinations of high A450 were selected and subjected to experimental verification. The experimental results were added to the training dataset for the following learning and prediction. Learning, prediction, and validation were performed repeatedly to achieve improved cell culture, that is, as high A450 as possible. In addition, the experimental results were used to evaluate the prediction accuracy of the ML models.
Evaluation of the ML models
Fivefold nested cross-validation was performed to calculate the prediction accuracy of the ML models, in which the hyperparameters in the inner 5-fold cross-validation were adjusted using grid search. The five scores computed in the outer 5-fold cross-validation were used for prediction accuracy. Three metrics were employed to evaluate the prediction accuracy: mean absolute error (MAE), coefficient of determination (R2), and root mean square error (RMSE). MAE and R2 were calculated using the “mean_absolute_error” and “ r2_score “ in the “metrics” module of the “scikit-learn” library. RMSE was calculated using “mean_squared_error” in the “metrics” module of the “scikit-learn”. The square root of the MSE was calculated using “sqtr” in the “numpy” library.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Acknowledgements
We thank Dr. Akiyoshi Fukamizu and Dr. Hiroaki Daitoku for their advice and support on the cell culture experiment. This work was supported by the JSPS KAKENHI Grant-in-Aid for Challenging Exploratory Research (21K19815) and partially by Grant-in-Aid for Scientific Research (B) (19H03215).
Author contributions
B.-W.Y. conceived the research; T.H. and Y.O. performed the experiments; T.H. and B.-W.Y. analyzed the data and wrote the paper; and all the authors approved the paper.
Data availability
All data generated or analyzed during this study are included in this published article and its Supplementary Information files (Supplementary Data 1 and 2).
Code availability
Code is available at https://github.com/hashizume711/medium_optimization.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
The online version contains supplementary material available at 10.1038/s41540-023-00284-7.
References
- 1.Walsh G. Biopharmaceutical benchmarks 2018. Nat. Biotechnol. 2018;36:1136–1145. doi: 10.1038/nbt.4305. [DOI] [PubMed] [Google Scholar]
- 2.Tihanyi B, Nyitray L. Recent advances in CHO cell line development for recombinant protein production. Drug Discov. Today Technol. 2020;38:25–34. doi: 10.1016/j.ddtec.2021.02.003. [DOI] [PubMed] [Google Scholar]
- 3.Weinguny M, et al. Directed evolution approach to enhance efficiency and speed of outgrowth during single cell subcloning of Chinese Hamster Ovary cells. Comput Struct. Biotechnol. J. 2020;18:1320–1329. doi: 10.1016/j.csbj.2020.05.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Zucchelli S, Patrucco L, Persichetti F, Gustincich S, Cotella D. Engineering translation in mammalian cell factories to increase protein yield: the unexpected use of long non-coding SINEUP RNAs. Comput Struct. Biotechnol. J. 2016;14:404–410. doi: 10.1016/j.csbj.2016.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Reinhart D, Damjanovic L, Kaisermayer C, Kunert R. Benchmarking of commercially available CHO cell culture media for antibody production. Appl. Microbiol. Biotechnol. 2015;99:4645–4657. doi: 10.1007/s00253-015-6514-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Ritacco FV, Wu Y, Khetan A. Cell culture media for recombinant protein expression in Chinese hamster ovary (CHO) cells: history, key components, and optimization strategies. Biotechnol. Prog. 2018;34:1407–1426. doi: 10.1002/btpr.2706. [DOI] [PubMed] [Google Scholar]
- 7.Lu S, Sun X, Zhang Y. Insight into metabolism of CHO cells at low glucose concentration on the basis of the determination of intracellular metabolites. Process. Biochem. 2005;40:1917–1921. doi: 10.1016/j.procbio.2004.07.004. [DOI] [Google Scholar]
- 8.Salim T, Chauhan G, Templeton N, Ling WLW. Using MVDA with stoichiometric balances to optimize amino acid concentrations in chemically defined CHO cell culture medium for improved culture performance. Biotechnol. Bioeng. 2022;119:452–469. doi: 10.1002/bit.27998. [DOI] [PubMed] [Google Scholar]
- 9.Takagi M, Hia HC, Jang JH, Yoshida T. Effects of high concentrations of energy sources and metabolites on suspension culture of Chinese hamster ovary cells producing tissue plasminogen activator. J. Biosci. Bioeng. 2001;91:515–521. doi: 10.1016/S1389-1723(01)80283-8. [DOI] [PubMed] [Google Scholar]
- 10.Azubuike CC, Edwards MG, Gatehouse AMR, Howard TP. Applying statistical design of experiments to understanding the effect of growth medium components on Cupriavidus necator H16 growth. Appl. Environ. Microbiol. 2020;86:e00705–e00720. doi: 10.1128/AEM.00705-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gonzalez R, Islas L, Obregon AM, Escalante L, Sanchez S. Gentamicin formation in Micromonospora purpurea: stimulatory effect of ammonium. J. Antibiot. 1995;48:479–483. doi: 10.7164/antibiotics.48.479. [DOI] [PubMed] [Google Scholar]
- 12.Singh V, et al. Strategies for fermentation medium optimization: an in-depth review. Front. Microbiol. 2016;7:2087. doi: 10.3389/fmicb.2016.02087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Castro PM, Hayter PM, Ison AP, Bull AT. Application of a statistical design to the optimization of culture medium for recombinant interferon-gamma production by Chinese hamster ovary cells. Appl. Microbiol. Biotechnol. 1992;38:84–90. doi: 10.1007/BF00169424. [DOI] [PubMed] [Google Scholar]
- 14.Parampalli A, et al. Developement of serum-free media in CHO-DG44 cells using a central composite statistical design. Cytotechnology. 2007;54:57–68. doi: 10.1007/s10616-007-9074-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Singh V, Khan M, Khan S, Tripathi CK. Optimization of actinomycin V production by Streptomyces triostinicus using artificial neural network and genetic algorithm. Appl. Microbiol. Biotechnol. 2009;82:379–385. doi: 10.1007/s00253-008-1828-0. [DOI] [PubMed] [Google Scholar]
- 16.Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 2022;23:40–55. doi: 10.1038/s41580-021-00407-0. [DOI] [PubMed] [Google Scholar]
- 17.Camacho DM, Collins KM, Powers RK, Costello JC, Collins JJ. Next-generation machine learning for biological networks. Cell. 2018;173:1581–1592. doi: 10.1016/j.cell.2018.05.015. [DOI] [PubMed] [Google Scholar]
- 18.Auslander N, Gussow AB, Koonin EV. Incorporating machine learning into established bioinformatics frameworks. Int. J. Mol. Sci. 2021;22:2903. doi: 10.3390/ijms22062903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Grzesik P, Warth SC. One-time optimization of advanced T cell culture media using a machine learning pipeline. Front. Bioeng. Biotechnol. 2021;9:614324. doi: 10.3389/fbioe.2021.614324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Havel J, Link H, Hofinger M, Franco-Lara E, Weuster-Botz D. Comparison of genetic algorithms for experimental multi-objective optimization on the example of medium design for cyanobacteria. Biotechnol. J. 2006;1:549–555. doi: 10.1002/biot.200500052. [DOI] [PubMed] [Google Scholar]
- 21.Cosenza Z, Block DE, Baar K. Optimization of muscle cell culture media using nonlinear design of experiments. Biotechnol. J. 2021;16:e2100228. doi: 10.1002/biot.202100228. [DOI] [PubMed] [Google Scholar]
- 22.Cohn DA, Ghahramani Z, Jordan MI. Active learning with statistical models. J. Artif. Intell. 1996;4:129–145. [Google Scholar]
- 23.Settles, B. Active Learning Literature Survey (University of Wisconsin-Madison Department of Computer Sciences, 2009).
- 24.Reker D, Schneider G. Active-learning strategies in computer-assisted drug discovery. Drug Discov. Today. 2015;20:458–465. doi: 10.1016/j.drudis.2014.12.004. [DOI] [PubMed] [Google Scholar]
- 25.Osmanbeyoglu HU, Wehner JA, Carbonell JG, Ganapathiraju MK. Active machine learning for transmembrane helix prediction. BMC Bioinformatics. 2010;11:S58. doi: 10.1186/1471-2105-11-S1-S58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Naik AW, Kangas JD, Sullivan DP, Murphy RF. Active machine learning-driven experimentation to determine compound effects on protein patterns. Elife. 2016;5:e10047. doi: 10.7554/eLife.10047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Borkowski O, et al. Large scale active-learning-guided exploration for in vitro protein production optimization. Nat. Commun. 2020;11:1872. doi: 10.1038/s41467-020-15798-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Combe M, Sokolenko S. Quantifying the impact of cell culture media on CHO cell growth and protein production. Biotechnol. Adv. 2021;50:107761. doi: 10.1016/j.biotechadv.2021.107761. [DOI] [PubMed] [Google Scholar]
- 29.Xu J, et al. Serum-free medium optimization based on trial design and support vector regression. Biomed. Res. Int. 2014;2014:269305. doi: 10.1155/2014/269305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Coulet M, Kepp O, Kroemer G, Basmaciogullari S. Metabolic profiling of CHO cells during the production of biotherapeutics. Cells. 2022;11:1929. doi: 10.3390/cells11121929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.González-Leal IJ, et al. Use of a Plackett-Burman statistical design to determine the effect of selected amino acids on monoclonal antibody production in CHO cells. Biotechnol. Prog. 2011;27:1709–1717. doi: 10.1002/btpr.674. [DOI] [PubMed] [Google Scholar]
- 32.Torkashvand F, et al. Designed amino acid feed in improvement of production and quality targets of a therapeutic monoclonal antibody. PLoS ONE. 2015;10:e0140597. doi: 10.1371/journal.pone.0140597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Rufo DD, Debelee TG, Ibenthal A, Negera WG. Diagnosis of diabetes mellitus using gradient boosting machine (LightGBM) Diagnostics. 2021;11:1714. doi: 10.3390/diagnostics11091714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Dinh A, Miertschin S, Young A, Mohanty SD. A data-driven approach to predicting diabetes and cardiovascular disease with machine learning. BMC Med. Inf. Decis. Mak. 2019;19:211. doi: 10.1186/s12911-019-0918-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Klug M, et al. A gradient boosting machine learning model for predicting early mortality in the emergency department triage: devising a nine-point triage score. J. Gen. Intern. Med. 2020;35:220–227. doi: 10.1007/s11606-019-05512-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Xuan P, et al. Gradient boosting decision tree-based method for predicting interactions between target genes and drugs. Front. Genet. 2019;10:459. doi: 10.3389/fgene.2019.00459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.De’ath G. Boosted trees for ecological modeling and prediction. Ecology. 2007;88:243–251. doi: 10.1890/0012-9658(2007)88[243:BTFEMA]2.0.CO;2. [DOI] [PubMed] [Google Scholar]
- 38.Aida H, Hashizume T, Ashino K, Ying BW. Machine learning-assisted discovery of growth decision elements by relating bacterial population dynamics to environmental diversity. Elife. 2022;11:e76846. doi: 10.7554/eLife.76846. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Leney-Greene MA, Boddapati AK, Su HC, Cantor JR, Lenardo MJ. Human plasma-like medium improves T lymphocyte activation. iScience. 2020;23:100759. doi: 10.1016/j.isci.2019.100759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Rizzuto R, et al. Calcium and apoptosis: facts and hypotheses. Oncogene. 2003;22:8619–8627. doi: 10.1038/sj.onc.1207105. [DOI] [PubMed] [Google Scholar]
- 41.Feng H, Guo L, Gao H, Li XA. Deficiency of calcium and magnesium induces apoptosis via scavenger receptor BI. Life Sci. 2011;88:606–612. doi: 10.1016/j.lfs.2011.01.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Turner CP, Connell J, Blackstone K, Ringler SL. Loss of calcium and increased apoptosis within the same neuron. Brain Res. 2007;1128:50–60. doi: 10.1016/j.brainres.2006.10.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Chiesa R, et al. Extracellular calcium deprivation in astrocytes: regulation of mRNA expression and apoptosis. J. Neurochem. 1998;70:1474–1483. doi: 10.1046/j.1471-4159.1998.70041474.x. [DOI] [PubMed] [Google Scholar]
- 44.Zhang X, Garcia IF, Baldi L, Hacker DL, Wurm FM. Hyperosmolarity enhances transient recombinant protein yield in Chinese hamster ovary cells. Biotechnol. Lett. 2010;32:1587–1592. doi: 10.1007/s10529-010-0331-8. [DOI] [PubMed] [Google Scholar]
- 45.Price PJ. Best practices for media selection for mammalian cells. In Vitro Cell Dev. Biol. Anim. 2017;53:673–681. doi: 10.1007/s11626-017-0186-6. [DOI] [PubMed] [Google Scholar]
- 46.Arigony AL, et al. The influence of micronutrients in cell culture: a reflection on viability and genomic stability. Biomed. Res. Int. 2013;2013:597282. doi: 10.1155/2013/597282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Bernardini C, et al. Relationship between serum concentration, functional parameters and cell bioenergetics in IPEC-J2 cell line. Histochem. Cell Biol. 2021;156:59–67. doi: 10.1007/s00418-021-01981-2. [DOI] [PubMed] [Google Scholar]
- 48.Quan H, Kim SK, Heo SJ, Koak JY, Lee JH. Optimization of growth inducing factors for colony forming and attachment of bone marrow-derived mesenchymal stem cells regarding bioengineering application. J. Adv. Prosthodont. 2014;6:379–386. doi: 10.4047/jap.2014.6.5.379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Bashokouh F, Abbasiliasi S, Tan JS. Optimization of cultivation conditions for monoclonal IgM antibody production by M1A2 hybridoma using artificial neural network. Cytotechnology. 2019;71:849–860. doi: 10.1007/s10616-019-00330-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Li W, Fan Z, Lin Y, Wang TY. Serum-free medium for recombinant protein expression in Chinese hamster ovary cells. Front. Bioeng. Biotechnol. 2021;9:646363. doi: 10.3389/fbioe.2021.646363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Venkatesan M, et al. Recombinant production of growth factors for application in cell culture. iScience. 2022;25:105054. doi: 10.1016/j.isci.2022.105054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Rouiller Y, et al. A high-throughput media design approach for high performance mammalian fed-batch cultures. MAbs. 2013;5:501–511. doi: 10.4161/mabs.23942. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Love KR, Bagh S, Choi J, Love JC. Microtools for single-cell analysis in biopharmaceutical development and manufacturing. Trends Biotechnol. 2013;31:280–286. doi: 10.1016/j.tibtech.2013.03.001. [DOI] [PubMed] [Google Scholar]
- 54.Tominaga H, et al. A water-soluble tetrazolium salt useful for colorimetric cell viability assay. Anal. Commun. 1999;36:47–50. doi: 10.1039/a809656b. [DOI] [Google Scholar]
- 55.Depeint F, Bruce WR, Shangari N, Mehta R, O’Brien PJ. Mitochondrial function and toxicity: role of B vitamins on the one-carbon transfer pathways. Chem. Biol. Interact. 2006;163:113–132. doi: 10.1016/j.cbi.2006.05.010. [DOI] [PubMed] [Google Scholar]
- 56.Kannan K, Jain SK. Effect of vitamin B6 on oxygen radicals, mitochondrial membrane potential, and lipid peroxidation in H2O2-treated U937 monocytes. Free Radic. Biol. Med. 2004;36:423–428. doi: 10.1016/j.freeradbiomed.2003.09.012. [DOI] [PubMed] [Google Scholar]
- 57.Wojtczak L, Slyshenkov VS. Protection by pantothenic acid against apoptosis and cell damage by oxygen free radicals-the role of glutathione. Biofactors. 2003;17:61–73. doi: 10.1002/biof.5520170107. [DOI] [PubMed] [Google Scholar]
- 58.Depeint F, Bruce WR, Shangari N, Mehta R, O’Brien PJ. Mitochondrial function and toxicity: role of the B vitamin family on mitochondrial energy metabolism. Chem. Biol. Interact. 2006;163:94–112. doi: 10.1016/j.cbi.2006.04.014. [DOI] [PubMed] [Google Scholar]
- 59.Ribeiro da Silva M, Zaborowska I, Carillo S, Bones J. A rapid, simple and sensitive microfluidic chip electrophoresis mass spectrometry method for monitoring amino acids in cell culture media. J. Chromatogr. A. 2021;1651:462336. doi: 10.1016/j.chroma.2021.462336. [DOI] [PubMed] [Google Scholar]
- 60.Pereira S, Kildegaard HF, Andersen MR. Impact of CHO metabolism on cell growth and protein production: an overview of toxic and inhibiting metabolites and nutrients. Biotechnol. J. 2018;13:e1700499. doi: 10.1002/biot.201700499. [DOI] [PubMed] [Google Scholar]
- 61.Selvarasu S, et al. Combined in silico modeling and metabolomics analysis to characterize fed-batch CHO cell culture. Biotechnol. Bioeng. 2012;109:1415–1429. doi: 10.1002/bit.24445. [DOI] [PubMed] [Google Scholar]
- 62.Glacken MW, Fleischaker RJ, Sinskey AJ. Reduction of waste product excretion via nutrient control: possible strategies for maximizing product and cell yields on serum in cultures of mammalian cells. Biotechnol. Bioeng. 1986;28:1376–1389. doi: 10.1002/bit.260280912. [DOI] [PubMed] [Google Scholar]
- 63.Chen P, Harcum SW. Effects of elevated ammonium on glycosylation gene expression in CHO cells. Metab. Eng. 2006;8:123–132. doi: 10.1016/j.ymben.2005.10.002. [DOI] [PubMed] [Google Scholar]
- 64.Mulukutla BC, Kale J, Kalomeris T, Jacobs M, Hiller GW. Identification and control of novel growth inhibitors in fed-batch cultures of Chinese hamster ovary cells. Biotechnol. Bioeng. 2017;114:1779–1790. doi: 10.1002/bit.26313. [DOI] [PubMed] [Google Scholar]
- 65.Jiang DJ, Jia SJ, Dai Z, Li YJ. Asymmetric dimethylarginine induces apoptosis via p38 MAPK/caspase-3-dependent signaling pathway in endothelial cells. J. Mol. Cell Cardiol. 2006;40:529–539. doi: 10.1016/j.yjmcc.2006.01.021. [DOI] [PubMed] [Google Scholar]
- 66.Böger RH, et al. An endogenous inhibitor of nitric oxide synthase regulates endothelial adhesiveness for monocytes. J. Am. Coll. Cardiol. 2000;36:2287–2295. doi: 10.1016/S0735-1097(00)01013-5. [DOI] [PubMed] [Google Scholar]
- 67.Eldar A, Elowitz MB. Functional roles for noise in genetic circuits. Nature. 2010;467:167–173. doi: 10.1038/nature09326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Hanna J, et al. Direct cell reprogramming is a stochastic process amenable to acceleration. Nature. 2009;462:595–601. doi: 10.1038/nature08592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Mathews EH, Stander BA, Joubert AM, Liebenberg L. Tumor cell culture survival following glucose and glutamine deprivation at typical physiological concentrations. Nutrition. 2014;30:218–227. doi: 10.1016/j.nut.2013.07.024. [DOI] [PubMed] [Google Scholar]
- 70.Piccinini F, Tesei A, Arienti C, Bevilacqua A. Cell counting and viability assessment of 2D and 3D cell cultures: expected reliability of the trypan blue assay. Biol. Proced. Online. 2017;19:8. doi: 10.1186/s12575-017-0056-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Seth G. Freezing mammalian cells for production of biopharmaceuticals. Methods. 2012;56:424–431. doi: 10.1016/j.ymeth.2011.12.008. [DOI] [PubMed] [Google Scholar]
- 72.Kurokawa M, Ying BW. Precise, high-throughput analysis of bacterial growth. J. Vis. Exp. 2017;19:56197. doi: 10.3791/56197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Menze BH, et al. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics. 2009;10:213. doi: 10.1186/1471-2105-10-213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Nembrini S, Konig IR, Wright MN. The revival of the Gini importance? Bioinformatics. 2018;34:3711–3718. doi: 10.1093/bioinformatics/bty373. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data generated or analyzed during this study are included in this published article and its Supplementary Information files (Supplementary Data 1 and 2).
Code is available at https://github.com/hashizume711/medium_optimization.