Abstract
Integration of machine learning and high throughput measurements are essential to drive the next generation of the design-build-test-learn (DBTL) cycle in synthetic biology. Here, we report the use of active learning in combination with metabolomics for optimising production of surfactin, a complex lipopeptide resulting from a non-ribosomal assembly pathway. We designed a media optimisation algorithm that iteratively learns the yield landscape and steers the media composition toward maximal production. The algorithm led to a 160 % yield increase after three DBTL runs as compared to an M9 baseline. Metabolomics data helped to elucidate the underpinning biochemistry for yield improvement and revealed Pareto-like trade-offs in production of other lipopeptides from related pathways. We found positive associations between organic acids and surfactin, suggesting a key role of central carbon metabolism, as well as system-wide anisotropies in how metabolism reacts to shifts in carbon and nitrogen levels. Our framework offers a novel data-driven approach to improve yield of biological products with complex synthesis pathways that are not amenable to traditional yield optimisation strategies.
Keywords: Active learning, Metabolomics, Surfactin, Bayesian optimisation, Metabolic pathways, Visualisation
Graphical Abstract
1. Introduction
Surfactants are compounds that lower the surface tension between two liquids or a liquid and gas. Their steady demand in industrial and domestic applications has motivated searching for alternatives to traditional petroleum-derived compounds, so as to reduce long-term environmental impacts [1]. Recent years have witnessed a growing number of approaches for bio-surfactant production that are environmentally friendly, safe for human health, and biodegradable [2], [3], [4].
Surfactin is a promising bio-surfactant suitable for high temperature conditions required in industrial applications [5], [6], [7], [8], [9], [10]. It is a lipopeptide produced by several strains of the genus Bacillus [11] with a wide range of applications in industry, agriculture, and medicine, such as emulsifiers, dispersants, and biocontrol agents [12], [13], [14], [15]. As an endogenous metabolite, surfactin facilitates cell motility and colonization [16], [17], [18]. When interacting with other bacteria, it functions as a spatially distributed antibiotic that disrupts the membrane of nearby cells, a phenomenon that can also promote the emergence of surfactant-resistant communities [19], [20], [21], [22].
Surfactin biosynthesis is carried out by a non-ribosomal peptide synthetase (NRPS) [8], [23], [24]. Due to its combinatorial assembly mechanism, surfactin production depends on multiple pathways that supply the molecular precursors for assembly in the cytoplasm. Such precursors include branched-chain fatty acids and amino acids [25], [26]. These multiple production routes confer substantial flexibility to the biosynthetic processes and translate into the production of several variants of surfactin [12], especially if a precursor undergoes modifications or if there are errors in the amino acid assembly.
Key challenges for surfactin production are the high variability and low titres observed in liquid medium, which introduce substantial barriers to scale up production [25], [26]. Current strategies to increase production include genetic engineering and optimisation of fermentation conditions [25], [26]. For example, overexpression of the efflux pump swrC (synonym: yerP) and other pumps has been reported, deriving into an 0.5–1.5-fold increase of surfactin production [25]. Additionally, overexpression of the Psrf promoter or its substitution with a high expression promoter resulted in titres between 0.04–1.5 g/L [25], while efforts in metabolic engineering have focussed on rewiring Bacillus metabolism to generate surfactin hyperproducers [27]. Several other studies have employed design-of-experiment approaches to maximise surfactin titres by altering the concentrations of the medium components [28], [29]. Components of the media described in the publications include laboratory-grade chemicals and more complex substrates from industrial waste [30], [31], [32], [33], [34]. Although various studies have identified pathways that contribute to surfactin synthesis in B. subtilis [35] and B. velezensis [36], yield improvements in surfactin production have remained elusive thus far.
Here, we developed a machine learning pipeline to increase surfactin yield via iterative improvements to the media composition in a design-built-test-learn (DBTL) loop. Our approach utilizes mass spectrometry to quantify surfactin as well as a suite of background metabolites that provide insights on the metabolic processes that drive the yield increase. We constructed a data-efficient active learning loop whereby Bacillus cultures are grown in different media compositions that are guided by a machine learning predictor of the yield landscape. Active learning is a machine learning paradigm whereby models are trained on iterative batches of measurements [37]; it is particularly useful in applications where comprehensive coverage of the input space is prohibitively expensive, and thus ideally suited for DBTL loops that involve mass spectrometry readouts. Through iterative DBTL rounds of measurement, model training and querying, we were able to improve surfactin yield by 160 % as compared to a baseline M9 media.
We employ a Bayesian optimisation routine [38], [39], [40], [41], a common tool for black-box optimisation that is commonly employed for tuning deep learning models [42], [43]. Recently, synthetic biologists have begun to explore its application in various use cases; early examples include synthetic gene design [44], as well as automated optimisation of metabolic engineering tasks [45], [46], [47], [48], media optimization in cell-free systems [49], an in silico design of genetic control circuits [50]. Various bespoke approaches have been developed for media optimisation in mammalian cell bioproduction [51], [52], [53], and recently several software packages have been developed for the optimisation of gene circuits and metabolic pathways [54]. Our work is a novel application of active learning in combination with metabolomic readouts and thus offers substantial opportunities for data- and cost-efficient optimisation of complex bioprocesses across scales.
2. Results
2.1. Active learning of the surfactin C yield landscape
Our strategy to improve titre focussed on optimisation of culture conditions to enhance production of Surfactin C in the spent media from Bacillus subtilis DSM 3256 using an iterative active learning loop (Fig. 1A). The medium composition variables, specifically the carbon concentration sourced from glucose and nitrogen concentration derived from ammonium chloride, were adjusted against the background of a baseline M9 medium. While temperature and agitation are pivotal to surfactin production [28], in this study the were kept constant to simplify the experimental design. The active learning cycle tested seven different carbon and nitrogen concentration combinations for Surfactin C production in each iteration. Each iteration can be seen as a different run of the Design, Build, Test, and Learn cycle in synthetic biology (Fig. 1A).
Fig. 1.
Active learning strategy for media optimisation. A) Active learning loop embedded in the Design, Build, Test, Learn (DBTL) cycle. The Build and Test stages encompass the cultivation of bacteria in microplates using machine learning-suggested combinations, followed by metabolite measurements via mass spectrometry. In the Learn stage, a Gaussian process regressor serves as a surrogate model, forecasting the Surfactin C titre landscape together with its prediction uncertainty. The Design phase, meanwhile, focuses on proposing new combinations, aided by the acquisition function. B) The structure of Surfactin C is displayed, emphasising the ring amino acids. The sequence for these amino acids is L-Glu1-L-Leu2-D-Leu3-L-Val4-L-Asp5-D-Leu6-L-Leu7. A simplified surfactin biosynthetic cluster diagram presents the four genes within the cluster and their corresponding domains: C for condensation; A for adenylation; T for thiolation or peptidyl carrier protein (PCP); E for epimerase; and TE for thioesterase. The initial condensation domain Cs in srfAA facilitates the integration of the fatty acid at the synthesis commencement. Downstream, the aminoacids are added one by one, where the presence of two D-Leucine residues in the ring occurs thanks to the epimerization domain. C) This section shows the layout for the microplate experiment during the initial iteration of active learning. Combination treatments, alongside control and M9 reference treatments, underwent block randomisation, resulting in six blocks. These are associated with biological replicates. A boxplot displays the Surfactin C titres achieved in the initial iteration. D) The final Surfactin C landscape following all three iterations is shown. The combination predicted to yield the maximum titre is marked with a cross. A zone exhibiting reduced titre (less than 0.5 in relation to the M9 titre, known as the performance cliff) is delineated with a white line. E) A dense grid of carbon/nitrogen combinations was employed to estimate uncertainty levels, expressed as the standard error of the mean, for models updated after iterations 0, 1 and 2. Models are compared pairwise, showcasing the uncertainty for identical grid points when comparing the model from iteration 2 against those from iterations 0 and 1.
In this study, the Build and Test stages involve cultivating the bacteria in microplates with varying combinations of glucose and ammonium chloride and then measuring the resulting metabolites using mass spectrometry. Among the 22 metabolites measured, the key product Surfactin C (Fig. 1B) and three other lipopeptides (Surfactin B, Surfactin D, and Iturin A) were assessed using a flow injection-mass spectrometry method between iterations (Fig. S1). Although a few colourimetric methods have been developed for surfactin and general lipopeptide quantification [55], [56], [57], [58], [59], we opted for mass spectrometry to provide a direct platform for the optimisation and it can therefore be generalised to compounds where a colourimetric/biosensor methods do not exist. This method was developed specifically for the experiment, given its ability to measure a single sample within 2 min and a full 96-well microplate in approximately 3 h, thus streamlining the process.
Microplate experiments were randomized using a custom script in Python that can be used that can be generalised to other systems (see Data Availability). Optical density (OD) was measured (Fig. S3, Fig. S6) and Surfactin C titre results for iteration 0 are depicted in Fig. 1C. We opted to limit the number of combinations per plate in favour of a larger number of replicates of each point (n = 6) so as to accurately capture and incorporate biological variation into the machine learning model. The initial media compositions in Fig. 1C were derived from a Latin hypercube design.
In the Learn phase, we employed a Gaussian process regressor as a surrogate model that predicts surfactin C titre and its uncertainty landscape in a 2-dimensional input space of media compositions. For the Design stage, we queried the model for new media compositions determined from a suitably chosen acquisition function that balances exploitation, i.e. select high-titre locations in the landscape, and exploration, i.e. cover areas of the design space where model predictions remain uncertain. Details of the active learning loop can be found in the Methods.
As shown in Fig. 1D, we obtained a 160 % titre improvement after three DBTL iterations improvement relative to the baseline titre in M9 medium. The optimal titre was reached at 0.8 % glucose and 50 mM NH4Cl. Previously reported values for optimal surfactin production, including modifications to the Cooper [60] and Landy media regarding carbon and nitrogen concentrations, correspond to 0.8 % glucose and 100 mM NH4Cl [61], in agreement with the obtained maximum for carbon concentration. When updating the models after each iteration (Fig. S4), the average uncertainty in the predictions decreases from 0.45 to 0.3 (Fig. 1E), indicating that the model gains more information about the system across iterations.
Thanks to our metabolomics measurement, we were also able to examine the yield of other surfactin variants (Surfactin B and Surfactin D), as well as the lipopeptide Iturin A which is assembled by a different biosynthetic cluster. We found that their production profiles are similar to that of Surfactin C (Fig. S1), which is consistent with previous reports on co-production of fengycin and surfactin [62] and has also been observed in other categories of biosynthetic gene clusters [63].
To further examine the cellular constraints that affect surfactin production, we investigated trade-offs between lipopeptide production and maximal growth in silico (Fig. 2). Using the final machine learning models, trained on three DBTL runs, we simulated hundreds of media compositions and constructed Pareto-like constraints between lipopeptide titre and the maximum optical density (OD), which was also simulated from a respective model. The results show no evident trade-off for Surfactin C and Surfactin B in relation to growth (Fig. 2A), which suggests that Bacillus can achieve high titre with minimal sacrifice in biomass. However, we observed a more pronounced trade-off for Surfactin D and Iturin A (Fig. 2B), where an ~5 % increase in titre is associated with ~10 % reduction in growth. This suggests a diversion of metabolic resources away from growth toward lipopeptide production.
Fig. 2.
Trade-offs between lipopeptide yield and growth. A) and C) For Surfactin C and Surfactin B, growth can be minimally sacrificed to obtain a maximum of production, therefore, no trade-off is observed B) and D) For Surfactin D and Iturin A, the Pareto front indicates an evident trade-off between lipopeptide production and maximum bacterial growth as optical density (OD). The maximum production of Iturin A and Surfactin D occurs at lower growth rates. In both cases the sample points were computed from simulations of lipopeptide production and maximum OD in the growth curve with the final Gaussian process regressor model using a randomized sampling of the input space. Limits of the Pareto fronts are shown using red triangles.
The metabolic features of the loop samples show a relationship with surfactin production, and highlight the diversity of the samples. In addition to the lipopeptides, metabolites associated with carbon metabolism and the tricarboxylic acid cycle (TCA) can be measured using the flow injection method (Fig. S2). Although these are not included in the medium, we were able to detect them in the spent medium because of export processes and release of cytoplasmic content from membrane disruption caused primarily by surfactin itself.
Using data from the loop samples alone, we performed a correlation analysis amongst the 4 lipopeptides, 18 other metabolites, and 2 other signals derived from growth data (maximum OD and final OD). A hierarchical clustering dendrogram highlights two primary clusters of metabolites, further divided into four correlated subgroups (Fig. 3A). The first (6) and second groups (7) of the primary cluster contain a combination of amino acids and additional metabolites from the glycolysis/glucogenesis pathway. In contrast, the third (7) and fourth groups (5) consist of lipopeptides (4), and organic acids related to carbon metabolism (6), respectively. Both Canonical Correlation Analysis (CCA) and a PERMANOVA tests validated the distinct nature of these groups (CCA: p-value ~1e-5 across all dimensions; PERMANOVA: p-value 1e-4). The organic acids in the fourth group show a positive correlation with lipopeptide production and are components or affiliates of the tricarboxylic acid cycle, suggesting heightened activity in this pathway.
Fig. 3.
Analysis of background metabolic changes using the mass spectrometry data. A) Spearman correlation between the measured metabolites/growth across loop samples. The rows and columns are ordered by hierarchical clustering, shown as a dendrogram. Red colours indicate high positive correlation, while bluer colours indicate high negative correlation. Two primary clusters were identified from the dendrogram, which can be subdivided into 4 groups. Each of the groups is labelled within general biochemical categories. B) PCA biplot of the active learning loop samples. Iteration samples are depicted as points with distinct colours; blue for iteration 0, orange for iteration 1, and green for iteration 2. The arrows in the biplot represent the loading scores of each metabolite, indicating their influence on the principal components. This information is also presented in the scores accompanying the metabolite labels. To facilitate reading in the biplot, the metabolite names are abbreviated when necessary.
We performed a Principal Component Analysis (PCA) (Fig. 3B) to explore the metabolic diversity of samples across different iterations, and additional relationships between metabolite measurements. The first and second components account for 44.1 % and 17.8 % of the explained variance, respectively. Notably, the lipopeptides and TCA cycle-associated organic acids account for the variance in the positive values of the first PCA component. In contrast, the metabolites from the primary cluster in Fig. 3A show the opposite trend. The associated loading scores are similar in magnitude, indicating that no individual metabolite dominates the variance contribution after the reduction. Moreover, the sparsity of the points in the PCA space suggests that samples are metabolically diverse. To visualise the conclusions from the previous correlation/PCA analysis about how TCA metabolites have a similar landscape, and why the opposite trend is found in other metabolite categories, we embedded the metabolite surfaces into a simplified pathway diagram for anabolism in Bacillus subtilis (Fig. S5) [64]. These are the predictions of the machine learning model for the abundance of each metabolite across the carbon and nitrogen concentration ranges. We found that L-arginine, which is linked with 2-oxoglutarate and is in the same pathway with glutamate synthesis, an amino acid in surfactin, possesses high abundancy with higher carbon-nitrogen concentration, suggesting higher activity in the glutamine synthetase-glutamate synthase (GOGAT) cycle and overflowing of this compound to the medium [65], [66].
2.2. Sensitivity of metabolite production to changes in media composition
To quantify how metabolism reacts to simultaneous changes in carbon and nitrogen, we performed a novel directional analysis, whereby the direction in which carbon and nitrogen changes might have specific effect in the metabolic enzymes and reactions that can be measured with reference to the production surfaces generated. We term this concept as “metabolic anisotropy”, i.e., non-uniformity in different directions or specifically in experimental terms, in simultaneous changes in medium composition. The approach consists of starting at the observed maximum of surfactin titre and tracing orthogonal trajectories to the level curves in different angular directions, until they reach a specified radius. This allows us to calculate the gradient of metabolite/growth/lipopeptide levels on angles from 0° to 360° respective to the Surfactin C maximum (Fig. 4). For example, 0° corresponds to only increasing glucose, while 90° corresponds to only increasing NH4Cl concentration. Taking a radius of 0.6 % glucose and 24 mM of nitrogen around the maximum carbon/nitrogen models, we used the trained models to calculate the gradient along the depicted blue circle (Fig. 4A). A radar chart shows that several metabolites show near symmetric profiles, i.e., the rate of change of its abundance in every direction is similar, while others possess unusual long gradients for certain directions (Figure 6B). After filtering the gradient profiles that possess overall symmetry across every angle, we found 6 metabolites which present an asymmetric gradient profile, i.e., when stepping down of the optimum carbon/nitrogen concentration and moving towards another combination in a specific direction, the decrease/increase on that metabolite production is significantly different to when choosing another direction. Specifically, L-Phenylalanine, 6-Phosphogluconic acid and R-2,3-Dihidroxy-isovalerate exhibits a higher gradient (abundance change) at the 135° direction, corresponding to decreasing glucose concentration and increasing nitrogen concentration (Fig. 6C). On the other hand, Succinate, (S)-Malate, and L-Arginine show higher gradients on the right half of the angle plane, moving towards increasing glucose concentration (Fig. 6C). L-arginine, as we have shown before, has a production profile strongly associated with increasing carbon/nitrogen, pointing to a rapid response from nitrogen metabolism. The observed anisotropy on abundance changes for specific metabolites in the media could be related to multiple factors, including rapid enzymatic action, transient metabolic fluxes, overexpression of exporting systems, among others, and has not been thoroughly studied before on this experimental context, as far it is known. However, further experimentation is needed to confirm these profiles.
Fig. 4.
Directional analysis reveals metabolites with high sensitivity to changes in carbon and nitrogen composition nearby the Surfactin C maximum. A) Radius around the maximum titre in the Surfactin C surface to calculate gradients in each metabolite production surface B) The gradients are calculated by simulating 100 samples in a path between the maximum titre point and a specific direction in the defined radius. Then the average gradient is obtained by averaging the (approximated) slopes for these samples. C) Profile of average gradient for different directions depicted as a radar chart. Each colour corresponds to a specific metabolite. The production surface for each metabolite was used to make the calculations. D) For specific metabolites, the profile is highly asymmetric, suggesting flux redistribution or enzyme kinetic changes when changing media composition simultaneously in a specific direction. The color legend is associated to each metabolite: (S)-Malate, 6-Phosphogluconic acid, L-Arginine, L-Phenylalanine R-2,3-Dihidroxy-isovalerate Succinate.
3. Discussion
Large scale surfactin production in a liquid culture remains a challenge in bioprocessing [8], since higher titres are associated with lower growth. Therefore, we identified a need for high-throughput analysis to optimize surfactin production and overcome the trade-off between cell growth and biosynthesis. However, this should be accompanied by measurement technology that can keep with the pace of the experimental pipeline, as we exemplified with flow injection mass spectrometry and advanced statistical methods that enable human-readable data inspection. The methods and recommendations included in the paper have the potential to accelerate the development of DBTL-active learning platforms. The detailed picture of the rich metabolic data embedded in surfactin-associated pathways agrees with the literature [25], [26] and provides a powerful guide for further applications in metabolic engineering. Titre variation is a fundamental factor to consider when performing a Bayesian optimisation loop [41], [67]. Sources of noise include the stochasticity of the underlying biological process, errors introduced by liquid handling, instrumental variability, among others. Fundamentally, one can establish a trade-off where the total number of samples is reduced and the number of replicates per batch is increased, to gain confidence in the model’s predictions and thus achieve sample-efficient optimisation and reliable production data that can be used in downstream analysis. In addition, the selection of an appropriate model and acquisition function to account for this noise is paramount. Thus, our approach with 6 biological replicates, but less combinations per plate than reported active learning experiments, and the use of a specific acquisition function can greatly help the understanding of the system through the iterations.
To our knowledge, our study employs a novel combination of metabolomics and active learning for the optimisation of a bioprocess. Using mass spectrometry (MS) as the detection platform is a general approach, since allows the optimisation of titres for compounds which a fluorometric/colorimetric method or a biosensor are not available. In this context, MS can be complemented with high-throughput multi-omics measurements. This idea has been intensively developed over the last years and they are an effective complement in the development of a bioprocess optimisation strategy that is at the same time metabolically informative [68]. These measurements are already contributing to synthetic biology and metabolic engineering [69], but more effort is needed to integrate this data in the algorithms and machine learning models used by active learning loops.
4. Methods
4.1. Strains and media
Bacillus subtilis DSM 3256, a surfactin-producing Bacillus strain, was acquired from the Deutsche Sammlung von Mikroorganismen und Zellkulturen (DSMZ) repository. M9 medium was prepared using the following recipe: Glucose: 0.4 % (carbon source), NH4Cl: 18.7 mM (nitrogen source), NaCl: 8.5 mM, MgSO4: 2 mM, CaCl2: 0.1 mM, Na2HPO4: 42.2 mM, KH2PO4: 22 mM [70]. For the optimization experiments, glucose and ammonium chloride were excluded from this recipe, giving a basal M9 salts/buffer solution to add the carbon/nitrogen sources later. Components of the M9 medium were purchased from Sigma Aldrich. The microplate experiments were performed using ultra-low attachment surface flat bottom 96-well microplates (Corning).
4.2. Microplate cultures
Bacillus subtilis DSM 3256 frozen stock was revived by culturing on an LB agar plate at 37 °C overnight. From this plate, individual colonies were inoculated into six precultures consisting of 5 ml of M9 medium. The cells were then grown in a shaking incubator at 37 °C and 180 rpm.
Conditions corresponding to the carbon-nitrogen concentrations in the medium, as well as controls, were block-randomised across a 96-well microplate, considering six replicates (blocks). These were manually pipetted into place. The layout for block randomisation was obtained from an R script utilising the agricolae package v.1.3–7 [71]. In each well, 100 µl of 2X M9 salts (M9 medium without a carbon or nitrogen source), 80 µl of the glucose/ammonium mix, and 20 µl of the preculture were added. This was adjusted to achieve an initial optical density at 600 nm (OD600) of 0.1 in each well. The microplate culture was conducted using a Tecan Infinite M200 PRO plate reader, recording an OD600 measurement every 10 min for 36 h at 37 °C. The agitation was set to a maximum of 10 mm. After culturing, the microplate was centrifuged for 20 min at 4000 rpm and 4 °C to separate the cell pellet from the supernatant. 80 µl of supernatant was taken from each well and stored in a − 80 °C freezer until mass spectrometry quantitation.
4.3. Surfactin and metabolite quantification
We utilised flow injection of spent media with a ThermoFisher Dionex Ultimate 3000 autosampler. The sample volume was set at 1 µl. The mobile phase comprised a 1:1 ratio of acetonitrile to water with 0.1 % formic acid, and the flow rate was 200 µl/min. The sample acquisition time spanned 1 min, using selected reaction monitoring (SRM) on a ThermoFisher TSQ Quantiva triple quadrupole (QqQ) mass spectrometer (MS). The MS parameters, as well as the precursor/product masses table, can be found in Supplementary Tables 1 and 2, respectively.
Peak extraction was accomplished using the rawrr package v1.10.1 in R [72]. Subsequently, metabolites that needed it underwent baseline correction using the asymmetric least squares algorithm from the Python pybaselines package v1.0.0 [73] set to default parameters. The corrected peaks were integrated over the 1-minute run using the trapezoid rule function from the Numpy package via a custom Python script. The integrated intensity data was then compiled into a table.
Outliers were identified and removed based on the interquartile range (IQR), keeping only values within the range of median - 1.5IQR to median + 1.5IQR. These values were then normalised by dividing them by the average M9 titres observed for each batch. A table comparing the relative surfactin C titre to the observed M9 titre for every condition or combination was constructed for the subsequent active learning prediction step. Similar tables were generated for additional metabolites.
4.4. Active learning loop
From the minimum and maximum concentrations of glucose and ammonium chloride that were selected for testing, a 2D design space was defined. Seven initial conditions were obtained from a Latin hypercube design (LHD). The centred LHD was implemented in Python using the pyDOE2 package [74]. After getting the relative surfactin C titre to the observed M9 titre from the MS quantification, these values were fitted using a heteroskedastic Gaussian process regression (GPR) model [75], [76], where the corresponding glucose/ammonium concentrations are inputted as variables/features and the titre is the observed output. From the GPR predictions, the q-Noisy expected improvement (q-NEI) acquisition function [67], [76] is calculated for the design space, and optimising this function retrieve seven combinations to be tested on the next iteration of the loop. The model and the acquisition function were implemented using the Ax v.0.3.2 and Botorch v0.8.4 library in Python [76], and default parameters were used. The default eta parameter for the acquisition is 0.01, considering equal balance between exploitation and exploration. Several additional scripts used in intermediate steps for formatting data tables and are described in the supplementary material. Surface plots, principal component analysis (PCA) and radar charts to explore and analyse the data were implemented using the matplotlib, seaborn and plotly packages in Python. PCA biplot was generated using the pca package in Python [77]. The surfactin molecule diagram was generated using the Pikachu package [78].
Author contributions
RVA: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Visualisation, Data Curation, Writing – Original Draft, Writing – Review and Editing. DAO: Conceptualization, Methodology, Writing – Original Draft, Writing – Review and Editing, Supervision. KB: Conceptualization, Methodology, Writing – Original Draft, Writing – Review and Editing, Supervision, Resources, Funding acquisition.
Declaration of Competing Interest
All authors declare that they have no conflict of interest related to this work.
Acknowledgements
RVA was supported by a Ph.D. studentship from the Darwin Trust of Edinburgh. DAO was supported by the United Kingdom Research and Innovation (grant EP/S02431X/1). KB was supported by the Engineering and Physical Sciences Research Council (grant EP/V042882/1). Access to the mass spectrometry equipment was provided by EdinOmics.
Footnotes
Supplementary data associated with this article can be found in the online version at doi:10.1016/j.csbj.2024.02.012.
Appendix A. Supplementary material
Supplementary material
.
References
- 1.Johnson P., Trybala A., Starov V., Pinfield V.J. Effect of synthetic surfactants on the environment and the potential for substitution by biosurfactants. Adv Colloid Interface Sci. 2021;288 doi: 10.1016/j.cis.2020.102340. [DOI] [PubMed] [Google Scholar]
- 2.Nikolova C., Gutierrez T. Biosurfactants and their applications in the oil and gas industry: current state of knowledge and future perspectives. Front Bioeng Biotechnol. 2021;9 doi: 10.3389/fbioe.2021.626639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Markande A.R., Patel D., Varjani S. A review on biosurfactants: properties, applications, and current developments. Bioresour Technol. 2021;330 doi: 10.1016/j.biortech.2021.124963. [DOI] [PubMed] [Google Scholar]
- 4.Eras-Muñoz E., Farré A., Sánchez A., Font X., Gea T. Microbial biosurfactants: a review of recent environmental applications. Bioengineered. 2022;13:12365–12391. doi: 10.1080/21655979.2022.2074621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Arima K., Kakinuma A., Tamura G. Surfactin, a crystalline peptide lipid surfactant produced by Bacillus subtilis: isolation, characterization, and its inhibition of fibrin clot formation. Biochem Biophys Res Commun. 1968;31:488–494. doi: 10.1016/0006-291X(68)90503-2. [DOI] [PubMed] [Google Scholar]
- 6.Sen R. In: Biosurfactants, Advances in Experimental Medicine and Biology. Sen R., editor. Springer; New York, NY: 2010. Surfactin: biosynthesis, genetics and potential applications; pp. 316–323. [DOI] [PubMed] [Google Scholar]
- 7.Shaligram N.S., Singhal R.S. Surfactin – a review on biosynthesis, fermentation, purification and applications. Food Technol Biotechnol. 2010;48:119–134. [Google Scholar]
- 8.Théatre A., Cano-Prieto C., Bartolini M., Laurin Y., Deleu M., Niehren J., Fida T., Gerbinet S., Alanjary M., Medema M.H., Léonard A., Lins L., Arabolaza A., Gramajo H., Gross H., Jacques P. The surfactin-like lipopeptides from bacillus spp.: natural biodiversity and synthetic biology for a broader application range. Front Bioeng Biotechnol. 2021;9 doi: 10.3389/fbioe.2021.623701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Théatre A., Hoste A.C.R., Rigolet A., Benneceur I., Bechet M., Ongena M., Deleu M., Jacques P. In: Biosurfactants for the Biobased Economy, Advances in Biochemical Engineering/Biotechnology. Hausmann R., Henkel M., editors. Springer International Publishing; Cham: 2022. Bacillus sp.: a remarkable source of bioactive lipopeptides; pp. 123–179. [DOI] [PubMed] [Google Scholar]
- 10.Dobler L, Breda GC, Rocha PM, Paiva WKV de, Santos ES dos, Oliveira RR de. Surfactin and surfactin-like production, purification, and application at marine environments; 2022. doi: 10.26434/chemrxiv-2022-0x7vd-v2 surfactin. World J Microbiol Biotechnol 38:143. doi:10.1007/s11274–022-03323–3. [DOI]
- 11.Steinke K., Mohite O.S., Weber T., Kovács Á.T. Phylogenetic distribution of secondary metabolites in the Bacillus subtilis species complex. mSystems. 2021;6 doi: 10.1128/mSystems.00057-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ongena M., Jacques P. Bacillus lipopeptides: versatile weapons for plant disease biocontrol. Trends Microbiol. 2008;16:115–125. doi: 10.1016/j.tim.2007.12.009. [DOI] [PubMed] [Google Scholar]
- 13.Seydlová G., Svobodová J. Review of surfactin chemical properties and the potential biomedical applications. Cent Eur J Med. 2008;3:123–133. doi: 10.2478/s11536-008-0002-5. [DOI] [Google Scholar]
- 14.Jacques P. In: Biosurfactants: From Genes to Applications, Microbiology Monographs. Soberón-Chávez G., editor. Springer; Berlin, Heidelberg: 2011. Surfactin and other lipopeptides from Bacillus spp. pp. 57–91. [DOI] [Google Scholar]
- 15.Zhen C., Ge X.-F., Lu Y.-T., Liu W.-Z., Zhen C., Ge X.-F., Lu Y.-T., Liu W.-Z. Chemical structure, properties and potential applications of surfactin, as well as advanced strategies for improving its microbial production. AIMSMICRO. 2023;9:195–217. doi: 10.3934/microbiol.2023012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Angelini T.E., Roper M., Kolter R., Weitz D.A., Brenner M.P. Bacillus subtilis spreads by surfing on waves of surfactant. PNAS. 2009;106:18109–18113. doi: 10.1073/pnas.0905890106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Raaijmakers J.M., De Bruijn I., Nybroe O., Ongena M. Natural functions of lipopeptides from Bacillus and Pseudomonas: more than surfactants and antibiotics. FEMS Microbiol Rev. 2010;34:1037–1062. doi: 10.1111/j.1574-6976.2010.00221.x. [DOI] [PubMed] [Google Scholar]
- 18.Rahman F.B., Sarkar B., Moni R., Rahman M.S. Molecular genetics of surfactin and its effects on different sub-populations of Bacillus subtilis. Biotechnol Rep. 2021;32 doi: 10.1016/j.btre.2021.e00686. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Stein T. Bacillus subtilis antibiotics: structures, syntheses and specific functions. Mol Microbiol. 2005;56:845–857. doi: 10.1111/j.1365-2958.2005.04587.x. [DOI] [PubMed] [Google Scholar]
- 20.Chen X., Lu Yajun, Shan M., Zhao H., Lu Z., Lu Yingjian. A mini-review: mechanism of antimicrobial action and application of surfactin. World J Microbiol Biotechnol. 2022;38:143. doi: 10.1007/s11274-022-03323-3. [DOI] [PubMed] [Google Scholar]
- 21.Hoefler B.C., Gorzelnik K.V., Yang J.Y., Hendricks N., Dorrestein P.C., Straight P.D. Enzymatic resistance to the lipopeptide surfactin as identified through imaging mass spectrometry of bacterial competition. Proc Natl Acad Sci USA. 2012;109:13082–13087. doi: 10.1073/pnas.1205586109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Luzzatto-Knaan T., Melnik A.V., Dorrestein P.C. Mass spectrometry uncovers the role of surfactin as an interspecies recruitment factor. ACS Chem Biol. 2019;14:459–467. doi: 10.1021/acschembio.8b01120. [DOI] [PubMed] [Google Scholar]
- 23.Koglin A., Löhr F., Bernhard F., Rogov V.V., Frueh D.P., Strieter E.R., Mofid M.R., Güntert P., Wagner G., Walsh C.T., Marahiel M.A., Dötsch V. Structural basis for the selectivity of the external thioesterase of the surfactin synthetase. Nature. 2008;454:907–911. doi: 10.1038/nature07161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Süssmuth R.D., Mainz A. Nonribosomal peptide synthesis—principles and prospects. Angew Chem Int Ed. 2017;56:3770–3821. doi: 10.1002/anie.201609079. [DOI] [PubMed] [Google Scholar]
- 25.Hu F., Liu Y., Li S. Rational strain improvement for surfactin production: enhancing the yield and generating novel structures. Micro Cell Fact. 2019;18:42. doi: 10.1186/s12934-019-1089-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Xia L., Wen J. Available strategies for improving the biosynthesis of surfactin: a review. Crit Rev Biotechnol. 2022;0:1–18. doi: 10.1080/07388551.2022.2095252. [DOI] [PubMed] [Google Scholar]
- 27.Wu Q., Zhi Y., Xu Y. Systematically engineering the biosynthesis of a green biosurfactant surfactin by Bacillus subtilis 168. Metab Eng. 2019;52:87–97. doi: 10.1016/j.ymben.2018.11.004. [DOI] [PubMed] [Google Scholar]
- 28.Bertrand B., Martínez-Morales F., Rosas-Galván N.S., Morales-Guzmán D., Trejo-Hernández M.R. Statistical design, a powerful tool for optimizing biosurfactant production: a review. Colloids Interfaces. 2018;2:36. doi: 10.3390/colloids2030036. [DOI] [Google Scholar]
- 29.Czinkóczky R., Sakiyo J., Eszterbauer E., Németh Á. Prediction of surfactin fermentation with Bacillus subtilis DSM10 by response surface methodology optimized artificial neural network. Cell Biochem Funct. 2023;41:234–242. doi: 10.1002/cbf.3776. [DOI] [PubMed] [Google Scholar]
- 30.Fonseca R.R., Silva A.J.R., França F.P.D., Cardoso V.L., Sérvulo E.F.C. Optimizing carbon/nitrogen ratio for biosurfactant production by a Bacillus subtilis strain. Appl Biochem Biotechnol 137. 2007:471–486. doi: 10.1007/s12010-007-9073-z. [DOI] [PubMed] [Google Scholar]
- 31.Wei Y.-H., Lai C.-C., Chang J.-S. Using Taguchi experimental design methods to optimize trace element composition for enhanced surfactin production by Bacillus subtilis ATCC 21332. Process Biochem. 2007;42:40–45. doi: 10.1016/j.procbio.2006.07.025. [DOI] [Google Scholar]
- 32.Mnif I., Ellouze-Chaabouni S., Ghribi D. Optimization of inocula conditions for enhanced biosurfactant production by Bacillus subtilis SPB1, in submerged culture, using Box–Behnken Design. Probiotics Antimicro Prot. 2013;5:92–98. doi: 10.1007/s12602-012-9113-z. [DOI] [PubMed] [Google Scholar]
- 33.Zouari R., Ellouze-Chaabouni S., Ghribi-Aydi D. Optimization of Bacillus subtilis SPB1 Biosurfactant production under solid-state fermentation using by-products of a traditional olive mill factory. Achiev Life Sci. 2014;8:162–169. doi: 10.1016/j.als.2015.04.007. [DOI] [Google Scholar]
- 34.Mohanty S.S., Koul Y., Varjani S., Pandey A., Ngo H.H., Chang J.-S., Wong J.W.C., Bui X.-T. A critical review on various feedstocks as sustainable substrates for biosurfactants production: a way towards cleaner production. Microb Cell Factor. 2021;20:120. doi: 10.1186/s12934-021-01613-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Valdés-Velasco L.M., Favela-Torres E., Théatre A., Arguelles-Arias A., Saucedo-Castañeda J.G., Jacques P. Relationship between lipopeptide biosurfactant and primary metabolite production by Bacillus strains in solid-state and submerged fermentation. Bioresour Technol. 2022;345 doi: 10.1016/j.biortech.2021.126556. [DOI] [PubMed] [Google Scholar]
- 36.Wang J., Guo R., Wang W., Ma G., Li S. Insight into the surfactin production of Bacillus velezensis B006 through metabolomics analysis. J Ind Microbiol Biotechnol. 2018;45:1033–1044. doi: 10.1007/s10295-018-2076-7. [DOI] [PubMed] [Google Scholar]
- 37.Eyke N.S., Koscher B.A., Jensen K.F. Toward machine learning-enhanced high-throughput experimentation. Trends Chem, Spec Issue: Mach Learn Mol Mater. 2021;3:120–132. doi: 10.1016/j.trechm.2020.12.001. [DOI] [Google Scholar]
- 38.Snoek J., Larochelle H., Adams R.P. 2012. Practical Bayesian optimization of machine learning algorithms. Advances in neural information processing systems. Curran Associates, Inc.
- 39.Shahriari B., Swersky K., Wang Z., Adams R.P., de Freitas N. Taking the human out of the loop: a review of bayesian optimization. Proc IEEE. 2016;104:148–175. doi: 10.1109/JPROC.2015.2494218. [DOI] [Google Scholar]
- 40.Frazier P.I. A Tutor Bayesian Optim. 2018 doi: 10.48550/arXiv.1807.02811. [DOI] [Google Scholar]
- 41.Garnett R. Cambridge University Press; 2023. Bayesian optimization. [Google Scholar]
- 42.Wang X., Jin Y., Schmitt S., Olhofer M. Recent advances in Bayesian Optimization. ACM Comput Surv. 2023;55 doi: 10.1145/3582078. [DOI] [Google Scholar]
- 43.Bai T., Li Y., Shen Y., Zhang X., Zhang W., Cui B. Transf Learn Bayesian Optim: A Surv. 2023 doi: 10.48550/arXiv.2302.05927. [DOI] [Google Scholar]
- 44.González J., Longworth J., James D.C., Lawrence N.D. Bayesian Optim Synth Gene Des. 2015 doi: 10.48550/arXiv.1505.01627. [DOI] [Google Scholar]
- 45.HamediRad M., Chao R., Weisberg S., Lian J., Sinha S., Zhao H. Towards a fully automated algorithm driven platform for biosystems design. Nat Commun. 2019;10:5150. doi: 10.1038/s41467-019-13189-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Radivojević T., Costello Z., Workman K., Garcia Martin H. A machine learning automated recommendation tool for synthetic biology. Nat Commun. 2020;11:4879. doi: 10.1038/s41467-020-18008-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Zhang J., Petersen S.D., Radivojevic T., Ramirez A., Pérez-Manríquez A., Abeliuk E., Sánchez B.J., Costello Z., Chen Y., Fero M.J., Martin H.G., Nielsen J., Keasling J.D., Jensen M.K. Combining mechanistic and machine learning models for predictive engineering and optimization of tryptophan metabolism. Nat Commun. 2020;11:4880. doi: 10.1038/s41467-020-17910-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Kumar P., Adamczyk P.A., Zhang X., Andrade R.B., Romero P.A., Ramanathan P., Reed J.L. Active and machine learning-based approaches to rapidly enhance microbial chemical production. Metab Eng. 2021;67:216–226. doi: 10.1016/j.ymben.2021.06.009. [DOI] [PubMed] [Google Scholar]
- 49.Borkowski O., Koch M., Zettor A., Pandi A., Batista A.C., Soudier P., Faulon J.-L. Large scale active-learning-guided exploration for in vitro protein production optimization. Nat Commun. 2020;11:1872. doi: 10.1038/s41467-020-15798-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Merzbacher C., Mac Aodha O., Oyarzún D.A. Bayesian optimization for design of multiscale biological circuits. ACS Synth Biol. 2023;12:2073–2082. doi: 10.1021/acssynbio.3c00120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Cosenza Z., Astudillo R., Frazier P.I., Baar K., Blockterlo D.E. Multi-information source Bayesian optimization of culture media for cellular agriculture. Biotechnol Bioeng. 2022;119:2447–2458. doi: 10.1002/bit.28132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Cosenza Z., Block D.E., Baar K., Chen X. Multi-objective Bayesian algorithm automatically discovers low-cost high-growth serum-free media for cellular agriculture application. Eng Life Sci. 2023;23 doi: 10.1002/elsc.202300005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Hashizume T., Ozawa Y., Ying B.-W. Employing active learning in the optimization of culture medium for mammalian cells. npj Syst Biol Appl. 2023;9:1–10. doi: 10.1038/s41540-023-00284-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Pandi A., Diehl C., Yazdizadeh Kharrazi A., Scholz S.A., Bobkova E., Faure L., Nattermann M., Adam D., Chapin N., Foroughijabbari Y., Moritz C., Paczia N., Cortina N.S., Faulon J.-L., Erb T.J. A versatile active learning workflow for optimization of genetic and metabolic networks. Nat Commun. 2022;13:3876. doi: 10.1038/s41467-022-31245-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Zhu L., Xu Q., Jiang L., Huang H., Li S. Polydiacetylene-based high-throughput screen for surfactin producing strains of Bacillus subtilis. PLoS One. 2014;9 doi: 10.1371/journal.pone.0088207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Yang H., Yu H., Shen Z. A novel high-throughput and quantitative method based on visible color shifts for screening Bacillus subtilis THY-15 for surfactin production. J Ind Microbiol Biotechnol. 2015;42:1139–1147. doi: 10.1007/s10295-015-1635-4. [DOI] [PubMed] [Google Scholar]
- 57.Ong S.A., Wu J.C. A simple method for rapid screening of biosurfactant-producing strains using bromothymol blue alone. Biocatal Agric Biotechnol. 2018;16:121–125. doi: 10.1016/j.bcab.2018.07.027. [DOI] [Google Scholar]
- 58.Heuson E., Etchegaray A., Filipe S.L., Beretta D., Chevalier M., Phalip V., Coutte F. Screening of lipopeptide-producing strains of bacillus sp. using a new automated and sensitive fluorescence detection method. Biotechnol J. 2019;14 doi: 10.1002/biot.201800314. [DOI] [PubMed] [Google Scholar]
- 59.Kubicki S., Bator I., Jankowski S., Schipper K., Tiso T., Feldbrügge M., Blank L.M., Thies S., Jaeger K.-E. A straightforward assay for screening and quantification of biosurfactants in microbial culture supernatants. Front Bioeng Biotechnol. 2020;8 doi: 10.3389/fbioe.2020.00958. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Cooper D.G., Macdonald C.R., Duff S.J.B., Kosaric N. Enhanced production of surfactin from bacillus subtilis by continuous product removal and metal cation additions. Appl Environ Microbiol. 1981;42:408–412. doi: 10.1128/aem.42.3.408-412.1981. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Willenbacher J., Yeremchuk W., Mohr T., Syldatk C., Hausmann R. Enhancement of surfactin yield by improving the medium composition and fermentation process. AMB Express. 2015;5:57. doi: 10.1186/s13568-015-0145-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Yaseen Y., Gancel F., Béchet M., Drider D., Jacques P. Study of the correlation between fengycin promoter expression and its production by Bacillus subtilis under different culture conditions and the impact on surfactin production. Arch Microbiol. 2017;199:1371–1382. doi: 10.1007/s00203-017-1406-x. [DOI] [PubMed] [Google Scholar]
- 63.Qi Y., Nepal K.K., Blodgett J.A.V. A comparative metabologenomic approach reveals mechanistic insights into Streptomyces antibiotic crypticity. Proc Natl Acad Sci USA. 2021;118 doi: 10.1073/pnas.2103515118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Koblitz J, Schomburg D, Neumann-Schaal M. MetaboMAPS: Pathway sharing and multi-omics data visualization in metabolic context; 2020. 10.12688/f1000research.23427.2. [DOI] [PMC free article] [PubMed]
- 65.Gunka K., Commichau F.M. Control of glutamate homeostasis in Bacillus subtilis: a complex interplay between ammonium assimilation, glutamate biosynthesis and degradation. Mol Microbiol. 2012;85:213–224. doi: 10.1111/j.1365-2958.2012.08105.x. [DOI] [PubMed] [Google Scholar]
- 66.He H., Li Y., Zhang L., Ding Z., Shi G. Understanding and application of Bacillus nitrogen regulation: a synthetic biology perspective. J Adv Res. 2023;49:1–14. doi: 10.1016/j.jare.2022.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Letham B., Karrer B., Ottoni G., Bakshy E. Constrained bayesian optimization with noisy experiments. Bayesian Anal. 2019;14:495–519. doi: 10.1214/18-BA1110. [DOI] [Google Scholar]
- 68.Wan S., Liu X., Sun W., Lv B., Li C. Current advances for omics-guided process optimization of microbial manufacturing. Bioresour Bioprocess. 2023;10:30. doi: 10.1186/s40643-023-00647-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Roy S., Radivojevic T., Forrer M., Marti J.M., Jonnalagadda V., Backman T., Morrell W., Plahar H., Kim J., Hillson N., Garcia, Martin H. Multiomics Data Collection, Visualization, and Utilization for Guiding Metabolic Engineering. Front Bioeng Biotechnol. 2021:9. doi: 10.3389/fbioe.2021.612893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.M9 minimal medium (standard) Cold Spring Harb Protoc. 2010;2010 doi: 10.1101/pdb.rec12295. [DOI] [Google Scholar]
- 71.Mendiburu F., Yaseen M.. agricolae: Statistical Procedures for Agricultural Research, R package version 1.4.0; 2020. 〈https://myaseen208.github.io/agricolae/.https://cran.r-project.org/package=agricolae〉.
- 72.Kockmann T., Panse C. The rawrr R Package: Direct Access to Orbitrap Data and Beyond. J Proteome Res. 2021;20:2028–2034. doi: 10.1021/acs.jproteome.0c00866. [DOI] [PubMed] [Google Scholar]
- 73.Erb D. pybaselines: A Python Libr Algorithms Baseline Correct Exp data. 2022 doi: 10.5281/zenodo.5608581. [DOI] [Google Scholar]
- 74.Sjögren R., Svensson D.. PyDOE2, a fork of the pyDOE package for design of experiments; 2018. 〈https://github.com/clicumu/pyDOE2/tree/master〉 (Accessed: 10/11/2023).
- 75.Williams C.K., Rasmussen C.E. MIT Press Cambridge; MA: 2006. Gaussian processes for machine learning. [Google Scholar]
- 76.Balandat M, Karrer B, Jiang DR, Daulton S, Letham B, Wilson AG, Bakshy E. BOTORCH: a framework for efficient Monte-Carlo Bayesian optimization. In: Proceedings of the 34th international conference on neural information processing systems, NIPS’20. Red Hook, NY, USA: Curran Associates Inc; 2020. p. 21524–38.
- 77.Taskesen E.. pca: A Python Package for Principal Component Analysis; 2020. 〈https://github.com/erdogant/pca/〉 Accessed: 10/11/2023.
- 78.Terlouw B.R., Vromans S.P.J.M., Medema M.H. PIKAChU: a Python-based informatics kit for analysing chemical units. J Chemin- 2022;14:34. doi: 10.1186/s13321-022-00616-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary material