Abstract
With the rise in engineered biomolecular devices, there is an increased need for tailor-made biological sequences. Often, many similar biological sequences need to be made for a specific application meaning numerous, sometimes prohibitively expensive, lab experiments are necessary for their optimization. This paper presents a transfer learning design of experiments workflow to make this development feasible. By combining a transfer learning surrogate model with Bayesian optimization, we show how the total number of experiments can be reduced by sharing information between optimization tasks. We demonstrate the reduction in the number of experiments using data from the development of DNA competitors for use in an amplification-based diagnostic assay. We use cross-validation to compare the predictive accuracy of different transfer learning models, and then compare the performance of the models for both single objective and penalized optimization tasks.
1. Introduction
Tailoring biological sequences, such as oligonucleotides or proteins, for specific applications is a common challenge in bioengineering. These engineered molecules have a variety of uses including in biosensors (Hua et al., 2022; Deng et al., 2023; Goertz et al., 2023), medical therapeutics (Badeau et al., 2018; Blakney et al., 2019; Ebrahimi and Samanta, 2023) and bio-computing (Siuti et al., 2013; Qian et al., 2011; Lv et al., 2021). However, development often requires expensive or time consuming experiments, meaning good experimental design is necessary to optimize the biological sequences within the experimental budget (Cox and Reid, 2000). This also leads to better analysis, especially when there are interaction effects between input factors, which is common in biological experiments (Kreutz and Timmer, 2009; Politis et al., 2017; Papaneophytou, 2019; Fellermann et al., 2019; Narayanan et al., 2020; Gilman et al., 2021).
Iterative experimental designs have the advantage of using information from previous experiments to inform future ones. Bayesian optimization is an iterative global black box optimization strategy (Snoek et al., 2012; Shahriari et al., 2016) which has proven effective for design of biomolecular experiments including vaccine production (Rosa et al., 2022), antibody development (Khan et al., 2023), design and manufacturing of proteins and tissues (Romero et al., 2013; Mehrian et al., 2018; Narayanan et al., 2021; Gamble et al., 2021), validation of molecular networks (Sedgwick et al., 2020) and extracellular vesicle production (Bader et al., 2023). In Bayesian optimization, a surrogate model, usually a Gaussian process, of the system is built using data and an acquisition function decides which data point to collect next. Gaussian processes are a powerful tool for designing biological experiments in low data regimes due to their uncertainty estimates (Hie et al., 2020).
When many similar biological sequences need to be designed, it can be harder to optimize all the sequences within the experimental budget. Optimizing each sequence from scratch discards useful information from previous tasks, meaning more experiments are required. An alternative is to use transfer learning — a technique that improves the learning of new sequences by using knowledge gained from other optimization tasks (Zhuang et al., 2021). Transfer learning is closely related to multi-task learning, where information is shared between tasks that are optimized at the same time. The approach outlined here can be used for either, and we will use transfer learning as an umbrella term for both.
As we require our surrogate model to be data efficient and have uncertainty quantification, we consider four Gaussian process models: an average Gaussian process (AvgGP), the multi-output Gaussian process (MOGP), the linear model of coregionalisation (LMC) and the latent variable multi-output Gaussian process (LVMOGP). The key difference between these Gaussian process models lies in their handling of correlations between outputs: from no correlation in the MOGP to non-linear correlation in the LVMOGP.
We apply these surrogate models in conjunction with Bayesian optimization for efficient optimization of bio-molecules, as shown in Figure 1. We focus specifically on the development of a new modular diagnostic assay, based on competitive polymerase chain reaction (PCR), for measuring expression of multiple genes simultaneously, giving a single end point readout (Goertz et al., 2023). This diagnostic requires many competitor DNA sequences to be optimized to have the correct amplification properties in PCR reactions, and we believe the relationship between the responses of the competitors may be non-linear. For optimal results, these competitors should have a predefined amplification curve rate; and a nuisance drift factor should ideally be below a certain threshold to allow for a more stable readout.
Figure 1:
Design of experiments workflow for optimizing the competitor DNA molecules. (A) Data is collected in the lab using a DNA amplification reaction assay. (B) The rate and drift are then calculated by fitting amplification curves. (C) A transfer learning surrogate model uses the data to predict the rate and drift for each of the given competitors. The LVMOGP is introduced in Section 2.2.3. Information is shared through the latent space, with one point on the latent space for each competitor. The shaded regions indicate the uncertainty. The 3D plots are predictions of the model for given competitors. (D) The Bayesian optimization algorithm, introduced in Section 2.3, combines information about the rate and drift surfaces in an acquisition function to select the experiment to run for each competitor. The solid lines in the rate and drift plots represent the mean of the Gaussian process models, while the shaded regions are 2 × standard deviation. This process is repeated until all optimal competitor sequences are found or the experimental budget is exhausted.
We use synthetic data experiments to compare the Gaussian process models in different settings. We then use cross-validation to verify the benefit of the LVMOGP for modeling the response of the competitors, using data from DNA amplification experiments. We confirm that a LVMOGP surrogate model in conjunction with the design of experiments workflow speeds up optimization of the competitors both when only the single objective of rate is optimized and when rate is optimized with a penalty on drift over a given threshold.
2. Materials and Methods
2.1. Gaussian Process Regression
A Gaussian process is a stochastic process representing an infinite collection of random variables, the joint distribution of any subset of which is a multi-dimensional Gaussian distribution (Rasmussen and Williams, 2006). A Gaussian process is fully defined by its mean and covariance functions:
| (1) |
where is our input. For a full nomenclature see Appendix A. We assume our output data to be noisy evaluations of :
| (2) |
where and is the noise variance.
The choice of kernel and hyperparameter initializations for a given application depends on prior information about the system, see Appendix D for more details. Often this implies setting the mean function to zero, which is what we do here. A common kernel function is the squared exponential, which is a stationary kernel that assumes the data-generating function is smooth:
| (3) |
where is the kernel variance and is the lengthscale of dimension (Rasmussen and Williams, 2006, Chapter 4). Given a set of training data , the training inputs can be aggregated into the matrix and the training observations aggregated into the vector . It is then possible to write a joint distribution of the training observations and predicted function value at prediction locations . Thus, the mean and covariance of the Gaussian process at the prediction points can be calculated respectively:
| (4) |
| (5) |
| (6) |
The hyperparameters are optimized by maximizing the marginal likelihood , which is calculated in closed form (Rasmussen and Williams, 2006, Chapter 2).
2.2. Gaussian Processes with Multiple Outputs
2.2.1. Independent Gaussian Processes with Shared Kernel
The multi-output Gaussian process (MOGP) allows for multiple outputs such that (Álvarez et al., 2012). All outputs have the same kernel function and hyperparameters but function values on different outputs are uncorrelated. This means the kernel of the MOGP is a block diagonal with if and if where is the output index. The joint distribution for two outputs and evaluated at points and is given by:
| (7) |
We use the MOGP to demonstrate the setting of no transfer of information about function values.
2.2.2. Linear Model of Coregionalization
The linear model of coregionalization (LMC) extends the MOGP to model linear correlations between output surfaces by assuming they are linear combinations of Gaussian process latent functions:
| (8) |
where is a vector of weights are shared latent functions, is a latent function that allows for some independent behavior and is a learned constant (Álvarez et al., 2012; Bonilla et al., 2007).
This leads to a Kronecker structured kernel such that the joint distribution between two functions and is given by:
| (9) |
where is an element of , a matrix determining the similarity between functions and there are different covariance functions . If , this is known as the intrinsic coregionalization model (Álvarez et al., 2012).
Coregionalization methods have successfully been used for Bayesian optimization (Cao et al., 2010; Swersky et al., 2013; Tighineanu et al., 2022) and applied to the optimization of synthetic genes (González et al., 2015) and chemical reactions (Taylor et al., 2023). However, coregionalization methods assume the response surfaces are linear combinations of a small number of latent functions, so they can fail to fit and predict well on data with non-linear similarity between surfaces.
2.2.3. Latent Variable Multi-output Gaussian Process
The latent variable multi-output Gaussian process (LVMOGP) introduced by Dai et al. (2017) can model non-linear similarities. It does so by augmenting the input domain of a Gaussian process with a dimensional latent space . Each output function has a latent variable, such that the latent variables are denoted by . The LVMOGP assumes output is generated by:
| (10) |
where . The latent space allows the LVMOGP to automatically transfer learn between output functions as it will cluster similar output functions together and place wildly different ones far apart on the latent space. The distance in the latent space and the latent space lengthscale determines the amount of correlation between different output functions. To account for uncertainty in the placement of the latent variables, they are treated as distributions rather than point estimates, such that . For more details on the implementation of the LVMOGP see Appendix B.
Similar latent variable models have been used for Bayesian optimization of material development (Zhang et al., 2020) and for transfer learning across cell lines (Hutter et al., 2021). However, these methods treat the latent variables as point estimates rather than distributions as in the LVMOGP, which can cause poor uncertainty estimates, especially at low data regimes.
2.2.4. Comparison of Gaussian Process Models
In our comparisons, we include a fourth model called the average Gaussian process (AvgGP), which treats all the data as if it has come from the same response surface. Figure 2 shows predictions of the four Gaussian process models on a toy data set with linear correlation between output functions. See Appendix C for details of the data generation. As the AvgGP doesn’t differentiate between surfaces, it doesn’t fit any response surface well. The MOGP only shares hyperparameters but no information about function values between response surfaces, meaning it makes worse predictions and has more uncertainty on new response surfaces. The LMC has a better mean prediction than the MOGP as it shares information between response surfaces. The LVMOGP similarly has better mean prediction than the MOGP as it shares information across response surfaces through the latent space. If and is the identity matrix, then the LMC recovers the MOGP. If a linear kernel is applied to the latent dimensions of the LVMOGP, the LMC is recovered, and by making the distance between latent variables large relative to the lengthscale, the MOGP can be recovered too. The fact there are hyperparameter settings for the LMC and LVMOGP that recover the MOGP is promising for preventing negative transfer, as in the case where there is no correlation between response surfaces they can just revert to the MOGP. However, this is only true for large data sets — in low data regimes, we may expect some negative transfer in the no correlation case, due to uncertainty in the hyperparameter values and, in the case of the LVMOGP, a prior on the existence of correlations.
Figure 2:
Predictions of the four Gaussian process models fitted to a toy dataset with linear correlation between output surfaces. The dots are the data, the dashed line is the true function, the solid line is the Gaussian process mean prediction and the shaded region is two times the predicted standard deviation, meaning around 95% of the data points should lie within the shaded region. The bottom row explains how data is transferred between the surfaces by each model. For the average Gaussian process (AvgGP), all data is assumed to be from the same surface, for the multioutput Gaussian process (MOGP) information is only transferred about the hyperparameter values but not the function values. In the linear model of coregionalisation (LMC) information is transferred via the similarity matrix B and in the latent variable multiouput Gaussian process (LVMOGP) it is transferred through the latent space. Theoretically, LMC and LVMOGP can learn if information can be transferred and (if so), how much.
2.2.5. Gaussian Process Implementation
Details of the data processing and hyperparameter initialisations can be found in Appendix D. All coding was done in Python using version 3.9. The Gaussian process models were implemented using GPFlow 2.3.0 (Matthews et al., 2017). GPFlow has implementations of the standard Gaussian process, MOGP and the LMC. Our LVMOGP was implemented as a new GPflow model class, which can be accessed via the Github links in Appendix E. Other packages used include PyMC3 3.11.4 (Salvatier et al., 2016) for Bayesian parameter estimation, Numpy 1.21.4 (Harris et al., 2020), Scipy 1.7.1 (Virtanen et al., 2020) and Pandas 1.3.4 (The pandas development team, 2023) for data processing and Matplotlib 3.4.3 (Droettboom et al., 2015) for visualization.
2.3. Bayesian Optimization
Bayesian optimization is a sequential experimental design strategy for finding the global minimum (or maximum) of an objective function (Shahriari et al., 2016; Snoek et al., 2012). As the objective function is unknown, a surrogate model is used to represent the posterior belief of the objective function and updated every time a new data point is observed. An acquisition function is then used to select the next data point to collect. A common acquisition function is the expected improvement which trades off exploration of regions with little data and exploitation of regions which are expected to be optimal (Jones et al., 1998; Garnett, 2023). This process is repeated until the optimum has been found or the experimental budget is exhausted.
2.3.1. Acquisition Function
Rather than maximizing or minimizing the rate, as is usual in Bayesian optimization, we wish to minimize the difference between the rate, , and the target rate, :
| (11) |
Therefore, we use the target vector optimization acquisition function, that extends the expected improvement acquisition function to minimize the Euclidean distance between a target vector and a vector of the current predicted values (Uhrenholt and Jensen, 2019). As we are only optimizing the rate, we use their formulation with scalars instead of vectors. In this formulation, a stochastic variable is defined as where is the output value at input and is our target value. The distribution of is modeled with the aim of minimizing . If the response surfaces are Gaussian processes, then can be approximated using a non-central distribution (Uhrenholt and Jensen, 2019). The expected improvement for this non-central distribution is expressed as:
| (12) |
where is the minimum observed so far, is root mean of the variances of each output evaluated at the training points, , and is an approximate cumulative distribution with non-centrality parameter defined in the paper (Uhrenholt and Jensen, 2019).
2.3.2. Bayesian Optimization with Drift Penalty
To ensure the drift value remains below, or close to the threshold, we use the probability of feasibility to encourage the algorithm to select points that have a high chance of being below the threshold (Schonlau et al., 1998):
| (13) |
where is the value of drift function at , and is the drift threshold.We then multiply the expected improvement by the probability of feasibility to get our final acquisition function:
| (14) |
The probability of feasibility has been used for optimization applications including analog circuits (Lyu et al., 2018) and materials design (Sharpe et al., 2018).
2.3.3. Performance Metrics
For both the synthetic experiments and the cross-validation experiments we assessed the fit of Gaussian process models with two performance metrics: root mean squared error (RMSE):
| (15) |
and negative log predictive density (NLPD):
| (16) |
| (17) |
These are both calculated on a test set of input locations of length . The RMSE is useful for comparing the mean predictions of the Gaussian processes, while the NLPD also indicates how good the uncertainty estimate is, both of which are important for effective exploration and exploitation. For assessing the Bayesian optimization algorithm, we use cumulative regret:
| (18) |
where and are rate and drift training data, is the data point closest to the target out of both training and candidate sets for that surface and is a penalty for exceeding the drift threshold.
2.4. Data Collection
Each competitor has predefined primers and fluorescent probes and a design region where the sequence can be altered. Rather than tackling the difficult combinatorial problem of optimizing the sequence directly, we reduce the problem to two key input variables: the number of base pairs (BP) and guanine-cytosine content (GC) as in Figure 3. This converts the design space into a more manageable continuous form and reduces the input dimensions, which is beneficial when data is limited. For each BP-GC combination, chosen by an expert researcher, a polymerase chain reaction (PCR) assay generates an amplification curve, from which rate and drift are calculated. In total, we have data on 34 different competitors and wish to optimize 16 of these. Across the 34 competitors, we have 592 data points at 327 unique input locations, with 1 to 6 repeats at each location. See Appendix F for a summary of the data.
Figure 3:
Schematic of the competitor design space. For a given competitor DNA molecule, the primers and fluorescent probe regions are fixed. We can edit the design region to ensure the sequence has a given number of base pairs and guanine-cytosine content. Changing the number of base pairs and guanine-cytosine-content affects the rate and drift of the competitor, allowing us to fine-tune to the rate and drift required for the diagnostic assay.
The rate and drift for each amplification curve were calculated using the following equations:
| (19) |
| (20) |
where and are the end point and starting fluorescence, is carrying capacity, is the rate, is the drift and is cycle number.
2.4.1. Polymerase Chain Reactions
To perform the PCR reactions, we used an Applied Biosystems QuantStudio 6 Flex using Applied Biosystems MicroAmp EnduraPlate Optical 384-well plates (Thermo Fisher Scientific, Waltham, MA, USA). The theromcycling stages consisted of a melt step at 95°C for 3 seconds and an annealing step at 60°C. All reactions were performed at 10 μL and used Applied Biosystems TaqMan Fast Advanced Master Mix. Either fluorescent probes or EvaGreen dye (Biotium, Fremont, CA, USA) were used as reporters.
2.4.2. DNA Sequences
For each BP-GC combination for a given competitor, NUPACK (Zadeh et al., 2011) was used to create a DNA sequence with the correct number of base pairs and guanine-cytosine content, as well as the correct sequences for the primer and probes. These sequences, alongside synthetic natural target analogs, were purchased from Twist Biosciences (San Francisco, CA) or as eBlock Gene Fragments from Integrated DNA Technologies (“IDT”, Coralville, IA, USA). Primers and probes were also purchased from IDT.
3. Results
3.1. Synthetic Data Experiments
To explore the performance of the MOGP, AvgGP, LMC and LVMOGP, we ran experiments on synthetic data sets representing three test cases: uncorrelated, linearly correlated and horizontally offset response surfaces. All synthetic experiments had two response surfaces each with 30 points observed and 10 new response surfaces with no points observed initially. We added one random point to each new response surface every iteration and recorded the RMSE and NLPD for the Gaussian process models’ predictions. Figure 4 shows the RMSEs and NLPDs of the Gaussian process models for these test settings.
Figure 4:
Results of experiments with synthetically-generated data. The plots on the left show example data-generating functions used for the synthetic experiments. The plots on the right show the RMSE and NLPD for the three different test response surface types for each of the Gaussian process models. New points are added randomly, and each line is the mean of 5 different randomly generated data sets, all generated from the same test functions.
For the uncorrelated test case, response surfaces were generated as independent samples of a Gaussian process prior with and . This test case was to check for negative transfer, where the sharing of information hinders rather than aids the learning process. In Figure 4, for the uncorrelated case the MOGP outperforms the other Gaussian process models for RMSE and NLPD until approximately 10 data points, although the RMSE of the MOGP at this point is still high. We expect the LMC and LVMOGP to have some negative transfer at very low data regimes as they have a prior expectation of correlations between response surfaces. However, with enough data, they should perform the same at the MOGP, which is corroborated by the results in Figure 4. Specifically, once the MOGP gets a reasonably low RMSE of < 0.25 the LMC and LVMOGP have achieved similar performance.
The response surfaces for the linearly-correlated test case were created as linear combinations of two latent functions, both generated as independent samples of a Gaussian process with squared exponential kernel, Equation 3, and and . The LMC outperforms the other two Gaussian process models except at very low data regimes, which is likely due to overconfidence of the LMC when it has little data. The LMC and LVMOGP outperform the MOGP even at high data regimes, showing the advantage of transfer learning.
The horizontally offset test case was chosen as a simple example where the LMC struggles to fit the data. The response surfaces were generated by offsetting a sigmoid function horizontally by a random constant. In this case, the LVMOGP outperforms the other Gaussian process models for both RMSE and NLPD. This is because the LVMOGP can learn new surfaces with very few data points, as all it needs to do is to correctly predict where the sloped region is. The LMC performs worse than the LVMOGP because the offset cannot be represented by a linear combination of its latent functions, meaning it requires more data to perform as well.
Across all the test cases, the LMC has poor NLPD at low data regimes. This is likely because it cannot express uncertainty in the deterministic matrix.
3.2. Prediction of DNA Amplification Experiments
The performance of the proposed design of experiments workflow was validated using data from competitor DNA amplification experiments. This was done in three parts: first cross-validation was performed to compare the predictive accuracy of the Gaussian process models; then a Bayesian optimization procedure was used to optimize only the rate; finally the Bayesian optimization with drift penalty procedure was applied.
In cross-validation, the training set consisted of all the data from the two competitors that had the most observations as well as a random subset of the remaining data, but ensuring all competitors had at least one data point. This was repeated 70 times for each percentage of data in the training set. We set both the rank of for the LMC and the latent dimensions of the LVMOGP to 10, see Appendix D.5 for a discussion on setting these parameters. Figure 5 shows the RMSE and NLPD of the Gaussian process models’ predictions. The LVMOGP outperforms the other Gaussian process models for both RMSE and NLPD for both rate and drift. The LMC has poor NLPD in comparison to the other Gaussian process models, suggesting it has poor uncertainty estimates.
Figure 5:
Results of cross-validation on the DNA amplification data for both rate and drift. For each cross-validation run, the training set consisted of all the data from two competitors and a random subset of the data on the remaining competitors, ensuring all competitors had at least one data point. This is repeated for different percentages of data in the training set, and for each percentage, it is repeated 70 times.
The AvgGP model shows little improvement with increased amounts of training data. This shows the limitations of averaging the surfaces and justifies modeling each response surface separately.
3.3. Optimization of DNA Amplification Experiments
Ideally, for the Bayesian optimization experiments we would integrate the algorithm into the experimental loop, collecting new data with each new recommendation of each Gaussian process model. However, due to the cost of experiments, this was infeasible. Instead, we performed retrospective Bayesian optimization using the existing competitive DNA amplification dataset. The data was split into training and candidate sets, with the design of experiments algorithm only allowed to choose the next point out of the candidate set. Bayesian optimization was run iteratively until all points had been selected or up to a maximum number of iterations, whichever happened first.
Two learning scenarios were tested: the learning many scenario where all data from the two competitors with the most data were fully observed to begin with and then 16 competitors optimized in parallel; and the one at a time where each of the 16 competitors was optimized individually, with the 33 remaining competitors included in the training set. For a discussion of the effect of the choice of initial surfaces, see Appendix G.5. These scenarios replicate likely lab experimentation scenarios — the first for when many competitors need to be optimized at once, and the second for when many competitors are already optimized and an extra one is added. The maximum number of iterations was 15 for the rate-only optimization and 20 or 10 for the penalized optimization, depending on the learning scenario.
We also considered two methods for choosing the first experiment for a new competitor with no previously observed data. Choosing the most central data point (center in Figure 6) offers both maximum reduction in variance across the response surface and ensures all competitor response surfaces have a comparable point, which may help the transfer learning methods determine their similarities. It is also a reasonable approximation of what a human experimenter without prior knowledge of the response surface might do. The second method is to let the Gaussian process model choose the first point (model’s choice in Figure 6) for a new competitor. For the AvgGP and the LVMOGP, this is possible as they can make posterior predictions on new response surface. For the LVMOGP, the latent variable of the new surface is determined as a weighted average of the latent variables of the response surfaces with data that have the same probe and at least one matching primer. If there are no surfaces with matching primers, we use a weighted average of the surfaces with the same probe. For the LMC and MOGP we have no posterior, so the first point is selected randomly. For these experiments, we set the values of the latent hyperparameters for the LMC and LVMOGP to 2, as discussed in Appendix D.5.
Figure 6:
Cumulative regret of each of the Gaussian process models for single objective (left) and penalized (right) Bayesian optimization. Each line indicates the mean across 24 random seeds and all competitors, while the shaded regions indicate the upper and lower 5% quantiles by random seed. The top row is when the first point on each new surface is selected as being the center point, and the bottom is when the model is allowed to choose the first point. The learning many scenario is when many competitors are being optimized at the same time, and the one at a time scenario is when one competitor is being optimized, with all others being in the training set.
3.3.1. Single Objective Bayesian Optimization
The left panel of Figure 6 shows the results of optimizing rate without considering the drift penalty. The variance in the results comes from three sources. The first is the random selection of the next point when two points have the same expected improvement — this causes unavoidable variation. The second is due to the Gaussian process models optimizing to different hyperparameter values due to different initializations. The different values arise because the optimization of the non-convex hyperparameter loss surfaces is difficult. The final source of variation is the random starting point for the MOGP and LMC.
In all cases, Figure 6 shows the LVMOGP has much lower cumulative regret than the other models. The LVMOGP also reaches the best point first the most often: 808 times across all competitors, all learning scenarios and seeds compared to 457, 498 and 484 for the MOGP, AvgGP and LMC respectively. See Table 2 in Appendix G.1 for a breakdown of these results. The center start point allows us to compare the performance of the Gaussian process models without being skewed by the first point. In this case, the LMC and LVMOGP have the lowest cumulative regret, with mean values of 1.08 and 0.91 respectively at the end of optimization, compared to 1.21 and 1.28 for the MOGP and AvgGP for the learning many case. The ordering changes between the center and model’s choice starting points, as in the latter the AvgGP and the LVMOGP are able to predict on new surfaces, giving them an advantage over the LMC and the MOGP when choosing the first point. For example, in learning many model’s choice scenario, the mean regret of the first points selected by the LVMOGP and the AvgGP are 0.464 and 0.499 respectively compared to 0.651 and 0.703 for the MOGP and the LMC. Table 8 in Appendix G.3 lists the mean regrets of the first points.
As the one at a time scenario includes the data from all other competitors, the Gaussian process models start with far more data than the learning many scenario. This means the AvgGP, the LMC and the LVMOGP all have less regret in the one at a time scenario, as they are able to transfer information about the function values of competitors to improve prediction of the target competitor behavior. This is most notable for the model’s choice start point, where the AvgGP, LMC and LVMOGP have final cumulative regrets of 0.93, 1.29 and 0.66 respectively, compared to 1.00, 1.63 and 0.80 for the learning many scenario. The MOGP does not transfer information about function values, so performs relatively worse than the other models for the one at a time scenario, with a final cumulative regret of 1.78 for the one at a time scenario as opposed to 1.69 for the learning many scenario.
Across all learning scenarios and start points, the LVMOGP has the smallest mean number of iterations to get within a tolerance of 0.05 of the value of the best point, with the LVMOGP taking a mean of 2.25 iterations, while the AvgGP, MOGP and LMC take 2.89, 3.02 and 2.93 respectively. For 16 competitors, this equates to 36 experiments needed for the LVMOGP compared to 49 for the MOGP. See Appendix G.1 for a break down by learning scenario and starting point and Appendix G.4 for box plots of the number of experiments taken by each model. The tolerance was set at 0.05 as this is approximately the level of experimental measurement uncertainty in the lab experiments Goertz et al. (2023).
3.3.2. Bayesian Optimization with Drift Penalty
The right-hand panel of Figure 6 shows the cumulative regret for optimization of the rate with a penalty on the drift. The LVMOGP has the lowest cumulative regret at the end for all scenarios, but doesn’t outperform the other models as much as in the single objective case. In all scenarios, the MOGP, LMC and LVMOGP fail to reach the best point for the same competitor, meaning the cumulative regret curves for these models don’t completely plateau. This is because they overestimate the value of the drift at the best point, so avoid selecting it. The AvgGP does find the best point for all competitors.
The LVMOGP barely outperforms the AvgGP for the learning many scenario with model’s choice starting point. This may be due to negative transfer in the drift predictions at very low data regimes making the selection of the first point sub-optimal.
Similar to the single objective case, the LVMOGP has the smallest mean number of iterations to get within a tolerance of 0.05 of the value of the best point, with a mean of 2.26 iterations compared to 3.38, 3.54 and 3.39 for the AvgGP, the MOGP and the LMC. For 16 competitors, this equates to 37 experiments needed for the LVMOGP compared to 57 for the MOGP, see Appendix G.4 for more details. Appendix G contains further Bayesian optimization results for both the single objective and penalized optimizations.
Figure 7 shows the rate and drift predictions and expected improvement for one iteration. Most notably, the MOGP has no transfer of information, so has almost equal expected improvement for most of the candidate points. The other three models transfer information across the competitors, meaning even with one data point, they have much more complex predictions than the MOGP. We can also see how the AvgGP, MOGP and LMC fit the drift poorly. This is because the drift is of a different order of magnitude depending on the fluorescent probe used. Most of the Gaussian process models are unable to detect this, meaning they end up with a poor fit to the data.
Figure 7:
Predictions for the rate and drift for each of the Gaussian process models. The BP and GC axes are in log and logit scales respectively. These plots show the mean of the Gaussian process model predictions and the uncertainty which here is 2 × standard deviation. The expected improvement with probability of feasibility is then plotted in the final column. This is for the case where we are optimizing competitor FP005-FP004-EvaGreen and have observed one data point so far, with the models able to choose the first point. The black contour lines on the mean plots indicate the target rate and threshold drift values.
4. Discussion
Expensive and time consuming experiments require an intelligent design of experiments strategy. This study demonstrates how a transfer learning surrogate model can be used in conjunction with Bayesian optimization to optimize biological sequences. For the specific case of designing competitor DNA molecules for a new diagnostic, reducing the number and therefore cost of experiments can help it reach the affordability criteria for point of care settings (Land et al., 2019).
In Bayesian optimization, we need a surrogate function with reliable mean and uncertainty estimates to ensure a balance between exploration and exploitation when selecting new points. Our cross-validation results in Section. 3.2 show the LVMOGP has better predictive accuracy than the other Gaussian process models for both rate and drift. These results also demonstrate one of the limitations of the LMC: the LMC has very high NLPD at low data regimes. This implies the LMC has poor uncertainty estimates and is overfitting, a result which has been previously observed (Dai et al., 2017).
To replicate a real-life iterative design of experiments regime, we performed Bayesian optimization on DNA amplification experimental data, but only allowing the models to select new points from existing data. For the single objective optimization case, the LVMOGP has lower cumulative regret than the other Gaussian process models for all test cases and starting points and requires fewer experiments on average to get within 0.05 tolerance of the best point. Specifically, the LVMOGP requires 13 and 20 fewer experiments than the no transfer MOGP model for the single objective and penalized cases respectively. This shows the LVMOGP transfer learning approach is useful both when optimizing multiple competitors at a time, and when using the data from all previous competitors to optimize a new one.
These results also demonstrate the advantage of a surrogate model that can predict unseen surfaces — both the LVMOGP and the AvgGP see a large improvement in regret when they are allowed to select the first point, both outperforming the MOGP and LMC where the first point is chosen at random.
When optimizing new biological sequences, there are often factors we wish to keep within a certain range such as purity (Degerman et al., 2006) or biophysical properties (Khan et al., 2023). While these can be treated as constraints, sometimes we may be willing to violate them slightly if it leads to a large improvement in the objective function. In these scenarios, we can add a penalty. To apply a penalty on the nuisance drift factor, we used the probability of feasibility to penalize any point predicted to be above the threshold drift value. In the penalized optimization, the LVMOGP had less cumulative regret than the other models but the difference in performance was smaller than that of the single objective optimization. This could be due to the added challenge of dealing with the penalty on drift.
There is variation in the performance of the Gaussian process models across random seeds due to the hyperparameter initialization. The LVMOGP has more variation due to its training being a harder optimization problem. While smart initialization and random restarts helped with this issue, future work could simplify the optimization procedures. The optimization of the Gaussian process models is discussed in Appendix D.3.
While the workflow outlined here will be useful for the optimization of new competitor DNA molecules, it is not specific to this application and could be used for other applications where it is necessary to optimize many similar tasks, such as engineering DNA probes (Lopez et al., 2018; Wadle et al., 2016), optimizing conditions for different cell lines (Hutter et al., 2021), inferring psuedotime for cellular processes (Campbell and Yau, 2015) or exploring protein fitness landscapes (Hu et al., 2023). We expect this method will scale well to settings with more output surfaces, and predictions will improve with more data. However, as with most Bayesian optimization approaches, it will not scale as well to high dimensional input settings (Wang et al., 2023).
We opted to use the LVMOGP to demonstrate how we can transfer information between tasks using proximity in latent space as Gaussian processes are data efficient and give good uncertainty predictions. However, we could replace Gaussian processes with any Bayesian model that gives priors over functions such as deep Gaussian processes (Damianou and Lawrence, 2013) or Bayesian neural networks (Goan and Fookes, 2020).
With the rise in lab automation, this workflow can be integrated into a design build test pipeline similar to Carbonell et al. (2018) and HamediRad et al. (2019) which can greatly reduce the time required to optimize new biomolecular components, speeding up the creation of new devices. This method could also be incorporated into hybrid models in bio-processing and chemical engineering, for decision making for systems with many similar components (Narayanan et al., 2023; Mowbray et al., 2021; Schweidtmann et al., 2021).
5. Conclusion
We have shown how a transfer learning design of experiments workflow can be used to optimize many competitor DNA molecules for an amplification-based diagnostics device. We used cross-validation to demonstrate that the latent variable multi-output Gaussian process has the best predictive accuracy and have shown it has the least regret when Bayesian optimization is performed on the DNA amplification data. Future improvements to the optimization of the model hyperparameters would lead to faster and more consistent performance of the algorithm. Despite this, we believe this workflow is applicable to many other biotechnology applications and should be used to reduce the experimental load when there are many similar tasks to be optimized but their similarity is a priori unknown.
A. Nomenclature
Acronyms
AvgGP Average Gaussian Process
BP Number of Base Pairs
DNA Deoxyribonucleic Acid
ELBO Evidence lower bound to marginal likelihood for LVMOGP
GC Percentage Guanine-Cytosine Content
LMC Linear Model of Coregionalization
LVMOGP Latent Variable Multi-output Gaussian Process
MOGP Multi-output Gaussian Process
NLPD Negative Log Predictive Density
PCR Polymerase Chain Reaction
RMSE Root Mean Squared Error
Functions
Acquisition function including probability of feasibility
Expected improvement acquisition function
Latent Gaussian processes in the linear model of coregionalisation
Gaussian Process
Function of
Drift function
Rate function in competitor amplification
Gaussian Process covariance function
Covariance function of the data
Cross covariance function between the data and inducing points
Covariance function of the inducing points
Gaussian Process mean function
Probability of feasibility
Parameters and Variables
Lengthscale of dimension
Noise added to where
Non-centrality parameter of target vector optimization expected improvement
stochastic variable defined as the squared difference between observed outputs and the target value
Mean of the latent variable
Predictions at locations
Latent variable of the pth output function
Identity matrix
Inducing variables
Vector of weights of the latent functions in the linear model of coregionalisation
Input location such that
Drift output data
Rate output data
Predicted mean at locations
Carrying capacity
Predicted covariance at locations X*
Kernel variance
Noise variance of Gaussian process
Variance of the pth latent variable
Cycle number
Gaussian Process hyperparameters
Coregionalization matrix in the LMC
Dimensions of x
Fluorescence in DNA amplification reaction
Fluorescence at the beginning of the DNA amplification reaction
Fluorescence at the end of the DNA amplification reaction
Latent variables such that
Mean of the variational distribution on Z
Number of output functions in multi-output Gaussian Process
Variational distribution
Number of covariance matrices in the LMC
Variance of the variational distribution on Z
Target rate
Drift threshold
Training inputs of Gaussian Process
Locations to be evaluated
Noisy evaluations of
Data point which is closest to the target out of the train and test datasets for a given surface
Inducing points
Miscellaneous
The latent space in the LVMOGP
An approximation to the cumulative non-central distribution function
B. Latent Variable Multi-output Gaussian Process Implementation
Gaussian processes are normally trained by maximizing the log marginal likelihood. However, the presence of the latent variable distributions in the LVMOGP means the log marginal likelihood is no longer tractable. Instead, Dai et al. (2017) used variational inference to approximate a lower bound to this log marginal likelihood, following the method proposed by Titsias (2009) and Titsias and Lawrence (2010). In variational inference, the aim is to minimize the Kullback-Leibler divergence between an approximate posterior and a true posterior.
Our implementation of the LVMOGP takes a concatenation of the input data and their corresponding latent variables where : to denotes the vector of latent inputs for each observed data point. All inputs for the same output dimension will have the same latent variable, .
For the LVMOGP this variational lower bound is given as:
| (21) |
where denotes a kernel expectation over the variational distribution of the latent variable of data point . and are the covariance functions of the data and the inducing points respectively, while is the cross covariance function between the two. is the trace of a matrix. and are the mean and covariance of the variational distribution over inducing points . The second term in this expression can be viewed as a data fit term, while the last term can be seen as a complexity penalty.
Two types of prediction are relevant using the LVMOGP. The first is when we have new input points and new position on the latent space . In this case, the posterior prediction can be calculated in closed form. The second, and more likely, prediction case is when we want to predict a new point at a point on the latent space where we already have data with latent variable . This integration is intractable, but following Titsias and Lawrence (2010), the first and second moments can be computed in closed form if using a squared exponential kernel.
C. Toy Dataset Creation
The dataset used in Figure 2 was generated by creating two latent functions from samples of a Gaussian process prior with the squared exponential kernel, in Equation 3, and and and multiplying them by random weights to create the output functions. To ensure the output functions could generate data anywhere, a Gaussian process was fitted to the densely sampled points for each output function. Data was then generated by evaluating the mean of the Gaussian processes at varying input locations adding noise . Similarly to the competitor dataset, the amount and location of data observed on each output function varies.
D. Gaussian Process Implementation
D.1. Data Standardization
For both the synthetic and competitor datasets we standardize the input and output data by subtracting the mean and dividing by the standard deviation, such that:
| (22) |
and similar for the output data. This is common practice for Gaussian process regression as it reduces numerical instability and allows for better interpretability of hyperparameter values, which is useful for initialization.
D.2. Choice of Gaussian Process Prior
When using Gaussian process models for real-world optimization tasks the Gaussian process prior should be informed by existing knowledge of the system. For example, the choice of kernel function can express belief of the smoothness or periodicity of the function and a mean function may be selected if there is a known trend in the data (Rasmussen and Williams, 2006, Chapters 2 & 4). For the competitor design task, we believed the function to be smooth, so opted for the squared exponential kernel in Equation 3. We did not have any prior information about a trend in the data so used a zero mean function.
D.3. Gaussian Process Hyperparameter Training
Gaussian processes are generally trained using the marginal likelihood, which automatically trades off data fit and model complexity, guarding against overfitting (Rasmussen and Williams, 2006, Chapter 5). Ideally, we would perform full Bayesian inference over the Gaussian process hyperparameters, however, this is often difficult and expensive due to the need to use approximation techniques to evaluate intractable integrals (Lalchand and Rasmussen, 2020).
Instead, we use type II maximum likelihood, a common approach of maximizing the marginal likelihood with respect to the hyperparameters. With sufficient data, this approach is justified based on the Laplace approximation and because in practice the posterior for the hyperparameters tends to be highly peaked (MacKay, 1999). However, at lower data regimes, the non-convexity of the marginal likelihood surface can cause overfitting due to multiple modes (Lalchand and Rasmussen, 2020). Low data regimes can also lead to hyperparameters being weakly identified, leading to flat ridges in the marginal likelihood surface, making the optimization sensitive to starting values (Warnes and Ripley, 1987). So, at low data regimes, Gaussian processes trained with type II maximum likelihood can over fit. As more data is collected, the type II maximum likelihood approach is a reasonable approximation and the marginal likelihood will automatically trade off model complexity and data fitting, preventing overfitting (Rasmussen and Williams, 2006, Chapter 5).
The marginal likelihood optimization surface is non-convex and therefore gradient based optimizers will only find local optima meaning the result dependent on initialization (MacKay, 1998).
To overcome this, and reduce overfitting, we use random restarts, along side principled methods of initialization, to fit the same Gaussian process model multiple times, and then select the hyperparameter configuration with the best log marginal likelihood. These regimes, introduced in Appendix D.4, differ slightly for the different models. We use gradient descent to optimize the marginal likelihood each time.
D.4. Hyperparameter Initialization
For all model, unless otherwise states, we initialize the lengthscale randomly as , noise variance randomly as which is equivalent to the noise being between 0% and 10% of the data variance and kernel variance . These settings are standard proactive for Gaussian process regression (Matthews et al., 2017). For the MOGP and AvgGP we did nine random restarts with these settings.
For the LMC we used three different methods for initializing and , with three random restarts for each:
Both and random.In this initialization, we initialize and .
random and . In this initialization and . This initialization was chosen as we thought it would favor solutions with small so it would better fit the linear correlation case, where the test functions are generated as linear combinations of some linear functions.
random and . In this initialization and . We chose this initialization to favor large , which is useful for the uncorrelated test case, as it would encourage the output functions to behave independently of each other.
The random initialisations for helped the initialisations for two reasons: firstly, in the GPflow implementation if is not initialized it defaults to a rank of 1, and secondly by initializing to random values rather than all one value we avoid saddle points on the optimization surface.
For the LVMOGP we used three different initialization procedures, again with three random restarts for each:
Random. In this initialization all hyperparameters and variational parameters were initialized randomly. the means of the latent variables were initialized as .
- GPy. This is the method used in the GPy implementation of the LVMOGP (Dai et al., 2017), that has following three steps:
-
A sparse MOGP is fitted to the data using a set of inducing points which are common to all outputs. The mean predictions of the output function values at these inducing inputs is then calculated:
(23) The sparse MOGP is used is ensure all output functions are observed at the same input locations for the functional PCA, which is necessary when data is observed at different locations on different surfaces. It also serves the purpose of smoothing the data plus the trained lengthscales are used to initialise the lengthscales of the observed dimensions of the LVMOGP. - The mean predictions are then used as inputs to functional PCA. The first eigenvectors and eigenvalues of are calculated and used to project into latent space
where . The relative contributions of each of the eigenvalues is also calculated as:(24) (25) - The latent variables from the functional PCA are used to initialize the latent variables of a Bayesian Gaussian process latent variable model. The lengthscales of the Bayesian Gaussian process latent variable model are initialized to . Once the Bayesian Gaussian process latent variable model is trained, the latent variables and hyperparameters of the Bayesian Gaussian process latent variable model are used to initialize those of the LVMOGP.
-
PCA. In this initialization, the first two steps of the GPy initialization are followed. This means fitting a sparse MOGP to the data and performing principle component analysis (PCA) on the posterior predictions at inducing point locations. The MOGP hyperparameters were then used to initialize the LVMOGP observed lengthscale, kernel variance and noise variance. The output of the PCA was used to initialize the latent variable means and the lengthscale of the latent dimensions. This initialization was chosen as a simplified version of the GPy initialization.
See the github repositories in Appendix E for more details.
In the synthetic experiments, we found the method of initializing the hyperparameters affected the end log marginal likelihood, with no initialization outperforming all others for each model. Therefore, we decided to continue with all initializations for the PCR data experiments. For the PCR data experiments we did 10 random restarts for each initialization, due to the randomness of some of the initializations.
D.5. Latent Dimensions
The LMC and LVMOGP have hyperparameters that need to be set for the number of latent functions and dimensions respectively.
For the LMC, the rank of the similarity matrix needs to be selected. This is equivalent to the number of latent functions (Álvarez et al., 2012). If the rank is too low, the LMC will fail to explain the data well. However if it is too high, the LMC can suffer from overfitting when data is limited. Unlike other hyperparameters, such as the lengthscale or noise variance, there is no continuous way to select this hyperparameter. Therefore, to select this parameter for a given problem, it is necessary to fit multiple LMC models with different ranks and select the best one by comparing the marginal likelihoods or cross validation. In reality, this isn’t always feasible as it is computationally expensive, and data may be limited.
Figure 9 shows results of cross validation experiments introduced in Section 3.2 for the LMC with this latent dimension hyperparameters set to 2 and 10. The test setting was the same as in Section 3.2 and the cross validation was repeated 70 times. The LMC has worse NLPD with more latent functions, most likely because more hyperparameters have been introduced, increasing the chances of overfitting when the dataset isn’t large.
Similarly, for the LVMOGP, the dimensionality of the latent space needs to be selected. However, if a kernel that treats each dimension as independent is used, the LVMOGP can “switch off” unessential dimensions (Titsias and Lawrence, 2010). This is done by making the lengthscales of the unessential dimensions really large, so there is no variation across those dimensions. This means the LVMOGP can automatically reduce the number of dimensions to those that give a good trade off between data fit and model complexity. This effect occurs in the latent dimensions of the drift parameter in Figure 8, where all the points are lined up on a single dimension.
Figure 8:
Latent space of the LVMOGP for the rate and drift. The crosses indicate competitors with probe primers and the dots indicate those with EvaGreen primers. The shaded circles indicate the uncertainty in the latent positions.
Figure 10 shows results of cross validation experiments introduced in Section 3.2 for the LVMOGP with 2 and 10 latent dimensions. Unlike the LMC, the LVMOGP however performs the same as it can switch off unnecessary dimensions for the 10 latent case.
Generally when conducting Bayesian optimization, data is very scarce to begin with, so it is not possible to perform model selection for the rank of the LMC matrix. One option would be to fit many models with different ranks and select the one with the best marginal likelihood, but this is generally too computationally expensive. This is one of the limitations of the LMC. Therefore, for the Bayesian
Figure 9:
Cross validation results for the linear model of coregionalization with similarity matrices B with rank 2 and 10. For each percentage train, 70 different train test splits were used. The LMC with rank 10 matrix has worse NLPD than that with rank 2, suggesting overfitting is occurring.
Figure 10:
Cross validation results for the latent variable multioutput Gaussian process with 2 and 10 dimensional latent spaces. For each percentage train, 70 different train test splits were used. Due to the automatic relevance determining properties of the latent space kernel, there is very little difference in performance.
optimization experiments we used a rank of 2 for the LMC and 2 dimensions for the LVMOGP as we expected these settings to give enough flexibility to transfer information while limiting chances of overfitting.
E. Data and Code Availability
Raw data is available on request from rdm-enquiries@imperial.ac.uk.
Implementation of the methods outlined in this paper requires a basic knowledge of python. We provide two github repositories, which include the methods used and jupyter notebooks demonstrations. The first repository, https://github.com/RSedgwick/TLGPs, contains code for each of the Gaussian process models, the synthetic experiments and some jupyter notebooks demonstrating the use of the models. It also contains instructions for running the code. This repository is application agnostic and could easily be transferred to other use cases.
The second repository, https://github.com/RSedgwick/TL_DOE_4_DNA, is specifically tailored to the competitor use case. This repository contains the code for the cross validation and Bayesian optimization experiments, as well as code for data processing and notebooks for results analysis. This repository is more targeted to our specific use case.
F. Data Summary
Each competitor is defined by its primer-reporter combination. For each of these primer-pair combinations we then have data at different guanine-cytosine content and no. of base pairs combinations. Table 1 gives a summary of the number of unique locations on each of the competitors.
G. Extra Bayesian Optimization Results
The following tables contain extra results for the Bayesian optimization experiments. The first table in each section, Tables 2 and 5, shows counts of the first model to get to the best point on a surface for all competitors and seeds. If two models get to the best point on the same iteration, they are both counted as “winners”. The second table, Tables 3 and 6 shows counts of the models with the lowest cumulative regret for each competitor and seed. The same thing applies if two models have the same cumulative regret. For the single objective optimization, Table 4 shows the average number of iterations for each model to get within tolerance of the target rate (+/− 0.05). For the penalized optimization Table 7 shows the average number of iterations for each model to get either within tolerance of the rate target with no drift penalty, or to the best point (which may have a drift penalty). For some of the runs with the drift penalty, some of the models failed to get to the best point for some surfaces within the experimental budget. In these cases, those surfaces were discarded and the average was taken for the surfaces where all the models had managed to get to the best point within the experimental budget.
G.1. Single Objective Optimization
Extra results for the single objective Bayesian optimization. These results demonstrate that the LVMOGP gets to the best point more often (Table 2) and has has the lowest cumulative regret (Table 3) more often than the other models. The LVMOGP also reaches the best point in the lowest number of iterations for all the learning scenarios (Table 4).
Table 1:
Summary of the amount of data we have for each competitor design surface. Each unique location refers to a unique GC-BP combination.
| Not To Be Optimized | To Be Optimized | ||
|---|---|---|---|
|
| |||
| Primer Reporter Combination | No. Unique Locations | Primer Reporter Combination | No. Unique Locations |
| FP004-RP004-EvaGreen | 28 | FP004-RP004-Probe | 53 |
| FP002-RP002x-Probe | 12 | FP001-RP001x-EvaGreen | 24 |
| FP004-RP004x-Probe | 12 | FP001-RP001x-Probe | 20 |
| FP001-RP001-Probe | 9 | RP001x-FP002-Probe | 19 |
| FP001-RP005-Probe | 8 | FP002-RP002x-EvaGreen | 15 |
| FP004-RP004x-EvaGreen | 8 | FP005-FP001-EvaGreen | 14 |
| FP003-RP008-Probe | 5 | FP004-FP005-Probe | 8 |
| FP006-RP006-Probe | 5 | FP005-FP001-Probe | 8 |
| FP005-RP005-Probe | 5 | FP005-FP004-EvaGreen | 8 |
| FP002-RP002-EvaGreen | 4 | RP002x-FP005-Probe | 8 |
| FP002-RP006-Probe | 4 | RP008x-FP001-EvaGreen | 8 |
| FP057.1.0-RP003x-Probe | 3 | RP008x-FP005-Probe | 8 |
| FP003-RP008x-EvaGreen | 3 | FP001-RP004-EvaGreen | 7 |
| FP003-RP008-EvaGreen | 3 | RP002x-FP004-EvaGreen | 6 |
| FP002-RP002-Probe | 3 | FP002-RP004-EvaGreen | 3 |
| FP001-RP001-EvaGreen | 2 | RP002x-FP002-EvaGreen | 2 |
| FP003-RP003-Probe | 1 | ||
| FP057.1.0-RP003x-EvaGreen | 1 | ||
Table 2:
Table showing counts of the first Gaussian process model to reach the best point on a surface for the single objective Bayesian optimization experiments. The counts are the number of times a Gaussian process model did the best on a competitor for each seed. If two Gaussian process models performed the same for a given instance, they are both counted. This is for 16 competitors and 25 random seeds.
| learning scenario | starting point | MOGP | Avg GP | LMC | LVMOGP |
|---|---|---|---|---|---|
| learning many | center | 124 | 121 | 144 | 255 |
| model’s choice | 107 | 119 | 97 | 147 | |
| one at a time | center | 140 | 140 | 156 | 215 |
| model’s choice | 86 | 118 | 87 | 191 |
Table 3:
Table showing counts of the Gaussian process model with the lowest cumulative regret on a surface for the single objective Bayesian optimization experiments. The counts are the number of times a Gaussian process model did the best on a competitor for each seed. If two Gaussian process models performed the same for a given instance, they are both counted. This is for 16 competitors and 25 random seeds.
| learning scenario | starting point | MOGP | Avg GP | LMC | LVMOGP |
|---|---|---|---|---|---|
| learning many | center | 182 | 80 | 140 | 197 |
| model’s choice | 85 | 94 | 83 | 117 | |
| one at a time | center | 129 | 140 | 131 | 206 |
| model’s choice | 99 | 106 | 87 | 159 |
Table 4:
Table showing the mean number of iterations need for the models to get within tolerance of the target rate (+/− 0.05) for the single objective optimization. This is for 16 competitors and 25 random seeds.
| learning scenario | starting point | MOGP | Avg GP | LMC | LVMOGP |
|---|---|---|---|---|---|
| learning many | center | 3.13 | 3.25 | 3.11 | 2.58 |
| model’s choice | 3.08 | 2.63 | 3.09 | 2.44 | |
| one at a time | center | 2.94 | 3.06 | 2.85 | 2.15 |
| model’s choice | 2.94 | 2.63 | 2.63 | 1.81 |
G.2. Bayesian Optimization with Drift Penalty
Extra results for the Bayesian optimization with a penalty on drift. These results demonstrate that the LVMOGP gets to the best point more often (Table 5) and has has the lowest cumulative regret (Table 6) more often than the other models for most of the learning scenarios. The LVMOGP also reaches the best point in the lowest number of iterations for all the learning scenarios (Table 7).
Table 5:
Table showing counts of the first Gaussian process model to reach the best point on a surface for the penalized Bayesian optimization experiments. The counts are the number of times a Gaussian process model did the best on a competitor for each seed. If two Gaussian process models performed the same for a given instance, they are both counted. This is for 16 competitors and 24 random seeds.
| learning scenario | starting point | MOGP | Avg GP | LMC | LVMOGP |
|---|---|---|---|---|---|
| learning many | center | 142 | 157 | 123 | 165 |
| model’s choice | 89 | 122 | 101 | 111 | |
| one at a time | center | 141 | 137 | 153 | 217 |
| model’s choice | 75 | 102 | 79 | 164 |
Table 6:
Table showing counts of the Gaussian process model that had the lowest cumulative regret on a surface for the penalized Bayesian optimization experiments. The counts are the number of times a Gaussian process model did the best on a competitor for each seed. If two Gaussian process models performed the same for a given instance, they are both counted. This is for 16 competitors and 24 random seeds.
| learning scenario | starting point | MOGP | Avg GP | LMC | LVMOGP |
|---|---|---|---|---|---|
| learning many | center | 180 | 118 | 100 | 163 |
| model’s choice | 85 | 103 | 84 | 111 | |
| one at a time | center | 173 | 118 | 139 | 204 |
| model’s choice | 83 | 70 | 65 | 156 |
G.3. Comparison of Choice of First Point
Table 8 shows the average regret of the first data point chosen by each of the models for each of the learning scenarios, for the single objective case. From this table, it is clear to see the AvgGP and the LVMOGP improve on the regret of the central point, and outperform the random selection of the MOGP and LMC. This demonstrates that having a principled method of selecting the first point is useful for reducing regret.
Table 7:
Table showing the mean number of iterations need for the models to either get within tolerance of the target rate (+/− 0.05) without drift penalty or reach the best point (which may have a penalty) for the penalized optimization. For some runs, one or more of the models would not achieve this within the experimental budget. In these cases, the number of iterations to best point was set to the experimental budget. This is for 16 competitors and 24 random seeds.
| learning scenario | starting point | MOGP | Avg GP | LMC | LVMOGP |
|---|---|---|---|---|---|
| learning many | center | 3.56 | 4.03 | 3.96 | 3.47 |
| model’s choice | 3.78 | 3.00 | 3.66 | 2.63 | |
| one at a time | center | 3.70 | 3.44 | 3.51 | 3.56 |
| model’s choice | 3.63 | 3.06 | 3.46 | 2.52 |
Table 8:
Table of the mean regret of the first data point for each of the learning scenarios for each of the models.
| learning scenario | starting point | MOGP | Avg GP | LMC | LVMOGP |
|---|---|---|---|---|---|
| learning many | center | 0.588 | 0.588 | 0.588 | 0.588 |
| model’s choice | 0.651 | 0.499 | 0.703 | 0.464 | |
| one at a time | center | 0.588 | 0.588 | 0.588 | 0.588 |
| model’s choice | 0.675 | 0.308 | 0.623 | 0.309 |
G.4. Comparison of Number of Experiments
Figure 11 shows box plots of the number of iterations taken by each model to reach the best point for each learning scenario for the experiments in Section 3.3. The distributions here are across the 16 different competitors and 24 random restarts. For the constrained optimization case, for some seeds some of the models didn’t reach the best point for competitor FP004-RP004-Probe within the experimental budget. In these cases we set the number of experiments to the total experimental budget (10 or 20 depending on the scenario).
These results show the LVMOGP on average requires less experiments than the other models to select the best point, demonstrating how this approach can reduce the number of experiments needed to optimize the competitors. All the models have some outliers, this could be due to being unlucky with the initial data it receives or sub optimal hyperparameter optimization.
G.5. Initial Surfaces for Bayesian Optimization
In the Bayesian optimization experiments outlined in section 3.3, we start with the two competitor surfaces with the most data fully observed, FP004-RP004-EvaGreen and FP002-RP002x-Probe. This is so the Gaussian process models have the chance to learn a reasonable prediction before the iterative Bayesian optimization process. It also replicates the case where we already have limited data on a couple of competitors and want to optimize more competitors.
To assess whether the choice of initial surfaces affects the results in our experiments, we investigate two other combinations of two initial surfaces. We ran the learning many scenario of Bayesian optimization with 20 seeds for both the single objective and penalized optimization. We chose the learning many as we expect the initial surfaces to have more impact in this scenario than in the one at a time scenario. The results of these experiments are plotted in Figures 12 and 13. In these results, it is clear the initial surfaces make some difference to the models’ performances, although the LVMOGP still performs the best most of the time. The differences in performance are most likely due to different amounts of initial data (depending on how much data we have for the initial surfaces) and differences in the similarity of
Figure 11:
Box plots of the number of iterations needed to reach within 0.05 tolerance of the best point for all learning scenarios for the Bayesian optimization experiments outlined in Section 3.3.
the initial surfaces to the surfaces to be learned.
H. Funding Information
This work was supported by the UKRI CDT in AI for Healthcare Grant No. EP/S023283/1, UK Research and Innovation Grant No. EP/P016871/1, the BASF / RAEng Research Chair in Data-Driven Optimization, the US NIH Grant No. 5F32GM131594, the EPSRC IRC Next Steps Plus grant No. EP/R018707/1 and the RAEng Chair in Emerging Technologies award No. CiET202194. For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.
I. Author Contributions
J.G. conducted lab experiments. R.S. developed code, conducted code experiments and wrote the manuscript. R.M. and M.v.W. supervised the project, specifically giving guidance on the machine learning aspects. J.G. and M.S. also supervised the project, specifically giving guidance on the bioengineering aspects.
Figure 12:
Results of single objective Bayesian optimization experiments for the learning many scenario with different initial two competitors. This experiment was run 20 times with different seeds.
Figure 13:
Results of penalized objective Bayesian optimization experiments for the learning many scenario with different initial two competitors. This experiment was run 20 times with different seeds.
References
- Badeau B. A., Comerford M. P., Arakawa C. K., Shadish J. A. and DeForest C. A. (2018) Engineered modular biomaterial logic gates for environmentally triggered therapeutic delivery. Nature chemistry, 10, 251–258. URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5822735/. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bader J., Narayanan H., Arosio P. and Leroux J.-C. (2023) Improving extracellular vesicles production through a Bayesian optimization-based experimental design. European Journal of Pharmaceutics and Biopharmaceutics, 182, 103–114.URL: https://www.sciencedirect.com/science/article/pii/S0939641122002983. [DOI] [PubMed] [Google Scholar]
- Blakney A. K., McKay P. F., Ibarzo Yus B., Hunter J. E., Dex E. A. and Shattock R. J. (2019) The Skin You Are In: Design-of-Experiments Optimization of Lipid Nanoparticle Self-Amplifying RNA Formulations in Human Skin Explants. ACS Nano, 13, 5920–5930. URL: https://pubs.acs.org/doi/10.1021/acsnano.9b01774. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bonilla E. V., Chai K. and Williams C. (2007) Multi-task Gaussian Process Prediction. Advances in Neural Information Processing Systems, 20. URL: https://proceedings.neurips.cc/paper/2007/hash/66368270ffd51418ec58bd793f2d9b1b-Abstract.html.
- Campbell K. and Yau C. (2015) Bayesian Gaussian Process Latent Variable Models for pseudotime inference in single-cell RNA-seq data.
- Cao B., Pan S. J., Zhang Y., Yeung D.-Y. and Yang Q. (2010) Adaptive Transfer Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 24. URL: https://ojs.aaai.org/index.php/AAAI/article/view/7682. Number: 1. [Google Scholar]
- Carbonell P., Jervis A. J., Robinson C. J., Yan C., Dunstan M., Swainston N., Vinaixa M., Hollywood K. A., Currin A., Rattray N. J. W., Taylor S., Spiess R., Sung R., Williams A. R., Fellows D., Stanford N. J., Mulherin P., Le Feuvre R., Barran P., Goodacre R., Turner N. J., Goble C., Chen G. G., Kell D. B., Micklefield J., Breitling R., Takano E., Faulon J.-L. and Scrutton N. S. (2018) An automated Design-Build-Test-Learn pipeline for enhanced microbial production of fine chemicals. Communications Biology, 1, 1–10. URL: https://www.nature.com/articles/s42003018-0076-9. Number: 1 Publisher: Nature Publishing Group. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cox D. R. and Reid N. (2000) The Theory of the Design of Experiments. CRC Press. [Google Scholar]
- Dai Z., Álvarez M. and Lawrence N. (2017) Efficient Modeling of Latent Information in Super-vised Learning using Gaussian Processes. In Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. URL: https://proceedings.neurips.cc/paper/2017/file/1680e9fa7b4dd5d62ece800239bb53bd-Paper.pdf.
- Damianou A. and Lawrence N. D. (2013) Deep gaussian processes. In Artificial intelligence and statistics, 207–215. PMLR. [Google Scholar]
- Degerman M., Jakobsson N. and Nilsson B. (2006) Constrained optimization of a preparative ion-exchange step for antibody purification. Journal of Chromatography A, 1113, 92–100. URL: https://www.sciencedirect.com/science/article/pii/S0021967306003013. [DOI] [PubMed] [Google Scholar]
- Deng F., Pan J., Liu Z., Zeng L. and Chen J. (2023) Programmable DNA biocomputing circuits for rapid and intelligent screening of SARS-CoV-2 variants. Biosensors and Bioelectronics, 223, 115025. URL: https://www.sciencedirect.com/science/article/pii/S095656632201065X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Droettboom M., Hunter J., Firing E., Caswell T. A., Elson P., Dale D., Lee J.-J., McDougall D., Root B., Straw A., Seppänen J. K., Nielsen J. H., May R., Varoquaux, Yu T. S., Moad C., Gohlke C., Würtz P., Hisch T., Silvester S., Ivanov P., Whitaker J., Cimarron, Hobson P., Giuca M., Thomas I., mmetz bn, Evans J., dhyams and NNemec (2015) matplotlib: v1.4.3. URL: https://zenodo.org/record/15423. [Google Scholar]
- Ebrahimi S. B. and Samanta D. (2023) Engineering protein-based therapeutics through structural and chemical design. Nature Communications, 14, 2411. URL: https://www.nature.com/articles/s41467-023-38039-x. Number: 1 Publisher: Nature Publishing Group. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fellermann H., Shirt-Ediss B., Kozyra J., Linsley M., Lendrem D., Isaacs J. and Howard T. (2019) Design of experiments and the virtual PCR simulator: An online game for pharmaceutical scientists and biotechnologists. Pharmaceutical Statistics, 18, 402–406. URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6767770/. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gamble C., Bryant D., Carrieri D., Bixby E., Dang J., Marshall J., Doughty D., Colwell L., Berndl M., Roberts J. and Frumkin M. (2021) Machine Learning Optimization of Photosynthetic Microbe Cultivation and Recombinant Protein Production. preprint, Bioengineering. URL: http://biorxiv.org/lookup/doi/10.1101/2021.08.06.453272.
- Garnett R. (2023) Bayesian Optimization. 127–129. Cambridge University Press. [Google Scholar]
- Gilman J., Walls L., Bandiera L. and Menolascina F. (2021) Statistical Design of Experiments for Synthetic Biology. ACS Synthetic Biology, 10, 1–18. URL: 10.1021/acssynbio. 0c00385. Publisher: American Chemical Society. [DOI] [PubMed] [Google Scholar]
- Goan E. and Fookes C. (2020) Bayesian neural networks: An introduction and survey. Case Studies in Applied Bayesian Data Science: CIRM Jean-Morlet Chair, Fall 2018, 45–87. [Google Scholar]
- Goertz J. P., Sedgwick R., Smith F., Kaforou M., Wright V. J., Herberg J. A., Kote-Jarai Z., Eeles R., Levin M., Misener R., Wilk M.v. d. and Stevens M. M. (2023) Competitive Amplification Networks enable molecular pattern recognition with PCR. URL: https://www.biorxiv.org/content/10.1101/2023.06.29.546934v1.
- González J., Longworth J., James D. C. and Lawrence N. D. (2015) Bayesian Optimization for Synthetic Gene Design . arXiv:1505.01627 [stat]. URL: http://arxiv.org/abs/1505.01627. ArXiv: 1505.01627.
- HamediRad M., Chao R., Weisberg S., Lian J., Sinha S. and Zhao H. (2019) Towards a fully automated algorithm driven platform for biosystems design. Nature Communications, 10, 5150. URL:https://www.nature.com/articles/s41467-019-13189-z. Number: 1 Publisher: Nature Publishing Group. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harris C. R., Millman K. J., van der Walt S. J., Gommers R., Virtanen P., Cournapeau D., Wieser E., Taylor J., Berg S., Smith N. J., Kern R., Picus M., Hoyer S., van Kerkwijk M. H., Brett M., Haldane A., del Río J. F., Wiebe M., Peterson P., Gérard-Marchant P., Sheppard K., Reddy T., Weckesser W., Abbasi H., Gohlke C. and Oliphant T. E. (2020) Array programming with NumPy. Nature, 585, 357–362. URL: https://www.nature.com/articles/s41586-020-2649-2. Number: 7825 Publisher: Nature Publishing Group. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hie B., Bryson B. D. and Berger B. (2020) Leveraging Uncertainty in Machine Learning Accelerates Biological Discovery and Design. Cell Systems, 11, 461–477.e9. URL: https://www.cell.com/cell-systems/abstract/S2405-4712(20)30364-1. Publisher: Elsevier. [DOI] [PubMed] [Google Scholar]
- Hu R., Fu L., Chen Y., Chen J., Qiao Y. and Si T. (2023) Protein engineering via Bayesian optimization-guided evolutionary algorithm and robotic experiments. Briefings in Bioinformatics, 24, bbac570. URL: 10.1093/bib/bbac570. [DOI] [PubMed] [Google Scholar]
- Hua Y., Ma J., Li D. and Wang R. (2022) DNA-Based Biosensors for the Biochemical Analysis: A Review. Biosensors, 12, 183. URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8945906/. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hutter C., von Stosch M., Cruz Bournazou M. N. and Butté A. (2021) Knowledge transfer across cell lines using hybrid Gaussian process models with entity embedding vectors. Biotechnology and Bioengineering, 118, 4389–4401. URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/bit.27907. [DOI] [PubMed] [Google Scholar]
- Jones D. R., Schonlau M. and Welch W. J. (1998) Efficient Global Optimization of Expensive Black-Box Functions. Journal of Global Optimization, 13, 455–492. URL: 10.1023/A:1008306431147. [DOI] [Google Scholar]
- Khan A., Cowen-Rivers A. I., Grosnit A., Deik D.-G.-X., Robert P. A., Greiff V., Smorodina E., Rawat P., Akbar R., Dreczkowski K., Tutunov R., Bou-Ammar D., Wang J., Storkey A. and Bou-Ammar H. (2023) Toward real-world automated antibody design with combinatorial Bayesian optimization. Cell Reports Methods, 3, 100374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kreutz C. and Timmer J. (2009) Systems biology: experimental design.The FEBS journal, 276, 923–942. [DOI] [PubMed] [Google Scholar]
- Lalchand V. and Rasmussen C. E. (2020) Approximate inference for fully bayesian gaussian process regression. In Symposium on Advances in Approximate Bayesian Inference, 1–12. PMLR. [Google Scholar]
- Land K. J., Boeras D. I., Chen X.-S., Ramsay A. R. and Peeling R. W. (2019) REASSURED diagnostics to inform disease control strategies, strengthen health systems and improve patient outcomes. Nature Microbiology, 4, 46–54. URL: https://www.nature.com/articles/s41564-018-0295-3. Number: 1 Publisher: Nature Publishing Group. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lopez R., Wang R. and Seelig G. (2018) A molecular multi-gene classifier for disease diagnostics. Nature Chemistry, 10, 746–754. URL: https://www.nature.com/articles/s41557-018-0056-1. Number: 7 Publisher: Nature Publishing Group. [DOI] [PubMed] [Google Scholar]
- Lv H., Li Q., Shi J., Fan C. and Wang F. (2021) Biocomputing Based on DNA Strand Displacement Reactions. ChemPhysChem, 22, 1151–1166. URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/cphc.202100140. [DOI] [PubMed] [Google Scholar]
- Lyu W., Xue P., Yang F., Yan C., Hong Z., Zeng X. and Zhou D. (2018) An Efficient Bayesian Optimization Approach for Automated Optimization of Analog Circuits. IEEE Transactions on Circuits and Systems I: Regular Papers, 65, 1954–1967. Conference Name: IEEE Transactions on Circuits and Systems I: Regular Papers. [Google Scholar]
- MacKay D. J. (1999) Comparison of approximate methods for handling hyperparameters. Neural computation, 11, 1035–1068. [Google Scholar]
- MacKay D. J. C. (1998) Introduction to gaussian processes. In NATO ASI series. Series F : computer and system sciences, 133–165. ISSN: 0258–1248. [Google Scholar]
- Matthews A. G., Van Der Wilk M., Nickson T., Fujii K., Boukouvalas A., León-Villagrá P., Ghahramani Z. and Hensman J. (2017) GPflow: a Gaussian process library using tensorflow. The Journal of Machine Learning Research, 18, 1299–1304. [Google Scholar]
- Mehrian M., Guyot Y., Papantoniou I., Olofsson S., Sonnaert M., Misener R. and Geris L. (2018) Maximizing neotissue growth kinetics in a perfusion bioreactor: An in silico strategy using model reduction and Bayesian optimization. Biotechnology and Bioengineering, 115, 617–629. URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/bit.26500. [DOI] [PubMed] [Google Scholar]
- Mowbray M., Savage T., Wu C., Song Z., Cho B. A., Del Rio-Chanona E. A. and Zhang D. (2021) Machine learning for biochemical engineering: A review. Biochemical Engineering Journal, 172, 108054. URL: https://www.sciencedirect.com/science/article/pii/S1369703X21001303. [Google Scholar]
- Narayanan H., Dingfelder F., Condado Morales I., Patel B., Heding K. E., Bjelke J. R., Egebjerg T., Butté A., Sokolov M., Lorenzen N. and Arosio P. (2021) Design of Biopharmaceutical Formulations Accelerated by Machine Learning. Molecular Pharmaceutics, 18, 3843–3853. URL: https://pubs.acs.org/doi/10.1021/acs.molpharmaceut.1c00469. [DOI] [PubMed] [Google Scholar]
- Narayanan H., Luna M. F., von Stosch M., Cruz Bournazou M. N., Polotti G., Morbidelli M., Butté A. and Sokolov M. (2020) Bioprocessing in the Digital Age: The Role of Process Models.Biotechnology Journal, 15, 1900172.URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/biot.201900172. [DOI] [PubMed] [Google Scholar]
- Narayanan H., von Stosch M., Feidl F., Sokolov M., Morbidelli M. and Butté A. (2023) Hybrid modeling for biopharmaceutical processes: advantages, opportunities, and implementation. Frontiers in Chemical Engineering, 5.URL: https://www.frontiersin.org/articles/10.3389/fceng.2023.1157889. [Google Scholar]
- Papaneophytou C. (2019) Design of Experiments As a Tool for Optimization in Recombinant Protein Biotechnology: From Constructs to Crystals. Molecular Biotechnology, 61, 873–891. URL: 10.1007/s12033-019-00218-x. [DOI] [PubMed] [Google Scholar]
- Politis S. N., Colombo P., Colombo G. and Rekkas D. M. (2017) Design of experiments (DoE) in pharmaceutical development. Drug Development and Industrial Pharmacy, 43, 889–901. URL: 10.1080/03639045.2017.1291672. [DOI] [PubMed] [Google Scholar]
- Qian L., Winfree E. and Bruck J. (2011) Neural network computation with DNA strand displacement cascades. Nature, 475, 368–372. URL: https://www.nature.com/articles/nature10262. [DOI] [PubMed] [Google Scholar]
- Rasmussen C. E. and Williams C. K. I. (2006) Gaussian processes for machine learning. Adaptive computation and machine learning. Cambridge, Mass: MIT Press. [Google Scholar]
- Romero P. A., Krause A. and Arnold F. H. (2013) Navigating the protein fitness landscape with Gaussian processes. Proceedings of the National Academy of Sciences, 110, E193–E201. URL: https://www.pnas.org/content/110/3/E193. Publisher: National Academy of Sciences Section: PNAS Plus. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosa S. S., Nunes D., Antunes L., Prazeres D. M. F., Marques M. P. C. and Azevedo A. M. (2022) Maximizing mRNA vaccine production with Bayesian optimization. Biotechnology and Bioengineering, 119, 3127–3139. URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/bit.28216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Salvatier J., Wiecki T. V. and Fonnesbeck C. (2016) Probabilistic programming in Python using PyMC3. PeerJ Computer Science, 2, e55. URL: https://peerj.com/articles/cs-55. Publisher: PeerJ Inc. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schonlau M., Welch W. J. and Jones D. R. (1998) Global versus local search in constrained optimization of computer models. In New developments and applications in experimental design, vol. 34, 11–26. Institute of Mathematical Statistics. URL: https://projecteuclid.org/ebooks/institute-of-mathematical-statistics-lecture-notes-monograph-series/New-developmentsand-applications-in-experimental-design/chapter/Global-versus-local-search-in-constrained-optimization-of-computer-models/10.1214/lnms/1215456182.
- Schweidtmann A. M., Esche E., Fischer A., Kloft M., Repke J.-U., Sager S. and Mitsos A. (2021) Machine Learning in Chemical Engineering: A Perspective. Chemie Ingenieur Technik, 93, 2029–2039. URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/cite.202100083. [Google Scholar]
- Sedgwick R., Goertz J., Stevens M., Misener R. and van der Wilk M. (2020) Design of Experiments for Verifying Biomolecular Networks. arXiv:2011.10575 [cs, q-bio, stat]. URL: http://arxiv.org/abs/2011.10575. ArXiv: 2011.10575.
- Shahriari B., Swersky K., Wang Z., Adams R. P. and de Freitas N. (2016) Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proceedings of the IEEE, 104, 148–175. [Google Scholar]
- Sharpe C., Seepersad C. C., Watts S. and Tortorelli D. (2018) Design of Mechanical Metamaterials via Constrained Bayesian Optimization. In Volume 2A: 44th Design Automation Conference, V02AT03A029. Quebec City, Quebec, Canada: American Society of Mechanical Engineers. URL: https://asmedigitalcollection.asme.org/IDETC-CIE/proceedings/IDETC-CIE2018/51753/Quebec%20City,%20Quebec,%20Canada/273625. [Google Scholar]
- Siuti P., Yazbek J. and Lu T. K. (2013) Synthetic circuits integrating logic and memory in living cells. Nature Biotechnology, 31, 448–452. URL: https://www.nature.com/articles/nbt.2510. [DOI] [PubMed] [Google Scholar]
- Snoek J., Larochelle H. and Adams R. P. (2012) Practical Bayesian Optimization of Machine Learning Algorithms. In Advances in Neural Information Processing Systems, vol. 25. Curran Associates, Inc. URL: https://proceedings.neurips.cc/paper/2012/hash/05311655a15b75fab86956663e1819cd-Abstract.html.
- Swersky K., Snoek J. and Adams R. P. (2013) Multi-Task Bayesian Optimization. In Advances in Neural Information Processing Systems 26 (eds. Burges C. J. C, Bottou L, Welling M., Ghahramani Z. and Weinberger K. Q.), 2004–2012. URL: http://papers.nips.cc/paper/5086-multi-task-bayesian-optimization.pdf. [Google Scholar]
- Taylor C. J., Felton K. C., Wigh D., Jeraal M. I., Grainger R., Chessari G., Johnson C. N. and Lapkin A. A. (2023) Accelerated Chemical Reaction Optimization Using Multi-Task Learning. ACS Central Science, 9, 957–968. URL: https://pubs.acs.org/doi/10.1021/acscentsci.3c00050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The pandas development team, T. p. d. (2023) pandas-dev/pandas: Pandas. URL: https://zenodo.org/record/7979740.
- Tighineanu P., Skubch K., Baireuther P., Reiss A., Berkenkamp F. and Vinogradska J. (2022) Transfer Learning with Gaussian Processes for Bayesian Optimization. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, 6152–6181. PMLR. URL: https://proceedings.mlr.press/v151/tighineanu22a.html. ISSN: 2640–3498. [Google Scholar]
- Titsias M. (2009) Variational Learning of Inducing Variables in Sparse Gaussian Processes. In Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, 567–574. PMLR. URL: https://proceedings.mlr.press/v5/titsias09a.html. ISSN: 1938–7228. [Google Scholar]
- Titsias M. and Lawrence N. D. (2010) Bayesian Gaussian Process Latent Variable Model. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 844–851. JMLR Workshop and Conference Proceedings. URL: https://proceedings.mlr.press/v9/titsias10a. html. ISSN: 1938–7228. [Google Scholar]
- Uhrenholt A. K. and Jensen B. S. (2019) Efficient Bayesian Optimization for Target Vector Estimation. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (eds. Chaudhuri Kand Sugiyama M), vol. 89 of Proceedings of Machine Learning Research, 2661–2670. PMLR. URL: https://proceedings.mlr.press/v89/uhrenholt19a.html. [Google Scholar]
- Virtanen P., Gommers R., Oliphant T. E., Haberland M., Reddy T., Cournapeau D., Burovski E., Peterson P., Weckesser W., Bright J., van der Walt S. J., Brett M., Wilson J., Millman K. J., Mayorov N., Nelson A. R. J., Jones E., Kern R., Larson E., Carey C. J., Polat I., Feng Y., Moore E. W., VanderPlas J., Laxalde D., Perktold J., Cimrman R., Henriksen I., Quintero E. A., Harris C. R., Archibald A. M., Ribeiro A. H., Pedregosa F. and van Mulbregt P. (2020) SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods, 17, 261–272. URL: https://www.nature.com/articles/s41592-019-0686-2. Number: 3 Publisher: Nature Publishing Group. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wadle S., Lehnert M., Rubenwolf S., Zengerle R. and von Stetten F. (2016) Real-time PCR probe optimization using design of experiments approach. Biomolecular Detection and Quantification, 7, 1–8. URL: https://www.sciencedirect.com/science/article/pii/S2214753515300139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang X., Jin Y., Schmitt S. and Olhofer M. (2023) Recent advances in bayesian optimization. 55, 287:1–287:36. URL: https://dl.acm.org/doi/10.1145/3582078. [Google Scholar]
- Warnes J. J. and Ripley B. D. (1987) Problems with likelihood estimation of covariance functions of spatial gaussian processes. Biometrika, 74, 640–642. [Google Scholar]
- Zadeh J. N., Steenberg C. D., Bois J. S., Wolfe B. R., Pierce M. B., Khan A. R., Dirks R. M. and Pierce N. A. (2011) NUPACK: Analysis and design of nucleic acid systems. Journal of Computational Chemistry, 32, 170–173. URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.21596. [DOI] [PubMed] [Google Scholar]
- Zhang Y., Tao S., Chen W. and Apley D. W. (2020) A Latent Variable Approach to Gaussian Process Modeling with Qualitative and Quantitative Factors. Technometrics, 62, 291–302. URL: https://www.tandfonline.com/doi/full/10.1080/00401706.2019.1638834. [Google Scholar]
- Zhuang F., Qi Z., Duan K., Xi D., Zhu Y., Zhu H., Xiong H. and He Q. (2021) A Comprehensive Survey on Transfer Learning. Proceedings of the IEEE, 109, 43–76. Conference Name: Proceedings of the IEEE. [Google Scholar]
- Álvarez M. A., Rosasco L. and Lawrence N. D. (2012) Kernels for Vector-Valued Functions: A Review. Foundations and Trends® in Machine Learning, 4, 195–266. URL: https://www.nowpublishers.com/article/Details/MAL-036. Publisher: Now Publishers, Inc. [Google Scholar]













