Skip to main content
The Journal of Chemical Physics logoLink to The Journal of Chemical Physics
. 2022 Dec 22;157(24):244901. doi: 10.1063/5.0130407

Optimization of non-equilibrium self-assembly protocols using Markov state models

Anthony Trubiano 1,a), Michael F Hagan 1,a)
PMCID: PMC9788858  PMID: 36586982

Abstract

The promise of self-assembly to enable the bottom-up formation of materials with prescribed architectures and functions has driven intensive efforts to uncover rational design principles for maximizing the yield of a target structure. Yet, despite many successful examples of self-assembly, ensuring kinetic accessibility of the target structure remains an unsolved problem in many systems. In particular, long-lived kinetic traps can result in assembly times that vastly exceed experimentally accessible timescales. One proposed solution is to design non-equilibrium assembly protocols in which system parameters change over time to avoid such kinetic traps. Here, we develop a framework to combine Markov state model (MSM) analysis with optimal control theory to compute a time-dependent protocol that maximizes the yield of the target structure at a finite time. We present an adjoint-based gradient descent method that, in conjunction with MSMs for a system as a function of its control parameters, enables efficiently optimizing the assembly protocol. We also describe an interpolation approach to significantly reduce the number of simulations required to construct the MSMs. We demonstrate our approach with two examples; a simple semi-analytic model for the folding of a polymer of colloidal particles, and a more complex model for capsid assembly. Our results show that optimizing time-dependent protocols can achieve significant improvements in the yields of selected structures, including equilibrium free energy minima, long-lived metastable structures, and transient states.

I. INTRODUCTION

Designing building blocks that are pre-programmed to self-assemble into a target structure has enabled the creation of microscopic and nano-scale materials with desirable properties and promising applications.1–7 However, achieving high yields and selective assembly of the target structures remains a critical, unsolved problem in most systems. Thus, there has been intense research aimed at discovering the governing principles of self-assembly that would indicate how to optimize the yield of complex targets, particularly in multifarious systems that are capable of assembling different target structures depending on the experimental conditions. Many of these theoretical studies investigate self-assembly within the framework of equilibrium statistical mechanics, in which the thermodynamic stability of a desired target state is maximized. Successful self-assembly has been achieved in various model systems by optimizing, according to this criterion, the subunit concentrations,8 interaction strengths,9,10 particle shape,11,12 and bond specificity.13

Despite the successes of this approach, it suffers from a fundamental limitation: the thermodynamic stability of a structure does not guarantee its kinetic accessibility in the finite timescales available to experiments, due to the presence of long-lived intermediates along self-assembly pathways.14–19 Finite-time assembly yields depend on a competition between thermodynamic and kinetic effects—thermodynamic stability of a target structure and rapid nucleation are favored by strong interactions, whereas correction of misassembled subunits requires weak interactions. Identifying the optimal trade-off between these requirements can be difficult, both computationally and experimentally.20

As a result, a different route to achieving self-assembly has been employed, in which the system is driven out of equilibrium by using a time-varying protocol for system parameters.21–25 Here, we distinguish between an “equilibrium” self-assembly process, in which the system starts from an out-of-equilibrium initial condition and assembly is driven entirely by descending free energy gradients, and a non-equilibrium process, in which energy is input/extracted from the system by externally varying parameters that control assembly driving forces. For example, there is a long history of using temperature protocols with slow cooling or cycles of heating and cooling to enhance control over crystal size distribution in crystallization experiments.26,27 More recently, heating/cooling cycles have been used to selectively assemble a structure distinct from the global free-energy minimum in lattice simulations of a multi-component mixture.28 More broadly, there are diverse experimental systems for which time-dependent assembly protocols can and have been applied: strand displacement reactions can tune interaction strengths between DNA-coated colloids,29,30 light-activated interactions can be tuned via spatiotemporal intensity protocols,31 and system properties such as temperature, pressure, and concentrations can be controlled within microfluidic devices.32–34

While the technology to implement such protocols is available, the field lacks an efficient framework to rationally design and optimize them. “On-the-fly” methods,35 such as the statistical physics design engine,36 and machine learning approaches, such as the automatic differentiation of molecular dynamics trajectories,37 have been successful for select systems, but the high computational cost of sampling a large parameter space under experimentally relevant conditions is prohibitive for most systems. To bridge this gap, we seek to develop a method to efficiently compute optimal time-dependent protocols to maximize the yield of a chosen target state, even for challenging systems such as those exhibiting long assembly timescales, kinetic trapping, or competing metastable states.

Our approach relies on the construction of Markov state models (MSMs).38,39 MSMs are a powerful tool for coarse-graining the dynamics of complex systems into a reduced-order form that is tractable to analysis and allows characterizing the system dynamics on timescales that are orders of magnitude longer than those accessible to straightforward simulations.38,40–43 We show that, by taking advantage of properties satisfied by general MSMs,44,45 we can analytically and efficiently evaluate derivatives of state probabilities with respect to the tunable system parameters for use in gradient-based optimization. Although our approach has features in common with previously developed methods to compute feedback control policies for self-assembly with real-time system sensors (e.g., Refs. 46–49), it avoids the potentially expensive dynamic programming calculations in those methods by directly considering the final time target state probability in the objective function. We first demonstrate the method on a semi-analytic Markov model for the assembly of a short polymer of six colloidal particles in two dimensions. For this system, we can optimize the time-dependent interactions between different particle types to achieve selective, high-yield assembly of multiple structures, including the stable equilibrium structure, metastable rigid structures, and floppy structures that are only transient for most constant parameter sets. Our results pose an interesting choice of tradeoffs for experimental systems; we show that tuning a time-dependent protocol and increasing subunit complexity (i.e., the number of distinct subunit types) can have similar results on yield, but one of these choices may be easier to implement experimentally.

To investigate generality, we also demonstrate our algorithm on a very different system—a two-parameter model for capsid assembly on a spherical nanoparticle. For this example, there is no analytical form for the transition matrix. We show how simulations and interpolation can be used to make such complex problems suitable for our optimization algorithm. Combining our algorithm with the use of radial basis function interpolation within parameter space, we develop a framework to dramatically reduce the computational requirements for protocol optimization. The method is also robust and reusable in comparison to existing methods; simulations do not need to be rerun to change the initial or target states in the optimization, as would be the case for “on-the-fly” methods. Using this procedure, we compute time-dependent protocols to assemble a target state that is highly kinetically inaccessible; there is ∼60% gap between the estimated equilibrium and dynamical yields. For the timescales we are able to simulate by brute force dynamics, our computed time-dependent protocols increase the target yield by greater than twofold over the best constant protocol in the same amount of assembly time. The MSMs also allow for the probing of longer timescales, for which we predict optimal time-dependent protocols that achieve yields within 1% of equilibrium in orders of magnitude less time than required to approach equilibrium under a constant protocol.

Importantly, the method is generally applicable to any system that can be approximated by an MSM and optimized by an objective function involving state probabilities, including multifarious50–55 and reconfigurable assembly systems.56–65 Moreover, it can be used as a highly efficient parameter estimation tool for both constant and time-dependent protocols.

II. METHODS

A. Theoretical setup

We begin by assuming the process of interest is described by a Markov state model over the discrete state space, S. For a given lag time, τ, we define a temporal discretization with tn = for n = 0, …, N. The time-dependent protocol will be piece-wise constant, represented as {θn}n=0N1, where θn is a constant parameter value from time tn to tn+1. The transition matrices are then given by Pn = P(θn), which propagate the state probabilities from tn to tn+1. Extension to multiple control parameters, θn, as we use in the second example below, is straightforward.

Let the row vector pn denote the probability distribution over the system states at time tn. The evolution of this probability is governed by the forward Kolmogorov equation,

pn+1=pnP(θn),p0=p0. (1)

For a set of target states, BS, we are interested in optimizing PBN=iBpiN, subject to the constraint that the protocol does not change too rapidly. To this end, we maximize the objective function

Φ[θ]=PBNλ2n=0N1θn+1θnτ2, (2)

where the second term is a smoothing penalty function whose strength is controlled by λ > 0. We will report this smoothing parameter as λ* = λ/(2), which is normalized with respect to the timescales of the system. It is straightforward to add additional terms to Eq. (2), for example, to limit the magnitude of the control parameter or maximize the speed of assembly.

B. Protocol optimization

We compute the gradient of PBN via an adjoint method.66,67 The adjoint equation is the backward Kolmogorov equation,

Fn=P(θn)Fn+1,FN=1B, (3)

where Fn is the adjoint variable at time tn and 1B is the indicator vector for the set B. This equation is prescribed at a final condition and solved backward in time. By solving Eqs. (1) and (3) for all time, the gradient components can be computed (see the supplementary material for details) as

PBNθk=pkPkθkFk+1. (4)

The derivative of the penalty term with respect to θk can be interpreted as a discretization of the second time derivative of the protocol. The gradient descent update step can then be seen as solving a discretized diffusion equation, where the gradient of PBN acts as a source term. For stability reasons, we solve this equation with the IMEX68 scheme

θkj+1=θkj+hPBN,jθk+λhτ2θk+1j+12θkj+1+θk1j+1, (5)

where j indices the iteration number, PBN,j=PBN[θj], and h is a step-size to be chosen. See the supplementary material for details.

C. Model systems

We demonstrate our optimization algorithm on two systems, which involve different levels of complexity to construct the MSM. The first example is a simple model for short polymer chains constructed from colloidal particles with programmable interactions, which enables the analytical construction of an MSM. The second example is a model for viral capsid assembly in which two types of conical subunits assemble on the surface of a spherical nanoparticle. In this case, we construct an MSM by estimating transition rate matrix elements from an ensemble of short, unbiased simulations.

1. Colloidal chain folding

The first model describes the folding of an initially linear polymer made up of six colloidal particles in two dimensions. The folding problem has many features in common with assembly, including analogous thermodynamic and kinetic effects, and provides a simple illustration of the optimal control method. The interactions between each particle in the chain can be programmed, for example, by coating the surface of the particles with strands of DNA.69 Furthermore, these interactions can be varied in time by modulating the melting temperatures of different strand types70 and applying a temperature protocol. Depending on the choice of interaction strengths, the system can fold into one of three ground states as well as a number of floppy states, which have been characterized both experimentally71,72 and via theory and simulation.20 It has been shown for this system, as well as many others,9,73–75 that increasing the number of subunit types allows for more efficient assembly of a target structure. However, increasing the number of subunit types generally makes systems more susceptible to kinetic traps and thus more sensitive to parameter variations. Furthermore, additional species require greater costs in material development and synthesis. Here we aim to investigate how time-dependent interaction strengths can boost folding yields as well as identify tradeoffs between subunit complexity (number of different subunit types) and protocol complexity.

To study the minimal subunit complexity case that allows for multiple free energy minima, we allow for only two types of particles, which we will distinguish as red and blue. We assume the initial chain has an alternating ordering of these particles, i.e., red, blue, red, blue, and so on. Figure 1(a) shows this initial chain as well as five potential folded structures. Structures 1 through 3 are the rigid ground states for this system, while structures 4 and 5 are floppy structures that are typically folding intermediates but can be stabilized by setting some of the interaction strengths to zero. Figure 1(b) shows histograms of the yields of each of these structures under a number of parameter sets, which consist of fixed values for each of the three interaction strengths; ERR, ERB, and EBB, which are the well depths for a short-ranged Morse potential. We denote parameter set i as the one that maximizes the finite-time yield of structure i. We find that relatively large yields are already possible for most structures using fixed protocols, except for structure 3, the triangle. While it is possible to further boost yields of some of these structures by changing the particle type distribution, we investigate what can be achieved using this fixed ordering along with time-dependent protocols.

FIG. 1.

FIG. 1.

Visualization and statistics for the colloidal polymer system. (a) A depiction of the initial colloidal chain with alternating red-blue particle types and a fixed backbone (shown in white), as well as example configurations of five states for which we optimize yields. Note that only a single representative permutation is shown for each state, but reported yields are combined over all consistent permutations. (b) Histograms of the yield for each of the five states in (a) (color-coded) for four sets of fixed parameter values. The number in the parameter set label indicates which state’s yield is being maximized. Yields are computed by averaging over 600 Brownian-dynamics trajectories with the given parameter values. The final time is Tf = 5 for state 1, Tf = 7.5 for state 3, and Tf = 10 for the others, reported as a non-dimensional simulation time. (c) Same as (b), but the parameter sets are now the optimal time-dependent protocols.

The model we use to construct an MSM for this system was previously developed to study the equilibrium folding of colloidal chains.20 To summarize the model, a state space is defined by enumerating all possible adjacency matrices describing bonds between the six particles. For each adjacency matrix, the rate of forming a new bond and the probability distribution for which bond forms first are estimated using Brownian Dynamics simulations. This specifies all the forward rates of a rate matrix, which are independent of interaction strength. For a given set of interaction strengths, the equilibrium probabilities of each adjacency matrix are measured using a Monte Carlo sampler. The backward rates are then set from the forward rates by imposing detailed balance with respect to the estimated equilibrium measure. A re-weighting procedure can then be used to evaluate the backward rates for other values of the interaction strengths. The resulting rate matrix can be converted into a probability transition matrix by exponentiation, at which point the optimization can be performed as described above.

These simulations are performed in non-dimensionalized form. Positions, energies, and time are respectively scaled by particle diameter σ, kBT, and a reference time tc. The resulting equations have a non-dimensional diffusion coefficient ϵ = Dtc/σ2, where D is the dimensional diffusion coefficient. We set ϵ = 1 and use experimentally measured values σ = 1.3 μm and D = 0.065 μm2/s for DNA coated colloids76 to obtain an order-of-magnitude estimate of tc ≈ 17 s.

2. Conical subunit assembly on a nanoparticle

We study a system adapted from Ref. 77, consisting of two types of conical subunits that assemble into capsids on the surface of a spherical nanoparticle. The subunits are rigid bodies comprised of six beads of increasing radius, consistent with a cone angle, α. Each of the four interior beads interacts attractively with the corresponding bead on other subunits through a Morse potential. The innermost bead has an attractive Morse interaction with the nanoparticle.

The two subunit types have relative diameters such that they correspond to pentamers (P) and hexamers (H) in the lowest-energy icosahedral capsid structure.78 There are two types of interactions between the subunits; an H–H attraction and an H–P attraction. The strength of these interactions can be tuned by varying the well-depth of the corresponding Morse potentials, EHH and EHP, respectively. Interaction strengths are chosen in the range of 1.2kBT to 1.8kBT, which can facilitate assembly on the nanoparticle surface but are typically too weak to drive nucleation in the bulk. The strength of the attraction to the nanoparticle is held constant at ENH = 7kBT for hexamers and ENP = 6.3kBT for pentamers. The system contains one nanoparticle, 86 pentamer subunits, and 214 hexamer subunits, so that subunits are in excess and the chemical potential of free subunits remains nearly constant throughout the assembly process. We consider a cubic box with side length 120l0, where l0 = 1 nm is set as the unit length scale for the system. Assembly simulations are performed using the Brownian dynamics algorithm in HOOMD79 with periodic boundary conditions. Times are measured in terms of the HOOMD-derived unit, t0=m/kBTl0, where m is the subunit mass. We also construct an MSM for the system by estimating the transition matrix elements from these Brownian dynamics simulations (Sec. II D). Further details on the simulations are provided in the supplementary material.

The structures with the lowest potential energy (per particle) for this system are capsids with either T = 4 icosahedral symmetry or D5 symmetry, which both contain 30 hexamer and 12 pentamer subunits with the same number of H–H and H–P bonds.80 However, in finite-time dynamical simulations, assembly usually results in asymmetric structures that, although containing about 42 subunits, have one or more defects. Figure 2 depicts the five most probable end structures after Tf = 8 × 105t0, for the parameters that maximized the yield of symmetric capsids (T = 4 and D5). The observation time Tf was chosen as the timescale after which the yield only grows logarithmically for most parameter sets that we focus on. We also compare the finite-time yields to the equilibrium yields, which we estimated using the MSM. This comparison shows that the system is far from reaching equilibrium at Tf, and from the MSM, we estimate that approaching equilibrium would require a timescale of ∼60Tf.

FIG. 2.

FIG. 2.

Visualization and assembly statistics for the cones system. (a) Snapshots along a typical trajectory. The first image shows the entire simulation domain, consisting of pentamer subunits (purple), hexamer subunits (orange), and a spherical nanoparticle (red). The following images show a zoomed-in view of the nanoparticle as assembly progresses. (b) Visual renderings of the most common end states for simulations with fixed interactions set to EHP = 1.35kBT and EHH = 1.475kBT, along with their finite-time yields at Tf = 8 × 105t0, estimated from 150 independent simulations. The target capsid is the second most common end state. The other states differ from the target due to the presence of one or more defects, which are circled in green. The bottom row on the right is the equilibrium yield for each structure, estimated by constructing an MSM and computing the stationary distribution.

These observations are consistent with the competition among thermodynamic and kinetic effects that arise for constant assembly driving forces. For very low subunit–subunit interactions or subunit concentrations, assembly is either thermodynamically unfavorable or does not nucleate on relevant timescales, while overly high interactions or concentrations cause the system to become trapped in metastable structures. Since the subunit–subunit interactions must be broken to exit a metastable state, reconfiguration timescales increase exponentially with interaction strengths. In our system, while the T = 4 and D5 structures are energetically favored, the other metastable states that we observe are more kinetically accessible under interaction strengths that enable nucleation on relevant timescales and thus occur in most assembly trajectories.

These competing effects suggest that assembly yields and fidelity could be enhanced by time-dependent sequences of interactions, which can drive rapid nucleation while also avoiding kinetic traps by facilitating reconfiguration from metastable states into the ground state(s). To investigate this possibility, we apply our optimization algorithm to the cones system. Our objective is to compute an optimal time-dependent protocol for EHH and EHP interactions that maximizes the yield of the symmetric T = 4 and D5 capsids at a specified observation time.

D. Constructing Markov state models

To apply the protocol optimization algorithm in Eq. (5), transition matrices need to be evaluated for the possible protocol values that are encountered during maximization of the objective function. While this analysis is computationally efficient for the colloidal polymer system because we have a semi-analytic Markov model, we are not so fortunate for the cones model. Similarly, for most relevant systems, it will not be possible to analytically evaluate the transition matrix. It would be computationally intractable to perform the sampling from unbiased simulations required to estimate the transition matrix at every candidate parameter value encountered during every candidate protocol sequence. Therefore, we describe here a computationally efficient procedure for estimating an MSM over parameter space to apply our optimization algorithm. This process has three steps: (1) We construct and validate local MSMs for fixed values of the system parameters, distributed over the region in parameter space that assembly protocols are likely to visit (which we denote as the feasibility set). (2) We perform an interpolation to construct a global MSM that can be evaluated for any intermediate value of the feasibility set. (3) During the optimization, if the protocol attempts to leave the feasibility set, we perform additional sampling from unbiased trajectories to expand the feasibility set as required. Similarly, if analysis indicates that interpolation errors are large within a subregion of the feasibility set, we perform sampling at additional parameter sets within that subregion.

To construct local MSMs for the cones system, we require a state-space discretization, i.e., a mapping between simulation configurations and MSM microstates. We use the state-space discretization (nP, nH,  IH), where nP and nH are the numbers of pentamers and hexamers attached to the nanoparticle, respectively and IH is equal to the number of hexamers in contact with precisely two pentamers. The target state is specified by the triplet (12, 30, 30), which corresponds to a T = 4 or D5 capsid. We find that this three-variable description is insufficient for a subset of configurations, for which we augment the discretization with the variables bHH and bHP, the numbers of hexamer–hexamer and hexamer–pentamer bonds, respectively. See the supplementary material for details.

We construct MSMs from unbiased dynamics trajectories by computing a row-normalized count matrix of the number of transitions between discrete states after a lag time, τ. We determine that τ = 3125t0 is sufficient to build converged local MSMs and capture the assembly dynamics (see the supplementary material for detailed convergence and validation). We identify a region in parameter space that leads to productive assembly and perform simulations at parameter values on an unstructured grid within that region to construct a collection of local MSMs. Figure 3 shows this region for our example and the sampled parameters within it as black nodes. To avoid extrapolation errors, we restrict optimization to the feasible set. However, as noted above, the feasible set can be expanded by additional sampling if the protocol attempts to leave the region.

FIG. 3.

FIG. 3.

Feasibility set for the cones system. The allowed parameter values for the optimization are shown within the bounded region of parameter space. Black nodes represent parameter values for which we constructed local MSMs. Global transition matrices are interpolated for optimization and to evaluate and plot the assembly populations as a function of time. The red point indicates the parameters that achieve the maximum target yield with fixed interactions. The optimal time-dependent protocol is traced by the curve, from purple (T0) to light blue (Tf), with representative snapshots of the structures at various time points.

To construct an MSM during the protocol optimization at parameter sets between sampled nodes, we perform interpolation of the transition matrix entries using radial basis functions.81,82 Since each parameter set may sample a disjoint collection of states, we first define a global discretized state space as the union of all states observed in each local MSM. For each pair of states with at least one transition among all parameter sets, we construct a list of their transition probabilities for each parameter set. We assign a probability of 0 if the transition is not observed. Using a Gaussian kernel, we construct an interpolant for each non-zero transition matrix entry that can be evaluated for any parameter value in the allowed region and used to construct a global transition matrix. See the supplementary material for details on constructing and evaluating these interpolants. Note that this process can be made more efficient by using information from sampling at all parameter sets for the evaluation of a given transition matrix, for example, by using transition-based reweighting analysis methods (e.g., DHAM,83 xTRAM,84 and dTRAM85). For simplicity, we have not employed these protocols in this work.

III. RESULTS

A. Colloidal chain assembly

We apply our optimization code to compute the time-dependent protocol that maximizes the yield for each structure shown in Fig. 1(a). We use different final times Tf for each structure, choosing Tf to be long enough for each phase of a protocol to approximately reach a steady state. We use a regularization parameter λ* = 2 × 10−3 for structures 1–3 and λ = 2 × 10−4 for structures 4 and 5. We believe that it was it necessary to reduce λ for states 4 and 5 because these are floppy structures that are difficult to stabilize. In all cases, we use a line search with a starting step size h = 1 and set the lag time to τ = Tf/50. The resulting yields are shown in Fig. 1(c), where the label number denotes which structure was being optimized for. Comparing these values to the optimal yields for fixed protocols in Fig. 1(b), we see improvements for each structure. We find that structures 1 and 2 can be formed with relatively high selectivity among the tracked structures, whereas the protocols for the other targets result in a spread of structures but with the target being the maximal probability structure in each case.

The triangle state, structure 3, sees the largest gain in yield with the time-dependent protocol among the five. The yield more than doubles from around 20% to just under 50%. The protocol is relatively simple, shown in Fig. 4(a). It begins with a large EBB value, which we know forms structure 4 with roughly 50% yield. Then, by switching on a large value for ERB, the triangle state becomes the dominant endpoint for any trajectory that enters the structure 4 state. Figure 4(b) shows the probability for the highest yield structures using this protocol, evaluated using the MSM, which shows the pathway taken to form the target. Note that the MSM predicts a nearly perfect yield, but simulation results in a yield of around 50%. This is due to a two-fold degeneracy in the adjacency matrix representation of states along this transition pathway. Whether the outer particle rotates clockwise or counter-clockwise to form bonds, the adjacency matrix representation is the same, but the resulting configurations either form the triangle state or a kinetic trap, reducing the maximum yield by a factor of two (see the supplementary material for further discussion).

FIG. 4.

FIG. 4.

Protocol optimization results and verification for the triangle state of the colloidal polymer. (a) An optimal time-dependent protocol for the three interaction parameters found by our algorithm. The initial guess was the optimal constant values for the like interactions, ERR = 0.01 kBT and EBB = 14 kBT, and a linear protocol from 1 kBT to 12 kBT for ERB. (b) State populations as a function of time for the optimal protocol, evaluated using the MSM. Only states with a maximum yield of over 0.2 are shown. (c) Time-dependent yield of the triangle state computed from Brownian dynamics. For the alternating chain, we compare the optimal fixed protocol yields with the optimal time-dependent protocol yields. We also show the yield for the optimal fixed protocol with three particle types and the corresponding optimal particle. Results are averaged over 600 independent Brownian dynamics trajectories.

Finally, Fig. 4(c) shows the yield of the triangle state as a function of time for three protocols, evaluated from unbiased Brownian dynamics trajectories. We see that the optimal fixed protocol for the alternating chain quickly levels off at around 20% yield. In contrast, for the optimal time-dependent protocol, probability first accumulates in structure 4, then very quickly forms the triangle once the red-blue interaction is turned on, achieving around a 50% yield. We also compare these curves to the best fixed protocol that can be achieved by adding additional particle types. The minimal complexity solution turns out to be three particle types in the configuration shown in Fig. 4(c), which is as efficient as having all particles being unique.20 We see that we can achieve comparable assembly efficiency by using a time-dependent protocol with two particle types as we can by optimizing constant protocols with three particle types. These kinds of tradeoffs are important to characterize, as in many systems it is more economical to vary experimental conditions over time than it is to design and synthesize new subunit types with specified interactions.

The optimal protocols and the Brownian dynamics verification of their predicted yield for each of the other structures in Fig. 1(a) can be found in the supplementary material. Our computed protocols and expected yields for the rigid structures are consistent with recent experimental findings.72

B. Conical subunit capsid assembly

First, we get a sense of how the assembly proceeds under fixed parameter values, with an observation time Tf = 8 × 105t0, or 256τ. We construct the transition matrix interpolant for parameter values inside the feasibility set (indicated by black points in Fig. 3) and then use it to compute target yields according to Eq. (1). The resulting yields are shown as a heat map in Fig. 3. We see that if the interactions are too weak, the target does not form due to thermodynamic unfavorability or slow nucleation. As the interactions strengthen, we reach the optimal assembly conditions [E=(EHP,EHH)=(1.35kBT,1.475kBT), red point], where the target is stable but subunits can rearrange to correct defects during assembly trajectories. Further strengthening the interactions reduces the target yield as defect states can no longer rearrange on simulated timescales and the system becomes trapped in metastable states. This nonmonotonic dependence of yields on interaction strengths for fixed interactions is typical in self-assembling systems.14,16,18

Next, we apply our optimization algorithm to the capsid assembly problem using λ* = 4 × 10−3 and h determined for each iteration by a back-tracking line search. Our initial guess is a piece-wise constant protocol, E=(1.5kBT,1.5kBT) for the first 25 lag times and E=(1.3kBT,1.5kBT) for the remaining time. The protocol converges to the result shown in Fig. 5(a). The time-dependence of this protocol is traced out in the parameter space in Fig. 3, and representative snapshots of the system are shown along the pathway. The sequence goes from an empty nanoparticle to defective structure C, through a variety of transient structures, and finally to the target structure.

FIG. 5.

FIG. 5.

Protocol optimization results and testing for the cones system. (a) The optimal time-dependent protocol found by our algorithm is shown as a solid line. The optimal constant protocol, determined in Fig. 3, is shown as a dashed line. (b) MSM estimates of the target probability as a function of time for the optimal time-dependent protocol and the optimal constant protocol. The MSM estimates are verified by performing 150 Brownian dynamics (BD) simulations performed with optimal time-dependent or constant protocols, respectively, and tracking the number of trajectories in the target state as a function of time. (c) MSM estimates of all common state probabilities using the optimal constant protocol. (d) MSM estimates of all common state probabilities using the optimal time-dependent protocol.

By propagating the time evolution of state probabilities with the MSM, we obtain estimates for the target yield as a function of time for each protocol [Fig. 5(b)]. The optimal time-dependent protocol achieves a yield of about 55% at our chosen final time, Tf = 8 × 105t0, compared to the optimal constant protocol yield of about 22%. Thus, the time-dependent protocol enhances yields by more than twofold in the same assembly time. To test these estimates, we perform Brownian dynamics according to the same protocols and compare the computed yield curves (averaged over 150 independent Brownian dynamics simulations) to the MSM prediction [Fig. 5(b)]. The MSM estimates match the Brownian dynamics results well, even for the time-dependent protocol. This close agreement demonstrates that, with sufficient sampling for estimating transition matrices and sufficient coverage of the feasibility set, the radial basis function interpolation enables effective MSM predictions for non-sampled parameter sets.

Figures 5(c) and 5(d) show yield curves for each of the high-population structures [those shown in Fig. 2(b)] for the optimal constant and time-dependent protocols, respectively. For the optimal constant case, we see the yield curves are all nearly flat near time zero, reflecting the length of time required for nucleation and growth of capsids. Furthermore, the yield curves for all structures increase monotonically over the simulated timescale, implying that once one of these structures is reached, transitions to the others are unlikely. Consistent with this conclusion, only 4 out of 150 simulation trajectories exhibit a transition between labeled states, which all occur from state C to the target. In contrast, the time-dependent protocol enables rapid nucleation and growth of capsids. Each tracked state accumulates ∼10% yield, at about the same rate, during the initial growth phase. During the next phase, where interaction strengths are reduced, states B, C, and D are destabilized and begin to decrease in probability, while state A continues to accumulate, but more slowly. The target state, however, quickly increases in yield, indicating that transitions are occurring from these destabilized structures to the target structure. In simulations, we observe 44 of 150 trajectories with such transitions, occurring from all of the A, B, C, and D states into the target. The distribution of times along the optimal protocol when these transitions begin is plotted in Fig. 6(a), which shows that the transitions are concentrated when the interactions first become weak and that state C is the initial state for most transitions. Figure 6(b) shows two example transition pathways. Both pathways begin with a subunit detaching from the nanoparticle, allowing rearrangement of the nearby subunits. While hexamers are involved in the rearrangements, pentamer rearrangements seem to drive the transitions, which explains why the optimal protocol decreases EHP more than EHH.

FIG. 6.

FIG. 6.

Analysis of how the optimal time-dependent protocol enhances yield. (a) Distribution of times when transitions from the competing states to the target state begin in trajectories with the optimal time-dependent protocol [from Fig. 2(b)]. (b) Examples of transition pathways from states B and C into the target state, observed under the optimal time-dependent protocol. Green Xs denote that a particle has dissociated in the next snapshot, and blue arrows denote the reconfiguration of associated subunits that results in the next snapshot.

The rapid transition between weak and strong interactions in the optimal time-dependent protocol is qualitatively consistent with Fullerton and Jack’s results on colloidal cluster growth,86 which considered simpler protocols defined by two interaction strengths and a single time to switch between them. To investigate how much additional enhancement a full time-dependence achieves in comparison to a piece-wise constant protocol, we converted our protocol to a simple two-step protocol. We used the time of the large jump as the transition time, at which point the interactions switch from E=(1.5kBT,1.5kBT) to E=(1.2kBT,1.4kBT). This protocol gives a yield of about 49.9%, so we estimate that the fully time-dependent protocol boosts the yield by an additional 5%. While this further improvement is relatively small, our efficient optimization algorithm provides a converged, fully time-dependent protocol much faster than it would take to sample all possible three-parameter step-function protocols. In particular, one gradient descent iteration requires approximately the same computation time as sampling two step-function protocols, and we typically observe convergence in 20–40 iterations (for a good initial guess protocol).

1. Effect of final time and regularization

Thus far, we have performed the protocol optimization using a final time Tf = 8 × 105t0, since this is tractable for testing the predicted protocols with Brownian dynamics simulations. One of the benefits of constructing MSMs is that it enables the probing of timescales beyond what is feasible for simulation. We take advantage of this to compute optimal protocols for larger values of Tf. We find that the resulting protocols are all similar; they all spend about the same amount of time in the strong interaction phases at the beginning and end of the protocol, and the extra assembly time gets allocated to the weak interaction phase as Tf increases. This is likely because we only optimize for the target yield at the final time without enforcing constraints on assembly rate. Figure 7(a) shows the optimal time-dependent and constant target yields as a function of the final time. We see that for constant protocols, the optimal yield increases approximately linearly but slowly, while the time-dependent optimal yield increases quickly initially and then levels off. As it levels off, it seems to approach the estimated equilibrium yield for the target state, coming within 1% of it when Tf = 900τ. To determine what sets the timescale for this approach to equilibrium, we plot the MSM estimated probability of each common state in Fig. 7(b). States B, C, and D reach steady state probabilities along this protocol, but state A is still decreasing to its equilibrium value near the final time. State A is seen to be the slowest to transition to the target, so this transition rate is what sets the necessary time-scale for the optimal time-dependent protocol to drive the system toward its equilibrium statistics. This suggests another mode for further optimization; search for new regions in parameter space that either minimize assembly of state A or allow for quicker transitions out of it.

FIG. 7.

FIG. 7.

Effect of the final time on optimal yields. (a) The optimal target yield from our computed constant and time-dependent protocols as a function of the final time. For reference, the yellow dashed line shows the estimated equilibrium target yield as in Fig. 2(b), and the black dashed line shows the final time used to generate our prior results. (b) MSM estimates of all common state probabilities using the optimal protocol for Tf = 900τ ≈ 28.12 × 105t0.

The optimal protocol and the resulting maximum yield depend on the regularization constant λ in Eq. (2). The results shown here used λ* = 4 × 10−3; by decreasing λ* by a few orders of magnitude, we observe maximum yields that increase by up to 2% but also involve protocols that are much less smooth than the one shown in Fig. 5(a). Thus, one can adjust λ depending on the trade-off between the difficulty of implementing abrupt changes in control parameters and the increase in yield. See the supplementary material for an analysis of how λ* affects the optimal yield and protocol smoothness.

The optimization also converges to different results based on the initial guess, suggesting that we have not found the global maximum. We performed additional optimizations that were initialized with various piece-wise constant protocols as well as a linear protocol between the optimal start and end values. The results are qualitatively similar in each case; interactions go from strong to weak to strong, but at different times, with target yields ranging from 45% to 55%. These results obtained from different initial guesses suggest that our protocol is near-optimal but that it would be beneficial to use alternative optimization algorithms better suited to finding global extrema on rugged landscapes.

IV. CONCLUSIONS

Markov state models (MSMs) have been widely adopted by the molecular dynamics community for their ability to probe long-time kinetics in a way that is human-interpretable. By leveraging these properties of MSMs as well as properties of the underlying transition matrices, we have developed an optimization algorithm that can efficiently compute time-dependent parameter protocols that maximize the finite-time probability of observing a chosen target state. Furthermore, in the (typical) case when an analytic form for the transition matrix cannot be obtained, we have described a sampling and interpolation procedure that enables constructing MSMs as a continuous function of the system control parameters after performing sampling at only discrete parameter sets. The resulting MSM can then be used for optimization.

We have tested the method and evaluated its performance on two systems. For a short polymer of colloidal particles, time-dependent interaction strengths between particle types can be tuned to drive selective folding into a number of different cluster geometries. Importantly, we show that the time-dependent protocols can be used to selectively enhance the yield of not only the equilibrium ground state but also metastable states, including floppy structures, which are typically transient under constant protocols. For some structures, it is possible to reduce the number of distinct particle types needed to achieve high folding yields under constant parameters by instead tuning a time-dependent protocol. Thus, this approach could simplify experimental realizations of folding and assembly since time-dependent protocols may be easier to implement in some systems than synthesizing new subunit types with specific interactions. For example, in systems with interactions mediated by DNA hybridization, Rogers et al.29,30 showed that DNA sequences can be designed such that multiple sets of particle–particle interactions can be simultaneously tuned by varying temperature.

We also consider a more computationally challenging system, conical subunits assembling around a spherical nanoparticle, as a model for viral capsid assembly. In this example, the target yield is suppressed because kinetic traps corresponding to defective structures prevent many trajectories from attaining the ground state configuration on relevant timescales. We demonstrate that by optimizing time-dependent subunit–subunit interaction strengths, the maximum yield can be more than doubled in comparison to that achieved with constant interactions. The optimal protocol begins with strong interactions that drive rapid nucleation, but into primarily defective structures, followed by selectively weakening interactions to enable rearrangement into the stable target state. Using unbiased simulations, we verify that the MSM estimates are accurate to within a few percent yield, which is notable considering the potentially compounding errors arising from the discretization of state space when constructing the MSM and the interpolation error associated with estimating transition matrices at discrete parameter sets. Importantly, one should not extrapolate outside of the sampled domain (feasibility set) when using the interpolation procedure. As an example, in the cones system, extrapolation outside of the feasibility set (based on the nearby transition matrix values) predicts nucleation in some parameter regions where unbiased simulations exhibit no nucleation, leading to large errors in MSM predictions. Instead, the feasibility set can be enlarged by additional sampling if the optimization algorithm identifies additional regions of parameter space as important.

To assess the computational efficiency of our method, we can compare the computational cost for applying the method to the capsid assembly problem against the cost for identifying the optimal protocol directly from Brownian dynamics simulations at the same accuracy level. For the latter, we consider using a gradient descent method to compute the optimal protocol. Our base case has a protocol time-discretization with 256 lag times (meaning timepoints at which the parameter values can be changed) for the two independent parameters, and thus a total of 512 parameters to optimize over. To estimate gradients, we would perform unbiased Brownian dynamics simulations at different parameter sets and compute finite-differences of the final yield with respect to each parameter. Achieving 10% accuracy would require about 100 simulations for each finite-difference calculation and thus a total of 5×104 simulations. As a lower bound, we assume that convergence would require 20 iterations from a good initial guess, resulting in a total of about 106 simulations. Since each simulation requires about 24 h on one of our CPUs (2.3Ghz AMD EPYC 7452), the total computational effort is 2.4×107 CPU hours. In comparison, the total computational effort for our MSM approach, which is dominated by the simulations used to estimate the transition matrices, was 100-fold smaller, about 2 × 105 CPU hours or the equivalent of about 104 unbiased simulations.

The MSM approach has several benefits in addition to this 100-fold reduction in computational sampling effort. First, estimation of the transition matrices involves many relatively short simulations, targeting poorly sampled transitions, and is thus trivially parallelizable. In contrast, the unbiased simulations must be run for the full simulation time, spending large amounts of time gathering data about transitions that have already been well sampled. More importantly, unbiased simulations become computationally intractable for parameter sets that lead to larger nucleation barriers. (In this example, we specifically chose parameter values for which direct simulation was computationally tractable.) Second, MSMs can efficiently and accurately extrapolate dynamics to longer timescales. Doubling the final time would double the total simulation effort for the direct method, whereas for the MSM approach, it simply doubles the number of matrix-vector and vector–vector multiplications needed to compute gradients according to Eq. (4). Finally, construction of the MSM allows for extensive analysis using transition path theory,39,87 which can reveal factors that control assembly behaviors and elucidate mechanisms by which time-dependent protocols facilitate reconfiguration of kinetic traps.

While we developed this algorithm in the context of folding or self-assembly protocols, the adjoint-based optimization algorithm is generically applicable to MSMs and any objective function involving a state probability. For instance, the same framework can be used as a highly efficient parameter estimation tool by determining the protocol (e.g., parameters such as temperature, interaction strengths, and concentrations) that most closely reproduces an experimental yield curve. Such estimation can be performed for time-dependent or constant protocols. Alternatively, the algorithm can be used for multifarious50–55 or reconfigurable56–65 assembly systems. By changing the initial and target states in the optimization, one can optimize for assembly of a wide range of target states or to promote transitions between particular states.

A. Combining optimal control theory with closed-loop feedback control

The method described in this article uses an MSM and a single optimal control theory computation to generate a protocol that is optimal at the ensemble level. On average, we have shown that the assembly will proceed according to the MSM predictions, but due to the stochastic nature of self-assembling systems, there will inevitably be some trajectories that quickly reach the target, some that take longer, and some that get trapped. Moreover, stochastic noise and experimental error could diminish the performance of the predicted protocol. Both of these limitations could be improved by combining our method with closed-loop feedback control, which has been shown to make assembly more robust by allowing the protocol to respond to the current state of the system rather than the average state of the system at that time (e.g., Refs. 4749). Our method can be applied to closed-loop control by monitoring the current state of a simulation or experiment at prescribed time intervals. At each time interval, the current system configuration is set as the initial state in the optimization, and a new optimal protocol is computed. Alternatively, one can incorporate machine learning approaches such as reinforcement learning and physics-informed neural networks (PINNs), which have been used to solve assembly and optimal control problems in a way that naturally encodes feedback and improves scalability.88–92 In addition, the MSM model can be used to efficiently generate training data for such approaches.

B. Limitations of the method and outlook

In this work, we have considered systems in which the subunit chemical potential is irrelevant (for the folding problem) or nearly constant in time (since the subunits are in excess in the cones system). We will describe in a subsequent publication how the method can be modified to account for time-varying chemical potentials, such as those that occurs due to the depletion of subunits in systems involving homogeneous nucleation or where subunits are not in excess.

The current implementation of our method also suffers from a few limitations. The first is scaling to a larger number of control parameters. Sampling and interpolation costs scale exponentially with the number of parameters. Fortunately, there is a desire to be economical in terms of system design, so many systems of interest will have only a few independent parameters. In higher-dimensional parameter spaces, a more targeted sampling approach will be necessary; for example, sampling could be performed on-the-fly in regions indicated as important by the optimization algorithm.

State-space discretization is another important consideration, as it is for constructing an MSM in any system. A good state-space discretization must describe all relevant slow degrees of freedom, and the results of our algorithm can be sensitive to the extent of discretization error. In addition, the user must decide on a trade-off between the number of collective coordinates used to discretize the state space and the computational complexity. For example, our coordinate for the cones system lumps together the T = 4 and D5 capsids, as these structures are very similar. While it would be straightforward to include an additional coordinate that distinguishes between the structures and then optimize specifically for D5 structures, we expect the resulting protocol would not change much because the T = 4 yield is so low (5% compared to 50% for D5). The size of the state space also affects computational and memory requirements. Machine learning approaches have the potential to address both these issues. Neural network architectures such as VAMPnets93 or GraphVAMPnets94 have been shown to streamline the MSM creation process by determining a minimal state space decomposition.

Finally, there is an implicit restriction on the temporal resolution of the protocols. The lower bound is set by the smallest lag time that results in convergence for all local MSMs. If this lag time is not sufficiently small compared to the assembly time-scales, the protocols are unlikely to be informative. This issue could be avoided in three ways: by performing a finer discretization of state space such that a smaller lag time can be used; by constructing a transition rate matrix95 and exponentiating it to construct the probability transition matrix, as we do for the colloids example; or by formulating the optimization algorithm for a transition rate matrix directly, rather than a probability transition matrix.

SUPPLEMENTARY MATERIALS

The supplementary material PDF contains details on the implementation of simulation of the cones model, derivations for the adjoint method gradient calculation, details on performing gradient descent, details on MSM construction, details on transition matrix interpolation and evaluation, and the full protocol optimization results for the colloidal polymer system.

ACKNOWLEDGMENTS

We would like to thank Eric Vanden-Eijnden for the idea of applying optimal control principles to self-assembling systems modeled by MSMs and Miranda Holmes-Cerfon for helpful discussions on the method. We acknowledge support from Grant No. NIH R01GM108021 and the Brandeis NSF MRSEC, Bioinspired Soft Materials, Grant No. DMR-2011846. We also acknowledge computational support from NSF XSEDE computing resources allocation Grant No. TG-MCB090163 and the Brandeis HPCC, which is partially supported by the NSF through Grant Nos. DMR-MRSEC 2011846 and OAC-1920147.

Contributor Information

Anthony Trubiano, Email: mailto:trubiano@brandeis.edu.

Michael F. Hagan, Email: mailto:hagan@brandeis.edu.

AUTHOR DECLARATIONS

Conflict of Interest

The authors declare no conflicts of interest.

Author Contributions

Anthony Trubiano: Conceptualization (equal); Methodology (equal); Data Curation (lead); Formal Analysis (lead); Software (lead); Visualization (lead); Writing – Original Draft (equal); Writing - Review and Editing (equal). Michael Hagan: Conceptualization (equal); Methodology (equal); Funding Acquisition (lead); Supervision (lead); Writing – Original Draft (equal); Writing – Review and Editing (equal).

DATA AVAILABILITY

The code used to generate data, construct MSMs, perform optimization, and produce figures for the colloids model is publicly available at the Github repository: https://github.com/onehalfatsquared/CPfold, Ref. 96.

The code used to generate data, construct MSMs, perform optimization, and produce figures for the cones model is publicly available at the Github repository: https://github.com/onehalfatsquared/protocolOptMSM, Ref. 97.

REFERENCES

  • 1.Garg T. and Rath G., Crit. Rev. Ther. Drug Carrier Syst. 32(2), 89 (2015). 10.1615/critrevtherdrugcarriersyst.2015010159 [DOI] [PubMed] [Google Scholar]
  • 2.Beija M., Salvayre R., Lauth-de Viguerie N., and Marty J.-D., Trends Biotechnol. 30, 485 (2012). 10.1016/j.tibtech.2012.04.008 [DOI] [PubMed] [Google Scholar]
  • 3.Ebbens S., Curr. Opin. Colloid Interface Sci. 21, 14 (2016). 10.1016/j.cocis.2015.10.003 [DOI] [Google Scholar]
  • 4.Mallory S. A., Valeriani C., and Cacciuto A., Annu. Rev. Phys. Chem. 69, 59 (2018). 10.1146/annurev-physchem-050317-021237 [DOI] [PubMed] [Google Scholar]
  • 5.Fan J. A., He Y., Bao K., Wu C., Bao J., Schade N. B., Manoharan V. N., Shvets G., Nordlander P., Liu D. R., and Capasso F., Nano Lett. 11, 4859 (2011). 10.1021/nl203194m [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Huh J. H., Kim K., Im E., Lee J., Cho Y., and Lee S., Adv. Mater. 32, 2001806 (2020). 10.1002/adma.202001806 [DOI] [PubMed] [Google Scholar]
  • 7.Ke Y., Ong L. L., Shih W. M., and Yin P., Science 338, 1177 (2012). 10.1126/science.1227268 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Murugan A., Zou J., and Brenner M. P., Nat. Commun. 6, 6203 (2015). 10.1038/ncomms7203 [DOI] [PubMed] [Google Scholar]
  • 9.Hormoz S. and Brenner M. P., Proc. Natl. Acad. Sci. U. S. A. 108, 5193 (2011). 10.1073/pnas.1014094108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Zeravcic Z., Manoharan V. N., and Brenner M. P., Rev. Mod. Phys. 89, 031001 (2017). 10.1103/revmodphys.89.031001 [DOI] [Google Scholar]
  • 11.Damasceno P. F., Engel M., and Glotzer S. C., Science 337, 453 (2012). 10.1126/science.1220869 [DOI] [PubMed] [Google Scholar]
  • 12.Sacanna S., Irvine W. T. M., Chaikin P. M., and Pine D. J., Nature 464, 575 (2010). 10.1038/nature08906 [DOI] [PubMed] [Google Scholar]
  • 13.Wang Y., Wang Y., and Breed D. R., Nature, 491, 51 (2012). 10.1038/nature11564 [DOI] [PubMed] [Google Scholar]
  • 14.Hagan M. F. and Chandler D., Biophys. J. 91, 42 (2006). 10.1529/biophysj.105.076851 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Wilber A. W., Doye J. P. K., Louis A. A., Noya E. G., Miller M. A., and Wong P., J. Chem. Phys. 127, 085106 (2007). 10.1063/1.2759922 [DOI] [PubMed] [Google Scholar]
  • 16.Grant J., Jack R. L., and Whitelam S., J. Chem. Phys. 135, 214505 (2011). 10.1063/1.3662140 [DOI] [PubMed] [Google Scholar]
  • 17.Palma C.-A., Cecchini M., and Samorì P., Chem. Soc. Rev. 41, 3713 (2012). 10.1039/c2cs15302e [DOI] [PubMed] [Google Scholar]
  • 18.Whitelam S. and Jack R. L., Annu. Rev. Phys. Chem. 66, 143 (2015). 10.1146/annurev-physchem-040214-121215 [DOI] [PubMed] [Google Scholar]
  • 19.Bisker G. and England J. L., Proc. Natl. Acad. Sci. U. S. A. 115, E10531 (2018). 10.1073/pnas.1805769115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Trubiano A. and Holmes-Cerfon M., Soft Matter 17, 6797 (2021). 10.1039/d1sm00681a [DOI] [PubMed] [Google Scholar]
  • 21.Nguyen M. and Vaikuntanathan S., Proc. Natl. Acad. Sci. U. S. A. 113, 14231 (2016). 10.1073/pnas.1609983113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Sherman Z. M. and Swan J. W., ACS Nano 10, 5260 (2016). 10.1021/acsnano.6b01050 [DOI] [PubMed] [Google Scholar]
  • 23.Heinen L. and Walther A., Soft Matter 11, 7857 (2015). 10.1039/c5sm01660f [DOI] [PubMed] [Google Scholar]
  • 24.Taylor S. L., Evans R., and Patrick Royall C., J. Phys.: Condens. Matter 24, 464128 (2012). 10.1088/0953-8984/24/46/464128 [DOI] [PubMed] [Google Scholar]
  • 25.Tagliazucchi M., Weiss E. A., and Szleifer I., Proc. Natl. Acad. Sci. U. S. A. 111, 9751 (2014). 10.1073/pnas.1406122111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Snyder R. C., Studener S., and Doherty M. F., AIChE J. 53, 1510 (2007). 10.1002/aic.11174 [DOI] [Google Scholar]
  • 27.Abu Bakar M. R., Nagy Z. K., Saleemi A. N., and Rielly C. D., Cryst. Growth. Des 9, 1378 (2009). 10.1021/cg800595v [DOI] [Google Scholar]
  • 28.Bupathy A., Frenkel D., and Sastry S., Proc. Natl. Acad. Sci. U. S. A. 119, e2119315119 (2022). 10.1073/pnas.2119315119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Rogers W. B. and Manoharan V. N., Science 347, 639 (2015). 10.1126/science.1259762 [DOI] [PubMed] [Google Scholar]
  • 30.Gehrels E. W., Rogers W. B., and Manoharan V. N., Soft Matter 14, 969 (2018). 10.1039/c7sm01722g [DOI] [PubMed] [Google Scholar]
  • 31.Stenhammar J., Wittkowski R., Marenduzzo D., and Cates M. E., Sci. Adv. 2, e1501850 (2016). 10.1126/sciadv.1501850 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Dou Y., Wang B., Jin M., Yu Y., Zhou G., and Shui L., J. Micromech. Microeng. 27, 113002 (2017). 10.1088/1361-6439/aa84db [DOI] [Google Scholar]
  • 33.Chang J.-C., Swank Z., Keiser O., Maerkl S. J., and Amstad E., Sci. Rep. 8, 8143 (2018). 10.1038/s41598-018-26542-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Zhang R., Li Q., Tian L., Gong J., Li Z., Liu W., and Gui L., J. Micromech. Microeng. 31, 055013 (2021). 10.1088/1361-6439/abf1b4 [DOI] [Google Scholar]
  • 35.Lindquist B. A., Jadrich R. B., and Truskett T. M., J. Chem. Phys. 145, 111101 (2016). 10.1063/1.4962754 [DOI] [PubMed] [Google Scholar]
  • 36.Miskin M. Z., Khaira G., de Pablo J. J., and Jaeger H. M., Proc. Natl. Acad. Sci. U. S. A. 113, 34 (2016). 10.1073/pnas.1509316112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Engel M. C., Smith J. A., and Brenner M. P., “Optimal control of nonequilibrium systems through automatic differentiation” (unpublished) (2022).
  • 38.Husic B. E. and Pande V. S., J. Am. Chem. Soc. 140, 2386 (2018). 10.1021/jacs.7b12191 [DOI] [PubMed] [Google Scholar]
  • 39.Perkett M. R. and Hagan M. F., J. Chem. Phys. 140, 214101 (2014). 10.1063/1.4878494 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Pande V. S., Beauchamp K., and Bowman G. R., Methods 52, 99 (2010); part of Special Issue: Protein Folding. 10.1016/j.ymeth.2010.06.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Schwantes C. R., McGibbon R. T., and Pande V. S., J. Chem. Phys. 141, 090901 (2014). 10.1063/1.4895044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Hummer G. and Szabo A., J. Phys. Chem. B 119, 9029 (2015). 10.1021/jp508375q [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Suárez E., Wiewiora R. P., Wehmeyer C., Noé F., Chodera J. D., and Zuckerman D. M., J. Chem. Theory Comput. 17, 3119 (2021). 10.1021/acs.jctc.0c01154 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Norris J. R. and Chains M., Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press, 1997). [Google Scholar]
  • 45.Gardiner C. W., Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences, 3rd ed., Springer Series in Synergetics Vol. 13 (Springer-Verlag, Berlin, 2004), pp. xviii+415. [Google Scholar]
  • 46.Juárez J. J. and Bevan M. A., Adv. Funct. Mater. 22, 3833 (2012). 10.1002/adfm.201200400 [DOI] [Google Scholar]
  • 47.Tang X., Rupp B., Yang Y., Edwards T. D., Grover M. A., and Bevan M. A., ACS Nano 10, 6791 (2016). 10.1021/acsnano.6b02400 [DOI] [PubMed] [Google Scholar]
  • 48.Tang X., Zhang J., Bevan M. A., and Grover M. A., J. Process Control 60, 141 (2017). 10.1016/j.jprocont.2017.06.003 [DOI] [Google Scholar]
  • 49.Grover M. A., Griffin D. J., Tang X., Kim Y., and Rousseau R. W., “Optimal feedback control of batch self-assembly processes using dynamic programming,” J. Process Control 88, 32–42 (2020). 10.1016/j.jprocont.2020.01.013 [DOI] [Google Scholar]
  • 50.Murugan A., Zeravcic Z., Brenner M. P., and Leibler S., Proc. Natl. Acad. Sci. U. S. A. 112, 54 (2015). 10.1073/pnas.1413941112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Zwicker D. and Laan L., Proc. Natl. Acad. Sci. U. S. A. 119, e2201250119 (2022). 10.1073/pnas.2201250119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Ben-Ari A., Ben-Ari L., and Bisker G., J. Chem. Phys. 155, 234113 (2021). 10.1063/5.0069161 [DOI] [PubMed] [Google Scholar]
  • 53.Osat S. and Golestanian R., “Non-reciprocal multifarious self-organization” (unpublished) (2022). [DOI] [PMC free article] [PubMed]
  • 54.Jacobs W. M., Phys. Rev. Lett. 126, 258101 (2021). 10.1103/physrevlett.126.258101 [DOI] [PubMed] [Google Scholar]
  • 55.Mohapatra L., Goode B. L., Jelenkovic P., Phillips R., and Kondev J., Annu. Rev. Biophys. 45, 85 (2016). 10.1146/annurev-biophys-070915-094206 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Das A. and Limmer D. T., “Nonequilibrium design strategies for functional colloidal assemblies” (unpublished) (2022). [DOI] [PMC free article] [PubMed]
  • 57.Kohlstedt K. L. and Glotzer S. C., Phys. Rev. E 87, 032305 (2013). 10.1103/physreve.87.032305 [DOI] [Google Scholar]
  • 58.Ortiz D., Kohlstedt K. L., Nguyen T. D., and Glotzer S. C., Soft Matter 10, 3541 (2014). 10.1039/c4sm00026a [DOI] [PubMed] [Google Scholar]
  • 59.Phillips C. L., Jankowski E., Krishnatreya B. J., Edmond K. V., Sacanna S., Grier D. G., Pine D. J., and Glotzer S. C., Soft Matter 10, 7468 (2014). 10.1039/c4sm00796d [DOI] [PubMed] [Google Scholar]
  • 60.Young K. L., Jones M. R., Zhang J., Macfarlane R. J., Esquivel-Sirvent R., Nap R. J., Wu J., Schatz G. C., Lee B., and Mirkin C. A., Proc. Natl. Acad. Sci. U. S. A. 109, 2240 (2012). 10.1073/pnas.1119301109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Nguyen T. D. and Glotzer S. C., ACS Nano 4, 2585 (2010). 10.1021/nn901725b [DOI] [PubMed] [Google Scholar]
  • 62.Nguyen M. and Vaikuntanathan S., “Dissipation induced transitions in elastic strings” (unpublished) (2018).
  • 63.Mann S., Nat. Mater. 8, 781 (2009). 10.1038/nmat2496 [DOI] [PubMed] [Google Scholar]
  • 64.Solomon M. J., Nature 464, 496 (2010). 10.1038/464496a [DOI] [PubMed] [Google Scholar]
  • 65.Long A. W. and Ferguson A. L., J. Phys. Chem. B 118, 4228 (2014). 10.1021/jp500350b [DOI] [PubMed] [Google Scholar]
  • 66.Plessix R.-E., Geophys. J. Int. 167, 495 (2006). 10.1111/j.1365-246x.2006.02978.x [DOI] [Google Scholar]
  • 67.Giles M. B. and Pierce N. A., Flow, Turbul. Combust. 65, 393 (2000). 10.1023/A:1011430410075 [DOI] [Google Scholar]
  • 68.Ascher U. M., Ruuth S. J., and Wetton B. T. R., SIAM J. Numer. Anal. 32, 797 (1995). 10.1137/0732037 [DOI] [Google Scholar]
  • 69.Wang Y., Wang Y., Zheng X., Ducrot É., Lee M.-G., Yi G.-R., Weck M., and Pine D. J., J. Am. Chem. Soc. 137, 10760 (2015). 10.1021/jacs.5b06607 [DOI] [PubMed] [Google Scholar]
  • 70.Khandelwal G. and Bhyravabhotla J., PLoS One 5, e12433 (2010). 10.1371/journal.pone.0012433 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.McMullen A., Holmes-Cerfon M., Sciortino F., Grosberg A. Y., and Brujic J., Phys. Rev. Lett. 121, 138002 (2018). 10.1103/physrevlett.121.138002 [DOI] [PubMed] [Google Scholar]
  • 72.McMullen A., Muñoz Basagoiti M., Zeravcic Z., and Brujic J., Nature 610, 502 (2022). 10.1038/s41586-022-05198-8 [DOI] [PubMed] [Google Scholar]
  • 73.Russo J., Romano F., Kroc L., Sciortino F., Rovigatti L., and Šulc P., J. Phys.: Condens. Matter 34, 354002 (2022). 10.1088/1361-648x/ac5479 [DOI] [PubMed] [Google Scholar]
  • 74.Jacobs W. M. and Frenkel D., J. Am. Chem. Soc. 138, 2457 (2016). 10.1021/jacs.5b11918 [DOI] [PubMed] [Google Scholar]
  • 75.Zeravcic Z., Manoharan V. N., and Brenner M. P., Proc. Natl. Acad. Sci. U. S. A. 111, 15918 (2014). 10.1073/pnas.1411765111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Perry R. W., Holmes-Cerfon M. C., Brenner M. P., and Manoharan V. N., Phys. Rev. Lett. 114, 228301 (2015). 10.1103/physrevlett.114.228301 [DOI] [PubMed] [Google Scholar]
  • 77.Lázaro G. R., Dragnea B., and Hagan M. F., Soft Matter 14, 5728 (2018). 10.1039/c8sm00129d [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Caspar D. L. D. and Klug A., in Cold Spring Harbor Symposia on Quantitative Biology (Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 1962), Vol. 27, p. 1. 10.1101/sqb.1962.027.001.005 [DOI] [Google Scholar]
  • 79.Anderson J. A., Glaser J., and Glotzer S. C., Comput. Mater. Sci. 173, 109363 (2020). 10.1016/j.commatsci.2019.109363 [DOI] [Google Scholar]
  • 80.Measuring the average energy of assembled capsids in simulations indicates that the D5 structures are slightly (<0.1%) lower total energy than T = 4 capsids.
  • 81.Fasshauer G. E. and Zhang J. G., Numer. Algorithms 45, 345 (2007). 10.1007/s11075-007-9072-8 [DOI] [Google Scholar]
  • 82.Hines T., “RBF,” https://github.com/treverhines/rbf, 2016.
  • 83.Rosta E. and Hummer G., J. Chem. Theory Comput. 11, 276 (2015). 10.1021/ct500719p [DOI] [PubMed] [Google Scholar]
  • 84.Mey A. S. J. S., Wu H., and Noé F., Phys. Rev. X 4, 041018 (2014). 10.1103/physrevx.4.041018 [DOI] [Google Scholar]
  • 85.Wu H., Mey A. S. J. S., Rosta E., and Noé F., J. Chem. Phys. 141, 214106 (2014). 10.1063/1.4902240 [DOI] [PubMed] [Google Scholar]
  • 86.Fullerton C. J. and Jack R. L., J. Chem. Phys. 145, 244505 (2016). 10.1063/1.4972861 [DOI] [PubMed] [Google Scholar]
  • 87.Prinz J.-H., Wu H., Sarich M., Keller B., Senne M., Held M., Chodera J. D., Schütte C., and Noé F., J. Chem. Phys. 134, 174105 (2011). 10.1063/1.3565032 [DOI] [PubMed] [Google Scholar]
  • 88.Whitelam S. and Tamblyn I., Phys. Rev. E 101, 052604 (2020). 10.1103/physreve.101.052604 [DOI] [PubMed] [Google Scholar]
  • 89.Demo N., Strazzullo M., and Rozza G., “An extended physics informed neural network for preliminary analysis of parametric optimal control problems” (unpublished) (2021).
  • 90.Lu L., Pestourie R., Yao W., Wang Z., Verdugo F., and Johnson S. G., SIAM J. Sci. Comput. 43, B1105 (2021). 10.1137/21m1397908 [DOI] [Google Scholar]
  • 91.Mowlavi S. and Nabi S., J. Comput. Phys. 473, 111731 (2023). 10.1016/j.jcp.2022.111731 [DOI] [Google Scholar]
  • 92.Yan J. and Rotskoff G. M., J. Chem. Phys. 157, 074101 (2022). 10.1063/5.0095593 [DOI] [PubMed] [Google Scholar]
  • 93.Mardt A., Pasquali L., Wu H., and Noé F., Nat. Commun. 9 (2018). 10.1038/s41467-017-02388-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Ghorbani M., Prasad S., Klauda J. B., and Brooks B. R., J. Chem. Phys. 156, 184103 (2022). 10.1063/5.0085607 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Crommelin D. and Vanden-Eijnden E., Multiscale Model. Simul. 7, 1751 (2009). 10.1137/080735977 [DOI] [Google Scholar]
  • 96.https://github.com/onehalfatsquared/CPfold, 2021.
  • 97.https://github.com/onehalfatsquared/protocolOptMSM, 2022.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

The supplementary material PDF contains details on the implementation of simulation of the cones model, derivations for the adjoint method gradient calculation, details on performing gradient descent, details on MSM construction, details on transition matrix interpolation and evaluation, and the full protocol optimization results for the colloidal polymer system.

Data Availability Statement

The code used to generate data, construct MSMs, perform optimization, and produce figures for the colloids model is publicly available at the Github repository: https://github.com/onehalfatsquared/CPfold, Ref. 96.

The code used to generate data, construct MSMs, perform optimization, and produce figures for the cones model is publicly available at the Github repository: https://github.com/onehalfatsquared/protocolOptMSM, Ref. 97.


Articles from The Journal of Chemical Physics are provided here courtesy of American Institute of Physics

RESOURCES