Discovering optimal kinetic pathways for self-assembly using automatic differentiation

Adip Jhaveri; Spencer Loggia; Yian Qian; Margaret E Johnson

doi:10.1073/pnas.2403384121

. 2024 May 1;121(19):e2403384121. doi: 10.1073/pnas.2403384121

Discovering optimal kinetic pathways for self-assembly using automatic differentiation

Adip Jhaveri ^a,¹, Spencer Loggia ^a,¹, Yian Qian ^a, Margaret E Johnson ^a,²

PMCID: PMC11087789 PMID: 38691585

Significance

Self-assembly of protein subunits into macromolecular complexes is ubiquitous and essential for living systems. A common obstacle during self-assembly is the formation of kinetically trapped intermediates that dramatically reduce functional yield. We show here that a large solution space for avoiding kinetic traps exists when rate-based protocols are employed. While evolution may favor symmetry within complexes, diversity in subunits expands this solution space, enhancing design for efficient and robust assembly. We used automatic differentiation algorithms used in deep learning as a powerful tool to search these large parameter spaces and “train” our kinetic models. Our results reveal how high-yield complexes that easily become kinetically trapped can instead be steered to efficiently assemble, exploiting nonequilibrium control of these ubiquitous dynamical systems.

Keywords: macromolecular assembly, kinetic trapping, optimization, dynamical systems

Abstract

Macromolecular complexes are often composed of diverse subunits. The self-assembly of these subunits is inherently nonequilibrium and must avoid kinetic traps to achieve high yield over feasible timescales. We show how the kinetics of self-assembly benefits from diversity in subunits because it generates an expansive parameter space that naturally improves the “expressivity” of self-assembly, much like a deeper neural network. By using automatic differentiation algorithms commonly used in deep learning, we searched the parameter spaces of mass-action kinetic models to identify classes of kinetic protocols that mimic biological solutions for productive self-assembly. Our results reveal how high-yield complexes that easily become kinetically trapped in incomplete intermediates can instead be steered by internal design of rate-constants or external and active control of subunits to efficiently assemble. Internal design of a hierarchy of subunit binding rates generates self-assembly that can robustly avoid kinetic traps for all concentrations and energetics, but it places strict constraints on selection of relative rates. External control via subunit titration is more versatile, avoiding kinetic traps for any system without requiring molecular engineering of binding rates, albeit less efficiently and robustly. We derive theoretical expressions for the timescales of kinetic traps, and we demonstrate our optimization method applies not just for design but inference, extracting intersubunit binding rates from observations of yield-vs.-time for a heterotetramer. Overall, we identify optimal kinetic protocols for self-assembly as a powerful mechanism to achieve efficient and high-yield assembly in synthetic systems whether robustness or ease of “designability” is preferred.

Self-assembled protein complexes evolved to function not only under selection for specific structures, but under selection for specific kinetics. Disrupted kinetics of virion assembly significantly reduces infectivity (1), and ribosome biogenesis fails without extensive kinetic control (2, 3). Living systems are therefore quite adept at assembling highly stable complexes even with many distinct subunits, taking advantage of variability in binding rates (4), time-dependent control via subunit “activation” [e.g. post-translational modification (5) or cofactor binding (6)] or enzymatically driven disassembly of intermediates (7, 8). With inspiration from these biologically evolved mechanisms of self-assembly, we ask what constraints on kinetic parameters are then required to achieve robust and efficient self-assembly? Improving the rational design space for assembly kinetics will significantly enhance efforts to optimize assembly function for synthetic biology and drug delivery (9).

However, a significant challenge for design of self-assembly is the size of the parameter space that controls assembly kinetics and the nonlinear dependence of assembly on diverse components. We model the kinetics of self-assembly for N = 3 to 7 subunit complexes using coupled sets of ordinary differential equations (ODEs) based on mass-action kinetics, and these dynamical systems cannot be solved analytically. Here, we therefore turn to gradient-based optimization using automatic differentiation (AD), which is similar to backpropagation used in machine learning and is a remarkably efficient and flexible approach for high-dimensional parameter optimization (10). Therefore, instead of using AD (or backpropagation) to train an artificial neural network (ANN), we are instead using it to directly “train” our kinetic models by identifying model parameters that improve yield and efficiency. Our approach is much faster than developing an ANN to perform optimization as it requires no training data. It also ensures a direct mapping of optimized parameters to the physicochemical properties of constituent subunits needed for inference, and by introducing constraints to our optimization function, we can steer the model toward distinct assembly mechanisms without any retraining required. AD does not require knowledge of analytical gradients and recent applications to design optimized structures (11) and kinetics of self-assembling crystals and small clusters (12) demonstrate the flexibility of applying AD with diverse simulation methods.

A fundamental physical barrier for macromolecular self-assembly is kinetic trapping, which dramatically delays productive assembly due to the formation of incompatible intermediates. We focus on kinetic traps that occur due to depletion of monomers before intermediates complete growth (13), thus assuming no mis-assembled intermediates as nonspecific interactions (14–16) are eliminated. For single-component assemblies like viruses, the onset of kinetic trapping can be theoretically estimated (17, 18), but the timescales of kinetically trapped systems have not been characterized to our knowledge. We derive timescales here that exhibit universal scaling with subunit free energies and concentrations, which is useful for extracting binding rates from experimental observations of kinetic trapping. Avoiding kinetic traps is a major aim of synthetic assembly and one we expect living systems to have resolved. One approach to avoid kinetic traps is to identify an optimal subunit–subunit interaction strength, ΔG (19, 20). However, for viruses (6, 20) and reversible heteromeric assemblies (21), this optimal energy occurs over a narrow range that shrinks with increasing assembly size (19, 22), and identifying this optimum cannot be done a priori (17, 21). Trapping can be alternatively avoided with diverse subunits by either introducing variations in subunit energetics (22, 23), or by variation in stoichiometries of monomer subunits (14). Subunit diversity also allows for improved control over equilibrium assembly yield by selecting against mis-assembled (14) or morphologically variant complexes (24). Here, we consider variations in rate-constants as a distinct strategy for avoiding traps via both internal (optimize subunit binding rates) or external and active control of subunits (fixed subunit binding rates). While recent work quantified how slowing dimerization rates relative to higher-order reactions could produce efficient assembly (21), the parameter space was limited to a single rate-constant. With our results, we systematically explore large parameter spaces of kinetic control with AD, showing how subunit diversity can improve designability of self-assembly without compromising efficiency.

Models and Methods

In our approach, we 1) set up a differential equation-based model of a self-assembly topology with initial concentrations $C_{i n i t}$ and rate-constants $k_{j}$ , with each subunit concentration distributed across all equilibrium species as $C_{mono}$ . 2) Numerically integrate these ODEs to a predefined time t_stop. To use AD routines in pyTorch, we implemented a differentiable numerical integration algorithm. During integration we store variables and partial derivatives at each step for AD to construct the gradient of our objective function L following the chain rule under any assembly protocol. The objective function is the yield of completed complexes at t_stop plus regularization terms to constrain our rates to physical regimes (SI Appendix, Table S1). We modify the objective function in the last section of Results to instead evaluate the sum of the absolute error between a model-predicted yield-vs.-time and observed yield-vs.-time. 3) Modify our rate parameters $k_{j}$ by following the gradients $\frac{\partial L}{\partial k_{j}}$ that we automatically constructed, keeping all subunit–subunit binding free energies $Δ G$ unchanged. This means that as an association rate is modified, the corresponding dissociation rate must match that modification to keep $Δ G$ fixed. 4) Return to step (2) and iterate until we reach the optimal yield by t_stop. We verified our results are not sensitive to our selection of t_stop (SI Appendix, Fig. S1) or the learning rate that controls convergence (SI Appendix, Fig. S2). A detailed description of the algorithm, validation, and the application to inferring rate parameters from observed data is provided in SI Appendix. Our code is open-source via github.com/mjohn218/KineticAssembly_AD.

Our three classes of biologically motivated protocols have multiple rate-constants free for optimization, but vary in their ease of molecular designability or external control (Fig. 1). The internal protocols (labeled A) optimize pairwise binding rates, thus reflecting optimal internal design of the molecular interactions that drive assembly. In our “rate growth” model (A1), we allow association rates to accelerate as assemblies grow, which biophysically must reflect cooperativity (25), conformational changes (26), or active modification of interfaces (27). Dimerization reactions occur with the same rate, k₂, trimerization with the same rate k_3, etc, so this model can apply to both homo- or heterosubunit assembly. With N subunits there are N-1 free rates to optimize. In our “diversification” model (A2), we allow independent binding rates between distinct dimers (e.g. $k_{12} \neq k_{23}$ ), reflecting heterogeneity in binding interfaces. We constrain all higher-order steps by the dimer rates of participating interfaces, implying that no molecular modifications are necessary as intermediates grow (Fig. 1), giving N*(N −1)/2 dimerization rates to optimize. The external protocols (labeled B and C) keep pairwise binding rates fixed, and instead introduce external factors to control assembly kinetics. In protocol B, we introduce external control via titration of individual subunits with distinct titration rates for each type of subunit, modifying the initial conditions, $C_{i n i t} = 0$ . Titration can be interpreted as controlling the physical appearance of subunits (e.g. via in vivo translation or in vitro titration), or as a rate of activating subunits into assembly-competent conformations. We contrast a multi-rate model with N distinct rates for all subunits α₁, α_{2, …} α_N, vs. a single-rate model with α for all subunits, as that model applies to homo-subunit assemblies. Subunit titration is stopped once it reaches the target concentration $C_{m o n o}$ .

Fig. 1. — Description of kinetic protocols optimized to avoid trapping and produce efficient and robust assembly. In the *Left* column, we illustrate the three kinetic protocols that we contrast to prevent (A1, A2, and B) or correct (C) kinetic trapping. In A1 and A2 we optimized pairwise binding rates between subunits (e.g. $k_{d i m}$ or $k_{12}$ ). In, B and C, pairwise rates are fixed and equal, but in B, we titrate in subunits at distinct rates ( $α_{1}, α_{2,} \dots$ ), and in C, we actively disassemble intermediates with an enzyme to recycle monomer subunits. On the *Right*, we show a graphical representation of trimer assembly and the time-dependent yield with two different sets of rates. For the overstabilized system simulated here (ΔG < ΔG_opt), equal binding rates (k_dim = k_tri) result in kinetic trapping (as shown by a long plateau) as monomers are completely used up in both dimers and trimers. The derivative of assembly yield with respect to ln(t) identifies two maxima that separate entry τ₁ and exit τ₂ from the plateau region, which we use to define the trapping factor TF = τ₂/τ₁. In the optimized rate-growth model (A1), we slowed down the dimer rate relative to the trimer rate (k_dim < k_tri) to efficiently achieve high yield in a short time. The optimized model eliminates trapping which is denoted by TF = 1 (τ₁ = τ₂), and shortens the time taken to reach 95% yield, τ₉₅ (black dashed vertical line).

In protocol C, we recycle trapped intermediates via their irreversible dissociation by an enzyme, returning monomers into the pool ( $S_{1} S_{2} + E ⇌ E S_{1} S_{2} \to E + S_{1} + S_{2}$ ). We evaluate enzyme intervention via a) Single substrate, where the enzyme reacts with only one intermediate and b) Multisubstrate, where the enzyme reacts with multiple intermediates but only one of each size (2 ≤ intermediate < N). We optimize the enzyme concentration [E]₀ and association rate of substrate binding (k₁) and catalysis rate (k_cat) for all substrates while keeping the ΔG fixed, giving 2 parameters. Protocols B and C both provide time-dependent quality control rather than preoptimized design. These solutions are critical in cell biology, as in vitro reconstitution has shown that simply mixing subunits at time zero often fails for multicomponent self-assembly (28–30).

We contrast two limiting assembly topologies, the fully connected graph where all subunitfs contact one another, and the ring topology where a single cycle incorporates all subunits. We exclude linear topologies as they have zero cycles and do not become trapped with equal stoichiometries, and we stop at size N = 7 to focus on more compact structures where the fully connected topology offers a reasonable limit on subunit interactions. All intermediates can combine if sterically possible (i.e., dimer + dimer ⇌ tetramer), except for rate-growth we prohibit these nonmonomer growth pathways. To focus on rate variation, we use equal concentrations of subunits $C_{m o n o}$ and equal free energies $Δ G$ for all pairwise interactions. We assume no cooperativity, and thus the free energies sum across bonds with a stability for m bonds given by

K_{D, m} = c_{0} exp (\frac{m Δ G}{k_{B} T}) = K_{D} exp (\frac{(m - 1) Δ G}{k_{B} T}),

[1]

where $K_{D} = c_{0} e x p (\frac{Δ G}{k_{B} T})$ , is the binding affinity between two subunits, $k_{B}$ is the Boltzmann constant, $T$ is the temperature, $c_{0}$ is the standard state concentration 1M and $Δ G = G_{b o u n d} - G_{u n b}$ is the free energy of binding. We use $K_{D}$ and $Δ G$ interchangeably given they are a log transform apart. We bound $k_{f}$ due to the limits of diffusion and the measured kinetics of protein–protein association at 10 or 1 μM⁻¹s⁻¹ (31). The formation of additional bonds during association must therefore slow the off-rates, which become

k_{-, m} = k_{f} K_{D, m} = k_{-} exp (\frac{(m - 1) Δ G}{k_{B} T}),

[2]

where $k_{-}$ is the off-rate for a single bond and $K_{D} = \frac{k_{-}}{k_{f}}$ . We reproduce the assembly kinetics using structure-resolved reaction–diffusion simulations (32) that can readily scale up to large N assemblies (33), and where association involving two bonds (i.e., dimer+monomer) proceeds in two steps unlike the 1-step model used for the deterministic model (SI Appendix, Table S2). By dividing those association rates by two, we recover essentially identical kinetics (SI Appendix, Fig. S3).

Results

With Equal Association Rates, Kinetic Trapping Emerges for All N ≥ 3 Systems with High Yield.

Because trapping is a kinetic phenomenon, it depends not only on the parameters that control equilibrium yield ( $N$ subunits, topology, $C_{m o n o}$ , and $Δ G$ ), but on initial conditions and rates. In this section, we make all association rates equal, and all subunits initialized as monomers in the bulk, $C_{i n i t} = C_{m o n o}$ . Kinetic trapping will occur for all $N \geq 3$ , as dimers lack intermediates and thus cannot become trapped in our model. A hallmark of kinetic trapping is a plateau in the yield-vs.-time (e.g., trimers(t)/trimers_MAX) that is eventually escaped to reach equilibrium yield (Fig. 1). We quantify trapping in two ways: 1) by the time needed to reach 95% yield, $τ_{95}$ , and 2) by a trapping factor that we have defined, $T F = τ_{2} / τ_{1}$ . The two timescales $τ_{1}$ and $τ_{2}$ delineate the entry into and exit from the trapped or plateau regime (Fig. 1). When there is no trapping, $τ_{1}$ = $τ_{2}$ and TF = 1.

From simulations, we see that trapping sets in once the equilibrium yield of the system exceeds ~99%, as measured by an increase in $τ_{95}$ or when the $T F > 1$ (Fig. 2A and SI Appendix, Fig. S4). With such high yield, the system becomes starved of monomers before intermediates can complete assembly (Fig. 1), whereas with lower yield a pool of monomers is available and trapping disappears (SI Appendix, Fig. S5). Without available monomers, the target complex grows only as existing intermediates dissociate, producing the plateau and corresponding delay between $τ_{1}$ and $τ_{2}$ . With increasing stabilization of pairwise contacts (lower ΔG) the efficiency of self-assembly is dramatically impaired (Fig. 2A).

Kinetic Trapping Is Increasingly Problematic as N Increases.

As more subunits are added to an assembly, trapping still sets in at ~99% yield, which corresponds to weaker pairwise free-energy ΔG (Fig. 2B), or lower concentration (SI Appendix, Fig. S4). However, with larger N it is increasingly challenging to both avoid traps and achieve high yield, as the regime of ΔG values where both occur shrinks significantly (Fig. 2B). Reduced efficiency is not the only problem with trapping; the yield in the trapped state also drops significantly as N increases, from 67.39% for the trimer (see SI Appendix for derivation) to ~55% for the tetramer and only ~20% for the 7-mer (SI Appendix, Fig. S6). We note that while accelerating all rates uniformly will shorten $τ_{95}$ , trapping persists and the TF is unchanged; faster rates speed-up the timescales to both enter $τ_{1}$ and exit $τ_{2}$ the trapped regime, rather than eliminating it (SI Appendix, Fig. S7), which we prove in SI Appendix.

Timescales of Trapped Systems Predict a Nearly Universal Dependence on N and ΔG.

For an assembly system with equal rates and initialized monomers, our results show it is experimentally easy to drive the system into kinetic traps by increasing the concentration (SI Appendix, Fig. S4) or stabilizing subunit interactions [e.g., via salt or pH (34)] (Fig. 2). Timescales of kinetically trapped systems have not been previously quantified to our knowledge, and to make use of experimental observations of trapping, we here derive approximate expressions for the TF and $τ_{95}$ . We find that the TF has a universal power-law dependence on N, ΔG, and $C_{m o n o}$ (Fig. 2C)

T F = τ_{2} / τ_{1} = a_{1} {(\frac{K_{D} exp (\frac{(m - 1) Δ G}{k_{B} T})}{C_{mono}})}^{- 1} .

[3a]

The entry time τ₁ is driven by the speed of association ${k_{f} C}_{m o n o}$ , causing an increasing TF with concentration as the exit time τ₂ is unaffected (SI Appendix, Fig. S7). The exit time τ₂ is dominated by the lifetime of the most stable intermediate, which has a dissociation rate $k_{-} \exp (\frac{(m - 1) Δ G}{k_{B} T})$ (Eq. 2) with m the number of bonds broken for a monomer to dissociate from this intermediate (e.g., m = N − 2). The positive dimensionless constant $a_{1}$ is relatively well-predicted by the free energy when the efficiency is fastest, which we define as $Δ G_{o p t}$ (SI Appendix, Table S3 and Fig. 2A), and which varies with number of subunits N and $C_{m o n o}$ , giving

a_{1} = \frac{c_{0}}{C_{mono}} exp (\frac{m Δ G_{opt} (N, C_{mono})}{k_{B} T}),

[3b]

or together, $T F = exp (\frac{m (Δ G_{opt} (N, C_{mono}) - Δ G)}{k_{B} T})$ . From this equation, we see the transition to $T F > 1$ when ΔG < ΔG_opt. To define an expression for the efficiency τ₉₅, we find that it largely follows the exit time τ₂ once the system becomes strongly trapped, resulting in the asymptotic scaling relationship (SI Appendix, Fig. S5)

τ_{95} = a_{2} {(k_{-} \exp (\frac{(m - 1) Δ G}{k_{B} T}))}^{- 1},

[4]

where $a_{2}$ is a dimensionless constant that depends on N subunits and is independent of $C_{m o n o}$ . Surprisingly, in this strongly trapped regime, the efficiency of assembly cannot be improved by altering initial subunit concentrations, unlike for the weakly or nontrapped systems.

These equations are useful for several reasons. They quantify how larger assemblies with more bonds per subunit will have exponentially worse trapping at identical ΔG values (Fig. 2B), due to slower dissociation times of higher intermediates. This also means that topology of the assembly matters: For ring topologies, all subunits have only two binding partners and therefore all intermediates have the identical lifetimes, with m = 1. All rings therefore have a TF scaling with $\frac{K_{D}}{C_{m o n o}}$ that is now independent of N (SI Appendix, Fig. S8). Rings are therefore much better relative to fully connected topologies when it comes to kinetic trapping. The trade-off, however, is that larger ring structures need increasingly more stable pairwise $Δ G$ values to reach high yield, and their efficiency at $Δ G_{o p t}$ is significantly worse due to the ease of dissociation for all intermediates (SI Appendix, Fig. S9). Last, Eq. 3 provides a relatively simple formula to infer pairwise free energies if experiments can observe timescales of kinetic trapping at multiple $C_{m o n o}$ .

Kinetic Traps Can Be Robustly Avoided by Either Varying Internal Binding Rates, or by External Control of Subunits.

Our above results showing kinetic trapping for highly stable systems depends on our model assertions of equal association rates for all binding, initial bulk concentrations of monomers, and no external activity on intermediates. Here we show that all of these kinetically trapped outcomes can be avoided or prevented without modifying pairwise $Δ G$ , $C_{m o n o}$ , or yield (Fig. 3). Using AD (Models and Methods), we find optimal parameter sets that efficiently achieve high yield for both internal and external (Fig. 1) protocols. To globally compare all protocols, we measure normalized efficiency via $τ_{95}^{*} = τ_{95} k_{f}^{MAX} C_{mono}$ , where $k_{f}^{M A X}$ is the fastest of all association rates. The orders-of-magnitude improvement in efficiency we find for these protocols (Fig. 3A) dramatically expands regimes where high-yield and efficient assembly are met for all N.

Fig. 3. — Optimal kinetic protocols found through AD are not only efficient but also robust to perturbations in ΔG and C_mono. (A) Optimal (normalized) timescales compared for all protocols show orders-of-magnitude speed-up with respect to assemblies with the unoptimized equal rate scenario (*Inset*) with k_f = 1 µM⁻¹ s⁻¹, ΔG = −20 k_BT, C_mono = 100 µM. With equal rates prior to optimization (*Inset*), the timescale $τ_{95}^{*}$ grows exponentially with N as predicted by Eq. 4, $τ_{95}^{*} = a_{2} \frac{C_{m o n o}}{K_{D}} e x p (\frac{- (N - 3) Δ G}{k_{B} T})$ . Although $a_{2}$ decreases by ~10⁴ from N = 3 to 7, this is small compared to the exponential factor which increases by 10³⁴. For ease of comparison, an approximate fit of $τ_{95}^{*}$ to a power law $N^{γ}$ gives $γ \approx 90$ . Internal kinetic protocols (blues-Rate growth and Diversification) are most efficient. They are comparable to the efficiency at ΔG_opt (brown) where trapping has not set in and rates are equal; the value of ΔG_opt and corresponding $τ_{95}^{*}$ for this model are identified for each N numerically (*SI Appendix*, Table S3). External protocols (greens-titration and enzyme recycling) are shown with their most efficient (multi-rate) solutions. All results for fully connected topology. (B) Robustness of kinetic protocols was evaluated by efficiency of assembly under perturbations to ΔG and C_mono from optimized conditions. For ΔG, off-rates are slowed and on-rates kept at the optimal values for each protocol. N = 5 assembly. For all kinetic protocols, ΔG = −20 k_BT at the “optimum” point. For the ΔG_opt condition, its optimum point is where ΔG = –7 k_BT and efficiency is best. C_mono for all the protocols at the optimum point is 100 µM.

Internal Design of Binding Rates Is Most Efficient and Robust.

The most efficient solutions we find are the internal protocols that only modify binding rates between subunits and/or intermediates, keeping $Δ G$ fixed. Their efficiency is comparable to the $τ_{95}^{*}$ found at the model optimized to the exact value of $Δ G_{o p t}$ , where trapping does not occur even for equal rates. The great advantage of our kinetically designed protocols compared to this equal-rate $Δ G_{o p t}$ model, however, is that kinetic protocols do not rely on reversibility and unbinding to prevent trapping (13, 35), but work even in the limit of irreversible binding (SI Appendix, Fig. S10). This imparts robustness to our optimal models such that they maintain equally fast $τ_{95}^{*}$ under perturbations to both $Δ G$ (via $k_{-}$ ) and $C_{m o n o}$ that might occur due to mutations in interfaces or altered gene expression, respectively. The same perturbations to the equal-rates $Δ G_{o p t}$ model immediately result in kinetic trapping (Fig. 3B).

Although we find the external protocols of subunit titration (B) and enzymatic recycling (C) are both less efficient and less robust to perturbations in concentration (Fig. 3 A and B), they have the advantage that they do not require the evolution or molecular engineering of association rate-constants. In external protocols, we kept association rates fixed at equal values, where $Δ G$ was stable and led to trapping under bulk monomer initial conditions. The success of these external protocols implies that any biologically evolved system can be steered to assemble without kinetic trapping by these time-dependent (B) and active measures (C), and the efficiency will be robust to increased stabilization of $Δ G$ (Fig. 3B).

Internal Protocols to Avoid Traps Place Strict Design Constraints on Hierarchy of Association Rates.

Our protocols to design association rates are maximally robust and efficient but do require interface binding selection/design to create a hierarchy of rates. In our rate-growth pathway (Fig. 1A), optimal efficiency is achieved by slowing dimerization compared to higher-order growth, e.g., $\frac{k_{t r i}}{k_{d i m}} > 1$ (Fig. 4A). Slower dimerization ensures that monomers are conserved long enough to be incorporated into a fully formed complex. For larger N, fully formed assemblies take longer to form, requiring more extreme slow-downs in dimerization, or larger ratios of $\frac{k_{t r i}}{k_{d i m}}$ (Fig. 4B). Somewhat surprisingly, a hierarchy of rates for further growth steps does not improve $τ_{95}^{*}$ : The highest efficiency is when $k_{t r i} = k_{t e t r} = k_{p e n t} ...$ . This result is thus comparable to recent work which only allowed a single rate to change (21), explicitly preventing any additional hierarchy. However, this solution requires the largest value for $\frac{k_{t r i}}{k_{d i m}}$ , and significantly accelerating trimerization relative to dimerization would require, at a minimum, conformational changes to existing interfaces in monomer vs. dimer form (25, 26). The ratio $\frac{k_{t r i}}{k_{d i m}}$ can be significantly reduced if we introduce an additional hierarchy where $\frac{k_{t e t r a}}{k_{t r i}} = 2$ , for example, providing more design flexibility for growth rates with a <twofold loss in efficiency (SI Appendix, Fig. S11). Topology also impacts the design constraints on $\frac{k_{t r i}}{k_{d i m}}$ , with a lower separation required for the ring topology, albeit with minimal change in efficiency (Fig. 4C).

Fig. 4. — Kinetic protocols that exploit internal design of subunit binding rates require a hierarchy of rates that stratifies as N subunits increase. (A) Illustration of the optimized rates for the rate-growth model. Dimerization is slowed relative to the rate of trimerization and subsequent steps. (B) The optimal ratios of $\frac{k_{t r i}}{k_{d i m}}$ >1 increase with N and are higher for the fully connected topology. (C) In the rate-growth model, efficiency is similar for both the fully connected (green) and ring (blue) topology. A power-law fit of $τ_{95}^{*} {\propto N}^{γ}$ gives $γ \approx 0.7$ . (D) The optimal $\frac{k_{t r i}}{k_{d i m}}$ ratio initially increases with more stable (*Lower*) K_D, but reaches a plateau as the system approaches irreversibility ( $K_{D} \to 0$ ) for all N = 3 to 7. (E) Illustration of the optimal rates in a diversification model. If all interfaces on one subunit (red subunit) dimerize faster (k_fast: black dashed) than all other interfaces at k_slow (gold dashed), traps are avoided. The number of interfaces that can be fast (#k_fast) can be chosen to ease designability. (F) Optimal ratios of k_fast /k_slow increase with N. They also must increase if fewer interfaces can be accelerated as can be seen if #k_fast = N − 2 (green) or #k_fast = 1 (light green) as compared to when all interfaces on one subunit are accelerated (gold). (G) When N − 1 interfaces are accelerated (gold), we can achieve similar efficiency to the rate-growth, with $γ \approx 1$ . With N − 2 (green) or 1 interface (light green) optimized, the efficiency is worse but still a significant improvement over unoptimized models.

Diverse Subunits Expand and Simplify the Design Space of Internal Protocols.

A limitation of this rate-growth model is that it requires conformational or post-translational modifications to achieve rate hierarchy during elongation. An alternative rate hierarchy can be achieved by diversification between dimerization reactions. This model thus requires diversity in subunits, as homomers are restricted to a single dimerization rate and thus can only take advantage of the rate-growth model. When some dimers form slowly relative to others, this ensures that a subset of monomers remain in the system long enough to complete assembly of fast-forming intermediates. Just as in the rate-growth, the separation between fast and slow dimerization rates must increase with N to avoid traps (Fig. 4F). Importantly, all higher-order assembly steps are constrained by the constituent dimerization rates, meaning that no conformational changes or cooperativity are required as assemblies grow. This provides a simpler design goal of selecting for dimerization rates between distinct subunits which can be assumed relatively rigid throughout the assembly process (Fig. 4E).

Because our optimal solution requires that all the interfaces on one of the subunits must have fast binding (#k_fast = N − 1), we considered more “designable” solutions where a smaller subset of interfaces (#k_fast < N − 1), even #k_fast = 1, must be maximally accelerated (SI Appendix, Fig. S12). These solutions still provide a vast improvement over uniform rate efficiency (Fig. 4G). The trade-off is that as fewer interfaces are accelerated, the rate separation must be more extreme to avoid trapping (Fig. 4F). Diverse subunits again help, with a hierarchy of rates (not just k_fast and k_slow) improving efficiency (SI Appendix, Fig. S12).

Robustness of Internal Protocols Stems from Their Ultimate Insensitivity to Dissociation Times and Concentration.

The optimal rate ratios shown for both internal protocols $(\frac{k_{t r i}}{k_{d i m}}$ and $\frac{k_{fast}}{k_{slow}})$ (Fig. 4) were all evaluated at ΔG = −20 k_BT, which is strongly trapped under equal rates. In the weakly trapped regime, as $Δ G \to {Δ G}_{o p t}$ , we note both ratios can shrink to improve efficiency, eventually reaching $\frac{k_{t r i}}{k_{d i m}} = 1$ when $Δ G = {Δ G}_{o p t}$ , for all N. A key finding is that in the opposite direction, however, as $Δ G \to - \infty$ , the optimal rate ratio does not keep rising but reaches a plateau [which we derive for the trimer system (SI Appendix)], and this is what provides the robustness (Fig. 4D as shown for rate growth protocol). When using the ratio at the plateau, the assembly kinetics are completely independent of dissociation times, $k_{-}$ , and therefore any perturbations or mutations that slow dissociation and stabilize ΔG will not impact $τ_{95}^{*}$ . This optimal ratio is also independent of concentration (SI Appendix), so while higher concentrations assemble faster, the normalized $τ_{95}^{*}$ remains unchanged under perturbations (Fig. 3B).

External Protocols Can Exploit Diversity of Subunits to Hierarchically Accelerate Assembly.

For our titration protocol (B), we only optimized titration rates (α₁, α₂, α₃, …) to control subunit concentration in time, keeping all other parameters fixed (protocol B in Fig. 1). So here $C_{i n i t} = 0$ for all subunits, but the total concentration reached $C_{m o n o}$ is the same as before, and therefore so is the equilibrium. We first asserted equal titration rates for all subunits, α. We find an optimal titration rate α* exists for all systems; it cannot be too fast or the system will still get trapped (SI Appendix, Fig. S13), and if it is too slow it unnecessarily delays assembly. We then ask, can we improve assembly efficiency with a hierarchy of rates? In this multi-rate scheme, two subunits can be present in the bulk (α₁* = ∞, α₂* = ∞), with the remaining components titrated over a hierarchy of rates that are all faster than $α^{*}$ from the single-rate scheme (Fig. 5A), thus improving efficiency (Fig. 5B). This efficiency gain increases with N, underlining how additional parameters improves control over the assembly process. We note that while complexes with diverse subunits have access to more assembly pathways and are thus more designable, they are not necessarily more efficient than an equivalent homomer. Homomers have intrinsically faster assembly kinetics because every subunit copy interacts with every other copy present.

Fig. 5. — External protocols achieve higher efficiency when generating rate hierarchies, with titration benefiting from diversity of subunits. (A) The optimal titration rates $α_{i}^{*}$ for all schemes must slow as N increases. If only a single rate is used, $α^{*}$ (blue curve) titration is slower. When each subunit is titrated at a distinct rate, the optimal titration rates follow a hierarchy [from α_fast (black) to α_slow(gray)] which allows for faster titration. (*Inset*) When only a single rate is optimized, the optimal titration rate ( $α^{*}$ ) can speed-up with increased C_mono, as C_mono² (Eq. 5, dashed black line). (B) With faster titration possible for the multi-rate model (black line), efficiency is improved over single-rate (blue); the multi-rate can only be implemented if subunits are distinct. Fully connected topologies. (C) With enzymatic recycling, for efficient assembly, there is an optimal rate to dissociate the encounter complex (k_cat) that decreases with assembly size for both modes of recycling, similar to $α^{*}$ . The trimer is an exception, with efficiency increasing monotonically with k_cat. We show the value that first reaches 95% yield. (D) Efficiency is improved with multiple substrates, as intermediates can be disassembled faster leading to shorter assembly times. For all systems, ΔG = −20 k_BT, C_mono = 100 µM, with equal rates k_f = 1 µM⁻¹s⁻¹.

Our other external protocol C of enzymatic recycling shows a similar benefit in efficiency when we expand from one substrate to a family or substrates. Here we do not try to prevent trapping, but instead perform quality control by an energy-consuming enzyme that recycles subunits from incomplete intermediates (the substrate) back into monomer form. If the enzyme can only dissociate a single substrate, we find it must be an N − 1 sized intermediate to eliminate traps, but even this approach fails for $N \geq 7$ due to the combinatorial growth in alternative intermediates. This recycling method is then rescued by allowing the enzyme to react with multiple substrates, driving overall more efficient assembly. Like the titration protocol, where α* cannot be too fast, assembly efficiency here is limited by k_cat, which also cannot be too fast or recycled monomers will immediately reassemble. With multiple substrates, the optimal k_cat values can be faster than the single-substrate value while avoiding trapping (Fig. 5C), meaning faster release of monomers and more efficient assembly (Fig. 5D). The rate of enzyme binding to the substrate, k₁[E₀], does not directly predict efficiency like k_cat, but it is constrained to react faster than the competing subunit (SI Appendix, Fig. S14).

External Protocols Are Sensitive to Monomer Concentrations, But Similarly Robust to Dissociation Times.

To provide better mechanistic insight into the optimal titration protocol and explain why it is less robust to changes in concentration, we interrogate the dependence of α^∗ on $C_{m o n o}$ , $k_{f}$ , and ΔG. Conceptually, α^∗ must be slow enough that each titrated monomer has sufficient delay time to add to an existing intermediate before new monomers enter, and α^∗ is thus constrained by the speed of assembly. Faster assembly therefore supports faster titration, and we can show this means that for a single titration rate (SI Appendix and ref. 6), the optimal value should scale as

α^{*} \propto k_{f} C_{mono}^{2},

[5]

which we recapitulated independently through our optimization approach (Fig. 5A). In our robustness evaluation (Fig. 3B), we found that keeping $α^{*}$ fixed and increasing $C_{m o n o}$ slows $τ_{95}^{*}$ . Assembly in this case is now being delayed by titration, as a faster route with larger $α^{*}$ is available. In contrast, the efficiency is generally robust to perturbations of ΔG, when keeping $k_{f}$ fixed and slowing the off-rate $k_{-}$ . The origin is the same as in our internal protocols; $α^{*}$ can be faster in weakly trapped systems to improve efficiency, but it also reaches a plateau value as $Δ G \to - \infty$ (SI Appendix, Fig. S15). Using this plateau rate ensures that assembly efficiency will be independent of dissociation times $k_{-}$ and thus robust to perturbations. The enzyme recycling protocol displays this same behavior, where now the optimal $k_{c a t}$ values are independent of dissociation times.

A unique feature of the titration protocol is that the rate $α^{*}$ is directly coupled to the desired yield. We asked AD to optimize $α$ or the hierarchy of α₁, α₂, α₃, … for 95% yield, but if we ask it for 85% yield, it will select for even faster titration, ensuring that exact yield is reached before trapping resumes (SI Appendix, Fig. S16). A benefit of the coupling between titration rates and yield, $C_{m o n o},$ and $k_{f}$ (Eq. 5) is in inferring rates from observations of experimental assembly kinetics. The yield-vs.-time as it changes with variations to the titration rate provides quantitative constraints on subunit association kinetics, as exploited in recent work on virus assembly (6).

AD-Based Optimization Can Be Used to Infer Rates and Assembly Mechanism from Observed Yield-vs.-Time.

Thus far we have used our optimization framework for the forward design of rates that maximize yield at a particular time. AD has the flexibility to deal with arbitrary objective functions, and here we apply our same framework to infer the subunit binding rates (and subunit ΔG) for three distinct heterotetramers given observations of yield-vs.-time at five concentrations. As a proof of principle, we generate the yield-vs.-time data ourselves using randomized binding and dissociation rates for distinct subunits following either a rate-growth or diversification pathway, but these data can be collected experimentally from approaches like dynamic light scattering. For the first two heterotetramers, we applied AD to predict the binding rates while assuming the ΔG are known, and in the third, we predicted both on-rates and ΔG, finding excellent agreement between inferred and actual rates for all (Fig. 6A). Importantly, we tested both a rate-growth and a diversification mechanism for each heterotetramer. Our method discriminated the correct model as producing the best fit to the data, systematically finding a lower error fit across a range of initial guesses for the rates (SI Appendix, Fig. S17).

Fig. 6. — AD applied to kinetic models predicts binding rates and ΔG from observed “experimental” yield-vs.-time data. (A) We accurately predicted kinetic rate-constants for 3 distinct heterotetramers using AD given observations of their yield-vs.-time kinetics. All tetramers are fully connected. AD predicted the correct rates from multiple sets of randomized initial values. By trying to optimize both the rate-growth and diversification kinetic models to the data, we correctly identified that the best fit for tetramer 1 is rate-growth (3 parameters), for tetramer 2 is diversification (6 parameters), and for tetramer 3 is rate-growth (*SI Appendix*, Fig. S17). For tetramer 3, we do not know ΔG, and thus predict 4 parameters including the dimer dissociation rate; the predicted ΔG is in good agreement with the true value (*Inset*). (B) For tetramer 3, we used observed yield-vs.-time data (Exp data—circles) that showed clear kinetic trapping, as the timescales for entry and exit into the trapped region provide complementary information on association and dissociation rates, respectively. Predicted model results in solid lines. Results from C_mono = 100 µM (pink), 1,000 µM (purple) 10⁴ µM (blue). Two additional curves at 500 and 5,000 µM were used in the optimization with similarly excellent agreement.

Because predicting both on and off-rates presented a stiffer challenge, we here took advantage of the model to design the most information-rich data to use for inference. Kinetics with trapping provides direct information of both on-rates, via the short-time kinetics, and off-rates or ΔG via the exit times from the trapped regions (Eq. 3). By selecting high values for initial concentrations C_mono, we observe trapping (Fig. 6B). By using distinct time-windows of the yield-vs.-time, we efficiently learn first on and then off-rates from only 5 distinct concentrations (Fig. 6B). AD coupled directly to kinetic models thus offers a powerful approach for learning underlying kinetic parameters and mechanisms from the emergent yield of self-assembly.

Discussion

An underlying theme in our optimized kinetic solutions is that a hierarchy of timescales between competing pathways allows for efficient assembly and evading kinetic traps that are robust to perturbations in stability $Δ G$ . Counterintuitively, slowing down some association rate-constants is therefore essential for efficiency. Biologically, kinetic control of assembly pathways often exploits subunit diversity to drive specificity in highly ordered assembly via post-translational modification (36), or chaperone-guided control of subunit binding (37) or disassembly (38). Sequential translation of distinct assembly components (39) or spatial localization further regulate the assembly hierarchy (40). Because in vitro reconstitution of macromolecular complexes lacks the support biological machinery of living systems, assembly yield often suffers dramatically (41). Our computational toolbox here provides a user-friendly optimization approach to design titration protocols that will dramatically improve yield and efficiency, potentially improving stepwise mixing strategies that are laboriously found experimentally (42). Titration need not rely on physical appearance but instead on cofactor activation, as was shown to significantly alleviate kinetic trapping in viral assembly (6) and 30S ribosome assembly (43). Our same toolbox can also be usefully applied to infer biochemical parameters from experimental datasets of emergent properties, bypassing the need for individual pairwise measurements as we showed above. Our AD-driven optimization approach enables sampling in large parameter spaces regardless of internal or external control elements, and thus immediately expands to more complex nonequilibrium protocols.

Far-from-equilibrium protocols are particularly important for evolved systems, as although internal design of binding kinetics provides optimal efficiency “for free” (Fig. 3), evolution via gene fusion, for example (44), often produces complexes with repeated subunits and symmetry (45, 46), limiting inherent diversification of binding kinetics. Repeated subunit building blocks also more naturally lends itself to nonmonomer growth (23), which we found is not as efficient as monomer growth, but can still effectively avoid kinetic traps. Such modular growth of stable subcomplexes effectively reduces N, and is used by ATP synthetase (47) and the transcription preinitiation complex, where N reduces from 46 to 10 (48).

Our optimization approach also provides an orthogonal design strategy as compared to fine-tuning of binding free energies for artificial self-assembly systems, as those solutions operate efficiently only over narrow concentration regimes. For designed systems, achieving the bimolecular rate-constants of our internal protocols does present a challenge, however; predicting rates from molecular interfaces requires theoretical approximations about the free energy barrier between unbound and transition states (49), or the use of computer simulations that can directly quantify binding rates of atomistic systems (50), but are limited by the accuracy of molecular dynamics force-fields. Nonetheless, molecular contacts have been successfully designed for binding kinetics using AD (12) and rational design of electrostatic complementarity (51), with 5,000-fold differences in rates possible (52). Variation of interface sizes can produce diversification of rates (31), making the diversification pathway particularly straightforward. Although rate-growth pathways require higher-order design, phosphorylation can be designed to trigger structural changes that promote oligomerization via rate-growth (53). Accessing protein design for diverse asymmetric structures (54) requires understanding the multitude of ways that kinetics can ensure high-yield assemblies, and our study emphasizes the benefits of incorporating kinetics into rational design of protein binding interfaces.

Supplementary Material

Appendix 01 (PDF)

pnas.2403384121.sapp.pdf^{(3MB, pdf)}

Acknowledgments

M.E.J. gratefully acknowledges funding from a NIH MIRA Award R35GM133644. We acknowledge use of the Advanced Research Computing at Hopkins (ARCH) supercomputer at Johns Hopkins. We thank Dr. Yiben Fu for helping with plotting, and Dr. Alex Sodt and Johnson Lab members for feedback.

Author contributions

M.E.J. designed research; A.J. and S.L. performed research; A.J., S.L., Y.Q., and M.E.J. contributed new reagents/analytic tools; A.J., S.L., and M.E.J. analyzed data; and A.J. and M.E.J. wrote the paper.

Competing interests

The authors declare no competing interest.

Footnotes

This article is a PNAS Direct Submission.

Data, Materials, and Software Availability

Code and input data have been deposited in github.com/mjohn218/KineticAssembly_AD (https://doi.org/10.5281/zenodo.10674384) (55).

Supporting Information

References

1.Mallery D. L., et al. , Cellular IP6 levels limit HIV production while viruses that cannot efficiently package IP6 are attenuated for infection and replication. Cell Rep. 29, 3983–3996.e4 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Earnest T. M., Chen K., Lai J., Luthey-Schulten Z., Towards a whole-cell model of ribosome biogenesis: Kinetic modeling of SSU assembly. Biophys. J. 109, 1117–1135 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Kim H., et al. , Protein-guided RNA dynamics during early ribosome assembly. Nature 506, 334–338 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Whitelam S., Jack R. L., The statistical mechanics of dynamic pathways to self-assembly. Annu. Rev. Phys. Chem. 66, 143–163 (2015). [DOI] [PubMed] [Google Scholar]
5.Chacko B. M., et al. , Structural basis of heteromeric smad protein assembly in TGF-beta signaling. Mol. Cell 15, 813–823 (2004). [DOI] [PubMed] [Google Scholar]
6.Qian Y., et al. , Temporal control by cofactors prevents kinetic trapping in retroviral Gag lattice assembly. Biophys. J. 122, 3173–3190 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Freeman B. C., Yamamoto K. R., Disassembly of transcriptional regulatory complexes by molecular chaperones. Science 296, 2232–2235 (2002). [DOI] [PubMed] [Google Scholar]
8.Weith M., et al. , Ubiquitin-independent disassembly by a p97 AAA-ATPase complex drives PP1 holoenzyme formation. Mol. Cell 72, 766–777.e6 (2018). [DOI] [PubMed] [Google Scholar]
9.Ogden P. J., Kelsic E. D., Sinai S., Church G. M., Comprehensive AAV capsid fitness landscape reveals a viral gene and enables machine-guided design. Science 366, 1139–1143 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Baydin A. G., Pearlmutter B. A., Radul A. A., Siskind J. M., Automatic Differentiation in Machine Learning: A Survey. J. Mach Learn Res. 18, 1–43 (2018). [Google Scholar]
11.Minkov M., Inverse design of photonic crystals through automatic differentiation. ACS Photon. 7, 1729–1741 (2020). [Google Scholar]
12.Goodrich C. P., King E. M., Schoenholz S. S., Cubuk E. D., Brenner M. P., Designing self-assembling kinetics with differentiable statistical physics models. Proc. Natl. Acad. Sci. U.S.A. 118, e2024083118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Rapaport D. C., Role of reversibility in viral capsid growth: A paradigm for self-assembly. Phys. Rev. Lett. 101, 186101 (2008). [DOI] [PubMed] [Google Scholar]
14.Murugan A., Zou J., Brenner M. P., Undesired usage and the robust self-assembly of heterogeneous structures. Nat. Commun. 6, 6203 (2015). [DOI] [PubMed] [Google Scholar]
15.Johnson M. E., Hummer G., Nonspecific binding limits the number of proteins in a cell and shapes their interaction networks. Proc. Natl. Acad. Sci. U.S.A. 108, 603–608 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Haxton T. K., et al. , Competing thermodynamic and dynamic factors select molecular assemblies on a gold surface. Phys. Rev. Lett. 111, 265701 (2013). [DOI] [PubMed] [Google Scholar]
17.Hagan M. F., Elrad O. M., Understanding the concentration dependence of viral capsid assembly kinetics–the origin of the lag time and identifying the critical nucleus size. Biophys. J. 98, 1065–1074 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Lazaro G. R., Hagan M. F., Allosteric control of icosahedral capsid assembly. J. Phys. Chem. B 120, 6306–6318 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Zenk J., Schulman R., An assembly funnel makes biomolecular complex assembly efficient. PLoS One 9, e111233 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Hagan M. F., Modeling viral capsid assembly. Adv. Chem. Phys. 155, 1–68 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Gartner F. M., Graf I. R., Frey E., The time complexity of self-assembly. Proc. Natl. Acad. Sci. U.S.A. 119, e2116373119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Deeds E. J., Bachman J. A., Fontana W., Optimizing ring assembly reveals the strength of weak interactions. Proc. Natl. Acad. Sci. U.S.A. 109, 2348–2353 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Villar G., et al. , Self-assembly and evolution of homomeric protein complexes. Phys. Rev. Lett. 102, 118106 (2009). [DOI] [PubMed] [Google Scholar]
24.Sartori P., Leibler S., Lessons from equilibrium statistical physics regarding the assembly of protein complexes. Proc. Natl. Acad. Sci. U.S.A. 117, 114–120 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Williamson J. R., Cooperativity in macromolecular assembly. Nat. Chem. Biol. 4, 458–465 (2008). [DOI] [PubMed] [Google Scholar]
26.Marsh J. A., Teichmann S. A., Structure, dynamics, assembly, and evolution of protein complexes. Annu. Rev. Biochem. 84, 551–575 (2015). [DOI] [PubMed] [Google Scholar]
27.Bunner A. E., Beck A. H., Williamson J. R., Kinetic cooperativity in Escherichia coli 30S ribosomal subunit reconstitution reveals additional complexity in the assembly landscape. Proc. Natl. Acad. Sci. U.S.A. 107, 5417–5422 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Kucharska I., et al. , Biochemical reconstitution of HIV-1 assembly and maturation. J. Virol. 94, e01844-19 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Rohl R., Nierhaus K. H., Assembly map of the large subunit (50S) of Escherichia coli ribosomes. Proc. Natl. Acad. Sci. U.S.A. 79, 729–733 (1982). [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Jewett M. C., Fritz B. R., Timmerman L. E., Church G. M., In vitro integration of ribosomal RNA synthesis, ribosome assembly, and translation. Mol. Syst. Biol. 9, 678 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Schreiber G., Haran G., Zhou H. X., Fundamental aspects of protein-protein association kinetics. Chem. Rev. 109, 839–860 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Varga M. J., Fu Y., Loggia S., Yogurtcu O. N., Johnson M. E., NERDSS: A nonequilibrium simulator for multibody self-assembly at the cellular scale. Biophys. J. 118, 3026–3040 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Guo S. K., Sodt A. J., Johnson M. E., Large self-assembled clathrin lattices spontaneously disassemble without sufficient adaptor proteins. PLoS Comput. Biol. 18, e1009969 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Zlotnick A., Johnson J. M., Wingfield P. W., Stahl S. J., Endres D., A theoretical model successfully identifies features of hepatitis B virus capsid assembly. Biochemistry 38, 14644–14652 (1999). [DOI] [PubMed] [Google Scholar]
35.Whitesides G. M., Boncheva M., Beyond molecules: Self-assembly of mesoscopic and macroscopic components. Proc. Natl. Acad. Sci. U.S.A. 99, 4769–4774 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Satoh K., Sasajima H., Nyoumura K. I., Yokosawa H., Sawada H., Assembly of the 26S proteasome is regulated by phosphorylation of the p45/Rpt6 ATPase subunit. Biochemistry 40, 314–319 (2001). [DOI] [PubMed] [Google Scholar]
37.Marshall R. S., Vierstra R. D., Dynamic regulation of the 26S proteasome: From synthesis to degradation. Front. Mol. Biosci. 6, 40 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Makhnevych T., Houry W. A., The role of Hsp90 in protein complex assembly. Biochim. Biophys. Acta 1823, 674–682 (2012). [DOI] [PubMed] [Google Scholar]
39.Wells J. N., Bergendahl L. T., Marsh J. A., Operon gene order is optimized for ordered protein complex assembly. Cell Rep. 14, 679–685 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Ghosh S., Vassilev A. P., Zhang J., Zhao Y., DePamphilis M. L., Assembly of the human origin recognition complex occurs through independent nuclear localization of its components. J. Biol. Chem. 286, 23831–23841 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Mulder A. M., et al. , Visualizing ribosome biogenesis: Parallel assembly pathways for the 30S subunit. Science 330, 673–677 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Fujiwara R., Murakami K., In vitro reconstitution of yeast RNA polymerase II transcription initiation with high efficiency. Methods 159–160, 82–89 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Tamaru D., Amikura K., Shimizu Y., Nierhaus K. H., Ueda T., Reconstitution of 30S ribosomal subunits in vitro using ribosome biogenesis factors. Rna 24, 1512–1519 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Marsh J. A., et al. , Protein complexes are under evolutionary selection to assemble via ordered pathways. Cell 153, 461–470 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Levy E. D., Erba E. B., Robinson C. V., Teichmann S. A., Assembly reflects evolution of protein complexes. Nature 453, U1262–U1266 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Ahnert S. E., Marsh J. A., Hernandez H., Robinson C. V., Teichmann S. A., Principles of assembly reveal a periodic table of protein complexes. Science 350, aaa2245 (2015). [DOI] [PubMed] [Google Scholar]
47.Ruhle T., Leister D., Assembly of F1F0-ATP synthases. Biochim. Biophys. Acta 1847, 849–860 (2015). [DOI] [PubMed] [Google Scholar]
48.Nguyen V. Q., et al. , Spatiotemporal coordination of transcription preinitiation complex assembly in live cells. Mol. Cell 81, 3560–3575.e6 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Zhou H. X., Szabo A., Theory and simulation of the time-dependent rate coefficients of diffusion-influenced reactions. Biophys. J. 71, 2440–2457 (1996). [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Saglam A. S., Chong L. T., Protein-protein binding pathways and calculations of rate constants using fully-continuous, explicit-solvent simulations. Chem. Sci. 10, 2360–2372 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Selzer T., Albeck S., Schreiber G., Rational design of faster associating and tighter binding protein complexes. Nat. Struct. Biol. 7, 537–541 (2000). [DOI] [PubMed] [Google Scholar]
52.Pang X., Qin S., Zhou H. X., Rationalizing 5000-fold differences in receptor-binding rate constants of four cytokines. Biophys. J. 101, 1175–1183 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Signarvic R. S., DeGrado W. F., De novo design of a molecular switch: Phosphorylation-dependent association of designed peptides. J. Mol. Biol. 334, 1–12 (2003). [DOI] [PubMed] [Google Scholar]
54.Sahtoe D. D., et al. , Reconfigurable asymmetric protein assemblies through implicit negative design. Science 375, eabj7662 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Bass D., Jhaveri A., Johnson M., mjohn218/KineticAssembly_AD: AD kinetic models. Zenodo. 10.5281/zenodo.10674384. Deposited 17 February 2024. [DOI]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix 01 (PDF)

pnas.2403384121.sapp.pdf^{(3MB, pdf)}

Data Availability Statement

Code and input data have been deposited in github.com/mjohn218/KineticAssembly_AD (https://doi.org/10.5281/zenodo.10674384) (55).

[r1] 1.Mallery D. L., et al. , Cellular IP6 levels limit HIV production while viruses that cannot efficiently package IP6 are attenuated for infection and replication. Cell Rep. 29, 3983–3996.e4 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r2] 2.Earnest T. M., Chen K., Lai J., Luthey-Schulten Z., Towards a whole-cell model of ribosome biogenesis: Kinetic modeling of SSU assembly. Biophys. J. 109, 1117–1135 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r3] 3.Kim H., et al. , Protein-guided RNA dynamics during early ribosome assembly. Nature 506, 334–338 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r4] 4.Whitelam S., Jack R. L., The statistical mechanics of dynamic pathways to self-assembly. Annu. Rev. Phys. Chem. 66, 143–163 (2015). [DOI] [PubMed] [Google Scholar]

[r5] 5.Chacko B. M., et al. , Structural basis of heteromeric smad protein assembly in TGF-beta signaling. Mol. Cell 15, 813–823 (2004). [DOI] [PubMed] [Google Scholar]

[r6] 6.Qian Y., et al. , Temporal control by cofactors prevents kinetic trapping in retroviral Gag lattice assembly. Biophys. J. 122, 3173–3190 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r7] 7.Freeman B. C., Yamamoto K. R., Disassembly of transcriptional regulatory complexes by molecular chaperones. Science 296, 2232–2235 (2002). [DOI] [PubMed] [Google Scholar]

[r8] 8.Weith M., et al. , Ubiquitin-independent disassembly by a p97 AAA-ATPase complex drives PP1 holoenzyme formation. Mol. Cell 72, 766–777.e6 (2018). [DOI] [PubMed] [Google Scholar]

[r9] 9.Ogden P. J., Kelsic E. D., Sinai S., Church G. M., Comprehensive AAV capsid fitness landscape reveals a viral gene and enables machine-guided design. Science 366, 1139–1143 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r10] 10.Baydin A. G., Pearlmutter B. A., Radul A. A., Siskind J. M., Automatic Differentiation in Machine Learning: A Survey. J. Mach Learn Res. 18, 1–43 (2018). [Google Scholar]

[r11] 11.Minkov M., Inverse design of photonic crystals through automatic differentiation. ACS Photon. 7, 1729–1741 (2020). [Google Scholar]

[r12] 12.Goodrich C. P., King E. M., Schoenholz S. S., Cubuk E. D., Brenner M. P., Designing self-assembling kinetics with differentiable statistical physics models. Proc. Natl. Acad. Sci. U.S.A. 118, e2024083118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r13] 13.Rapaport D. C., Role of reversibility in viral capsid growth: A paradigm for self-assembly. Phys. Rev. Lett. 101, 186101 (2008). [DOI] [PubMed] [Google Scholar]

[r14] 14.Murugan A., Zou J., Brenner M. P., Undesired usage and the robust self-assembly of heterogeneous structures. Nat. Commun. 6, 6203 (2015). [DOI] [PubMed] [Google Scholar]

[r15] 15.Johnson M. E., Hummer G., Nonspecific binding limits the number of proteins in a cell and shapes their interaction networks. Proc. Natl. Acad. Sci. U.S.A. 108, 603–608 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r16] 16.Haxton T. K., et al. , Competing thermodynamic and dynamic factors select molecular assemblies on a gold surface. Phys. Rev. Lett. 111, 265701 (2013). [DOI] [PubMed] [Google Scholar]

[r17] 17.Hagan M. F., Elrad O. M., Understanding the concentration dependence of viral capsid assembly kinetics–the origin of the lag time and identifying the critical nucleus size. Biophys. J. 98, 1065–1074 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r18] 18.Lazaro G. R., Hagan M. F., Allosteric control of icosahedral capsid assembly. J. Phys. Chem. B 120, 6306–6318 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r19] 19.Zenk J., Schulman R., An assembly funnel makes biomolecular complex assembly efficient. PLoS One 9, e111233 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r20] 20.Hagan M. F., Modeling viral capsid assembly. Adv. Chem. Phys. 155, 1–68 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r21] 21.Gartner F. M., Graf I. R., Frey E., The time complexity of self-assembly. Proc. Natl. Acad. Sci. U.S.A. 119, e2116373119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r22] 22.Deeds E. J., Bachman J. A., Fontana W., Optimizing ring assembly reveals the strength of weak interactions. Proc. Natl. Acad. Sci. U.S.A. 109, 2348–2353 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r23] 23.Villar G., et al. , Self-assembly and evolution of homomeric protein complexes. Phys. Rev. Lett. 102, 118106 (2009). [DOI] [PubMed] [Google Scholar]

[r24] 24.Sartori P., Leibler S., Lessons from equilibrium statistical physics regarding the assembly of protein complexes. Proc. Natl. Acad. Sci. U.S.A. 117, 114–120 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r25] 25.Williamson J. R., Cooperativity in macromolecular assembly. Nat. Chem. Biol. 4, 458–465 (2008). [DOI] [PubMed] [Google Scholar]

[r26] 26.Marsh J. A., Teichmann S. A., Structure, dynamics, assembly, and evolution of protein complexes. Annu. Rev. Biochem. 84, 551–575 (2015). [DOI] [PubMed] [Google Scholar]

[r27] 27.Bunner A. E., Beck A. H., Williamson J. R., Kinetic cooperativity in Escherichia coli 30S ribosomal subunit reconstitution reveals additional complexity in the assembly landscape. Proc. Natl. Acad. Sci. U.S.A. 107, 5417–5422 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r28] 28.Kucharska I., et al. , Biochemical reconstitution of HIV-1 assembly and maturation. J. Virol. 94, e01844-19 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r29] 29.Rohl R., Nierhaus K. H., Assembly map of the large subunit (50S) of Escherichia coli ribosomes. Proc. Natl. Acad. Sci. U.S.A. 79, 729–733 (1982). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r30] 30.Jewett M. C., Fritz B. R., Timmerman L. E., Church G. M., In vitro integration of ribosomal RNA synthesis, ribosome assembly, and translation. Mol. Syst. Biol. 9, 678 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r31] 31.Schreiber G., Haran G., Zhou H. X., Fundamental aspects of protein-protein association kinetics. Chem. Rev. 109, 839–860 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r32] 32.Varga M. J., Fu Y., Loggia S., Yogurtcu O. N., Johnson M. E., NERDSS: A nonequilibrium simulator for multibody self-assembly at the cellular scale. Biophys. J. 118, 3026–3040 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r33] 33.Guo S. K., Sodt A. J., Johnson M. E., Large self-assembled clathrin lattices spontaneously disassemble without sufficient adaptor proteins. PLoS Comput. Biol. 18, e1009969 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r34] 34.Zlotnick A., Johnson J. M., Wingfield P. W., Stahl S. J., Endres D., A theoretical model successfully identifies features of hepatitis B virus capsid assembly. Biochemistry 38, 14644–14652 (1999). [DOI] [PubMed] [Google Scholar]

[r35] 35.Whitesides G. M., Boncheva M., Beyond molecules: Self-assembly of mesoscopic and macroscopic components. Proc. Natl. Acad. Sci. U.S.A. 99, 4769–4774 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r36] 36.Satoh K., Sasajima H., Nyoumura K. I., Yokosawa H., Sawada H., Assembly of the 26S proteasome is regulated by phosphorylation of the p45/Rpt6 ATPase subunit. Biochemistry 40, 314–319 (2001). [DOI] [PubMed] [Google Scholar]

[r37] 37.Marshall R. S., Vierstra R. D., Dynamic regulation of the 26S proteasome: From synthesis to degradation. Front. Mol. Biosci. 6, 40 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r38] 38.Makhnevych T., Houry W. A., The role of Hsp90 in protein complex assembly. Biochim. Biophys. Acta 1823, 674–682 (2012). [DOI] [PubMed] [Google Scholar]

[r39] 39.Wells J. N., Bergendahl L. T., Marsh J. A., Operon gene order is optimized for ordered protein complex assembly. Cell Rep. 14, 679–685 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r40] 40.Ghosh S., Vassilev A. P., Zhang J., Zhao Y., DePamphilis M. L., Assembly of the human origin recognition complex occurs through independent nuclear localization of its components. J. Biol. Chem. 286, 23831–23841 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r41] 41.Mulder A. M., et al. , Visualizing ribosome biogenesis: Parallel assembly pathways for the 30S subunit. Science 330, 673–677 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r42] 42.Fujiwara R., Murakami K., In vitro reconstitution of yeast RNA polymerase II transcription initiation with high efficiency. Methods 159–160, 82–89 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r43] 43.Tamaru D., Amikura K., Shimizu Y., Nierhaus K. H., Ueda T., Reconstitution of 30S ribosomal subunits in vitro using ribosome biogenesis factors. Rna 24, 1512–1519 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r44] 44.Marsh J. A., et al. , Protein complexes are under evolutionary selection to assemble via ordered pathways. Cell 153, 461–470 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r45] 45.Levy E. D., Erba E. B., Robinson C. V., Teichmann S. A., Assembly reflects evolution of protein complexes. Nature 453, U1262–U1266 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r46] 46.Ahnert S. E., Marsh J. A., Hernandez H., Robinson C. V., Teichmann S. A., Principles of assembly reveal a periodic table of protein complexes. Science 350, aaa2245 (2015). [DOI] [PubMed] [Google Scholar]

[r47] 47.Ruhle T., Leister D., Assembly of F1F0-ATP synthases. Biochim. Biophys. Acta 1847, 849–860 (2015). [DOI] [PubMed] [Google Scholar]

[r48] 48.Nguyen V. Q., et al. , Spatiotemporal coordination of transcription preinitiation complex assembly in live cells. Mol. Cell 81, 3560–3575.e6 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r49] 49.Zhou H. X., Szabo A., Theory and simulation of the time-dependent rate coefficients of diffusion-influenced reactions. Biophys. J. 71, 2440–2457 (1996). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r50] 50.Saglam A. S., Chong L. T., Protein-protein binding pathways and calculations of rate constants using fully-continuous, explicit-solvent simulations. Chem. Sci. 10, 2360–2372 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r51] 51.Selzer T., Albeck S., Schreiber G., Rational design of faster associating and tighter binding protein complexes. Nat. Struct. Biol. 7, 537–541 (2000). [DOI] [PubMed] [Google Scholar]

[r52] 52.Pang X., Qin S., Zhou H. X., Rationalizing 5000-fold differences in receptor-binding rate constants of four cytokines. Biophys. J. 101, 1175–1183 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r53] 53.Signarvic R. S., DeGrado W. F., De novo design of a molecular switch: Phosphorylation-dependent association of designed peptides. J. Mol. Biol. 334, 1–12 (2003). [DOI] [PubMed] [Google Scholar]

[r54] 54.Sahtoe D. D., et al. , Reconfigurable asymmetric protein assemblies through implicit negative design. Science 375, eabj7662 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r55] 55.Bass D., Jhaveri A., Johnson M., mjohn218/KineticAssembly_AD: AD kinetic models. Zenodo. 10.5281/zenodo.10674384. Deposited 17 February 2024. [DOI]

PERMALINK

Discovering optimal kinetic pathways for self-assembly using automatic differentiation

Adip Jhaveri

Spencer Loggia

Yian Qian

Margaret E Johnson

Significance

Abstract

Models and Methods

Fig. 1.

Results

With Equal Association Rates, Kinetic Trapping Emerges for All N ≥ 3 Systems with High Yield.

Fig. 2.

Kinetic Trapping Is Increasingly Problematic as N Increases.

Timescales of Trapped Systems Predict a Nearly Universal Dependence on N and ΔG.

Kinetic Traps Can Be Robustly Avoided by Either Varying Internal Binding Rates, or by External Control of Subunits.

Fig. 3.

Internal Design of Binding Rates Is Most Efficient and Robust.

Internal Protocols to Avoid Traps Place Strict Design Constraints on Hierarchy of Association Rates.

Fig. 4.

Diverse Subunits Expand and Simplify the Design Space of Internal Protocols.

Robustness of Internal Protocols Stems from Their Ultimate Insensitivity to Dissociation Times and Concentration.

External Protocols Can Exploit Diversity of Subunits to Hierarchically Accelerate Assembly.

Fig. 5.

External Protocols Are Sensitive to Monomer Concentrations, But Similarly Robust to Dissociation Times.

AD-Based Optimization Can Be Used to Infer Rates and Assembly Mechanism from Observed Yield-vs.-Time.

Fig. 6.

Discussion

Supplementary Material

Acknowledgments

Author contributions

Competing interests

Footnotes

Data, Materials, and Software Availability

Supporting Information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases