Abstract
Markov state models (MSMs) have been demonstrated to be a powerful method for computationally studying intramolecular processes such as protein folding and macromolecular conformational changes. In this article, we present a new approach to construct MSMs that is applicable to modeling a broad class of multi-molecular assembly reactions. Distinct structures formed during assembly are distinguished by their undirected graphs, which are defined by strong subunit interactions. Spatial inhomogeneities of free subunits are accounted for using a recently developed Gaussian-based signature. Simplifications to this state identification are also investigated. The feasibility of this approach is demonstrated on two different coarse-grained models for virus self-assembly. We find good agreement between the dynamics predicted by the MSMs and long, unbiased simulations, and that the MSMs can reduce overall simulation time by orders of magnitude.
INTRODUCTION
The assembly of basic units into structures with increased size and complexity is central to biology, where examples of assembled structures include viruses (e.g., Refs. 1, 2, 3, 4), cell membranes, cytoskeletal filaments,5 and ordered layers of proteins on bacterial surfaces.6 Assembly is also increasingly important to nanoscience, where interactions between colloidal particles are being engineered to drive assembly into sophisticated, functional materials (e.g., Refs. 7, 8, 9, 10, 11, 12, 13, 14, 15) and DNA origami promises the ability to build structures of nearly limitless complexity (e.g., Refs. 16, 17, 18). An important focus of current research is understanding how the interactions between individual components determine assembly pathways, timescales, and fidelity for a target structure. Computational modeling can play a key role in determining assembly pathways and mechanisms, since most intermediates are transient and thus not readily characterized in experiments. However, simulating assembly is challenging because target structures can be orders of magnitude larger than their constituent components and assembly pathways typically surmount large free energy barriers, leading to timescales which greatly exceed computational limitations.
This paper is concerned with using Markov State Models (MSMs) to overcome the gap between assembly times and computationally accessible timescales. Many powerful enhanced sampling techniques have been developed to efficiently harvest computational trajectories that include barrier crossings or other rare events (e.g., Refs. 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34). However, many of these methods are limited by requiring a priori knowledge of reaction coordinates, one or few pathways to completion, or only several metastable minima. In contrast, MSMs can be used to study assembly reactions characterized by multiple free energy barriers, a diverse ensemble of pathways, and pleomorphic products. Furthermore, MSMs are one of only a few methods35 that can describe non-stationary, out-of-equilibrium dynamical processes.
Previous works on assembly of capsids or other structures have postulated Markov state models in which the state space and transition rates are pre-assumed based on physical considerations.36, 37, 38, 39, 40, 41, 42, 43, 44, 45 In the approach described here, the state space and transition rates emerge from particle-based dynamics simulations, and the validity of the Markov assumption is explicitly tested against the microscopic dynamics.
While such MSMs have been extensively developed in the context of protein folding,46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60 existing approaches cannot describe the assembly of disconnected, permutable subunits. Here, we present a method to construct MSMs that is applicable to a wide variety of such assembly reactions. We test our approach on two models, which respectively describe the assembly of viral proteins around rigid nanoparticles and flexible polymers. While straightforward dynamics simulations with similar models have led to important insights about capsid assembly (e.g., Refs. 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, reviewed in Refs. 4, 77), these investigations have been limited to parameters for which nucleation barriers are small. We show that the MSM approach enables simulation over a much wider range of experimentally relevant parameter values.
METHODS
Existing implementations of MSMs
In this section, we review how relatively short, unbiased simulations can be used to build an MSM and how this technique has been applied to study protein folding or conformational transitions. The procedure begins by partitioning configurations from the short simulations into states such that conformations which interconvert rapidly are collected into the same state. The separation of timescales resulting from this partitioning ensures that the model is Markovian on timescales longer than a “lag time” τ, meaning that the probability of transitioning to a new state only depends upon the current state. Taking to be the vector of probabilities of being in each of the possible states of the system at time t = 0, the state probability at time tf is given by where n = tf/τ and is the stochastic matrix of interstate transition probabilities estimated from the simulations at lag time τ.
Determining a state decomposition that achieves the separation of timescales described above is a crucial aspect of building an MSM. If the states do not sufficiently distinguish values of all of the slow degrees of freedom in the system, then the lag time τ at which the system becomes Markovian will be comparable to its longest relaxation timescale. Since simulations must be greater than τ in length, the “short” simulations will approach the length of long, unbiased trajectories and thus the method will offer no computational savings.
Several approaches to determining state decompositions have been developed in the context of all-atom protein simulations.46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93 In cases where a set of collective coordinates describing all of the slow degrees of freedom is known (or guessed) a priori, biased sampling can be used to determine the free energy landscape as a function of these coordinates. States can then be defined based on local free energy minima (e.g., Refs. 49 and 78, 79, 80, 81). Since it is rare to have a priori knowledge of good collective coordinates, alternative approaches have been developed in which configurations are clustered based on geometric criteria, such as structural similarity (e.g., Refs. 82, 83, 84, 85). Chodera et al.59 developed an algorithm to refine an initial geometric partitioning of “microstates” into “macrostates” based on kinetics. Open source software packages such as MSMBuilder86, 87 and EMMA94 provide a suite of tools for building MSMs in this fashion and analyzing them. This algorithm and similar approaches have been shown to be extremely powerful for the study of proteins, allowing for prediction of folding pathways and rates on even the supra-millisecond timescale (e.g., Refs. 54, and 87, 88, 89, 90) as well as identifying hidden allosteric sites.91 Recently, systematic approaches to find optimal coordinates for constructing MSMs have been developed.92, 93
Building MSMs for assembly systems
In contrast to protein folding, where each residue in the protein has a unique index, assembly subunits are permutable and thus cannot be indexed. Therefore, existing algorithms for determining state decompositions cannot be directly applied to self-assembly. Here, we describe several approaches to creating a state decomposition based on the network of subunit interactions and their positions relative to heterogeneous nucleation sites.
We consider systems in which subunits can assemble either through homogeneous nucleation to form a single component structure (e.g., an empty virus capsid shell) or through heterogeneous nucleation around a scaffold (e.g., a polymer or a nanoparticle) to form a multicomponent structure. To simplify the presentation, we consider one type of subunit and focus on only the largest assembled structure in the system at any given time, but the approach can be generalized to multiple subunit species and assemblies. Our approach can be applied to systems which assemble via reversible or irreversible interactions; in both cases, we will describe a strong interaction as a “bond.”
To generate a state decomposition, we categorize subunits into three classes: class I: bonded subunits in the assemblage, class II: subunits bonded to the scaffold, but not the assemblage, and class III: unbonded, free subunits. In the case of homogeneous nucleation, there are no class II subunits (since there is no scaffold), and class III subunits can be ignored if subunit association to the scaffold is sufficiently reaction-rate limited that the density of free subunits is spatially uniform.
For systems that assemble into well-defined structures, fluctuations in bond distances and angles are fast compared to bond formation and breakage. By averaging over these short timescale fluctuations, the unique structure of a growing cluster can be defined by the class I subunit bonding network. More precisely, each cluster is converted into an undirected graph with nodes corresponding to subunits and edges corresponding to bonds between the subunits (see Fig. 11 in Appendix B). This ensures a consistent state definition that is unaffected by exchanges of subunits with different indices, short-timescale conformational fluctuations, or rigid body motions of the assemblage. Class I subunits can be further sub-partitioned by including the distance from the scaffold to each subunit, but this was unnecessary for the systems that we considered. Class II subunits can be handled in a similar manner by considering the subunit-scaffold bonding network, but we found that it was only necessary to track the total number of class II subunits.
Figure 11.
(a) and (c) show clusters of patchy spheres growing on the nanoparticle during assembly while (b) and (d) show their respective graph representations. Attractors are not shown, but bonds (strong attractions) between subunits are depicted as teal cylinders. In this work, only the largest cluster was considered when constructing the MSM, which would exclude the lower dimer in (c) and (d) from consideration.
Class III subunits must be included in the state decomposition when their association to the cluster or scaffold approaches the diffusion limit, which results in density inhomogeneities. Free subunit positions fluctuate rapidly since they are not involved in strong interactions, and thus we use their density distribution rather than their positions to decompose states. We follow the approach developed by Gu et al.95 to include solvent degrees of freedom in protein folding MSMs. We define a vector w, in which each index corresponds to a subunit in the cluster or a residue of the scaffold and each component corresponds to a distance-weighted density of free subunits around the indexed subunit,
| (1) |
with and as the positions of cluster/scaffold subunit i and the free subunit j. σd is an adjustable decay length that sets the scale for relevant interactions between free subunits and the scaffold or growing cluster. This definition weighs nearby subunits more heavily since they are more likely to associate with the cluster or scaffold. Whether cluster subunits, scaffold residues, or both need to be considered in this definition depends on which association reactions approach the diffusion limit. For example, in the case considered below where subunit adsorption onto the nanoparticle approaches the diffusion limit but subunit-subunit associations do not, it is only necessary to include the nanoparticle in w.
Finally, if scaffold internal degrees of freedom (e.g., conformations of a polymer) evolve slowly, these should also be included in the state definition. Because the scaffold residues (i.e., polymer segments) are indexable, these degrees of freedom can be treated via the existing RMSD-based approach (e.g., in MSMBuilder86, 87 and EMMA94).
Reducing the number of states
As the size of the target structure increases, the number of distinct assembly intermediates and hence the number of unique graphs grows rapidly. The number of states could become intractable if class II or class III subunits were included in the state definition. There are two routes to reduce the number of states. First, kinetic data can be used to group microstates that interconvert rapidly into macrostates by following the approach based on Perron cluster analysis in MSMBuilder60, 86, 87 and EMMA.94 Second, a priori knowledge of the system can be used to reduce the number of unique states. Since assembly into a target structure with high fidelity generally requires weak subunit-subunit interactions,4, 96, 97 subunits in a cluster with only one bond rapidly dissociate. Thus, the edges corresponding to these interactions can generally be neglected when building graphs. For the models considered here, we found that fluctuations about the most compact, highly bonded structures are rapid enough that it was sufficient to consider only the number of subunits in a cluster. Several alternative simplified descriptions are discussed in Appendix B.
Generating the transition matrix
The transition probability matrix T(τ) is calculated by column-normalizing the count matrix C(τ), in which each element Cji gives the total number of transitions from state i to state j measured at a lag time τ. The count matrix can be calculated from many, relatively short, unbiased trajectories run in parallel. Because of the Markov property, the initial conditions for these trajectories can be chosen to efficiently generate good statistics for all of the relevant transition elements.
When no information is available a priori about which transition elements are most significant, one can use a ratcheting procedure (Appendix C). Many simulations are run in parallel for a time ts, which must be longer than the lag time τ but can be much shorter than the longest relaxation timescale. Microstates are then determined from coordinates saved during these trajectories, and a new ensemble of trajectories is started with initial conditions preferentially chosen from the microstates with the poorest sampling. This procedure is repeated until T has satisfactorily converged. Once sufficient statistics have been gathered to crudely estimate T, more systematic adaptive sampling47 can be used to choose initial conditions that will reduce the statistical uncertainty of the MSM. However, the initial ratcheting procedure already allows for tremendous speed up in comparison to long, brute force (without enhanced sampling) simulations as it enables the system to cross free energy barriers in linear rather than exponential simulation times. Note that because the protocol does not generate initial conditions according to the equilibrium distribution, the count matrix C should not be symmetrized when simulating assembly dynamics. In fact, even if C is estimated from long, unbiased trajectories that achieve formation of the target structure it should not be symmetrized when calculating dynamics, since assembly to the target structure is an out-of-equilibrium process.
Analysis of MSMs
In this subsection, we briefly review analysis of constructed MSMs and discuss an application which is useful for analyzing assembly reactions. Upon spectral decomposition of the transition matrix, the time-dependent state probabilities can be written as
| (2) |
where ωi is the ith eigenvalue of T(τ) and and are the corresponding left/right eigenvectors, which are assumed to be normalized. Since T(τ) is generally not Hermitian, the left and right eigenvectors are not equivalent. Because the rate matrix is stochastic, there is only one unit eigenvalue, whose associated right eigenvector corresponds to the equilibrium distribution, while all other eigenvalues are positive and real.50 The implied timescale, corresponds to the relaxation timescale for eigenmode i. For lag times on which the system satisfies the Markov assumption, the calculated implied timescales are nearly independent of τ.88 Checking the convergence of the implied timescales is useful for selecting an appropriate τ, but does not guarantee that the model is Markovian, which also requires converged eigenvectors.88
Self-assembly reactions
It is often useful to calculate the completion fraction fc(t), which is defined as the fraction of structures in the target state as a function of time. This quantity can be compared to light scattering or size exclusion chromatography experiments.43, 98, 99, 100, 101, 102, 103, 104, 105 With ordered such that index 1 corresponds to the initial, unassembled state and the largest index N corresponds to the target state, fc is given by PN(t). Inserting Pi(0) = δi, 1, with δ the Kroniker delta, into Eq. 2 gives
| (3) |
where indicates the nth index of . The mean completion time then follows as
| (4) |
Analysis using transition path theory
Insight into assembly mechanisms can be obtained from MSMs using Transition Path Theory (TPT),106, 107 which has been developed in the context of MSMs in Refs. 54, 90, 108, 109. We state two of these results which are particularly useful for analyzing assembly reactions here. The microstates that correspond to the transition state ensemble can be identified by calculating the committor probability for each state, which is the probability that a dynamical trajectory initiated from a given state will subsequently visit the target state.110, 111 We define A as the set of mostly unassembled states that rapidly interconvert (the reactant states), B as the target structure (the product state), and I as all other states (the intermediate states). The forward committor probability, , is the probability that a trajectory started in state i will visit B before A and is given by solving54
| (5) |
Similarly, the backward committor probability is the probability that the system was more recently in state A than B. For an equilibrium system, the committor probabilities are related by54, but for the models considered here, the target structure acts as an absorbing state (see Appendix C), which gives for i∉B.
The relative probabilities of different assembly pathways can be calculated from the flux between states, which is given by54, with πi as the stationary probability of being found in state i. Since fij contains non-productive loops that are not on the pathway to completion, the forward flux is defined by subtracting out these contributions:54
| (6) |
MODELS
To test and benchmark our MSM framework, we consider two previously studied models for viral assembly, which differ in their level of detail and more importantly in the type of cargo being packaged. Both models represent capsid protein subunits as rigid bodies with excluded volume geometries and orientation-dependent interactions (following, e.g., Refs. 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, and 112, 113, 114), designed such that the lowest energy structure is an icosahedron with 20 subunits. Each subunit can be thought of as describing a trimer of proteins that form a T = 1 capsid.4
Patchy sphere model
The first model is motivated by experiments in which capsid proteins from brome mosaic virus (BMV) or Hepatitis B virus (HBV) capsid proteins assemble around nanoparticles functionalized with negative charge.115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126 Following the approach developed by Schwartz et al.,61 the subunit excluded volume is spherically symmetric and three attractive patches (bond vectors) are rigidly fixed to the subunit, with each pair of bond vectors forming an angle of 108° (see Fig. 1 and Eq. A1). There is a favorable interaction between subunits when (1) the ends of bond vectors nearly overlap, (2) the bond vectors are nearly anti-parallel, and (3) the secondary bond vectors are nearly coplanar. Twenty subunits realizing these conditions results in the minimum energy target structure (a complete capsid) shown in Fig. 1. The interaction strength is tuned by the parameter ɛB. The nanoparticle has a spherical excluded volume and short-range attractive interactions with capsid subunits, which are tuned by the parameter εS and qualitatively represent screened electrostatic attractions. More details about this model can be found in Appendix A1.
Figure 1.

Patchy sphere model geometry. (a) Geometry of two attracting subunits with the bond vectors depicted as arrows and attractors colored teal. The angle between each of the subunit bond vectors is 108o, and the interactions are described in Appendix A (Eq. A1). The dihedral angle is not shown. (b) A cutaway view of the complete capsid showing the nanoparticle. (c) The complete capsid, which contains 20 subunits arranged with icosahedral symmetry.
Triangles model
This model represents a capsid protein subunit using multiple, spherical “excluders” that enforce excluded volume and spherical “attractors” with short-range, pairwise attractions tuned by the parameter ɛcc. Excluders and attractors are arranged so that the minimum energy capsid is an icosahedron (Fig. 2). The cargo considered here is a self-avoiding bead-spring polymer with a persistence length comparable to that of single-stranded RNA (ssRNA) with no base pairing. Polymer beads experience short range attractive interactions with polymer-attractors located on the bottom of protein subunits (see Fig. 2); the interaction strength is tuned by the parameter ɛcp. This model has previously been used to study assembly around a polymer72 and empty capsid assembly.73 Similar models were applied to empty capsids by Rapaport69, 70, 71 and Nguyen et al.63 More details are given in Appendix A2.
Figure 2.

Triangle model geometry. (a) Subunit geometry with grey excluders, green polymer attractors, and teal subunit attractors. Subunits experience an attractive interaction when the subunit attractors nearly overlap. (b) A cutaway view of the complete capsid with the encapsulated polymer shown in blue. (c) The complete capsid, which contains 20 subunits.
Simulations and units
Subunit positions and orientations are propagated using overdamped Brownian dynamics according to a second order predictor-corrector algorithm.127 To represent an experiment with excess capsid protein, each simulation includes a single scaffold (nanoparticle or polymer) and is coupled to a bulk solution by performing grand canonical Monte Carlo moves, in which subunits at the periphery of the simulation box are exchanged with a reservoir at fixed chemical potential with a frequency consistent with the diffusion limited rate.72, 122 To obtain dimensionless units, we rescale energies by kBT and times by a characteristic diffusion timescale (see Appendix A).
RESULTS
We performed simulations over a wide range of parameters to test the ability of the MSMs to accurately reproduce assembly dynamics and to determine the extent of computational speed up in comparison to brute force calculations. To evaluate accuracy, we compare the MSM (Eq. 3) and brute force dynamics predictions for the cumulative distribution of assembly times, fc(t) in Figs. 34. We find that this comparison provides a stringent test of the MSM, as it requires an accurate estimate of all statistically relevant elements of the transition matrix. In particular, capturing the assembly lag phase (the time before the first target structures appear, see Fig. 3) requires accurate estimates for as many as 20 implied timescales and their associated eigenvectors.
Figure 3.
Patchy sphere model: Fraction complete (fc) as a function of time from the brute-force calculations (symbols) and the MSM calculations (lines). The subunit-subunit binding energy parameter is ɛB = 10 for all simulations and the values of the subunit-nanoparticle interaction parameter εS are indicated on the plot. MSM states are defined by the number of subunits adsorbed to the nanoparticle and the largest cluster size, except for εS = 9, which also includes Eq. 1 for density variations around the nanoparticle with σd = 4σ. Notice that a semi-log scale is used to accommodate a wide range of assembly times. The brute force estimates of fc(t) are calculated from the completion times for 500–1000 unbiased simulations.
Figure 4.
Triangles model: Fraction complete as a function of time for the brute-force calculations (symbols) and the MSM calculations (lines) for indicated values of the subunit-subunit (ɛcc) and subunit-polymer (ɛcp) interaction parameters. It was not possible to simulate assembly using brute-force calculations for ɛcp = 3.0. MSM states are defined by the number of subunits adsorbed to the polymer and the largest cluster size.
The assembly time distributions are accurately predicted from the MSMs for a wide range of parameter values that represent different assembly mechanisms (Figs. 34). In the patchy sphere model, assembly for the weakest subunit-nanoparticle interaction (εS = 6.6) is heavily nucleation-dominated, while for the strongest value (εS = 9) the nucleation timescale is comparable to the elongation timescale128 (i.e., the time required for a critical nucleus to grow to completion129). These two scenarios are distinguished by the spectrum of implied timescales (Fig. 5). MSMs corresponding to nucleation-dominated parameter sets are characterized by a wide separation between the two largest implied timescales, whereas the MSM corresponding to εS = 9 yields a dense spectrum of implied timescales. As discussed in Sec. 4C, the different parameter values give rise to very different assembly pathways as well.
Figure 5.
Implied timescales for the patchy sphere model. The five largest timescales are shown for (a) slow, nucleation dominated assembly (ɛB = 10, εS = 6.6) and (b) rapid assembly (ɛB = 10, εS = 9). For (a), the largest implied timescale (excluding the unit eigenvalue) corresponds to the nucleation timescale.
For most parameter sets, we found that the minimal state definition capable of reproducing assembly dynamics included the number of subunits in the largest cluster and the number of subunits adsorbed to the nanoparticle or polymer. However, it is worth noting that the ratcheting procedure used to estimate the transition matrix considered not only the cluster size, but also the number of intra-cluster bonds (Appendix C). Ratcheting was less efficient when only the cluster size was considered, which suggests that the bonding network within a cluster is important. For the parameter set with the strongest subunit-nanoparticle interaction strength (ɛs = 9 in Fig. 3), subunits rapidly adsorbed onto the nanoparticle and it was also necessary to include the free subunit density distribution (Eq. 1) when building the MSM.
Although a simple state definition yields accurate results for these models, a more detailed order parameter that describes the cluster structure will be required in other situations. To show that MSM construction is feasible even with our most general definition, we also generated MSMs using the graph coordinate (described in the Appendix B). As shown in Fig. 6 the predicted assembly time distributions are identical to those predicted from the simpler coordinates.130
Figure 6.
Feasibility of the graph coordinate. The fraction complete for the patchy sphere model calculated from MSMs defined using the number of subunits adsorbed to the nanoparticle and either the largest cluster size (dashed line) or the graph coordinate (solid line) for indicated parameter values. Data from long, unbiased simulations is shown as points.
Testing convergence
To test that a system is Markovian on timescales corresponding to the lag time τ used to build the transition matrix, one typically checks that the implied timescales are nearly lag-time-independent for lag times equal to or exceeding τ (Fig. 5). For assembly systems, convergence can be more stringently tested by determining if the predicted assembly time distribution fc, which depends on all of the implied timescales and associated eigenvectors, becomes independent of lag time (Fig. 7).
Figure 7.
MSM convergence data for patchy spheres (ɛB = 10, εS = 6.6). (a) The scaled error (Eq. 7) of the mean completion time as a function of total simulation time tT for the MSM calculation and straightforward dynamics. (b) Convergence of the fraction complete fc as more trajectories are used to build the MSM. The mean completion time is about 100 times the length of the short simulations used for this parameter set. The MSM has converged after about 1000 short simulations, which corresponds to the total CPU time of 10 brute force simulations. Without the MSM, hundreds of brute force simulations are required to achieve comparable precision.
Simulation time
To assess the computational speed up afforded by MSMs for our systems, we calculated a scaled error Θ for the estimated mean assembly time :
| (7) |
Here tT is the total simulation time accrued during the short trajectories used to estimate the transition probability matrix; tT was varied by changing the number of short trajectories. We neglected computational overhead associated with initializing short trajectories and spectral decomposition of the transition matrix, as these factors were negligible. We also calculated Θ as a function of simulation time for straightforward dynamics, with tT varied by changing the number of trajectories used to estimate . As shown in Fig. 7, the MSM calculation converges with an order of magnitude less simulation time than the estimate based on straightforward dynamics. As expected, the magnitude of speedup depends on the separation of timescales, with greater speed up for large nucleation barriers and only limited speed up in growth-dominated regimes with dense timescale spectra. Importantly, the method is not hindered by multiple, large nucleation barriers such as tend to occur for low values of ɛB in these capsid assembly systems.62 The MSM calculation shown for the lowest polymer-subunit interaction in Fig. 4 (ɛcp = 3.0) demonstrates that efficient convergence can be achieved for parameters which are inaccessible to unbiased simulations; a typical unbiased simulation at these parameters would require 2 × 104 cpu hours (over 2 cpu years).
Analysis using transition path theory
As a further test of the MSM approach and its ability to provide mechanistic insight, we applied Transition Path Theory (TPT)54 (see Sec. 2E) to MSMs for several patchy sphere parameter sets. First, we calculated committor probabilities as functions of the cluster size n and the number of adsorbed subunits ns. Committors calculated using the MSM (Fig. 8a) closely agree with those calculated from straightforward dynamical trajectories (Fig. 8b). Provided that the chosen collective variables are suitable reaction coordinates,110 states with committor probabilities near qi = 1/2 correspond to critical nuclei, or coordinates from which complete assembly or complete disassembly are equally probable. Figure 8 reveals that the ensemble of critical nuclei depends sensitively on parameter values. For instance, with moderate strengths of subunit-subunit and subunit-nanoparticle interactions (Fig. 8a) critical nuclei occur for a narrow range of adsorbed subunits ns ≈ 15 but include a broad range of n = 10 − 15 subunits in the cluster, indicating that subunit adsorption is the controlling degree of freedom, while the assembly state of the adsorbed subunits is fluctuating rapidly during nucleation. In contrast, for weak subunit-subunit and strong subunit-nanoparticle interactions (Fig. 8d), for which subunit adsorption is rapid but assembly requires a rare fluctuation to a large cluster size, the critical nuclei include a narrow range of n.
Figure 8.

Patchy sphere committor probabilities (Eq. 5). The top row compares committor probabilities calculated using (a) the MSM and (b) brute force dynamics for the same parameter set, while (c) and (d) show committor probabilities calculated using the MSM for two parameter sets for which assembly proceeds by different pathways. Parameter values are indicated on each plot.
To determine how parameter values influence assembly mechanisms, we used Eq. 6 to calculate the total forward flux through each Markov state (Fig. 9). The sequence of states with the largest flux connecting the initial and target configurations corresponds to the most probable assembly pathway. We see that for nucleation dominated parameters (ɛB = 10, εS = 6.6, Figs. 9a, 9c) on average 15 of the 20 subunits required to form a capsid have adsorbed onto the nanoparticle in a disordered manner before any significant assembly occurs, indicating that assembly proceeds according to the en masse mechanism defined in Refs. 72, 122, 131. For weaker subunit-nanoparticle interactions and stronger subunit-subunit interactions (εS = 4, ɛB = 13.5, Figs. 9b, 9d), the predominant assembly pathway changes to a nucleation and growth scenario in which the number of adsorbed subunits and the cluster size proceed in lockstep, because adsorption of unassembled subunits is only transient. The calculated assembly pathways closely agree with those determined from brute force dynamics.
Figure 9.

(a) and (b): The total forward flux (Eq. 6) through each state as calculated from the MSM for patchy spheres, with indicated parameter values. Note that values of the forward flux are colored according to a log scale. (c) and (d): Snapshots depicting the typical assembly pathways for the parameter sets in (a) and (b), respectively. The Roman numeral labels indicate points on the flux diagrams to which the snapshots correspond. Bonded subunits are colored red, unbonded subunits adsorbed to the nanoparticle are colored blue, and the nanoparticle is rendered translucent.
DISCUSSION AND CONCLUSIONS
In this paper, we propose a general approach to construct MSMs for assembly systems, in which undirected graphs categorize the interaction network of growing clusters. We also discuss several simplified descriptors of the assembly state. A Gaussian-based signature95 can be used to describe the density distribution of free subunits; however, this parameter can be neglected when subunit association is reaction-rate limited. We find that MSMs constructed from trajectories that are orders of magnitude shorter than the mean assembly time accurately describe dynamics on all assembly timescales for two different model systems. Analysis of the constructed MSMs using transition path theory54 reveals predominant assembly pathways and the transition state ensembles as functions of control parameter values. This information can be used to infer mechanisms and to identify key intermediates that could be targeted to effect changes in assembly yields or structures. Furthermore, the method scales trivially to many processors, which should enable simulation of larger assembling structures than is currently possible.
Limitations and possible extensions of the method
A key requirement of the MSM approach is determining a state decomposition for which the system is Markovian on a timescale short in comparison to the assembly time, which requires that the state decomposition distinguishes values of all slow degrees of freedom. Based on the success of our MSM calculation for two different models, we expect our approach to state decomposition to apply to a wide variety of assembly reactions. Our structure-based coordinates can be applied to any reaction in which subunit association and dissociation timescales are slow in comparison to cluster internal vibrations (i.e., fluctuations around preferred interaction distances and angles). The coordinates described in this paper do not explicitly describe the polymer degrees of freedom and thus require that these are fast in comparison to assemblage properties. However, since polymer segments can be indexed, polymer configurations can be categorized via the existing RMSD-based approaches.86, 87, 94 Other degrees of freedom which might potentially be included in MSM definitions include nucleic acid secondary and tertiary structure and capsid protein conformational states (e.g., Ref. 124).
Expected scaling of the method
The minimum time required for simulating assembly of a target structure with N subunits without enhanced sampling, in the absence of nucleation barriers, scales as ∼N2. This assumes that the elongation time (defined in Sec. 4) scales ∼N (see Ref. 129) and that the average number of particles in each simulation (and thus the CPU time/step) scales ∼N. As shown in Appendix D, the approach described in this article achieves similar scaling even in the presence of nucleation barriers. Use of more sophisticated adaptive sampling can be expected to further reduce the required CPU time.47 In addition to overcoming slow relaxation timescales, a significant advantage of the MSM approach over unbiased dynamics is that the short trajectories used to estimate the transition rates are trivially parallelizable, and thus can be extended to many more processors than unbiased dynamics.
Similar estimates can be applied to scaling with other parameters. For example, while the CPU time for unbiased dynamics increases with decreasing total subunit concentration c0 according to with n* the critical nucleus size, the MSM approach will scale as the minimum assembly time, (Ref. 129). This scaling could be further improved upon using coordinates that consider positions of free subunits (e.g., w described in Sec. 2B) or methods that implicitly account for diffusive dynamics (e.g., Refs. 45, 132).
Application to non-templated assembly
The state definitions described here can be readily generalized to account for multiple clusters to describe multiple scaffolds or empty capsid assembly. However, the size of the state space will increase exponentially with the number of clusters. Thus the method is most practical when only a few clusters assemble at any given time in a spatially correlated region, so that tracking only the largest few clusters is sufficient for state decomposition. Also see Ref. 141 for further discussion on simulations of small assembly systems.133 This situation occurs in empty capsid assembly when nucleation barriers are sufficiently large that nucleation timescales are slow in comparison to growth times.128, 129 While this condition might seem limiting, it is precisely the regime in which enhanced sampling methods can be expected to provide significant speedup, since there is a significant separation of timescales. Furthermore, most experiments are performed under these conditions, with concentrations of intermediates often so low as to be undetectable, because assembly is most robust against kinetic traps when nucleation is rate limiting.4, 99, 128, 129 The MSM approach described here enables simulating assembly at these experimentally relevant parameter values.
ACKNOWLEDGMENTS
We thank Vijay Pande for enlightening discussions and for suggesting to develop a coordinate based on subunit densities. This work was supported by Award Number R01GM108021 from the National Institute of General Medical Sciences. We also acknowledge support by NSF-MRSEC-0820492. Computational resources were provided by the National Science Foundation through XSEDE computing resources (Open Science Grid and Trestles) and the Brandeis HPCC.
APPENDIX A: MODEL DETAILS
Patchy sphere model
In this model, adapted from Ref. 122, the minimum energy structure is a complete capsid of 20 subunits encapsulating the nanoparticle. Subunits have a spherical excluded volume with three attractive patches, or bond vectors, that are separated by 108° and rotate rigidly with the subunit. The attractive interaction between two complementary bond vectors on respective subunits i and j is maximized when (1) the distance between the attractors is minimized, (2) the angle between bond vectors is minimized, and (3) the dihedral angle calculated from two secondary bond vectors, which are not involved in the primary interaction, is minimized. The schematic of subunit interactions is shown in Fig. 1. Minimizing creates an interaction that resists torsion and enforces angular specificity commensurate with a complete capsid. The potentials are given by Eq. A1122
| (A1) |
with a generalized truncated and shifted Lennard-Jones function:
| (A2) |
In Eq. A1, the index b sums over pairs of complementary bond vectors, Θ(x) is the Heaviside step function and Rij is the subunit center-to-center distance.
The nanoparticle is modeled as a spherical excluded volume with a Lennard-Jones potential whose argument is shifted so that the minimum occurs when subunits are on the surface:
| (A3) |
with r as the nanoparticle-subunit center-to-center distance, RS as the radius of the nanoparticle, and εS as a tuneable subunit-nanoparticle interaction strength. This potential qualitatively represents subunit electrostatic interactions with functionalized gold nanoparticles.115, 116, 117, 118, 119, 120, 121
Units and parameter values
Lengths have units of σ, the subunit diameter, energies have units of kBT, and times have units of t0 = σ2/D, where D is the subunit diffusion constant. The box side length is 18σ, the nanoparticle radius is RS = 0.9σ, grand canonical particle exchanges occur at least 9σ from the nanoparticle surface, and the reservoir subunit concentration is c0 = 0.005σ−3 (defined as c0 = Nσ3/L3 with N the number of subunits). To qualitatively estimate the subunit concentration, following the approach described in Ref. 72, we set the diameter of a subunit to σ = 7 nm based on the satellite tobacco mosaic virus (STMV) structure. The reservoir concentration is then 25 μM. The attractor cutoff values are rc = 2.5σ, θc = 1 and ϕc = π.
Triangles model
In this model taken from Ref. 72, the minimum energy structure is a complete capsid of 20 subunits assembled around a bead-spring polymer. The truncated pyramidal subunits are composed of two layers of excluders, which enforce excluded volume interactions, and two layers of capsomer attractors on the edges, which mimic hydrophobic and electrostatic attractions (see Fig. 10). The attractive interaction between subunits is maximized when the capsomer attractors are perfectly overlapping. In this situation, excluders on either side of the interface are separated by the cut off distance of their potential.
Figure 10.
Triangles model subunit geometry with distances marked. Excluders are gray, subunit attractors are teal, and the black circles depict the overlapping arrangement of excluders (polymer attractors not shown). Note that the position of the kth attractor on subunit i is given by , and the position of the lth excluder on subunit j is given by (as used in Eqs. A4, A5).
The polymeric cargo is a freely jointed chain of spherical monomers, which have an excluded volume that represents steric interactions and screened electrostatic repulsions. The model represents the polymer in good solvent, which behaves as a self-avoiding random walk with radius of gyration ; Np is the number of monomers and σb is the monomer diameter. The polymer has short-range interactions with the polymer attractors on the bottom of each subunit, which represent screened electrostatic interactions. All interactions in this model can be decomposed into pairwise potentials.
Capsomer-capsomer interactions
Capsomer subunit interactions decompose into pairwise interactions between their constituent building blocks—the excluders and attractors. Excluders in a subunit i experience a truncated Lennard-Jones-like potential when within a cutoff radius rc of excluders in subunit j ≠ i (i.e., subunits excluders repel each other when overlapping). Attractors in subunit i experience an attractive Lennard-Jones-like potential with commensurate attractors from another subunit j ≠ i. Commensurate attractors are defined to be the attractors which would overlap in a complete capsid:72
| (A4) |
where the first sum is over all pairs of excluders between subunits i and j and the second sum is over all commensurate attractor pairs between subunits i and j, with ɛcc as an adjustable parameter that tunes the attraction strength. The coordinates of the kth excluder/attractor are given by and is given by Eq. A2.
Capsomer-polymer interactions
The capsomer-polymer attraction is similar to the capsomer-capsomer interaction, with an attractive potential that is minimized when polymer beads overlap subunit attractors and a repulsive potential to account for excluded volume. For a capsid subunit i and polymer subunit j, the potential is
| (A5) |
with the first sum over all subunit excluders and the second sum over all polymer attractors on the subunit. ɛcp parameterizes the attraction strength for each attractor. σp is the diameter of a polymer bead and is set to 0.4σb. ξk is a factor that decreases the interaction strength for the outmost three polymer attractors on the subunit () to compensate for these sites overlapping in a complete capsid. ξk is set to one for all other attractors.
Polymer-polymer interactions
The polymer potential includes a “bonded” interaction between monomers that are nearest neighbors along the chain and a nonbonded, excluded volume interaction with all other polymer beads. For bead coordinates and (i ≠ j)
| (A6) |
Units and parameter values
The capsomer subunits have anisotropic translational and rotational diffusion constants calculated using Hydrosub7.C.72, 134 Lengths are scaled by σb and times are scaled by t0, which is the Brownian time for a sphere with diameter σb. The box side length is 40σb, grand canonical moves are performed at least 10σb from the center of the polymer, the reservoir concentration is , and the polymer length is 200. To qualitatively estimate the subunit concentration, we set the pseudoatom diameter to σb = 2.36 nm based on the STMV structure (see Ref. 72), which results in a reservoir concentration of 59 μM. The attractor and polymer bead diamaters are respectively σa = 0.2σb and σb = 0.4σb.
APPENDIX B: COLLECTIVE COORDINATES
Graph isomorphism
Here we describe a general approach to state decomposition for self-assembly reactions, which represents the bonding network of a growing cluster as an undirected graph (Fig. 11), with subunits as the nodes and strong interactions, which we denote as “bonds,” as the edges (see Fig. 11). Structures which differ only in the indices that label their subunits or fluctuations of bond geometries have the same graphs; to account for subunit index changes. We only applied this order parameter to class I subunits (i.e., the largest growing cluster) as described in Sec. 2B, but it can be generalized to class II subunits by considering subunit interactions with the scaffold. Isomorphic graphs were identified using the algorithm in the Boost C++ libraries.135, 136 While identification was fast for all cases we considered, faster algorithms are available (e.g., Refs. 137, 138), with further improvements possible in the case where all graphs are planar. The search process can be trivially parallelized by initially separating structures based on quantities such as number of subunits, bonds, and cycles.
Alternative structure-based order parameters
Through a priori knowledge of the system, it is possible to devise state decompositions which result in fewer states. For example, subunits attached to a cluster by only a single interaction are transient under typical assembly conditions.62 Graphs can therefore be simplified by pruning nodes which are connected to the rest of the graph by only a single edge. Further simplifications can be made by only recording features of the graph. For systems in which single bonds are relatively unstable, clusters tend to grow by sequential completion of polygons.73 One can then record the set of all cycles (which correspond to complete polygons) in the graph, or one can account only for the number of subunits n and number of bonds nb within a growing cluster. These two quantities can be combined into a single coordinate as
| (B1) |
with a parameter a that prevents degeneracy among typical structures by separating structures with different numbers of complete polygons. We used a = 5 or a = 10.
APPENDIX C: SIMULATIONS AND PROTOCOL FOR ESTIMATING THE TRANSITION MATRIX
This section describes the ratcheting procedure that we used to estimate the elements of the transition matrix. We began by performing ns simulations (with ns = 50 − 100), each started from initial configurations in which subunits had random positions and orientations (except that subunits were not allowed to overlap). Each simulation was run for a time ts, and snapshots were saved regularly to a database. Saved configurations were then classified into states. This classification could be based on any of the state decomposition approaches described in this article; we used the coordinate defined in Eq. B1. A second iteration was then started, in which ns new simulations of length ts were performed with snapshots regularly saved to the database. To efficiently estimate important elements of the transition matrix, initial configurations were preferentially chosen from states from which fewer simulations had already been initialized. Iterations were then repeated until the MSM converged (Fig. 7). This procedure is useful when no a priori knowledge is available about the system, since it both focuses sampling on poorly sampled states and identifies new states that arise through the natural dynamics of the system. However, once enough states have been gathered to construct an MSM, more sophisticated adaptive sampling is possible.47
Optimal values of the parameters ns and ts depend on both the system being simulated and the available computational resources and thus need to be chosen through trial and error. The simulation time ts must be longer than the lag time τ, but should not be too long in order to efficiently ratchet the system over free energy barriers. We found that a ts of around 20τ − 100τ usually worked well. The total number of simulations required to generate a converged MSM varied depending on the parameters, but we found that about 1000 simulations were sufficient for most parameter sets. We used a modified version of MSMBuilder86 to construct the transition matrix and to calculate its eigenvalues and eigenvectors. For most parameter sets, it was sufficient to calculate only the top 20 eigenvectors.
For the models that we consider here, dissociation of a subunit from a complete capsid occurs only on timescales long compared to the assembly time for most parameter sets. Thus, well-formed target states are effectively absorbing states on simulated timescales (see Refs. 4, 129, 139 for discussion); however, on rare occasions, capsids would form in strained configurations and eventually dissociate. When building the transition matrix, it was important to acquire sufficient statistics from short trajectories in the target state to balance the rare dissociations from strained target states. This need could be avoided by either more stringently defining the target state or by manually defining the target state as an absorbing state during transition matrix construction. Note that if one were interested in capturing the much longer time scale associated with dissociation of a well-formed target structure, additional coordinates describing fluctuations of the subunit-subunit bonds would be required to efficiently sample this transition.
APPENDIX D: EXPECTED SCALING OF THE METHOD WITH TARGET STRUCTURE SIZE
In this appendix, we consider scaling of the specific approaches described in this article with target structure size N. However, we note that the size of the state space and the number of simulations used to estimate transition rates can be considerably reduced by lumping and adaptive sampling.47
The total CPU time required depends on the number of transition rates to be estimated and the number of interacting particles in the short simulations used for estimation. We begin with the most general state decomposition, based on the graphs, which leads to the largest state space. While the number of possible graphs for a given model is exponential in the number of subunits N in the target structure, productive assembly reactions only visit a relatively small subset of possible structures.73, 140 Since the MSM state space includes only those states which are visited during estimation of the transition rate matrix, we expect only polynomial scaling. Although additional degrees of freedom, such as the number of subunits interacting with the scaffold (ns), increases the size of the state space, only a subset of ns are encountered for most values of the structural coordinate. Finally, since most transitions involve association or dissociation of one or a few subunits, the transition matrix is sparse, with the number of nonzero elements roughly proportional to its linear size.
To approximately quantify these scaling estimates, we built MSMs for patchy sphere models with sizes N = 12, 20, 30, 60. This analysis was complicated by the fact that the number of structures visited depends sensitively on the parameters (more aggressive assembly reactions visit more structures) and the details of how ratcheting is performed (longer or poorly designed ratcheting protocols find more statistically irrelevant structures). To minimize these variations as much as possible, we used parameter sets that led to similar assembly pathways at each size, and we built MSMs from long unbiased simulations. The numbers of graph structures and the total number of non-zero transition matrix elements for each model size N are shown in Fig. 12. For comparison, and to illustrate the effectiveness of other approaches, the data for MSMs built using ns and n is also plotted. The number of states scales between N and N2, whereas the number of unique transitions scales roughly linearly with N. It is worth noting that by eliminating transition rate elements with 5 or fewer recorded transitions, the total number of transition elements in the graph estimate was reduced by a factor of about 20 without affecting the accuracy of the MSM. A comparable reduction was achieved for the simplified state decomposition.
Figure 12.
(a) The number of states as a function of completed capsid size N. (We have included the trivial point N = 1 for which there are two structures.) (b) The number of unique transitions between states as a function of N. Both plots are from MSMs built using either (ns, n) or (ns, graphIso). ns is the number of subunits adsorbed to the nanoparticle, n is the largest cluster size, and graphIso is the graph isomorphism. The model for a size 20 capsid is described in Sec. 3; sizes 30 and 60 were developed in Refs. 62, 122 (respectively). Size 12 is the same as the size 20 model, except with modified bond vectors for pentamer subunits to make a dodecahedron.
To estimate the total CPU time required to build an MSM, we note that the average number of particles in each simulation will scale ∼N. Thus, based on Fig. 12, the total CPU time required will scale as ∼N2.
References
- Caspar D. L. D. and Klug A., Cold Spring Harbor Symp. Quant. Biol. 27, 1 (1962). 10.1101/SQB.1962.027.001.005 [DOI] [PubMed] [Google Scholar]
- Zlotnick A. and Mukhopadhyay S., Trends Microbiol. 19, 14 (2011). 10.1016/j.tim.2010.11.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Speir J. A. and Johnson J. E., Curr. Opin. Struct. Biol. 22, 65 (2012). 10.1016/j.sbi.2011.11.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hagan M., Adv. Chem. Phys. 155, 1 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Y., Meyer R., and Hagan M. F., Phys. Rev. Lett. 104, 258102 (2010). 10.1103/PhysRevLett.104.258102 [DOI] [PubMed] [Google Scholar]
- Whitelam S., Phys. Rev. Lett. 105, 088102 (2010). 10.1103/PhysRevLett.105.088102 [DOI] [PubMed] [Google Scholar]
- Sacanna S., Irvine W. T. M., Chaikin P. M., and Pine D. J., Nature (London) 464, 575 (2010). 10.1038/nature08906 [DOI] [PubMed] [Google Scholar]
- Duguet E., Desert A., Perro A., and Ravaine S., Chem. Soc. Rev. 40, 941 (2011). 10.1039/c0cs00048e [DOI] [PubMed] [Google Scholar]
- Yang Y., Gao L., Lopez G. P., and Yellen B. B., ACS Nano 7, 2705 (2013). 10.1021/nn400118e [DOI] [PubMed] [Google Scholar]
- Whitelam S., Tamblyn I., Haxton T. K., Wieland M. B., Champness N. R., Garrahan J. P., and Beton P. H., Phys. Rev. X 4, 011044 (2014). 10.1103/PhysRevX.4.011044 [DOI] [Google Scholar]
- Wang Y., Wang Y., Breed D. R., Manoharan V. N., Feng L., Hollingsworth A. D., Weck M., and Pine D. J., Nature (London) 491, 51 (2012). 10.1038/nature11564 [DOI] [PubMed] [Google Scholar]
- Wang Y., Hollingsworth A. D., Yang S. K., Patel S., Pine D. J., and Weck M., J. Am. Chem. Soc. 135, 14064 (2013). 10.1021/ja4075979 [DOI] [PubMed] [Google Scholar]
- Walther A. and Mueller A. H. E., Chem. Rev. 113, 5194 (2013). 10.1021/cr300089t [DOI] [PubMed] [Google Scholar]
- van Anders G., Ahmed N. K., Smith R., Engel M., and Glotzer S. C., ACS Nano 8, 931 (2014). 10.1021/nn4057353 [DOI] [PubMed] [Google Scholar]
- Groeschel A. H., Walther A., Loebling T. I., Schacher F. H., Schmalz H., and Mueller A. H. E., Nature (London) 503, 247 (2013). 10.1038/nature12610 [DOI] [PubMed] [Google Scholar]
- Rothemund P., Nature (London) 440, 297 (2006). 10.1038/nature04586 [DOI] [PubMed] [Google Scholar]
- Sacca B. and Niemeyer C. M., Angew. Chem., Int. Ed. 51, 58 (2012). 10.1002/anie.201105846 [DOI] [PubMed] [Google Scholar]
- Torring T., Voigt N. V., Nangreave J., Yan H., and Gothelf K. V., Chem. Soc. Rev. 40, 5636 (2011). 10.1039/c1cs15057j [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pan A. C., Sezer D., and Roux B., J. Phys. Chem. B 112, 3432 (2008). 10.1021/jp0777059 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ovchinnikov V., Karplus M., and Vanden-Eijnden E., J. Chem. Phys. 134, 085103 (2011). 10.1063/1.3544209 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bolhuis P. G., Chandler D., Dellago C., and Geissler P. L., Annu. Rev. Phys. Chem. 53, 291 (2002). 10.1146/annurev.physchem.53.082301.113146 [DOI] [PubMed] [Google Scholar]
- Fischer S., Olsen K. W., Nam K., and Karplus M., Proc. Natl. Acad. Sci. U.S.A. 108, 5608 (2011). 10.1073/pnas.1011995108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elber R., Biophys. J. 92, L85 (2007). 10.1529/biophysj.106.101899 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pietrucci F., Marinelli F., Carloni P., and Laio A., J. Am. Chem. Soc. 131, 11811 (2009). 10.1021/ja903045y [DOI] [PubMed] [Google Scholar]
- Lei M., Zavodszky M. I., Kuhn L. A., and Thorpe M. F., J. Comput. Chem. 25, 1133 (2004). 10.1002/jcc.20041 [DOI] [PubMed] [Google Scholar]
- Moroni D., Bolhuis P. G., and van Erp T. S., J. Chem. Phys. 120, 4055 (2004). 10.1063/1.1644537 [DOI] [PubMed] [Google Scholar]
- Dickson A. and Dinner A. R., Annu. Rev. Phys. Chem. 61, 441 (2010). 10.1146/annurev.physchem.012809.103433 [DOI] [PubMed] [Google Scholar]
- Allen R. J., Warren P. B., and ten Wolde P. R., Phys. Rev. Lett. 94, 018104 (2005). 10.1103/PhysRevLett.94.018104 [DOI] [PubMed] [Google Scholar]
- Pfaendtner J., Branduardi D., Parrinello M., Pollard T. D., and Voth G. A., Proc. Natl. Acad. Sci. U.S.A. 106, 12723 (2009). 10.1073/pnas.0902092106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barducci A., Bonomi M., and Parrinello M., Biophys. J. 98, L44 (2010). 10.1016/j.bpj.2010.01.033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang B. W., Jasnow D., and Zuckerman D. M., Proc. Natl. Acad. Sci. U.S.A. 104, 18043 (2007). 10.1073/pnas.0706349104 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huber G. A. and Kim S., Biophys. J. 70, 97 (1996). 10.1016/S0006-3495(96)79552-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferguson A. L., Panagiotopoulos A. Z., Kevrekidis I. G., and Debenedetti P. G., Chem. Phys. Lett. 509, 1 (2011). 10.1016/j.cplett.2011.04.066 [DOI] [PubMed] [Google Scholar]
- van Erp T. S., “Dynamical rare event simulation techniques for equilibrium and nonequilibrium systems,” in Kinetics and Thermodynamics of Multistep Nucleation and Self-Assembly in Nanoscale Materials, Advances in Chemical Physics Vol. 151, edited by Nicolis G. and Maes D. (Wiley & Sons, 2012), pp. 27–60. 10.1002/9781118309513.ch2 [DOI]
- Becker N. B., Allen R. J., and ten Wolde P. R., J. Chem. Phys. 136, 174118 (2012). 10.1063/1.4704810 [DOI] [PubMed] [Google Scholar]
- Jamalyaria F., Rohlfs R., and Schwartz R., J. Comput. Phys. 204, 100 (2005). 10.1016/j.jcp.2004.10.004 [DOI] [Google Scholar]
- Keef T., Micheletti C., and Twarock R., J. Theor. Biol. 242, 713 (2006). 10.1016/j.jtbi.2006.04.023 [DOI] [PubMed] [Google Scholar]
- Hemberg M., Yaliraki S., and Barahona M., Biophys. J. 90, 3029 (2006). 10.1529/biophysj.105.076737 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dykeman E. C., Stockley P. G., and Twarock R., Phys. Rev. E 87, 022717 (2013). 10.1103/PhysRevE.87.022717 [DOI] [PubMed] [Google Scholar]
- Sweeney B., Zhang T., and Schwartz R., Biophys. J. 94, 772 (2008). 10.1529/biophysj.107.107284 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang T. Q. and Schwartz R., Biophys. J. 90, 57 (2006). 10.1529/biophysj.105.072207 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Misra N., Lees D., Zhang T. Q., and Schwartz R., Comput. Math. Method Med. 9, 277 (2008). 10.1080/17486700802168379 [DOI] [Google Scholar]
- Kumar M. S. and Schwartz R., Phys. Biol. 7, 045005 (2010). 10.1088/1478-3975/7/4/045005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xie L., Smith G., Feng X., and Schwartz R., Biophys. J. 103, 1545 (2012). 10.1016/j.bpj.2012.08.057 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith G. R., Xie L., Lee B., and Schwartz R., Biophys. J. 106, 310 (2014). 10.1016/j.bpj.2013.11.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bowman G. R., Voelz V. A., and Pande V. S., J. Am. Chem. Soc. 133, 664 (2011). 10.1021/ja106936n [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bowman G. R., Ensign D. L., and Pande V. S., J. Chem. Theory Comput. 6, 787 (2010). 10.1021/ct900620b [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bowman G. R., Beauchamp K. A., Boxer G., and Pande V. S., J. Chem. Phys. 131, 124101 (2009). 10.1063/1.3216567 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Swope W. C., Pitera J. W., Suits F., Pitman M., Eleftheriou M., Fitch B. G., Germain R. S., Rayshubski A., Ward T. J. C., Zhestkov Y., and Zhou R., J. Phys. Chem. B 108, 6582 (2004). 10.1021/jp037422q [DOI] [Google Scholar]
- Swope W. C., Pitera J. W., and Suits F., J. Phys. Chem. B 108, 6571 (2004). 10.1021/jp037421y [DOI] [Google Scholar]
- Prinz J.-H., Chodera J. D., Pande V. S., Swope W. C., Smith J. C., and Noe F., J. Chem. Phys. 134, 244108 (2011). 10.1063/1.3592153 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Park S., Ensign D. L., and Pande V. S., Phys. Rev. E 74, 066703 (2006). 10.1103/PhysRevE.74.066703 [DOI] [PubMed] [Google Scholar]
- Pande V. S., Beauchamp K., and Bowman G. R., Methods 52, 99 (2010). 10.1016/j.ymeth.2010.06.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Noe F., Schutte C., Vanden-Eijnden E., Reich L., and Weikl T. R., Proc. Natl. Acad. Sci. U.S.A. 106, 19011 (2009). 10.1073/pnas.0905466106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lane T. J., Bowman G. R., Beauchamp K., Voelz V. A., and Pande V. S., J. Am. Chem. Soc. 133, 18413 (2011). 10.1021/ja207470h [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jayachandran G., Vishal V., and Pande V. S., J. Chem. Phys. 124, 164902 (2006). 10.1063/1.2186317 [DOI] [PubMed] [Google Scholar]
- Chodera J. D., Swope W. C., Noe F., Prinz J.-H., Shirts M. R., and Pande V. S., J. Chem. Phys. 134, 244107 (2011). 10.1063/1.3592152 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hinrichs N. S. and Pande V. S., J. Chem. Phys. 126, 244101 (2007). 10.1063/1.2740261 [DOI] [PubMed] [Google Scholar]
- Chodera J. D., Singhal N., Pande V. S., Dill K. A., and Swope W. C., J. Chem. Phys. 126, 155101 (2007). 10.1063/1.2714538 [DOI] [PubMed] [Google Scholar]
- Deuflhard P. and Weber M., Linear Algebra Its Appl. 398, 161 (2005). 10.1016/j.laa.2004.10.026 [DOI] [Google Scholar]
- Schwartz R., Shor P. W., Prevelige P. E., and Berger B., Biophys. J. 75, 2626 (1998). 10.1016/S0006-3495(98)77708-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hagan M. F. and Chandler D., Biophys. J. 91, 42 (2006). 10.1529/biophysj.105.076851 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nguyen H. D., Reddy V. S., and Brooks C. L., Nano Lett. 7, 338 (2007). 10.1021/nl062449h [DOI] [PubMed] [Google Scholar]
- Wilber A. W., Doye J. P. K., Louis A. A., Noya E. G., Miller M. A., and Wong P., J. Chem. Phys. 127, 085106 (2007). 10.1063/1.2759922 [DOI] [PubMed] [Google Scholar]
- Nguyen H. and Brooks C., Nano Lett. 8, 4574 (2008). 10.1021/nl802828v [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nguyen H. D., Reddy V. S., and Brooks C. L., J. Am. Chem. Soc. 131, 2606 (2009). 10.1021/ja807730x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnston I. G., Louis A. A., and Doye J. P. K., J. Phys.: Condens. Matter 22, 104101 (2010). 10.1088/0953-8984/22/10/104101 [DOI] [PubMed] [Google Scholar]
- Wilber A. W., Doye J. P. K., Louis A. A., and Lewis A. C. F., J. Chem. Phys. 131, 175102 (2009). 10.1063/1.3243581 [DOI] [PubMed] [Google Scholar]
- Rapaport D. C., Johnson J. E., and Skolnick J., Comput. Phys. Commun. 121–122, 231 (1999). 10.1016/S0010-4655(99)00319-7 [DOI] [Google Scholar]
- Rapaport D., Phys. Rev. E 70, 051905 (2004). 10.1103/PhysRevE.70.051905 [DOI] [PubMed] [Google Scholar]
- Rapaport D., Phys. Rev. Lett. 101, 186101 (2008). 10.1103/PhysRevLett.101.186101 [DOI] [PubMed] [Google Scholar]
- Elrad O. and Hagan M. F., Phys. Biol. 7, 045003 (2010). 10.1088/1478-3975/7/4/045003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hagan M. F., Elrad O. M., and Jack R. L., J. Chem. Phys. 135, 104115 (2011). 10.1063/1.3635775 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mahalik J. P. and Muthukumar M., J. Chem. Phys. 136, 135101 (2012). 10.1063/1.3698408 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Perlmutter J. D., Qiao C., and Hagan M. F., eLife 2, e00632 (2013). 10.7554/eLife.00632 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rapaport D. C., Phys. Rev. E 86, 051917 (2012). 10.1103/PhysRevE.86.051917 [DOI] [PubMed] [Google Scholar]
- Hicks S., “Statistical mechanical models of virus capsid assembly,” Ph.D. thesis (Cornell University, 2010). [Google Scholar]
- Sriraman S., Kevrekidis I. G., and Hummer G., J. Phys. Chem. B 109, 6479 (2005). 10.1021/jp046448u [DOI] [PubMed] [Google Scholar]
- Chodera J. D., Swope W. C., Pitera J. W., and Dill K. A., Multiscale Model. Simul. 5, 1214 (2006). 10.1137/06065146X [DOI] [Google Scholar]
- Sorin E. J. and Pande V. S., Biophys. J. 88, 2472 (2005). 10.1529/biophysj.104.051938 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elmer S. P., Park S., and Pande V. S., J. Chem. Phys. 123, 114903 (2005). 10.1063/1.2008230 [DOI] [PubMed] [Google Scholar]
- Andrec M., Felts A. K., Gallicchio E., and Levy R. M., Proc. Natl. Acad. Sci. U.S.A. 102, 6801 (2005). 10.1073/pnas.0408970102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Groot B. L., Daura X., Mark A. E., and Grubmuller H., J. Mol. Biol. 309, 299 (2001). 10.1006/jmbi.2001.4655 [DOI] [PubMed] [Google Scholar]
- Karpen M. E., Tobias D. J., and Brooks C. L., Biochemistry 32, 412 (1993). 10.1021/bi00053a005 [DOI] [PubMed] [Google Scholar]
- Singhal N., Snow C. D., and Pande V. S., J. Chem. Phys. 121, 415 (2004). 10.1063/1.1738647 [DOI] [PubMed] [Google Scholar]
- Bowman G. R., Huang X., and Pande V. S., Methods 49, 197 (2009). 10.1016/j.ymeth.2009.04.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beauchamp K. A., Bowman G. R., Lane T. J., Maibaum L., Haque I. S., and Pande V. S., J. Chem. Theory Comput. 7, 3412 (2011). 10.1021/ct200463m [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prinz J.-H., Wu H., Sarich M., Keller B., Senne M., Held M., Chodera J. D., Schtte C., and Noe F., J. Chem. Phys. 134, 174105 (2011). 10.1063/1.3565032 [DOI] [PubMed] [Google Scholar]
- Beauchamp K. A., McGibbon R., Lin Y.-S., and Pande V. S., Proc. Natl. Acad. Sci. U.S.A. 109, 17807 (2012). 10.1073/pnas.1201810109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Voelz V. A., Bowman G. R., Beauchamp K., and Pande V. S., J. Am. Chem. Soc. 132, 1526 (2010). 10.1021/ja9090353 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bowman G. R. and Geissler P. L., Proc. Natl. Acad. Sci. U.S.A. 109, 11681 (2012). 10.1073/pnas.1209309109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Perez-Hernandez G., Paul F., Giorgino T., De Fabritiis G., and Noé F., J. Chem. Phys. 139, 015102 (2013). 10.1063/1.4811489 [DOI] [PubMed] [Google Scholar]
- Schwantes C. R. and Pande V. S., J. Chem. Theory Comput. 9, 2000 (2013). 10.1021/ct300878a [DOI] [PMC free article] [PubMed] [Google Scholar]
- Senne M., Trendelkamp-Schroer B., Mey A. S., Schtte C., and No F., J. Chem. Theory Comput. 8, 2223 (2012). 10.1021/ct300274u [DOI] [PubMed] [Google Scholar]
- Gu C., Huang-Wei C., Maibaum L., Pande V. S., Carlsson G. E., and Guibas L. J., BMC Bioinf. 14, 1 (2013). 10.1186/1471-2105-14-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grant J., Jack R. L., and Whitelam S., J. Chem. Phys. 135, 214505 (2011). 10.1063/1.3662140 [DOI] [PubMed] [Google Scholar]
- Zlotnick A., Virology 315, 269 (2003). 10.1016/S0042-6822(03)00586-5 [DOI] [PubMed] [Google Scholar]
- Prevelige P. E., Thomas D., and King J., Biophys. J. 64, 824 (1993). 10.1016/S0006-3495(93)81443-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zlotnick A., Johnson J. M., Wingfield P. W., Stahl S. J., and Endres D., Biochemistry 38, 14644 (1999). 10.1021/bi991611a [DOI] [PubMed] [Google Scholar]
- Zlotnick A., Aldrich R., Johnson J. M., Ceres P., and Young M. J., Virology 277, 450 (2000). 10.1006/viro.2000.0619 [DOI] [PubMed] [Google Scholar]
- Casini G. L., Graham D., Heine D., Garcea R. L., and Wu D. T., Virology 325, 320 (2004). 10.1016/j.virol.2004.04.034 [DOI] [PubMed] [Google Scholar]
- Chen C., Kao C. C., and Dragnea B., J. Phys. Chem. A 112, 9405 (2008). 10.1021/jp802498z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berthet-Colominas C., Cuillel M., Koch M. H. J., Vachette P., and Jacrot B., Eur. Biophys. J. 15, 159 (1987). 10.1007/BF00263680 [DOI] [Google Scholar]
- Kler S., Asor R., Li C., Ginsburg A., Harries D., Oppenheim A., Zlotnick A., and Raviv U., J. Am. Chem. Soc. 134, 8823 (2012). 10.1021/ja2110703 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tsiang M., Niedziela-Majka A., Hung M., Jin D., Hu E., Yant S., Samuel D., Liu X., and Sakowicz R., Biochemistry 51, 4416 (2012). 10.1021/bi300052h [DOI] [PubMed] [Google Scholar]
- E W. and Vanden-Eijnden E., Annu. Rev. Phys. Chem. 61, 391 (2010). 10.1146/annurev.physchem.040808.090412 [DOI] [PubMed] [Google Scholar]
- Metzner P., Schütte C., and Vanden-Eijnden E., J. Chem. Phys. 125, 084110 (2006). 10.1063/1.2335447 [DOI] [PubMed] [Google Scholar]
- Noé F. and Fischer S., Curr. Opin. Struct. Biol. 18, 154 (2008). 10.1016/j.sbi.2008.01.008 [DOI] [PubMed] [Google Scholar]
- Metzner P., Schutte C., and Vanden-Eijnden E., Multiscale Model. Simul. 7, 1192 (2009). 10.1137/070699500 [DOI] [Google Scholar]
- Dellago C., Bolhuis P. G., and Geissler P. L., Adv. Chem. Phys. 123, 1 (2002). 10.1002/0471231509.ch1 [DOI] [Google Scholar]
- Prinz J.-H., Held M., Smith J. C., and No F., Multiscale Model. Simul. 9, 545 (2011). 10.1137/100789191 [DOI] [Google Scholar]
- Levandovsky A. and Zandi R., Phys. Rev. Lett. 102, 198102 (2009). 10.1103/PhysRevLett.102.198102 [DOI] [PubMed] [Google Scholar]
- Hicks S. D. and Henley C. L., Phys. Rev. E 74, 031912 (2006). 10.1103/PhysRevE.74.031912 [DOI] [PubMed] [Google Scholar]
- Wales D. J., Philos. Trans. R. Soc. A 363, 357 (2005). 10.1098/rsta.2004.1497 [DOI] [PubMed] [Google Scholar]
- Tsvetkova I., Chen C., Rana S., Kao C. C., Rotello V. M., and Dragnea B., Soft Matter 8, 4571 (2012). 10.1039/c2sm00024e [DOI] [Google Scholar]
- Sun J., DuFort C., Daniel M. C., Murali A., Chen C., Gopinath K., Stein B., De M., Rotello V. M., Holzenburg A., Kao C. C., and Dragnea B., Proc. Natl. Acad. Sci. U.S.A. 104, 1354 (2007). 10.1073/pnas.0610542104 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang X., Bronstein L. M., Retrum J., Dufort C., Tsvetkova I., Aniagyei S., Stein B., Stucky G., McKenna B., Remmes N., Baxter D., Kao C. C., and Dragnea B., Nano Lett. 7, 2407 (2007). 10.1021/nl071083l [DOI] [PubMed] [Google Scholar]
- Dixit S. K., Goicochea N. L., Daniel M. C., Murali A., Bronstein L., De M., Stein B., Rotello V. M., Kao C. C., and Dragnea B., Nano Lett. 6, 1993 (2006). 10.1021/nl061165u [DOI] [PubMed] [Google Scholar]
- Daniel M.-C., Tsvetkova I. B., Quinkert Z. T., Murali A., De M., Rotello V. M., Kao C. C., and Dragnea B., ACS Nano 4, 3853 (2010). 10.1021/nn1005073 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen C., Kwak E. S., Stein B., Kao C. C., and Dragnea B., J. Nanosci. Nanotechnol. 5, 2029 (2005). 10.1166/jnn.2005.506 [DOI] [PubMed] [Google Scholar]
- Chen C., Daniel M. C., Quinkert Z. T., De M., Stein B., Bowman V. D., Chipman P. R., Rotello V. M., Kao C. C., and Dragnea B., Nano Lett. 6, 611 (2006). 10.1021/nl0600878 [DOI] [PubMed] [Google Scholar]
- Hagan M. F., Phys. Rev. E 77, 051904 (2008). 10.1103/PhysRevE.77.051904 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hagan M. F., J. Chem. Phys. 130, 114902 (2009). 10.1063/1.3086041 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elrad O. M. and Hagan M. F., Nano Lett. 8, 3850 (2008). 10.1021/nl802269a [DOI] [PMC free article] [PubMed] [Google Scholar]
- Siber A., Zandi R., and Podgornik R., Phys. Rev. E 81, 051919 (2010). 10.1103/PhysRevE.81.051919 [DOI] [PubMed] [Google Scholar]
- He L., Porterfield Z., van der Schoot P., Zlotnick A., and Dragnea B., ACS Nano 7, 8447 (2013). 10.1021/nn4017839 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Branka A. and Heyes D., Phys. Rev. E 60, 2381 (1999); 10.1103/PhysRevE.60.2381 [DOI] [PubMed] [Google Scholar]; Heyes D. and Branka A., Mol. Phys. 98, 1949 (2000). 10.1080/00268970009483398 [DOI] [Google Scholar]
- Endres D. and Zlotnick A., Biophys. J. 83, 1217 (2002). 10.1016/S0006-3495(02)75245-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hagan M. F. and Elrad O., Biophys. J. 98, 1065 (2010). 10.1016/j.bpj.2009.11.023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Convergence required a 50% increase in total simulation time because statistics were reduced by the increased number of states (18000 states with the graph compared to 220 with the simple state definition). However, this increase could be at least partially eliminated through lumping of microstates.
- Garmann R. F., Comas-Garcia M., Gopal A., Knobler C. M., and Gelbart W. M., J. Mol. Biol. 426, 1050 (2013). 10.1016/j.jmb.2013.10.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Zon J. and ten Wolde P., J. Chem. Phys. 123, 234910 (2005). 10.1063/1.2137716 [DOI] [PubMed] [Google Scholar]
- Note that characterization of the full time course of assembly kinetics for a system in which subunits are the limiting reagent will require multiple simulations at different subunit concentrations.
- de la Torre J. G. and Carrasco B., Biopolymers 63, 163 (2002). 10.1002/bip.10013 [DOI] [PubMed] [Google Scholar]
- Fortin S., “The graph isomorphism problem,” Technical Report No. 96-20 (Department of Computer Science, University of Alberta, 1996).
- Reingold N. and Deo, Combinatorial Algorithms: Theory and Practice (Prentice Hall, 1977). [Google Scholar]
- McKay B. D., Practical Graph Isomorphism (Congressus Numerantium, 1981). [Google Scholar]
- Luks E. M., J. Comput. Syst. Sci. 25, 42 (1982). 10.1016/0022-0000(82)90009-5 [DOI] [Google Scholar]
- Zlotnick A., J. Mol. Biol. 366, 14 (2007). 10.1016/j.jmb.2006.11.034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moisant P., Neeman H., and Zlotnick A., Biophys. J. 99, 1350 (2010). 10.1016/j.bpj.2010.06.030 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ouldridge T. E., Louis A. A., and Doye J. P. K., J. Phys.: Condens. Matter 22, 104102 (2010). 10.1088/0953-8984/22/10/104102 [DOI] [PubMed] [Google Scholar]








