Skip to main content
The Journal of Chemical Physics logoLink to The Journal of Chemical Physics
. 2015 Nov 9;143(24):243123. doi: 10.1063/1.4935066

CAMELOT: A machine learning approach for coarse-grained simulations of aggregation of block-copolymeric protein sequences

Kiersten M Ruff 1, Tyler S Harmon 2, Rohit V Pappu 3,a)
PMCID: PMC4644154  PMID: 26723608

Abstract

We report the development and deployment of a coarse-graining method that is well suited for computer simulations of aggregation and phase separation of protein sequences with block-copolymeric architectures. Our algorithm, named CAMELOT for Coarse-grained simulations Aided by MachinE Learning Optimization and Training, leverages information from converged all atom simulations that is used to determine a suitable resolution and parameterize the coarse-grained model. To parameterize a system-specific coarse-grained model, we use a combination of Boltzmann inversion, non-linear regression, and a Gaussian process Bayesian optimization approach. The accuracy of the coarse-grained model is demonstrated through direct comparisons to results from all atom simulations. We demonstrate the utility of our coarse-graining approach using the block-copolymeric sequence from the exon 1 encoded sequence of the huntingtin protein. This sequence comprises of 17 residues from the N-terminal end of huntingtin (N17) followed by a polyglutamine (polyQ) tract. Simulations based on the CAMELOT approach are used to show that the adsorption and unfolding of the wild type N17 and its sequence variants on the surface of polyQ tracts engender a patchy colloid like architecture that promotes the formation of linear aggregates. These results provide a plausible explanation for experimental observations, which show that N17 accelerates the formation of linear aggregates in block-copolymeric N17-polyQ sequences. The CAMELOT approach is versatile and is generalizable for simulating the aggregation and phase behavior of a range of block-copolymeric protein sequences.

I. INTRODUCTION

Protein sequences with block-copolymeric architectures can aggregate and drive reversible liquid-liquid demixing or sol-gel transitions that give rise to dense liquid or gel-like phases.1–17 A subset of these sequences also form amorphous or semi-crystalline solids such as amyloid fibers.1,18–30 The sequence blocks in these proteins are either well-folded domains or intrinsically disordered regions (IDRs). The latter refer to regions that fail to fold into unique three-dimensional structures as autonomous units.31–34 Accordingly, depending on their architecture, protein sequences that drive aggregation and phase separation may be classified as being intrinsically disordered or partially disordered block-copolymers (Figure 1). Molecular simulations can play a prominent role in uncovering the physical principles that govern the phase behavior of block-copolymeric proteins. Since aggregation and phase separation are driven by collective interactions among large numbers of polymeric molecules, the underlying physics of phase transitions and the demands of computational efficiency mandate the development and deployment of coarse-grained simulations.35–38 Here, we report the design and implementation of a method for systematic coarse-graining that can be adapted to the two classes of block-copolymeric protein sequences depicted in Figure 1. The algorithm is named CAMELOT and it stands for Coarse-grained simulations Aided by MachinE Learning Optimization and Training. We demonstrate the utility of the CAMELOT method by deploying it in Langevin dynamics simulations of archetypal block-copolymeric sequences that are based on the exon 1 encoded intrinsically disordered region of the huntingtin protein.

FIG. 1.

FIG. 1.

Example (a) intrinsically disordered block-copolymeric and (b) partially disordered block-copolymeric sequences implicated in aggregation and phase separation. Sequences can be broken up into blocks consisting of ordered domains (circles) or intrinsically disordered regions with compositional biases (rounded rectangles). (a) Htt Exon 1 (UniProt ID: P42858) and Sup35p NM (Uniprot ID: P05453) are archetypal intrinsically disordered block-copolymeric sequences. Htt Exon 1 forms insoluble inclusions that are the pathological hallmarks of Huntington’s disease.78 Sup35p NM forms insoluble amyloid fibrils and propagates as a prion.128,129 (b) NCK1 (Uniprot ID: P16333) and Whi3p (Uniprot ID: Q75E28) are archetypal partially disordered block-copolymeric sequences. NCK1 is involved in the formation of liquid-like micron-sized membraneless organelles important for signal integration.130 Whi3p is involved in the formation of functional assemblies important for branching and cell cycle control in filamentous fungi.18

There are two distinct paradigms for coarse-graining.39 One is based on fixed resolution transferrable models with consensus parameters.40–43 The CAMELOT method follows a different paradigm whereby system-specific coarse-grained models are designed using information gleaned from prior finer-grained simulations. The general outline of the CAMELOT method is as follows: For the block-copolymeric sequence of interest, we first perform converged all atom simulations aided by enhanced sampling methods to obtain descriptions of conformational ensembles for individual molecules and oligomers. We use the information gleaned from the all atom simulations to prescribe the resolution, forcefield architecture, and parameters for the coarse-grained model. The questions we wish to answer and the length scales that we wish to access in our simulations will dictate the choice of resolution. The parameterization of the model of choice is based on a combination of Boltzmann inversion,44,45 regression analysis, and a Gaussian process Bayesian optimization approach.46 Our design of CAMELOT is guided by the successes of previous methods that include the force-matching algorithm of Voth and coworkers,47–58 the Yvon-Born-Green formalism of Noid and coworkers,59–61 and the relative entropy method of Shell and coworkers.62–64

The remainder of the text is organized as follows: Section II describes the CAMELOT algorithm. Section III describes the numerical methods we use for all atom and coarse-grained simulations. In Section IV, we apply the CAMELOT method in coarse-grained simulations of archetypal block-copolymeric sequences. Our results help to establish the accuracy of the CAMELOT method. Section V demonstrates our ability to answer specific questions regarding the impact of N-terminal flanking sequences as modulators of the aggregation of polyglutamine tracts. Section VI discusses the physical insights that emerge regarding the polyglutamine containing systems and possible generalizations of the CAMELOT approach to studies of aggregation and phase behavior in other block-copolymeric systems.

II. CAMELOT METHOD

A. Choosing the resolutions for coarse-grained simulations

In coarse-grained simulations we wish to achieve the optimal balance of relevance, efficiency, and accuracy. Relevance refers to the fact that aggregation is a multimolecular phenomenon and requires the incorporation of O(103) molecules in each simulation. However, such simulations need to be efficient while also being accurate. This tripartite balance among relevance, efficiency, and accuracy is not achievable using all atom representations of molecules and this in turn necessitates the use of coarse-grained models. However, the latter can become generic and inaccurate for the specific system in question if the model is not designed to achieve both efficiency and accuracy. Block-copolymeric systems are well suited for coarse-grained simulations because the degree of coupling within and between blocks is distinctly different. The block-copolymeric nature of IDRs emerges from compositional-to-conformational relationships of these sequences.34,65 These relationships are governed by specific compositional biases such as the fraction of charged residues and the presence or absence of stretches of polar amino acids. Depending on the compositional biases, IDRs can adopt either globular or random-coil-like conformations. In contrast to IDRs, the ordered domains are essentially semi-rigid globules. Accordingly, we deploy two categories of coarse-grained beads whereby the beads mimic either individual residues or groups of residues (domains) that are modeled as colloidal beads.

A residue bead represents a single amino acid residue that is centered on its center-of-mass. In the CAMELOT model, there are three distinct groups of residue beads. The charged amino acids (Arg, Asp, Glu, Lys) belong to group-Ch, the strongly interacting residues (Asn, His, Ile, Leu, Gln, Met, Phe, Trp, Tyr, Val) belong to group-S, and the weakly interacting amino acids (Ala, Cys, Gly, Pro, Ser, Thr) belong to group-W. We derived these groupings from the amino acid solubility data of Auton and Bolen.66 These data quantify the free energy changes associated with the transfer of an amino acid from an aqueous solution to a crystalline state. The solubility data are relevant because they capture the competition that characterizes the transfer of proteins from their dispersed states in aqueous solutions to aggregates that are rich in protein-protein as opposed to protein-solvent interactions.

We use colloidal beads for groups of residues that correspond to ordered globular domains or IDRs that adopt globular conformations. We capture the effects of internal structures within colloidal beads using suitable inter-particle potentials that we glean from all atom simulations of pairwise associations of ordered domains and/or IDRs. The use of colloidal beads reduces the number of beads in the system enabling the inclusion of O(103) to O(104) molecules in coarse-grained simulations. This allows us to access length scales that are directly relevant to experimental studies of aggregation and phase separation of block-copolymeric sequences.

B. Effective energy functions for coarse-grained simulations

The effective energy functions for CAMELOT based coarse-grained simulations take the form

Weff=Wb+Wθ+Wϕ+WLJ+Wel+WC. (1)

In Equation (1), the terms on the right hand side, respectively, correspond to bond length, bond angle, dihedral angle, Lennard-Jones, electrostatic, and colloidal bead interaction potentials. The choice of the resolution for coarse-grained simulations fixes the terms in Equation (1). If each residue is modeled as a single bead, then WC = 0. Conversely, if every molecule is modeled as a single colloidal bead, then all terms except WC become zero. All of the terms in Equation (1) are part of the effective energy function for hybrid resolutions that include residue specific beads and domain specific colloidal beads that correspond to groups of residues. The effective energy function is parameterized against data gathered from all atom simulations that are obtained at a specific temperature T. The specific parameters for Weff are, therefore, valid for the particular temperature T at which the all atom simulations are performed. Each of the terms in Equation (1) is discussed in detail below.

1. Bonded interactions

Flexible bonds connect pairs of consecutive beads and there are preferred values for the lengths of these bonds and the angles that define the junctions at pairs of bonds. In terms of Nb and Nθ, the total number of bonds and bond angles in the system, the bond length and bond angle terms in Equation (1) are written as

Wb=i=1NbKibib0i22,Wθ=i=1NθLiθiθ0i22. (2)

In Equation (2), Ki and Li are the force constants that quantify the penalties associated with deforming bonds and bending bond angles beyond the corresponding equilibrium values of b0i and θ0i. The energies associated with rotations around bonds that connect four consecutive beads are defined in terms of the dihedral angle potential, which is a Fourier series of the form

Wϕ=i=1Nϕn=13Vni1cosnϕiϕni. (3)

In Equation (3), Nϕ refers to the number of dihedral angles, n is the number of terms in the Fourier series, Vni is the amplitude, and ϕni is the phase.

2. Lennard-Jones interactions

The WLJ term, which is calculated over non-bonded pairs of beads, is written as

WLJ=i=1Nnbj<i4εijσijrij12σijrij6,    rij<rvdWij=0,  rijrvdWij. (4)

In Equation (4), Nnb is the number of interacting beads for the Lennard-Jones potential, rij is the distance between beads i and j, σij is the distance at which the inter-bead potential is zero, and εij is the strength of the interaction. The set of free parameters for WLJ is chosen to be pvdW ≡ [σChCh,   εSS,   εSC1,   εSC2, …, εSCN,  εWW,  εWC1,   εWC2, …, εWCN]. Here, the subscript Ch refers to residue beads with a net charge, S refers to strongly interacting residue beads, W refers to weakly interacting residue beads, and CN refers to colloidal beads of type N. All other parameters for the WLJ term are prescribed either a priori or are determined by Lorentz-Berthelot mixing rules. The ε-values for the interactions between pairs of charged residue beads or between charged residue beads and colloidal beads are set to be 0.01 kcal/mol. The σ-value for beads corresponding to neutral residues is set to be equal to the average radii of gyration (Rg) values of these residues in the all atom simulations. The σ-values for interactions between any residue bead X and any colloidal bead C are set using the relation

σXC=RgC21/6. (5)

In Equation (5), Rg(C) is the average Rg of the group of residues that corresponds to the colloidal bead, C. For each interaction pair ij, the cutoff is set using the relation rvdW(ij)=2.5σij.

3. Electrostatic interactions

Residue beads that correspond to Lys, Arg, Glu, or Asp will have a net charge of ±1. The net charge of a colloidal bead is the sum over all charged amino acids within the colloid. The interactions, Wel, between beads with excess charge are written in terms of a Yukawa potential,

Wel=i=1ncj<iCqiqjεrijexprijlD  for  rij<rc=0,  for  rijrc. (6)

Here, nc denotes the number of beads that have an excess charge, qi is the charge on bead i, ε = 78 is the dielectric constant, lD = 10 Å is the Debye length for physiologically relevant ionic strengths, and rc is the distance past which the mean-field electrostatic interactions are zeroed out.

4. Interactions between colloidal beads

The coarse-grained model for a block-copolymeric sequence may have one or more colloidal beads. These beads are either identical to or different from one another. The number and types of colloidal beads are specified by the sequence encoded conformational properties of the block-copolymeric sequence of interest. The collective interactions among colloidal beads are modeled using the WC term in Equation (1), where WC=i=1ntk=1gij=i+1ntl=1gjwCrk(i),rl(j). Here, nt is the number of distinct types of colloidal beads, gi is the number of colloidal beads of type i, and rk(i) is the position vector of bead k of type i. The functional form for wc is not defined a priori but is determined using data gathered from simulations of pairs of colloidal beads represented in all atom detail. An example of the implementation of our approach for defining wc is described in Subsection IV B for the sequence of an archetypal block-copolymer.

C. Parameterization of uncoupled Weff terms: Wb, Wθ, Wϕ, and wC

We use a Boltzmann inversion procedure44,45 to extract parameters for uncoupled terms within Weff. For a given observable, such as the bond length x, the observed probability distribution from all atom simulations is written as ρ(x) ∝ exp[ − W(x)/kBT]. Here, ρ(x) is the probability density associated with the observable x, W(x) is the effective potential in terms of the observable, kB is the Boltzmann constant, and T is the simulation temperature.

Since Wb and Wθ are modeled using harmonic potentials, inversion of the Boltzmann relationship yields the analytical relationships for the parameters of the potentials in terms of the first and second moments, viz., bi, θi, bi2, and θi2 of the relevant probability distributions,

b0i=bi,Ki=kBTbi2bi2,θ0i=θi,Li=kBTθi2θi2. (7)

Parameters for Wϕ and wC are obtained via non-linear regression analysis by fitting Wϕ and wC to the equation

Wx=kBTlnPx. (8)

In Equation (8), P(x) is either the probability distribution for each dihedral angle for Wϕ or the pair correlation function between centers-of-mass of grouped residues that correspond to a given pair of colloidal beads for wC.

D. Parameterization of WLJ

1. Primary objective function

As described in Subsection II B 2, the set of free parameters for WLJ is chosen to be pvdW ≡ [σChCh, εSS, εSC1, εSC2, …, εSCN, εWW, εWC1, εWC2, …, εWCN]. Given this set of free parameters, we seek values for pvdW that minimize the objective function Ω(pvdW), which is defined as

ΩpvdW=11mi=1m2j=1nbinρijAAρijCGpvdW2. (9)

Here, m is the number of distinct inter-bead distances in the coarse-grained model that correspond to pairs of beads that are separated by at least five bonds. Data for the histograms that quantify the probability densities for distances between pairs of interacting beads are recorded in simulations based on the all atom (AA) and coarse-grained (CG) models. For a pair of beads designated by a single index i, the densities within each of the bins j of the corresponding histograms are denoted as ρijAA and ρijCG. The optimal parameters for pvdW should minimize Ω(pvdW) and hence maximize the collective overlap among m pairs of histograms where m = (n − 5)(n − 4)/2. Here, n denotes the number of beads in the coarse-grained model.

2. Auxiliary objective function

The design of the primary objective function assumes that the overlap between probability densities of distances between all atom and coarse-grained models should be weighted equally for all pairs of beads. However, depending on the question of interest, the overlap of distributions for certain pairs of beads may be more important than others. In addition, it might be important to include observables that go beyond distance distributions. In these situations, an auxiliary objective function should be used in a post-processing step to refine some or all of the pvdW parameters.

3. Gaussian process Bayesian optimization procedure for obtaining pvdW

Minimization of Ω(pvdW) is non-trivial for two reasons: The objective function does not have an analytical form. Therefore, we cannot readily use gradient-based methods for optimization of Ω(pvdW) to predict the optimal values of pvdW. Second, the objective function is expensive to evaluate because the calculation of Ω(pvdW) requires that we perform Langevin dynamics (LD) simulations of individual molecules in the coarse-grained representation to calculate ρijCG. Bayesian optimization methods are well suited for such problems. These methods are efficient because they use prior knowledge that is generated during the optimization process to direct the sampling in high-dimensional spaces. Bayesian methods strike an optimal balance between exploration (sampling the parameter space, albeit with high uncertainty regarding the objective function) and exploitation (using prior knowledge and sampling where the objective function is likely to be minimized).

Our primary goal is to identify values of pvdW that minimize the primary objective function Ω. In order to minimize Ω, we use a Bayesian optimization procedure. This requires the collection of observations O1:t = [(pvdW,1:tΩ,1:t)]. Here, O1:t denotes the set of consecutive observations [O1, …, Ot] from sampling the multidimensional space of parameters. The accumulation of observations is used to generate the likelihood function, P(O1:t|Ω). The likelihood function is combined with the prior distribution P(Ω) to generate the posterior distribution, P(Ω|O1:t), for the unknown objective function Ω. The posterior probability for obtaining the desired low value of Ω given a collection of observations O1:t is calculated using Bayes theorem,

PΩ|O1:tPO1:t|ΩPΩ. (10)

In order to use Bayesian optimization, we need a model to estimate Ω. Here, we model Ω using a Gaussian process, which is a distribution of objective functions that is defined by its mean and covariance functions. A Gaussian process predicts the most likely values for objective functions and the uncertainties in the estimates for the most likely values. Each Gaussian process Bayesian optimization (GPBO) iteration involves three steps: (i) The algorithm utilizes the posterior distribution and the expected improvement acquisition function to determine the next set of pvdW values. (ii) Given a choice for pvdW, we perform LD simulations for a single chain based on the coarse-grained model. (iii) Data from these simulations are used to calculate Ω(pvdW) and the posterior distribution is updated based on the Gaussian process.67

Depending on the complexity of the system, the GPBO procedure can be run in parallel with independent trials to ensure convergence to the same parameter space and/or over generations in which each subsequent generation utilizes information from the pervious generation to shrink the parameter search space. An example of such a hierarchical approach is described for an example archetypal system in Subsection IV D 3. In general, a single generation, within which hundreds of GPBO iterations are conducted, appears to be reasonable for most systems containing fewer than seven parameters.

III. SIMULATION SETUP

A. All atom simulations

All atom simulations were performed using the CAMPARI modeling package (http://camapri.sourceforge.net) utilizing the ABSINTH implicit solvation model68–71 and forcefield paradigm. This model is accurate—as measured against experimental data—and efficient for all atom simulations of conformational properties and intermolecular associations of intrinsically disordered proteins.72 In this work, the all atom simulations were based on the abs3.2_opls.prm parameter set. Additional details of the all atom simulations are provided in the supplementary material.73

B. Coarse-grained Langevin dynamics simulations

We present the details of the LD simulations in Cartesian space that are used to guide the parameterization of the WLJ terms and for simulations that use the optimized coarse-grained model. All LD simulations were performed using the LAMMPS simulation package (http://lammps.sandia.gov). In these simulations, the Langevin equation shown in Equation (11) is integrated to propagate the positions and velocities of the residue specific and/or domain specific colloidal beads. The force on each of the beads labeled i is written as

Fi=Weff,imiγivi+Ri. (11)

In Equation (11), Fi is the force exerted on bead i; it is a sum of the negative gradient of the effective interactions specified by Weff, the frictional force proportional to the velocity of the bead i, vi, and the random force Ri exerted by collisions of the beads with the bath. The random forces have a white noise spectrum in accord with the fluctuation dissipation theorem. The equation of motion is integrated using a velocity Verlet algorithm using an integration time step of 2 fs. The damping term is written as γi=αγi. Here, α = 2 ps is a scaling factor and γi=mi6πηRgiγLYS1, mi is the mass of bead i, η = 6.29 × 10−4 kgm−1 s−1 is the viscosity of water at 315 K, Rgi is the average radius of gyration of bead i as calculated from data based on all atom simulations, and γLYS1 is the inverse of the damping parameter for the lysine bead that is calculated using the Stokes-Einstein relationship. LD simulations of individual coarse-grained molecules were initiated by mapping equilibrated conformations drawn from the all atom simulations to the coarse-grained model.

For multi-chain simulations at finite concentrations, there are at least O(103) coarse-grained molecules in each simulation cell. These simulations were performed in the canonical ensemble at constant concentrations, which are controlled using periodic boundary conditions. We initiate the simulations by drawing an initial conformation at random from the equilibrated ensembles of all atom simulations and replicating them on three-dimensional lattices. This is followed by energy minimization using steepest descent and a subsequent equilibration based on 106 steps of LD simulations performed with time steps of 2 fs. Each final simulation involves 108 integration steps and for every combination of sequence and peptide concentration we performed multiple independent simulations.

IV. APPLICATION OF CAMELOT FOR SIMULATIONS OF BLOCK-COPOLYMERS WITH POLYGLUTAMINE TRACTS

Several proteins with polyglutamine tracts or glutamine-rich regions have been identified as drivers of aggregation and phase separation.74–76 The translation of genes with expanded CAG trinucleotide repeats leads to proteins with expanded polyglutamine (polyQ) tracts.77 These form insoluble inclusions that are the pathological hallmarks of several neurodegenerative diseases including Huntington’s disease.78 Water is a poor solvent for homopolymeric polyQ tracts as well as polypeptide backbones.79,80 Accordingly, individual polyQ molecules form globular structures to minimize the chain-solvent interface. For polyQ molecules of a particular chain length, there exists a well-defined saturation concentration (cs) that defines the boundary between soluble and insoluble phases.81 The insoluble phase is enriched in fibrillar aggregates and the measured values of cs decrease with increasing polyQ length. Additionally, for concentrations below cs, there exists a second saturation concentration (cc), which corresponds to the formation of spherical aggregates. Kinetics experiments initiated from fully disaggregated solutions that are supersaturated with respect to cs indicate the early formation of spherical aggregates, 10-30 nm in size, that are precursors of fibrillar aggregates. Fibril formation is barrier-limited and appears to proceed via nucleated conformational conversion within liquid-like spheres82–84 in accord with the mechanism for crystallization that was proposed by ten Wolde and Frenkel.85 In contrast, the formation of metastable liquid-like spheres does not involve discernible free energy barriers.81

Flanking sequence modules modulates the driving forces for and mechanisms of polyQ aggregation.76,81,86–93 This has been observed for the N-terminal stretch of huntingtin. For a given polyQ tract, the presence of N17, the 17-residue N-terminal flanking sequence module, leads to a lowering of cs vis-à-vis the values measured for polyQ tracts. N17 narrows the gap between cs and cc and decreases the metastability of spherical aggregates. Accordingly, N17 helps accelerate the formation of fibrillar aggregates.94,95 For a given polyQ length, fibril formation proceeds without a discernible lag time if N17 is appended N-terminally to the polyQ tract. The curious effects of N17 are attributable to a domain cross talk between N17 and polyQ.96,97 According to this model, the N17 module adsorbs and unfolds on the surface of the polyQ domain. This engenders a patchy colloid98–108 architecture for the N17-polyQ block-copolymer. In direct analogy with the physics of patchy colloids, the presence of an adsorbed N17 patch, with charged groups exposed on the surface, leads to a diminution of non-specific interactions of polyQ molecules.96,97 The patch on the colloidal particle breaks the spherical symmetry and imparts directional preferences to intermolecular encounters, thus promoting a distinct preference for linear aggregates. Here, we deploy our CAMELOT approach to test the applicability of the patchy colloid model for explaining the N17 enhanced formation of linear aggregates in block-copolymer sequences with polyQ tracts.

A. Determining the coarse-grained resolution from all atom simulations

1. Sequences of interest

For simplicity, the N-terminal and polyQ blocks are denoted as N- and Q-blocks, respectively (see Figure 2). We used sequence design to generate different types of N-block sequences. The different N-block sequences (see Figure 2) can be distinguished by the degree of adsorption between the N- and Q-blocks as shown in Figure 3. Here, we use data from all atom simulations to organize the different N-Q sequences along the ordinate of Figure 3 in ascending order of the degree of adsorption (dA) between different N-blocks and a Q40-block. The parameter dA is calculated as

dA=VIVN+VQ,VI=πRg,N+Rg,Qrc2rc2+2rcRg,Q3Rg,Q2+2rcRg,N+6Rg,NRg,Q3Rg,N212rc  if  Rg,N+Rg,Q>rc,VI=0  if  Rg,N+Rg,Qrc. (12)

In Equation (12), VI is the volume of the intersection between spherical envelopes corresponding to the N- and Q-blocks. The terms VN and VQ in the denominator were calculated using the conformation-specific Rg values for the N- and Q-blocks. In the definition of VI, rc is the distance between the centers-of-mass of the N- and Q-blocks whereas Rg,N and Rg,Q are the conformation-specific radii of gyration calculated over the atoms of the N- and Q-blocks, respectively. When dA = 0, the spherical envelopes of the N- and Q-blocks do not intersect. The maximal degree of adsorption between a 17-residue N-terminal stretch and a globular polyQ domain is ∼0.4.

FIG. 2.

FIG. 2.

N-Q sequences used for this study. N-block denotes different 17-resiude N-terminal sequences and Q-block denotes a polyQ tract with 40 Gln residues. N17 denotes the wild type sequence (UniProt ID: P42858). Sequences of the N-block were designed in order to modulate the amino acid sequence (N17W and N17S) or to modulate the amino acid composition (E(KE)8). Designs that modulate amino acid sequence maintain the wild type N17 composition but scramble the sequence. The choice of the polyampholytic sequence E(KE)8 allowed for examination of N-block properties that could not be accessed using the N17 composition. In the sequences, hydrophobic residues are in black, polar residues are in green, and positively and negatively charged residues are shown in blue and red, respectively.

FIG. 3.

FIG. 3.

Degree of adsorption (dA) to Q40 versus the probability of forming N-block dimers for each N sequence. Results were extracted from at least 3 independent all atom simulations of monomers and monomer-dimer equilibria simulations, respectively. The dotted orange line quantifies the probability of Q40 forming dimers. Dimers are defined by any two residues of differing molecules that are less than or equal to 3.5 Å apart. Insets show representative snapshots for each N-Q sequence. The black and orange translucent spheres correspond to the Rg’s of the N- and Q-blocks, respectively. The overlap between translucent spheres serves as a visualization of dA.

The values for dA were calculated from ABSINTH-based all atom simulations of monomeric variants of different N-Q sequences. The abscissa in Figure 3 corresponds to the probabilities of forming dimers of N-blocks as autonomous units. These results were also extracted from ABSINTH-based all atom simulations with pairs of N-block molecules. The results summarized in Figure 3 make several points: The N-blocks show negligibly low likelihoods for self-interactions when compared to the high probability of self-associations between pairs of Q40 molecules. By scrambling the sequence of N17, we were able to design sequences that adsorb more strongly (N17S) and more weakly (N17W) to the polyQ domain when compared to the sequence of N17 that is drawn from the N-terminus of the huntingtin protein. Since a value of dA = 0 cannot be achieved with the composition of N17, we designed a synthetic polyampholytic sequence, Glu-(Lys-Glu)8 denoted as E(KE)8, that helps us to achieve zero adsorption between the N- and Q-blocks. The four N-Q sequences allow us to titrate the effects of varying dA on the sequence-encoded bias toward linear aggregates.

2. Choice of resolution

To test the hypothesis that the degree of adsorption between N- and Q-blocks modulates the bias towards linear aggregates, we need to understand the interplay between sequence-specific properties of flanking sequences and polyQ-mediated aggregation. Specifically, we seek a model that maintains sequence-specific properties of the flanking sequences while enabling simulations of O(103) molecules in order to study the early stages of polyQ-mediated aggregation.

Experimental data and atomistic simulation results show that polyQ constructs adopt globular conformations in aqueous solutions.79 Previous simulations have shown that Rg = R0Nν, for polyQ tracts. Here, Rg refers to the ensemble averaged radii of gyration, ν = 0.33 for globules, and the pre-factor is R0 = 3.0 Å. We analyzed the results from all atom simulations for T = 315 K to establish that the Q-block adopts globular conformations in the context of N-Q sequences. Further, in all of the N-Q constructs, the values of Rg calculated over the Q-blocks are concordant with a scaling exponent of ν = 0.33.

Since the Q-block maintains its preference for globular conformations in all N-Q sequences examined here, we used the following architecture for the coarse-grained model: the residues of the N-block are modeled as residue beads whereas the residues of the Q-block are lumped together as a single colloidal bead (Figure 4). This choice allows for sequence-specific properties of the flanking sequences to be maintained while enabling significant computational efficiency in simulations of O(103) N-Q molecules.

FIG. 4.

FIG. 4.

Architecture of the coarse-grained model compared to the all atom model for N17-Q40. For the coarse-grained model, each residue in the N-block is modeled as a single bead, whereas all the residues of Q-block are modeled as a single colloidal bead. Here, Gln residues are in orange, hydrophobic residues are in black, positively and negatively charged residues are in blue and red, respectively, and other polar residues are in green.

In order to justify our choice of resolution, we deployed a network-based approach to provide an unbiased assessment of groups of residues that interact preferentially among themselves as opposed to interacting across groups. The idea is that residues that prefer to interact among themselves in the all atom simulations can be treated as a single entity upon coarse-graining. In network-based approaches, such groups are referred to as communities.109,110 The overall strategy, which is adapted from the work of Sethi et al.,111 is as follows: each conformation drawn from the ensemble of conformations generated in an atomistic simulation is converted to a network of nodes and edges. The nodes are positions of Cα atoms and edges are drawn between pairs of nodes that are within 13 Å of each other. Communities based on these networks are determined using the Girvan-Newman algorithm combined with the network modularity score, Q. The final community structure for each conformation (network) is taken as the sub-network that yielded the highest Q-score.73

Panels (a)–(d) in Figure 5 summarize the results obtained from analysis of the community structures for all N-Q constructs. Specifically, each panel plots the probability that two residues are observed to be in the same community. Hotter colors imply that two residues have a higher probability of being part of the same community, whereas cooler colors imply that two residues have a low probability of being part of the same community. For E(KE)8-Q40, N17W-Q40, and N17-Q40, the community analysis clearly shows that distinct communities involve residues within the N- and Q-blocks, respectively. Figure 6 provides further quantitative evidence in support of the choice of the coarse-grained model. Here, we quantify the probability that a residue from block X (N- or Q-) belongs to the same community as a residue from block Y (N- or Q-). Even though the coupling increases between the N- and Q-blocks for sequences such as N17S-Q40, residues prefer to be in communities with other residues that are part of the same block. Given the low probabilities for N- and Q-block residues to be part of the same community and the lack of evidence for sub-communities within the Q-block, the coarse-grained resolution in which each N-block residue is modeled as a residue bead and the Q-block is modeled as a colloidal bead is justified for the N-Q constructs.

FIG. 5.

FIG. 5.

Community analysis for each of the N-Q sequences. Conformations from all atom simulations were converted into networks wherein each Cα position was considered a node and a weighted edge ew = dij was drawn between two nodes if the distance between the Cα positions of residues i and j, dij, was less than or equal to 13 Å. Communities based on these networks were determined using the Girvan-Newman algorithm combined with the network modularity score, Q. The final community structure for each conformation (network) was taken as the sub-network that yielded the highest Q. Panels (a)-(d) show the probability that any two residues are in the same community for E(KE)8-Q40, N17W-Q40, N17-Q40, and N17S-Q40, respectively. The bottom left corner of each plot corresponds to probabilities of intra-N communities (residue numbers 1-17), whereas the top right corner of each plot corresponds to the probabilities of intra-Q communities (residue numbers 18-57). For E(KE)8-Q40, N17W-Q40, and N17-Q40, it is clearly shown that intra-block communities are more favorable (white or blue colors on the plots versus hotter colors for intra-N and intra-Q communities). Additionally, within the N- and Q-blocks there are no well-defined sub-communities.

FIG. 6.

FIG. 6.

Average probabilities that a residue from block X is in the same community as a residue from block Y. Here, block X and block Y can refer to either N- or Q-blocks. The first two columns show the average intra-block probabilities for each N-Q sequence and the third column shows the inter-block probabilities. For all N-Q sequences, the intra-block probabilities are greater than the inter-block probabilities (hotter versus cooler colors). This implies that even as the coupling increases between residues across the N- and Q-blocks, residues still prefer to be in communities with other residues within in the same block.

B. Effective energy function for coarse-grained simulations

The effective energy function for the coarse-grained model of N-Q constructs is given by Equation (1). In the current example, wC denotes the colloidal potential between Q-blocks. We used data from 18 independent all atom simulation for pairs of Q40 molecules to obtain the functional form for the wC potential. These data yielded a pair correlation function g(r) at the target temperature for the distance r between the centers-of-mass of the polyQ molecules with two minima that correspond to the two categories of interactions between homopolymeric globules, viz., docking and entanglement. The functional form of wC is a sum of two terms, a Mie potential (wE) to model entanglements and a Gaussian potential (wD) to model the docking of globules.112 Explicitly,

wC=wE+wD,wE=γrγrγaγrγaγaγrγaεEσErγrσErγa,wD=εDexprrd22δd2. (13)

In Equation (13), εE is the well depth of the entanglement potential, σE is the distance r between the colloidal beads for which the entanglement potential is zero, εD is the well depth of the docking potential, rd is the inter-particle separation at which the docking potential is minimized, and δd controls the width of the well.

C. Parameterization of uncoupled Weff terms: Wb, Wθ, Wϕ, and wC

The target potential of mean force wC,target(r) was obtained by Boltzmann inversion of g(r) such that wC,target(r) = − kBTln[g(r)]. This potential of mean force, extracted from the all atom simulations, was used in a non-linear regression procedure to obtain the parameters for wc in the coarse-grained model. The values of γrep and γatt were set be equal to 6 and 2, respectively. The final parameters from the regression analysis are as follows:

εE=3.72  kcal-mol1,σE=0.32Rg  Å,εD=3.92  kcal-mol1,rd=1.78Rg  Å,δd=5.94  Å. (14)

In Equation (14), Rg refers to the ensemble-averaged radius of gyration for monomeric forms of Q40 molecules that we calculate from all atom simulations. Figure 7 shows the potential of mean force extracted from all atom simulations of pairs of Q40 molecules at 315 K—the target potential wC,target(r)—and a comparison of the fit to this potential obtained via non-linear regression analysis that leads to the parameters summarized in Equation (14).

FIG. 7.

FIG. 7.

Interaction potential, wC, for pairs of Q40 molecules. The potential of mean force for pairs of Q40 molecules was extracted from 18 independent all atom simulations at 315 K (orange curve) and was fit to Equation (13) using a non-linear regression procedure (black curve); wC captures the two modes of interactions observed for pairs of Q40 molecules, namely, entanglement and docking. Representative snapshots of entanglement and docking states are shown as insets. In these snapshots, Q40 molecules are shown in atomic detail in orange and grey. Additionally, translucent spheres that correspond to the radius of gyration, Rg, of each molecule are drawn as a visual aid to distinguish between entanglement (large overlap between spheres) and docking (limited, if any, overlap between spheres) states.

Parameters for the bonded and dihedral angle terms were also determined using the Boltzmann inversion procedure. Specifically, for each N-Q construct, we first calculated the positions of the centers-of-mass for residues within the N-block and over all of the residues within the Q-block. The simulation results at 315 K were used to extract the probability distributions for each of the bond lengths, bond angles, and dihedral angles that define the coarse-grained model. The all atom probability distributions were then used to generate the bond length, bond angle, and dihedral angle parameters as described in Subsection II C.

D. Parameterization of WLJ

As described in Subsection II B 2, the set of free parameters for WLJ is chosen to be pvdW ≡ [σChCh, εSS, εSC1, εSC2, …, εSCN, εWW, εWC1, εWC2, …, εWCN], where Ch denotes a charged residue bead, S denotes a strongly interacting residue bead, W denotes a weakly interacting residue bead, and CN denotes the colloidal bead of type N. Given that the N-Q constructs have only one colloidal bead, pvdW reduces to five free parameters. Explicitly, pvdW ≡ [σChCh, εSS, εSC, εWW, εWC] ≡ [p1, p2, p3, p4, p5]. All other parameters were prescribed as described in Subsection II B 2 or determined through Lorentz-Berthelot mixing rules.

1. Primary objective function

For each of the N-Q sequences, we strive to obtain an optimal set of parameters for pvdW such that the primary objective function, Ω, given by Equation (9) is minimized. The choice of parameters should maximize the overlaps between all distance pairs separated by at least five bonds between the all atom and coarse-grained models.

2. Auxiliary objective function

For the N-Q constructs, the Q-block is modeled as a colloidal bead in the coarse-grained model. Accordingly, for a sequence with n beads, only (n − 5) out of the m pairwise distances involve the Q-block. In machine learning approaches, additional consideration is given to specific features within the primary data through the inclusion of an auxiliary objective function. We added an auxiliary objective function Ωaux in a post-processing step to ensure that we give extra weight to the interactions between the N- and Q-blocks in generating the optimal values for pvdW. This weighting scheme is important since our goal is to determine how the degree of adsorption between N- and Q-blocks modulates Q-block driven aggregation. The auxiliary objective function is defined as

ΩauxpvdW=ΩpvdW+ΩN-QpvdW2,ΩN-QpvdW=11n3i=1n32j=1nbinρijAAρijCGpvdW2. (15)

The term ΩN−Q is evaluated over distances between the Q-block colloidal bead and beads of the N-block involving residues that are separated by three or more bonds from the colloidal bead.

3. GPBO procedure for parameterization of pvdW

For each N-Q sequence of interest, we use a multi-step procedure that involves several generations (see Figure 8). The search space for finding the optimal parameters is systematically narrowed from one generation to the next and the procedure terminates if the range does not narrow any further. For a given range in parameter space, we apply the GPBO procedure in an iterative procedure to refine the model using observations to identify the most likely values of pvdW that will minimize Ω. We used the MATLAB implementation of Gardner and Weinberger to calculate the posterior probabilities and generate refined parameters.67 Each trial involves 500 GPBO iterations and each generation involves 20 independent trials. Within each generation, we collect 20 separate estimates for pvdW from each of the 20 trials. We then apply a post-processing step to generate x subsets of values for pvdW by choosing the parameters that minimize the auxiliary objective function Ωaux. Here, x = 20 if the generation number is less than three and x = 10 if the generation number is between three and six. Finally, x = 5 if the generation number is greater than six. The subsets are used to narrow the parameter range for the next generation of the overall optimization procedure. The parameterization procedure is terminated if the post-processing step does not lead to a further narrowing of the parameter range as defined by a numerical threshold (see Figure 8). The optimal final values for pvdW are drawn from the final subset and correspond to the parameters that lead to the lowest value of Ωaux. Figure 9 shows how the parameter space is explored in a single trial that involves 500 iterations of the GPBO procedure. Table I shows the final values for pvdW that we obtain for each of the four N-Q sequences. Figure 10 summarizes the evolution of Ω and Ωaux for N17-Q40 for each generation of the optimization procedure. These plots demonstrate the minimization of the two objective functions as the optimization progresses.

FIG. 8.

FIG. 8.

Flow chart of the multi-step procedure used for the parameterization of WLJ. This procedure combines 500 Gaussian process Bayesian optimization (GPBO) iterations based on the primary objective function, Ω, with a post-processing filtering of the optimal pvdW sets from 20 independent trials based on the auxiliary objective function, Ωaux. The post-processing filtering leads to a refinement of the parameter search space used for the next generation. The parameterization procedure is terminated if at least 8 generations have been completed and the overall change in the parameter range between consecutive generations is less than or equal to 0.05. The initial parameter range for each parameter pi was p1 = [8.5 Å 10.0 Å], p2 = [0.05 kcal/mol 1.5 kcal/mol], p3 = [0.3 kcal/mol 2.0 kcal/mol], p4 = [0.01 kcal/mol 0.3 kcal/mol], and p5 = [0.05 kcal/mol 0.75 kcal/mol].

FIG. 9.

FIG. 9.

Visualization of the exploration of the 2-dimensional parameter space (p2, p3) in a single trial for N17-Q40. This single trial contained 500 GPBO iterations. Although the GPBO procedure explored the 5-dimensional parameter space (p1, p2, p3, p4, p5), visualization of the GBPO procedure is only shown for the parameters p2 and p3 since these were the most sensitive parameters. (a) The binning in both dimensions was 0.2 kcal/mol. The color of each square indicates the average objective score, Ω(pvdW), for iterations in which (p2, p3) fall within this bin. Light blue colors indicate low average Ω(pvdW) scores. The number within each square indicates the first iteration in which (p2, p3) falls within this bin. The white squares indicate regions of the (p2, p3) parameter space that were not sampled within the 500 GPBO iterations. (b) Each dot corresponds to the (p2, p3) parameter set at a given iteration. As the iteration number increases, the colors move from cool (dark blue) to hot (dark red). Many regions of the (p2, p3) parameter space are sampled early on. However, a majority of the iterations sample a similar (p2, p3) parameter space. This parameter space corresponds to the low average objective score region of (a).

TABLE I.

Final pvdW values for each N-Q sequence.

N-Q sequence p1(Å) p2 (kcal/mol) p3 (kcal/mol) p4 (kcal/mol) p5 (kcal/mol)
N17-Q40 9.62 1.15 1.23 0.06 0.43
N17W-Q40 8.54 0.41 0.66 0.02 0.65
N17S-Q40 9.53 0.58 1.53 0.02 0.53
E(KE)8-Q40 9.50 ... ... ... ...
FIG. 10.

FIG. 10.

Evolution of the average Ω(pvdW,o) (a) and average Ωaux(pvdw,o) (b) over subsequent generations of the optimization procedure for N17-Q40. Here, pvdW,o corresponds to the optimal parameter set, i.e., the set of parameters that yielded min(Ω) for each of the 20 independent trials in the generation. Here, Ω(pvdW,o) and Ωaux(pvdW,o) generally improve upon subsequent generations. However, comparison of panels (a) and (b) shows that Ω(pvdW,o) and Ωaux(pvdW,o) are not always correlated. Specifically, generation 7 has one of the lowest Ω(pvdW,o) values, but an intermediate Ωaux(pvdW,o) value. Panels (c) and (d) plot the average Ω(pvdW,o) and average Ωaux(pvdW,o) versus the overall change in the parameter range between consecutive generations. Colors move from cool to hot as the generations increase. Panels (c) and (d) show that at generation 12 the overall change in the parameter range is less than or equal to 0.05 and thus the optimization procedure was terminated.

In order to interpret the relatively small changes to the magnitudes of Ω and Ωaux, we performed a sensitivity analysis to assess how the objective functions change as we perturb individual parameters from the values that we obtain using the optimization procedure. Specifically, for a given parameter pi (i = 1, …, 5), each panel in Figure 11 shows how the different objective functions, viz., Ω, ΩN−Q, and Ωaux change as a function of changes to individual pi values. The objective functions are most sensitive to changes in parameters p1–p3 whereas changes to p4 and p5 do not have a material impact on the objective functions. Deviation from min(Ω) is most pronounced for p1 and p2. This suggests that p1 and p2 are important for capturing N-N intramolecular interactions correctly. Such a result is reasonable given that p1 defines the σ-value for charged beads and p2 defines the ε-value between strongly interacting residue beads within the N-block. For p3, positive fraction changes lead to similar Ω values but lead to appreciable changes in ΩN−Q. This result demonstrates the importance of including Ωaux in a post-processing step to focus the optimization procedure to regions in the parameter space that minimize Ω as well as Ωaux. The additional weight given to N-Q interactions highlights a central feature of machine learning approaches, whereby the learning of specific features that are relevant to the question of interest leads to systematic improvement of the objective function.

FIG. 11.

FIG. 11.

Parameter sensitivity analysis for N17-Q40. Given the final values for pvdW obtained from the optimization procedure, individual parameters were systematically varied in increments of 0.05, the fractional change in order to determine the sensitivity of each individual parameter on Ω, ΩN-Q, and Ωaux. Here, fraction change is defined as pitestpifinalpifinal, where pifinal is the optimal value for parameter pi and pitest is the parameter used for the given simulation. Four independent 200 ns monomer simulations were conducted and the average objective scores over the independent simulations are plotted. For each subplot, an individual parameter (p1, p2, p3, p4, or p5) was varied from its final value, while all other parameters were held fixed at their final value. The objective scores are found to be least sensitive to parameters p4 and p5.

Prior to using the coarse-grained model for comparative assessments of the influence of different N-block sequences on polyQ aggregation, we tested the quality of final coarse-grained models for each sequence. Figure 12 summarizes results comparing the conformational properties obtained using the coarse-grained model to results obtained from the all atom model at 315 K. The comparisons are shown for N17-Q40 and E(KE)8-Q40.

FIG. 12.

FIG. 12.

Comparison of conformational properties between all atom and coarse-grained models for N17-Q40 and E(KE)8-Q40. A monomer’s size and shape can be captured by its Rg and asphericity, δ, respectively. If δ = 0, then the monomer adopts globular conformations, whereas if δ = 1, then the monomer adopts rod-like conformations. Rg and δ values were determined from the eigenvalues of the gyration tensor over three and four independent monomer simulations for the all atom and coarse-grained models, respectively. For the all atom model, the gyration tensors were calculated over the centers-of-mass of each N-block residue and the center-of-mass of the Q-block. This choice allowed for a direct comparison to the gyration tensors calculated over the positions of the centers of beads in the coarse-grained model. Two-dimensional histograms of Rg versus δ are plotted for N17-Q40 all atom (1.a), N17-Q40 coarse-grained (1.b), E(KE)8-Q40 all atom (2.a), and E(KE)8-Q40 coarse-grained (2.b) models. Panels (1.c) and (2.c) quantify the integrated probabilities for the all atom and coarse-grained models within each cell of the 2-dimensional Rg versus δ plots for N17-Q40 and E(KE)8-Q40, respectively. Insets of (1.c) and (2.c) show representative snapshots for the all atom and coarse-grained models. Here, Gln is colored orange, other polar residues green, hydrophobic residues black, and positively and negatively charged residues red and blue, respectively. In both the all atom and coarse-grained models, N17-Q40 prefers compact/globular conformations with N17 adsorbed on Q40. In contrast, in both the all atom and coarse-grained models, E(KE)8-Q40 samples larger and more elongated conformations compared to N17-Q40.

For each conformation drawn from the all atom and coarse-grained ensembles, we calculated the eigenvalues of the gyration tensor. These eigenvalues (λi, i = 1, 2, 3) were used to compute the radius of gyration (Rg) and the asphericity δ as shown below,113

Rg=λ1+λ2+λ3,δ=13λ1λ2+λ1λ3+λ2λ3λ1+λ2+λ32. (16)

Figure 12 shows the joint distributions of Rg and δ values derived from all atom simulations and the coarse-grained model for N17-Q40 and E(KE)8-Q40. In each case, the two-dimensional space is tiled into a 3 × 3 grid. Figure 12 also shows the integrated probabilities within each cell of the 3 × 3 grid that are quantified for the two models. These results demonstrate that the global conformational properties of the coarse-grained model are congruent with those of the all atom model. Importantly, the quality of the comparative assessments is favorable and equivalent for N-Q sequences with very different types of N-block sequences. The computational efficiency of coarse-grained simulations increases by at least three orders of magnitude vis-à-vis the ABSINTH-based all atom simulations. The increased efficiency combined with the verifiable accuracy of the model justifies the deployment of the model in simulations of aggregation that involves ca. 103 N-Q molecules.

V. COARSE-GRAINED SIMULATIONS OF N-Q AGGREGATION

For each N-Q sequence, we performed eight independent LD simulations based on the coarse-grained model with residue beads for each N-block residue and a single colloidal bead for the Q-block residues. Each of these isothermal-isochoric simulations was performed in cubic boxes, 65.7 nm to a side, with periodic boundary conditions at 315 K. There were 512 N-Q molecules in the simulation cell and this corresponds to supersaturated solutions with a concentration of 3 mM. Each simulation was run for 200 ns. The total simulation time across eight independent simulations is 1.6 μs. For a given N-Q sequence, a 200 ns simulation takes four weeks to complete on a single core with compute nodes comprising of 8 Intel E5430 Central Processing Units that share 4 GB of RAM. When comparing the simulation time scales to those for in vitro experiments, it is worth noting that the diffusivity of colloidal beads is O(102) larger than in experiments. This combined with the reduction in the number of degrees of freedom implies that the simulation time scales access processes that are relevant for the formation of precursors of large-scale aggregates.

Aggregates are distinguishable by the number of molecules, the overall sizes, shapes, and packing densities. Our focus is on comparative assessments of the morphologies of aggregates that are the precursors of large individual aggregates that form over long time scales either due to barrier-limited processes or as a consequence of phenomena such as Ostwald ripening.114–116 Given the finite sizes of the simulation systems, the aggregates of interest are those that incorporate at least an order of magnitude of the molecules in the central simulation cell. We analyzed the morphologies of aggregates that comprise of at least nine molecules using the last 120 ns of each simulation.73 The robustness of this approach was established by applying similar analyses to all eight independent simulations for each construct.

In order to test the hypothesis that the degree of adsorption between N- and Q-blocks modulates the bias towards linear aggregates, we determined the preference for linear aggregates for each N-Q construct. We applied methods derived from network-based analysis described in Section IV A to distinguish between compact and linear aggregate types based on the number of polyQ communities. The packing density of these communities determines the morphologies of aggregates. Pairs of polyQ colloidal beads are in the same community if the distance between the beads is less than or equal to 7 Å. The centroids of each polyQ-community within an aggregate are used to calculate the size of the aggregate defined as its radius of gyration and denoted as RA. The values of RA and the average size of an individual polyQ-community defined as the average radius of gyration of communities within the aggregate and denoted as RQ are related to each other via the parameter χ, which is the packing density of polyQ communities. The quantities RA, RQ, χ, and the number of polyQ communities nQ follow the relation

χRA3=nQRQ3RA3=nQRQ3χ. (17)

The size of an aggregate is governed by the size, number, and packing density of polyQ communities. Setting χ = 0.5 yields a threshold value for RA that allows us to distinguish between compact versus linear aggregates in terms of the average size and number of individual polyQ communities nQ within an aggregate. The threshold value of RA takes the form

Rt=2nQ1/3RQ. (18)

Aggregates are compact if RARt and they are linear if RA > Rt. The threshold values of Rt are useful for distinguishing between the morphologies of aggregates with similar or identical numbers of polyQ communities.

Figure 13 summarizes the properties of the aggregates for different constructs by quantifying the joint probability histograms in the two-parameter space of nQ and RA. The results are drawn from simulations results for polyQ molecules with and without different N-block sequences. In each panel of Figure 13, the red curve is a plot of Rt. For a given value of nQ, linear and compact aggregates lie above and below the curve for Rt. For each construct, we integrated the probabilities for forming linear aggregates (PLA) defined as those that lie above the reference Rt curve. The corresponding probability for forming compact aggregates is (1 − PLA) and the parameter of interest is the ratio S=PLA1PLA. In Figure 14, we plot S obtained from the coarse-grained simulations against dA, the degree of adsorption between N- and Q-blocks that we quantified from our analysis of the all atom simulations for the four constructs with 17-residue N-blocks. The Pearson product moment correlation coefficient, which measures the linear correlation between S and dA, is 0.98. The implication is that the degree of adsorption between the N- and Q-blocks determines the sequence-encoded extent of interactions between N- and Q-blocks and this in turn determines the bias toward forming linear aggregates.

FIG. 13.

FIG. 13.

Two-dimensional histograms of the number of polyQ communities, nQ, within an aggregate versus the radius of gyration of the aggregate, RA. The final 120 ns of eight independent simulations was analyzed for each N-Q sequence. Only aggregates with greater than eight molecules were considered for these analyses. The trends are found to be the same if the cutoff for aggregate size was set to be six or ten (data not shown). In each plot, the red curve defines the cutoff between compact and linear aggregates, Rt, for a given nQ based on Equation (18). As can be observed from panel (b), the choice of this threshold separates the two distinct density regions of aggregate sizes, RA, for nQ = 3 and 4. N17W-Q40, N17-Q40, and N17S-Q40 form linear aggregates, whereas E(KE)8-Q40 and Q40 form compact aggregates.

FIG. 14.

FIG. 14.

The average degree of adsorption between N- and Q-blocks from all atom simulations of N-Q monomers (dA) versus the relative probability of forming linear aggregates in the coarse-grained model (S). Here, r denotes the Pearson product moment correlation coefficient. The probability of forming linear aggregates is highly correlated with the degree of adsorption between N- and Q-blocks from all atom monomer simulations of N-Q sequences.

Figure 15 summarizes the frequencies of forming aggregates with different numbers of polyQ communities, nQ, that correspond to different morphologies. Each panel includes representative structures of aggregates formed by different constructs and these pictures provide visual support for the results summarized in Figure 14. The aggregate distributions for N17W-Q40, N17-Q40, and N17S-Q40 are bimodal with preferences for small compact aggregates (red bars, nQ = 1 or 2) and linear aggregates (green bars). The preference for forming linear aggregates increases from N17W-Q40 to N17-Q40 to N17S-Q40 and this correlates with an increase in dA. In contrast, Q40 and E(KE)8-Q40 prefer larger compact aggregates and the distributions are essentially monodisperse.

FIG. 15.

FIG. 15.

Probability of forming aggregates with different numbers of polyQ communities, nQ, and different morphologies (compact versus linear) for (a) N17W-Q40, (b) N17-Q40, (c) N17S-Q40, (d) E(KE)8-Q40, and (e) Q40. The black vertical line delineates between compact versus linear morphologies. The abscissa quantifies the number of polyQ communities, i.e., the number of distinct clusters of Q-beads within an aggregate using a cutoff of 7 Å. For example, if nQ = 3, then the Q-beads cluster into 3 distinct communities (see bars labeled 3 in both the compact (red) and linear (green) regions for representative aggregates). Communities can consist of one to O(10) molecules given the ability of the Q-beads to entangle with each other. Red bars represent the probabilities of compact aggregates and green bars represent the probabilities of linear aggregates with a given nQ. Probabilities of forming distinct morphologies and nQ values were taken from the last 120 ns of eight independent coarse-grained simulations containing 512 molecules for each of the N-Q sequences. Only aggregates with at least nine molecules were considered for this analysis. Insets for each panel show representative aggregates for the most probable aggregate types.

The observations from the coarse-grained simulations yield predictive expectations regarding the outcomes of kinetics experiments. We expect a smaller barrier and smaller lag times associated with forming large linear aggregates if the sequence in question shows a pronounced bias toward forming linear aggregates. Our results indicate that sequences lacking the N-block favor compact aggregates as shown in panel (e) of Figures 13 and 15. This is consistent with pronounced lag times that have been observed for forming linear, fibrillar aggregates for polyQ molecules.81 Our results also show that the N17-Q40 sequence has a clear bias toward forming linear aggregates. These findings are also consistent with the recent experimental data, which show that N17 engenders a sequence-encoded bias for the rapid formation of linear aggregates.

We tested the sensitivity of aggregate distributions to the choice of parameters (pvdW). For this, we selected the optimal set of pvdW values from five randomly selected generations. Figure S673 shows the optimal pvdW set for each of the five generations for N17-Q40. The results of seven independent coarse-grained simulations containing 512 molecules at 750 μM were analyzed for parameter sets drawn from each randomly chosen generation. The resultant aggregate distributions are shown in Figure S7.73 Each distribution was compared to the distribution resulting from the final optimal parameter set for N17-Q40. To quantify the correlation between aggregate distributions obtained using “suboptimal parameters” and the best set of parameters, we quantified the Pearson product moment correlation coefficients between the reference distribution and the distributions obtained using the supposedly suboptimal set of parameters. These values range between 0.97 and 0.99 indicating that the final choice of pvdW values is relatively insensitive to the generation number. Therefore, a single generation of the GPBO procedure would have been sufficient for parameterizing the coarse-grained model for N-Q constructs.

VI. DISCUSSION

A. Physical significance of the findings from coarse-grained simulations of block-copolymeric molecules with polyQ tracts

Two distinct models have been put forth for the observed effects of N17 on polyQ aggregation. According to the helix bundling model, N17 modules form oligomeric helical bundles that enable an increase in the local concentration of polyQ domains.92 A subsequent rate-limiting step corresponds to the conversion of polyQ domains to beta-sheet conformations that leads to the formation of linear aggregates, although the details of this process are not fully defined. The helix bundling model is appealing for its conceptual simplicity. However, it has two weaknesses: First, the bundling of N17 helices is observed at concentrations that are at least two orders of magnitude larger than the saturation concentration of typical polyQ domains.81 Hence, it becomes difficult to explain how the weak self-association of N17 modules contributes to increasing the effective concentration of polyQ modules when the measured driving forces suggest the opposite scenario. Second, none of the published kinetics data indicates the presence of discernible lag times for the aggregation of N17-polyQ molecules. This would be expected given the significant barrier that should accompany conversion from oligomers lacking beta-sheet structure to linear aggregates that are rich in beta-sheet structure. An alternative is the domain cross talk model97 that is not focused on the internal conformational properties of individual blocks but focuses instead on generic properties that is reminiscent of the physics of patchy colloids. The model has four components: First, unfolding and adsorption of the N-block on the Q-block is driven by interactions between uncharged residues within the N-block.96 Second, adsorption of the N-block engenders a disordered patchy colloidal architecture106 for N17-polyQ molecules with the charged residues in the N-block being exposed to solvent.96 Third, the patch limits random intermolecular encounters between polyQ modules. And fourth, the conformation of the N-block patch is not rigid and this enables intermolecular interactions between the N-block of one N17-polyQ molecule and the Q-block of another N17-polyQ molecule.

The physics of patchy colloids is directly relevant for explaining the observed aggregation properties of N17-polyQ molecules. Isotropic attractions between uniformly sticky colloidal beads will lead to compact, hexagonally close-packed aggregates as opposed to linear aggregates. In patchy colloids, the interactions between surface adsorbed patches are repulsive. The morphologies of aggregates formed by patchy colloids are determined by a combination of the range of the repulsive interactions vis-à-vis the attractions, the number of patches per particle, and the relative sizes of repulsive patches versus those of colloidal beads.105

Our findings for sequences with designed N-blocks support the transferability of the tenets of patchy colloids to explain the observations for N-Q sequences. The approach of using designed N-block sequences also results in a set of testable predictions. These predictions suggest that the polyampholytic N-block, E(KE)8, will engender a strong preference for distinct architectures at the monomer level whereby the N-block forms an extended tail and the polyQ globule forms a compact head. The degree of adsorption between the N- and Q-blocks is zero for this sequence. Consequently, these sequences form a homogeneous distribution of compact as opposed to linear aggregates in simulations based on the coarse-grained model. Therefore, the aggregates formed by E(KE)8-polyQ should be monodisperse species with solvent exposed polyampholytic tails resembling the architectures for reverse micelles. The conversion to large, linear fibrillar aggregates should be the slowest for such sequences. In contrast, the architectures of aggregates formed by N17W-polyQ and N17S-polyQ should show weaker and stronger intrinsic biases, respectively, toward linear aggregates when compared to the N17-polyQ sequence. We predict that these differences should be reflected in the measured driving forces for forming linear versus spherical aggregates and in the time scales for the conversion from spherical to linear aggregates. Importantly, these predictions require simulations that can access the relevant spatial scales and this is afforded by our methodology for coarse-graining that affords computational efficiency while preserving the ability to capture the effects of sequence-specific interactions.

B. Improvements and generalizations to the CAMELOT approach

It is imperative that we extend our current simulations to access at least O(104) molecules to avoid finite size artifacts and make direct contact with the spatial scales that prevail in aggregation experiments. Such simulations combined with systematic concentration titrations will enable predictions of phase diagrams for block-copolymeric sequences. In this work, we used a hybrid model that includes residue-specific beads for the N-block and a single colloidal bead for the residues of the Q-block. The interactions between Q-blocks were modeled using a colloidal potential that captures the two main features of the self-associations of homopolymeric globules, namely, entanglement and docking. The pair potential for colloidal beads can be generalized through additional considerations: It is conceivable that three-body effects will modulate the balance between docking and entanglement that are captured using pair potentials. We should be able to test for the presence of cooperative effects by parameterizing the WC term against all atom simulations that include more than two polyQ molecules. It is conceivable that the barrier for converting to linear aggregates is lowered at larger length scales due to conformational fluctuations that enable the modulation of the range of attractive interactions between colloidal beads while also altering the balance between entanglements and docking. The CAMELOT approach incorporates the necessary versatility for modeling such emergent properties. This can be achieved by fixing the number of molecules and performing a series of independent simulations, each characterized by a quenched degree of adsorption and distinct functional forms for colloidal potentials to model differences in the range of attractive interactions and the energy scales for entanglements versus docking. The results of each set of independent simulations can be stitched together using appropriate methods to generate predictive phase diagrams to quantify how a specific block-copolymeric sequence encodes its aggregation landscape as a function of the degree of adsorption between N- and Q-blocks and features of the colloidal potentials.

The hybrid resolution used in this work has general applicability for other block-copolymeric systems involved in aggregation and phase separation (see Figure 1). This includes sequences with globular folded domains as established in Section II.73 Our analysis of communities of residues gleaned from all atom simulation results for a sequence with a disordered bristle tethered to an ordered SH3 domain shows that a hybrid resolution with residue beads for the disordered bristle and a colloidal bead for the SH3 domain would be justifiable for efficient simulations of the phase behavior of these classes of molecules (see Figure 1). The colloidal bead could have specific residue beads adsorbed on its surface as patches or the interactions between SH3 domains could be modeled using directional potentials between colloidal beads. It is also possible to design hybrid resolutions with fewer residues per colloidal bead, thus interpolating between the purely bead per residue resolution on one end and the single colloidal bead for the entire sequence on the other end. Such interpolations would leverage the blob concept introduced by de Gennes,117 which has proven to be useful in understanding the phase behavior of synthetic polymers.118

The CAMELOT based coarse-grained simulations provide insights regarding the distributions of molecular aggregates. A particularly important question pertains to the conformational properties of individual molecules within aggregates and the barriers for interconversion between different conformational basins. These questions should be addressable using mixed resolution generalizations of the CAMELOT approach that combine atomistic descriptions of an individual chain interacting with an explicitly modeled bath of coarse-grained molecules. The coupling between the all atom representation and the coarse-grained molecules in the bath can be modeled via a direct mapping between the all atom representation and the single bead per residue model. This would represent a generalized adaptation of the multiscale coarse-graining approach of Izvekov and Voth.119 Bottlenecks in conformational sampling for mixed resolution models can be overcome by adapting recent generalizations of the resolution exchange approach120 that have been proposed by Chen and coworkers.121,122

C. General connections between coarse-graining and machine learning

The CAMELOT approach is guided by the precepts that underlie force matching and other information-driven algorithms for systematic coarse-graining. Our use of machine learning aided methods is reminiscent of recent successes in reparameterizing molecular mechanics forcefields.123 We envisage the prospect of using data from multiple resolutions aided by deep learning algorithms that use data from simulations based on multiple resolutions.124–126 The tenets of deep learning can be anchored in the framework of the renormalization group theory.127 This should allow automated discoveries of the coupling between and correlated fluctuations of collective degrees of freedom that determine the sequence-specific phase diagrams for different block-copolymeric systems. As noted above, we would also like to be able to mix distinct resolutions with each other in order to extract finer structures, specifically conformational changes that are compatible with coarse-grained results. Such approaches will require a well-defined structure for the coarse-grained potentials across all resolutions and the functional form proposed for Weff and the overall structure of the CAMELOT algorithm would appear to provide the necessary ingredients.

D. Final summary

We have developed a new method, named CAMELOT, that combines Boltzmann inversion, non-linear regression, and machine learning to develop a coarse-grained model whereby the appropriate resolution and optimal parameters are, respectively, inferred and learned from detailed all atom simulations. The CAMELOT algorithm is available as a general toolbox that can be plugged into generic molecular modeling suites such as the CAMPARI package (http://campari.sourceforge.net). This enables the automated transfer of information from the all atom simulations into a suitable coarse-grained model that is then run using the LAMMPS package (http://lammps.sandia.gov). In this work, we introduced the general framework for the CAMELOT approach and demonstrated its utility by deploying a hybrid resolution for coarse-grained simulations of molecules with polyQ tracts. The results obtained from the coarse-grained simulations are in accord with the hypothesis that adsorption of flanking sequences on the polyQ domain generates patchy colloid architectures. The deployment of the proposed approach to study aggregation and phase separation in a range of block-copolymeric sequences should enable improved understanding of the sequence determinants of phase behavior such as the biases for liquids comprised of random coils versus solids based on linear, fibrillar aggregates.

Acknowledgments

The National Institutes of Health supported this work through Grant No. 5R01NS056114. We acknowledge usage of computational resources provided by the Center for High Performance Computing at Washington University in St. Louis. We are grateful to Jacob Gardner and Kilian Weinberger for introducing us to Bayesian optimization methods and providing the initial code for implementing the GPBO part of this work. We thank Rahul Das, Alex Holehouse, and Jason Wagoner for helpful discussions during the early stages of this work. Gregory Bowman, Jianhan Chen, Gerhard Hummer, and Andreas Vitalis contributed several insightful critiques and suggestions that helped in revising the manuscript.

REFERENCES

  • 1.Schmidt H. B. and Goerlich D., Elife 4, e04251 (2015). 10.7554/elife.04251 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Beun L. H., Storm I. M., Werten M. W. T., de Wolf F. A., Stuart M. A. C., and de Vries R., Biomacromolecules 15, 3349 (2014). 10.1021/bm500826y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ando D., Zandi R., Kim Y. W., Colvin M., Rexach M., and Gopinathan A., Biophys. J. 106, 1997 (2014). 10.1016/j.bpj.2014.03.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Li L., Tong Z., Jia X., and Kiick K. L., Soft Matter 9, 665 (2013). 10.1039/C2SM26812D [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ge Z. and Liu S., Chem. Soc. Rev. 42, 7289 (2013). 10.1039/c3cs60048c [DOI] [PubMed] [Google Scholar]
  • 6.DiMarco R. L. and Heilshorn S. C., Adv. Mater. 24, 3923 (2012). 10.1002/adma.201200051 [DOI] [PubMed] [Google Scholar]
  • 7.Humenik M., Magdeburg M., and Scheibel T., J. Struct. Biol. 186, 431 (2014). 10.1016/j.jsb.2014.03.010 [DOI] [PubMed] [Google Scholar]
  • 8.Huang W., Rollett A., and Kaplan D. L., Expert Opin. Drug Delivery 12, 779 (2015). 10.1517/17425247.2015.989830 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Tokareva O., Jacobsen M., Buehler M., Wong J., and Kaplan D. L., Acta Biomater. 10, 1612 (2014). 10.1016/j.actbio.2013.08.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Rabotyagova O. S., Cebe P., and Kaplan D. L., Biomacromolecules 12, 269 (2011). 10.1021/bm100928x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Valluzzi R., Winkler S., Wilson D., and Kaplan D. L., Philos. Trans. R. Soc., B 357, 165 (2002). 10.1098/rstb.2001.1032 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Srinivasan N. and Kumar S., Wiley Interdiscip. Rev.: Nanomed. Nanobiotechnol. 4, 204 (2012). 10.1002/wnan.1160 [DOI] [PubMed] [Google Scholar]
  • 13.Rauscher S. and Pomes R., in Fuzziness: Structural Disorder in Protein Complexes, edited byFuxreiter M. and Tompa P. (Springer Science & Business Media, 2012), p. 159. [Google Scholar]
  • 14.Buell A. K., Galvagnion C., Gaspar R., Sparr E., Vendruscolo M., Knowles T. P. J., Linse S., and Dobson C. M., Proc. Natl. Acad. Sci. U. S. A. 111, 7671 (2014). 10.1073/pnas.1315346111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Pappu R. V., Wang X., Vitalis A., and Crick S. L., Arch. Biochem. Biophys. 469, 132 (2008). 10.1016/j.abb.2007.08.033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Treusch S. and Lindquist S., J. Cell Biol. 197, 369 (2012). 10.1083/jcb.201108146 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Toretsky J. A. and Wright P. E., J. Cell Biol. 206, 579 (2014). 10.1083/jcb.201404124 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Lee C., Occhipinti P., and Gladfelter A. S., J. Cell Biol. 208, 533 (2015). 10.1083/jcb.201407105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Weber S. C. and Brangwynne C. P., Curr. Biol. 25, 641 (2015). 10.1016/j.cub.2015.01.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Elbaum-Garfinkle S., Kim Y., Szczepaniak K., Chen C. C.-H., Eckmann C. R., Myong S., and Brangwynne C. P., Proc. Natl. Acad. Sci. U. S. A. 112, 7189 (2015). 10.1073/pnas.1504822112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Chiu Fan L., Brangwynne C. P., Gharakhani J., Hyman A. A., and Julicher F., Phys. Rev. Lett. 111, 088101 (2013). 10.1103/physrevlett.111.088101 [DOI] [PubMed] [Google Scholar]
  • 22.Brangwynne C. P., J. Cell Biol. 203, 875 (2013). 10.1083/jcb.201308087 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Uversky V. N., Kuznetsova I. M., Turoverov K. K., and Zaslavsky B., FEBS Lett. 589, 15 (2015). 10.1016/j.febslet.2014.11.028 [DOI] [PubMed] [Google Scholar]
  • 24.Patel A., Lee H. O., Jawerth L., Maharana S., Jahnel M., Hein M. Y., Stoynov S., Mahamid J., Saha S., Franzmann T. M., Pozniakovski A., Poser I., Maghelli N., Royer L. A., Weigert M., Myers E. W., Grill S., Drechsel D., Hyman A. A., and Alberti S., “A liquid-to-solid phase transition of the ALS protein FUS accelerated by disease mutation,” Cell 162(5), 1066 (2015). 10.1016/j.cell.2015.07.047 [DOI] [PubMed] [Google Scholar]
  • 25.Srinivasan N., Bhagawati M., Ananthanarayanan B., and Kumar S., Nat. Commun. 5, 5145 (2014). 10.1038/ncomms6145 [DOI] [PubMed] [Google Scholar]
  • 26.Malinovska L., Palm S., Gibson K., Verbavatz J. M., and Alberti S., “Dictyostelium discoideum has a highly Q/N-rich proteome and shows an unusual resilience to protein aggregation,” Proc. Nation. Acad. Sci. 112(20), E2620 (2015). 10.1073/pnas.1504459112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kwon I., Xiang S., Kato M., Wu L., Theodoropoulos P., Wang T., Kim J., Yun J., Xie Y., and McKnight S. L., Science 345, 1139 (2014). 10.1126/science.1254917 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Lai J., Koh C. H., Tjota M., Pieuchot L., Raman V., Chandrababu K. B., Yang D., Wong L., and Jedd G., Proc. Natl. Acad. Sci. U. S. A. 109, 15781 (2012). 10.1073/pnas.1207467109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kato M., Han T. W., Xie S., Shi K., Du X., Wu L. C., Mirzaei H., Goldsmith E. J., Longgood J., Pei J., Grishin N. V., Frantz D. E., Schneider J. W., Chen S., Li L., Sawaya M. R., Eisenberg D., Tycko R., and McKnight S. L., Cell 149, 753 (2012). 10.1016/j.cell.2012.04.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Han T. W., Kato M., Xie S., Wu L. C., Mirzaei H., Pei J., Chen M., Xie Y., Allen J., Xiao G., and McKnight S. L., Cell 149, 768 (2012). 10.1016/j.cell.2012.04.016 [DOI] [PubMed] [Google Scholar]
  • 31.van der Lee R., Buljan M., Lang B., Weatheritt R. J., Daughdrill G. W., Dunker A. K., Fuxreiter M., Gough J., Gsponer J., Jones D. T., Kim P. M., Kriwacki R. W., Oldfield C. J., Pappu R. V., Tompa P., Uversky V. N., Wright P. E., and Babu M. M., Chem. Rev. 114, 6589 (2014). 10.1021/cr400525m [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Uversky V. N., Biotechnol. J. 10, 356 (2015). 10.1002/biot.201400374 [DOI] [PubMed] [Google Scholar]
  • 33.Li N. K., Quiroz F. G., Hall C. K., Chilkoti A., and Yingling Y. G., Biomacromolecules 15, 3522 (2014). 10.1021/bm500658w [DOI] [PubMed] [Google Scholar]
  • 34.Das R. K., Ruff K. M., and Pappu R. V., Curr. Opin. Struct. Biol. 32, 102 (2015). 10.1016/j.sbi.2015.03.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Binder K., Muller M., Virnau P., and MacDowell L. G., in Advanced Computer Simulation Approaches for Soft Matter Sciences I, edited by Holm C. and Kremer K. (Springer, 2005), p. 1. [Google Scholar]
  • 36.Murtola T., Bunker A., Vattulainen I., Deserno M., and Karttunen M., Phys. Chem. Chem. Phys. 11, 1869 (2009). 10.1039/b818051b [DOI] [PubMed] [Google Scholar]
  • 37.Brini E., Algaer E. A., Ganguly P., Li C., Rodriguez-Ropero F., and van der Vegt N. F. A., Soft Matter 9, 2108 (2013). 10.1039/C2SM27201F [DOI] [Google Scholar]
  • 38.Li Y., Abberton B. C., Kroeger M., and Liu W. K., Polymers 5, 751 (2013). 10.3390/polym5020751 [DOI] [Google Scholar]
  • 39.Noid W. G., J. Chem. Phys. 139, 090901 (2013). 10.1063/1.4818908 [DOI] [PubMed] [Google Scholar]
  • 40.Marrink S. J., Risselada H. J., Yefimov S., Tieleman D. P., and de Vries A. H., J. Phys. Chem. B 111, 7812 (2007). 10.1021/jp071097f [DOI] [PubMed] [Google Scholar]
  • 41.Kar P., Gopal S. M., Cheng Y. M., Predeus A., and Feig M., J. Chem. Theory Comput. 9, 3769 (2013). 10.1021/ct400230y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Frembgen-Kesner T., Andrews C. T., Li S., Nguyet Anh N., Shubert S. A., Jain A., Olayiwola O. J., Weishaar M. R., and Elcock A. H., J. Chem. Theory Comput. 11, 2341 (2015). 10.1021/acs.jctc.5b00038 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Andrews C. T. and Elcock A. H., J. Chem. Theory Comput. 10, 5178 (2014). 10.1021/ct5006328 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.van Hoof B., Markvoort A. J., van Santen R. A., and Hilbers P. A. J., Biophys. J. 100, 309a (2011). 10.1016/j.bpj.2010.12.1888 [DOI] [Google Scholar]
  • 45.Moore T. C., Iacovella C. R., and McCabe C., J. Chem. Phys. 140, 224104 (2014). 10.1063/1.4880555 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Seeger M., J. Int. Neural Syst. 14, 69 (2004). 10.1142/S0129065704001899 [DOI] [PubMed] [Google Scholar]
  • 47.Davtyan A., Dama J. F., Voth G. A., and Andersen H. C., J. Chem. Phys. 142, 154104 (2015). 10.1063/1.4917454 [DOI] [PubMed] [Google Scholar]
  • 48.Lanyuan L., Dama J. F., and Voth G. A., J. Chem. Phys. 139, 121906 (2013). 10.1063/1.4811667 [DOI] [PubMed] [Google Scholar]
  • 49.Dama J. F., Sinitskiy A. V., McCullagh M., Weare J., Roux B., Dinner A. R., and Voth G. A., J. Chem. Theory Comput. 9, 2466 (2013). 10.1021/ct4000444 [DOI] [PubMed] [Google Scholar]
  • 50.R. D. Hills, Jr., Lu L., and Voth G. A., PLoS Comput. Biol. 6, e1000827 (2010). 10.1371/journal.pcbi.1000827 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Pu L., Qiang S., Daume III H., and Voth G. A., J. Chem. Phys. 129, 214114 (2008). 10.1063/1.3033218 [DOI] [PubMed] [Google Scholar]
  • 52.Izvekov S., Swanson J. M. J., and Voth G. A., J. Phys. Chem. B 112, 4711 (2008). 10.1021/jp710339n [DOI] [PubMed] [Google Scholar]
  • 53.Noid W. G., Chu J.-W., Ayton G. S., and Voth G. A., J. Phys. Chem. B 111, 4116 (2007). 10.1021/jp068549t [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Chu J. W. and Voth G. A., Biophys. J. 90, 1572 (2006). 10.1529/biophysj.105.073924 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Chu J. W., Izvekov S., and Voth G. A., Mol. Simul. 32, 211 (2006). 10.1080/08927020600612221 [DOI] [Google Scholar]
  • 56.Wang Y. and Voth G. A., J. Phys. Chem B 114, 8735 (2010). 10.1021/jp1007768 [DOI] [PubMed] [Google Scholar]
  • 57.Hone T. D., Izvekov S., and Voth G. A., J. Chem. Phys. 122, 054105 (2005). 10.1063/1.1836731 [DOI] [PubMed] [Google Scholar]
  • 58.Izvekov S., Parrinello M., Burnham C. J., and Voth G. A., J. Chem. Phys. 120, 10896 (2004). 10.1063/1.1739396 [DOI] [PubMed] [Google Scholar]
  • 59.Rudzinski J. F. and Noid W. G., J. Phys. Chem. B 118, 8295 (2014). 10.1021/jp501694z [DOI] [PubMed] [Google Scholar]
  • 60.Mullinax J. W. and Noid W. G., J. Phys. Chem. C 114, 5661 (2010). 10.1021/jp9073976 [DOI] [Google Scholar]
  • 61.Mullinax J. W. and Noid W. G., Phys. Rev. Lett. 103, 198104 (2009). 10.1103/physrevlett.103.198104 [DOI] [PubMed] [Google Scholar]
  • 62.Chaimovich A. and Shell M. S., J. Chem. Phys. 134, 094112 (2011). 10.1063/1.3557038 [DOI] [PubMed] [Google Scholar]
  • 63.Chaimovich A. and Shell M. S., Phys. Rev. E 81, 060104(R) (2010). 10.1103/physreve.81.060104 [DOI] [PubMed] [Google Scholar]
  • 64.Shell M. S., J. Chem. Phys. 129, 144108 (2008). 10.1063/1.2992060 [DOI] [PubMed] [Google Scholar]
  • 65.Das R. K. and Pappu R. V., Proc. Natl. Acad. Sci. U. S. A. 110, 13392 (2013). 10.1073/pnas.1304749110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Auton M. and Bolen D. W., Methods Enzymol. 428, 397 (2007). 10.1016/s0076-6879(07)28023-1 [DOI] [PubMed] [Google Scholar]
  • 67.Gardner J., Kusner M., Xu Z., Weinberger K., and Cunningham J., in Proceedings of the 31st International Conference on Machine Learning (ICML-14) (JMLR Workshop and Conference Proceedings, 2014), p. 937. [Google Scholar]
  • 68.Mittal A., Das R. K., Vitalis A., and Pappu R. V., in Computational Approaches to Protein Dynamics: From Quantum to Coarse-Grained Methods, edited by Fuxreiter M. (CRC Press, Boca Raton, FL, 2015). [Google Scholar]
  • 69.Vitalis A. and Pappu R. V., J. Comput. Chem. 30, 673 (2009). 10.1002/jcc.21005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Radhakrishnan A., Vitalis A., Mao A. H., Steffen A. T., and Pappu R. V., J. Phys. Chem. B 116, 6862 (2012). 10.1021/jp212637r [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Vitalis A. and Pappu R. V., Annu. Rep. Comput. Chem. 5, 49 (2009). 10.1016/s1574-1400(09)00503-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Mao A. H., Crick S. L., Vitalis A., Chicoine C. L., and Pappu R. V., Proc. Natl. Acad. Sci. U. S. A. 107, 8183 (2010). 10.1073/pnas.0911107107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.See supplementary material at http://dx.doi.org/10.1063/1.4935066 E-JCPSA6-143-021598 for supplementary figures and additional analysis as well as description of methods.
  • 74.Lee C. C., Walters R. H., and Murphy R. M., Biochemistry 46, 12810 (2007). 10.1021/bi700806c [DOI] [PubMed] [Google Scholar]
  • 75.Landrum E. and Wetzel R., J. Biol. Chem. 289, 10254 (2014). 10.1074/jbc.C114.552943 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Wetzel R., J. Mol. Biol. 421, 466 (2012). 10.1016/j.jmb.2012.01.030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Walker F. O., Lancet 369, 218 (2007). 10.1016/S0140-6736(07)60111-1 [DOI] [PubMed] [Google Scholar]
  • 78.Becher M. W., Kotzuk J. A., Sharp A. H., Davies S. W., Bates G. P., Price D. L., and Ross C. A., Neurobiol. Dis. 4, 387 (1998). 10.1006/nbdi.1998.0168 [DOI] [PubMed] [Google Scholar]
  • 79.Crick S. L., Jayaraman M., Frieden C., Wetzel R., and Pappu R. V., Proc. Natl. Acad. Sci. U. S. A. 103, 16764 (2006). 10.1073/pnas.0608175103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Holehouse A. S., Garai K., Lyle N., Vitalis A., and Pappu R. V., J. Am. Chem. Soc. 137, 2984 (2015). 10.1021/ja512062h [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Crick S. L., Ruff K. M., Garai K., Frieden C., and Pappu R. V., Proc. Natl. Acad. Sci. U. S. A. 110, 20075 (2013). 10.1073/pnas.1320626110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Serio T. R., Cashikar A. G., Kowal A. S., Sawicki G. J., Moslehi J. J., Serpell L., Arnsdorf M. F., and Lindquist S. L., Science 289, 1317 (2000). 10.1126/science.289.5483.1317 [DOI] [PubMed] [Google Scholar]
  • 83.Walters R. H. and Murphy R. M., J. Mol. Biol. 412, 505 (2011). 10.1016/j.jmb.2011.07.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Vitalis A. and Pappu R. V., Biophys. Chem. 159, 14 (2011). 10.1016/j.bpc.2011.04.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.ten Wolde P. R. and Frenkel D., Science 277, 1975 (1997). 10.1126/science.277.5334.1975 [DOI] [PubMed] [Google Scholar]
  • 86.Tobelmann M. D. and Murphy R. M., Biophys. J. 100, 2773 (2011). 10.1016/j.bpj.2011.04.028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Duennwald M. L., Jagadish S., Muchowski P. J., and Lindquist S., Proc. Natl. Acad. Sci. U. S. A. 103, 11045 (2006). 10.1073/pnas.0604547103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Sahoo B., Singer D., Kodali R., Zuchner T., and Wetzel R., Biochemistry 53, 3897 (2014). 10.1021/bi500300c [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Mishra R., Jayaraman M., Roland B. P., Landrum E., Fullam T., Kodali R., Thakur A. K., Arduini I., and Wetzel R., J. Mol. Biol. 415, 900 (2012). 10.1016/j.jmb.2011.12.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Mishra R., Hoop C. L., Kodali R., Sahoo B., van der Wel P. C. A., and Wetzel R., J. Mol. Biol. 424, 1 (2012). 10.1016/j.jmb.2012.09.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Jayaraman M., Mishra R., Kodali R., Thakur A. K., Koharudin L. M. I., Gronenborn A. M., and Wetzel R., Biochemistry 51, 2706 (2012). 10.1021/bi3000929 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Jayaraman M., Kodali R., Sahoo B., Thakur A. K., Mayasundari A., Mishra R., Peterson C. B., and Wetzel R., J. Mol. Biol. 415, 881 (2012). 10.1016/j.jmb.2011.12.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Sivanandam V. N., Jayaraman M., Hoop C. L., Kodali R., Wetzel R., and van der Wel P. C. A., J. Am. Chem. Soc. 133, 4558 (2011). 10.1021/ja110715f [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Tam S., Spiess C., Auyeung W., Joachimiak L., Chen B., Poirier M. A., and Frydman J., Nat. Struct. Mol. Biol. 16, 1279 (2009). 10.1038/nsmb.1700 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Thakur A. K., Jayaraman M., Mishra R., Thakur M., Chellgren V. M., Byeon I.-J. L., Anjum D. H., Kodali R., Creamer T. P., Conway J. F., Gronenborn A. M., and Wetzel R., Nat. Struct. Mol. Biol. 16, 380 (2009). 10.1038/nsmb.1570 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Williamson T. E., Vitalis A., Crick S. L., and Pappu R. V., J. Mol. Biol. 396, 1295 (2010). 10.1016/j.jmb.2009.12.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Kokona B., Rosenthal Z. P., and Fairman R., Biochemistry 53, 6738 (2014). 10.1021/bi500449a [DOI] [PubMed] [Google Scholar]
  • 98.Vissers T., Smallenburg F., Munao G., Preisler Z., and Sciortino F., J. Chem. Phys. 140, 144902 (2014). 10.1063/1.4869834 [DOI] [PubMed] [Google Scholar]
  • 99.Preisler Z., Vissers T., Munao G., Smallenburg F., and Sciortino F., Soft Matter 10, 5121 (2014). 10.1039/c4sm00505h [DOI] [PubMed] [Google Scholar]
  • 100.Preisler Z., Vissers T., Smallenburg F., Munao G., and Sciortino F., J. Phys. Chem. B 117, 9540 (2013). 10.1021/jp404053t [DOI] [PubMed] [Google Scholar]
  • 101.Sciortino F. and Zaccarelli E., Curr. Opin. Solid State Mater. Sci. 15, 246 (2011). 10.1016/j.cossms.2011.07.003 [DOI] [Google Scholar]
  • 102.Ruzicka B., Zaccarelli E., Zulian L., Angelini R., Sztucki M., Moussaid A., Narayanan T., and Sciortino F., Nat. Mater. 10, 56 (2011). 10.1038/nmat2921 [DOI] [PubMed] [Google Scholar]
  • 103.Sciortino F., Giacometti A., and Pastore G., Phys. Chem. Chem. Phys. 12, 11869 (2010). 10.1039/c0cp00504e [DOI] [PubMed] [Google Scholar]
  • 104.Sciortino F., Collect. Czech. Chem. Commun. 75, 349 (2010). 10.1135/cccc2009109 [DOI] [Google Scholar]
  • 105.Giacometti A., Lado F., Largo J., Pastore G., and Sciortino F., J. Chem. Phys. 132, 174110 (2010). 10.1063/1.3415490 [DOI] [PubMed] [Google Scholar]
  • 106.Bianchi E., Tarlaglia P., Zaccarelli E., and Sciortino F., J. Chem. Phys. 128, 144504 (2008). 10.1063/1.2888997 [DOI] [PubMed] [Google Scholar]
  • 107.Foffi G. and Sciortino F., J. Phys. Chem. B 111, 9702 (2007). 10.1021/jp074253r [DOI] [PubMed] [Google Scholar]
  • 108.Bianchi E., Largo J., Tartaglia P., Zaccarelli E., and Sciortino F., Phys. Rev. Lett. 97, 168301 (2006). 10.1103/physrevlett.97.168301 [DOI] [PubMed] [Google Scholar]
  • 109.Newman M. E. J., Proc. Natl. Acad. Sci. U. S. A. 103, 8577 (2006). 10.1073/pnas.0601602103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Leicht E. A. and Newman M. E. J., Phys. Rev. Lett. 100, 118703 (2008). 10.1103/physrevlett.100.118703 [DOI] [PubMed] [Google Scholar]
  • 111.Sethi A., Tian J., Vu D. M., and Gnanakaran S., Biophys. J. 103, 748 (2012). 10.1016/j.bpj.2012.06.052 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Ruff K. M., Khan S. J., and Pappu R. V., Biophys. J. 107, 1226 (2014). 10.1016/j.bpj.2014.07.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Steinhauser M. O., J. Chem. Phys. 122, 094901 (2005). 10.1063/1.1846651 [DOI] [PubMed] [Google Scholar]
  • 114.Zhang J. and Muthukumar M., J. Chem. Phys. 130, 035102 (2009). 10.1063/1.3050295 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Baldan A., J. Mater. Sci. 37, 2171 (2002). 10.1023/A:1015388912729 [DOI] [Google Scholar]
  • 116.Baldan A., J. Mater. Sci. 37, 2379 (2002). 10.1023/A:1015408116016 [DOI] [Google Scholar]
  • 117.de Gennes P.-G., Scaling Concepts in Polymer Physics (Cornell University Press, Ithaca, London, 1979). [Google Scholar]
  • 118.Uematsu T., Svanberg C., and Jacobsson P., Macromolecules 38, 6227 (2005). 10.1021/ma050478t [DOI] [Google Scholar]
  • 119.Izvekov S. and Voth G. A., J. Phys. Chem. B 109, 2469 (2005). 10.1021/jp044629q [DOI] [PubMed] [Google Scholar]
  • 120.Lyman E., Ytreberg F. M., and Zuckerman D. M., Phys. Rev. Lett. 96, 028105 (2006). 10.1103/PhysRevLett.96.028105 [DOI] [PubMed] [Google Scholar]
  • 121.Zhang W. and Chen J., J. Chem. Theory Comput. 10, 918 (2014). 10.1021/ct500031v [DOI] [PubMed] [Google Scholar]
  • 122.Lee K. H. and Chen J., J. Comput. Chem. (published online 2015). 10.1002/jcc.23957 [DOI] [Google Scholar]
  • 123.Wang L.-P., Martinez T. J., and Pande V. S., J. Phys. Chem. Lett. 5, 1885 (2014). 10.1021/jz500737m [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Bengio Y., Courville A., and Vincent P., IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798 (2013). 10.1109/TPAMI.2013.50 [DOI] [PubMed] [Google Scholar]
  • 125.Salakhutdinov R. and Hinton G., Neural Comput. 24, 1967 (2012). 10.1162/NECO_a_00311 [DOI] [PubMed] [Google Scholar]
  • 126.Larochelle H., Mandel M., Pascanu R., and Bengio Y., J. Mach. Learn. Res. 13, 643 (2012). [Google Scholar]
  • 127.Mehta P. B. and Schwab D. J., e-print arXiv:1410.3831 (2014).
  • 128.Alberti S., Halfmann R., King O., Kapila A., and Lindquist S., Cell 137, 146 (2009). 10.1016/j.cell.2009.02.044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Halfmann R., Alberti S., Krishnan R., Lyle N., O’Donnell C. W., King O. D., Berger B., Pappu R. V., and Lindquist S., Mol. Cell 43, 72 (2011). 10.1016/j.molcel.2011.05.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Li P., Banjade S., Cheng H. C., Kim S., Chen B., Guo L., Llaguno M., Hollingsworth J. V., King D. S., Banani S. F., Russo P. S., Jiang Q. X., Nixon B. T., and Rosen M. K., Nature 483, 336 (2012). 10.1038/nature10879 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. See supplementary material at http://dx.doi.org/10.1063/1.4935066 E-JCPSA6-143-021598 for supplementary figures and additional analysis as well as description of methods.

Articles from The Journal of Chemical Physics are provided here courtesy of American Institute of Physics

RESOURCES