Yang et al. 10.1073/pnas.0707284105.

Supporting Information

Files in this Data Supplement:

SI Text
SI Figure 5
SI Figure 6
SI Figure 7
SI Figure 8
SI Figure 9
SI Figure 10
SI Figure 11




SI Figure 5

Fig. 5. Clustering results with various rmsd cutoffs, dc, for protein A (A) and Villin (B). Clusters that consist of >10 structures are shown for clarity. In all cases, structures in the cluster with the highest average connectivity, <k> show very low average rmsd values, <rmsd>, from the experimental structure. The cluster with the second highest <k> is composed of mirror images.





SI Figure 6

Fig. 6. Relaxation behavior of the total energy rmsd(t) and E(t) for protein A and Villin, at T » 300 K. Fits to double-exponential curves (red curves) are good in all four cases. Relaxation times tslow and tfast for each curve are given in the plots.





SI Figure 7

Fig. 7. A schematic view explaining the concept of a structural graph and flux, F. Each node (colored ovals) represents a single conformation, and edges (solid lines) are drawn between structurally similar conformations (as determined by the criterion d < dc, where d is a similarity measure). Different colors indicate conformations from different trajectories, and wavy lines indicate the direction of time t. A collection of nodes that are linked by edges (either directly or in several steps) belong to the same cluster. To determine the kinetic significance of a cluster, we use the concept of flux, F, which is defined as the fraction of all trajectories that pass although a cluster. Therefore, because all conformation will fold into the native state, this cluster will have F = 1 (rightmost cluster). Other clusters with F = 1, through which all trajectories have to pass, can be interpreted as obligatory intermediate states.





SI Figure 8

Fig. 8. Structural superimpositions of all TSE structures obtained by pfold analysis for protein A (Left) and Villin (Right). The H1-H2 hairpin is formed in TSE for both proteins. The average total helicity calculated from the whole chain, as measured by the criterion of Kabsch and Sander (6), is 81% and 65% for protein A and Villin, respectively. The average size of the ensembles, <Rg>, is 14.2 Å for protein A and 10.3 Å for Villin.





SI Figure 9

Fig. 9. Experimental F values of different types of mutations for protein A. Our calculation method based on the number of contacts only in wide-type can be regarded as WT®Gly mutation. It is surprising to see that this method can reproduce experimental F values for other types of mutations.





SI Figure 10

Fig. 10. Computational counterpart of FRET signal distributions for protein A. The probability distribution of end-to-end distances, 1/r6, in the native and unfolded states, respectively. r is determined as the Ca-Ca distance between residues 10 and 56 (numbering as in PDB ID code 1BDD). The native state is composed of the set of 147 lowest-E conformations of the respective 147 trajectories used to study folding of protein A. The unfolded ensemble is composed of all snapshot taken at times t < 20 million MC steps excluding the initial configurations.





SI Figure 11

Fig. 11. Simulated F values where the cutoff of 2 Å was used for being folded instead of 3 Å. The numbers of TSEs are 65 and 109 for protein A and Villin, respectively. Note that comparable results could be obtained regardless of the cutoff values. (For protein A, |Fexp - Fsim| = 0.14 on average.)





SI Text

Simulation. Protein A (1BDD) and Villin (1VII) were chosen as model proteins. Unstructured tails were truncated (residues 1-9; 57-60) for protein A, and W64 was mutated to A64 and a hydrophobic residue at the end, F76 was truncated for Villin. Starting from different random coil configurations, 2,000 independent Monte Carlo simulation for each protein were conducted at for 108 steps. The melting temperature for protein A, , was obtained from simulations at different temperatures (data not shown), whereas the experimental melting temperature is 346 K (1). The MC temperature thus corresponds to . Global and local moves were used for backbone rotation (2). To keep the detailed balance condition, a knowledge-based move (2, 3) was not used, and the local move set was modified (4, 5) (see below). For each trajectory, snapshots were stored at every 106 MC step and the energy minimum structure was recorded at the end of simulation.

The Side-Chain Torsional Angle Energy.

The side-chain torsional angle energy (J.S.Y., P. S. Kutchukian, and E.I.S., unpublished results) was obtained from the same database as our previous study (2) by

 [S1]

where and are the number of observations in the jth bin of a side-chain torsional angle cs of residue Ai and total number of observances subtracted by for a triplet consisting of Ai-1, Ai, and Ai+1, respectively. The bin width was 30° and the value of was chosen to make the net interaction zero.

Detailed Balance for the Local Move.

Dodd et al. (5) developed a local move that is composed of concerted rotation of seven adjacent bonds. They also showed that changes in these seven degrees of freedom are correlated, and therefore a new sampling method rather than the conventional Metropolis rule should be used to conserve detailed balance. We follow their procedure and probability of accepting a move from the old state o to the new state n is given by (5)

 , [S2]

where N is the number of solutions, U is the potential energy, T is temperature, and J is the Jacobian determinant.

p

fold Analysis.
Transition state ensemble and F values were obtained by the following procedure. From representative trajectories, structures that are just before entering the native cluster were selected as putative transition state structures. Starting from each of these structures, 100 short (106 MC steps) MC simulations were executed. If the trajectory finds a structure whose RMSD from the top-k structure is <3 Å, then it is considered to be folded. It should be noted that the simulated F values are not sensitive to the choice of this criterion (see SI Fig. 11). A structure with is regarded as a member of TSE, where pfold is the fraction of runs that folded out of 100 runs. The F values are calculated by

 , [S3]

where and are the average number of contacts at residue i for 10 top-k structures and for TSE, respectively.

1. Vu DM, Myers JK, Oas TG, Dyer RB (2004) Biochemistry 43:3582-3589.

2. Yang JS, Chen WW, Skolnick J, Shakhnovich EI (2007) Structure (London) 15:53-63.

3. Chen WW, Yang JS, Shakhnovich EI (2007) Proteins 66:682-688.

4. Coutsias EA, Seok C, Jacobson MP, Dill KA (2004) J Comput Chem 25:510-528.

5. Dodd LR, Boone TD, Theodorou DN (1993) Mol Phys 78:961-996.

6. Kabsch W, Sander C (1983) Biopolymers 22:2577-2637.