Skip to main content
Biophysical Journal logoLink to Biophysical Journal
. 2007 Dec 7;94(7):2516–2528. doi: 10.1529/biophysj.107.113225

Analyzing Forced Unfolding of Protein Tandems by Ordered Variates, 2: Dependent Unfolding Times

E Bura *, D K Klimov , V Barsegov
PMCID: PMC2267135  PMID: 18065466

Abstract

Statistical analyses of forced unfolding data for protein tandems, i.e., unfolding forces (force-ramp) and unfolding times (force-clamp), used in single-molecule dynamic force spectroscopy rely on the assumption that the unfolding transitions of individual protein domains are independent (uncorrelated) and characterized, respectively, by identically distributed unfolding forces and unfolding times. In our previous work, we showed that in the experimentally accessible piconewton force range, this assumption, which holds at a lower constant force, may break at an elevated force level, i.e., the unfolding transitions may become correlated when force is increased. In this work, we develop much needed statistical tests for assessing the independence of the unobserved forced unfolding times for individual protein domains in the tandem and equality of their parent distributions, which are based solely on the observed ordered unfolding times. The use and performance of these tests are illustrated through the analysis of unfolding times for computer models of protein tandems. The proposed tests can be used in force-clamp atomic force microscopy experiments to obtain accurate information on protein forced unfolding and to probe data on the presence of interdomain interactions. The order statistics-based formalism is extended to cover the analysis of correlated unfolding transitions. The use of order statistics leads naturally to the development of new kinetic models, which describe the probabilities of ordered unfolding transitions rather than the populations of chemical species.

INTRODUCTION

Most of mechanically active proteins perform their biological function in linear tandems of “head-to-tail” connected repeats. For example, ubiquitin (Ub), a naturally occurring multimer of identical Ub repeats, is involved in protein degradation and several signaling pathways (1,2). A giant protein, titin plays a crucial role in muscle contraction and relaxation. Titin spans almost half of the muscle sacromer and consists of ∼300 domains and 30,000 amino acids (3,4). There are two types of titin domains, immunoglobulin (Ig) and fibronectin (Fn) modules, which are linked in a tandem. The number of Ig domains varies from 37 to 90 in different titin molecules (5,6). Fibronectin is composed of ∼20 distinct Fn domains of type FnI–FnIII. FnIII contains multiple binding sites for integrin receptors of the extracellular matrix (5). In ddFLN, a dimeric filamin from Dictyostelium discoideum, and in human filamin (A protein), a single chain is composed of a rod-like tandem of several Ig domains (68). Filamins, which also form multidomain tandems, play an important role in cellular locomotion (6,7).

In single-molecule atomic force microscopy (AFM) experiments, the consecutive unfolding transitions of protein domains in a tandem or a polyprotein are analyzed by applying constant mechanical force (force-clamp mode) or time-dependent force (force-ramp) (914). In force-ramp AFM experiments, the force-induced unraveling of protein tandems results in sawtooth profiles of the unfolding forces, {f1, f2, …, fn}, which correspond to the unfolding of individual protein domains. In force-clamp AFM probes, the force-induced tension in the tandem chain results in the stepwise elongation of the tandem end-to-end distance, X. For the polyubiquitin chain (Ubn, 3 < n < 12), elongation of X in steps of ΔX ≈ 20 nm was used to identify the unfolding transitions in the individual Ub domains (15,16).

Current statistical analyses of forced unfolding data for protein tandems ((D)n) rely on the assumption that a), the forced unfolding transitions of individual domains (D) are mutually independent (uncorrelated), and that b), the recorded unfolding forces (force-ramp) and unfolding times (force-clamp) are realizations of the same probability density function (pdf) (13,14,1618). Said differently, these analyses are based on the assumption that the unfolding times and forces form a set of independent identically distributed (iid) random variables. In our previous computer simulation studies of forced unfolding, hereafter referred to as Study 1 (19), we tested the validity of the “iid assumption” in an experimentally accessible piconewton range of applied constant force. We showed that the uncorrelated forced unfolding transitions, observed for the model tandem S2–S2–S2, become correlated when the applied force is increased (19).

In a typical force-clamp AFM experiment on a tandem D1D2– … −Dn, the recorded first, second, etc. forced unfolding times, t1:n, t2:n, …, tn:n, are ordered, i.e., t1:nt2:n ≤ …tn:n (19). Because any domain Di, i = 1, 2, …n, could have unfolded at any time, there is no direct correspondence between the observed ordered unfolding time data, {t1:n, t2:n, …, tn:n}, and the unobserved parent unfolding times {t1, t2, …, tn} for individual domains D1 (t1), D2 (t2), …, Dn (tn). The main goal of unfolding time data analysis is to characterize the forced unfolding times of individual domains. This is equivalent to inferring the parent unfolding time distributions for individual domains, ψ1(t) (D1), ψ2(t) (D2), …, ψn(t) (Dn), from the distributions of ordered time variates, t1:n, t2:n, …, tn:n. As we showed in Study 1 (19), only when the unfolding times are iid, which is the case for the uncorrelated unfolding times for a homogeneous tandem (D)n, the connection between ordered unfolding times and the parent densities is direct, and ψ (t) = ψ1(t) = ψ2(t) = … = ψn(t) can be estimated by combining all ordered time variates into a single histogram. However, when the parent distributions are nonidentical, i.e., ψ1(t) ≠ ψ2(t) ≠ … ≠ ψn(t) (heterogeneous tandem D1D2– … −Dn) and/or the unfolding times, t1, t2, …, tn are correlated (dependent), the relationship between the observed ordered time data and the unobserved parent time data is more complex, and data analysis based on the iid assumption is inappropriate. We will show in this study that when the unfolding times are correlated, the use of the iid assumption could result in an inaccurate description of protein unfolding. Hence, statistical tools for testing whether the iid assumption holds are much needed.

In the case of noninteracting domains, such as domains S2 in tandem S2–S2–S2 (Study 1), the emergence of correlations among the unfolding transitions is due to dynamic competition between the unfolding kinetics and tension propagation along the tandem chain (19). However, in wild-type protein tandems, correlations can also build up due to interdomain interactions. Recent experiments on tandems of I27–I28 repeats showed enhanced domain stabilization against applied pulling force, which causes the increase of the average unfolding force from 260 pN (for the tandem of domains I27) to 300 pN (for the tandem of I27–I28 repeats) (20). Similar domain stabilization effect has been reported for the tandem of FnIII domains (21). Also, recent force-ramp AFM measurements on the homogeneous tandems of fibrinogen, performed at a pulling speed of 1 μm/s, revealed that the consecutive unfolding transitions are strongly correlated (A. Brown and J. Weisel, University of Pennsylvania Medical School, private communication, 2008). This behavior is most likely due to interaction between fibrinogen's αC-domains and its central region (22).

These experimental findings demonstrate the importance of the inter- and intramolecular protein-protein interactions and show that current AFM technology can be used to probe these interactions by analyzing correlated (dependent) forced unfolding transitions in protein tandems. In force-ramp AFM measurements on protein tandems, mutual independence between the unfolding transitions can be accessed by applying standard tests for independence, such as the Pearson correlation (23), Spearman rank correlation coefficient (23), or Hoeffding's D statistic- (24,25) based test to the recorded unfolding forces. In the case of force-clamp AFM measurements, however, the observed forced unfolding times are ordered. To assess independence of the parent unfolding times, one would have to use statistical tests designed to detect possible correlations of the unobserved unfolding time data by analyzing the observed ordered unfolding times. Yet such tests do not exist. Standard tests for independence can only be applied to the unobserved parent unfolding times. In this study, we develop statistical tools for assessing 1), independence of the forced unfolding times and 2), equality of their (parent) pdfs from observed ordered time data. We illustrate the use of these tests by analyzing the unfolding times for a model of the homogeneous dimer S2–S2 and the heterogeneous dimer S2–S1 of connected domains S2 and S1.

To model correlated unfolding transitions and interdomain interactions in protein tandems, novel theoretical approaches that go beyond the iid assumption are needed. In Study 1, we introduced an order statistics-based approach to analyze the ordered unfolding transitions in protein tandems (19). The key elements of the order statistics formalism are the cumulative distribution function (cdf) of the r-th order statistic (r = 1, …, n) in a tandem of length n, Φr:n(t) ≡ Prob(tr:nt), and the corresponding probability density function (pdf), φr:n(t) = dΦr:n(t)/dt. Because the order statistics cdfs and pdfs, Φr:n(t) and φr:n(t), depend on the parent cdfs and pdfs, Ψ(t) and ψ (t), order statistics-based theory can be used to infer Ψ(t) and ψ (t) from the ordered time data. In this study, we extend the use of order statistics to analyzing correlated unfolding transitions in model tandems S2–S2 and S2–S1, characterized by dependent and identically distributed (did) and dependent and nonidentically distributed (dnid) unfolding times, respectively. In our test studies, we use single domains S2 and S1, and the dimers S2–S2 and S2–S1 to represent protein tandems of short and long length, respectively. The order statistics-based analysis, presented here, can be performed by using experimental unfolding time data for homogeneous as well as heterogeneous tandems of any length. In AFM experiments on a tandem (D)N of length, say N = 12, the unfolding data for short (long) tandems can be obtained by grouping together and analyzing separately the unfolding times for tandems of length n = 1–3 (n = 9–12). Because in a typical AFM experiment the cantilever tip randomly picks up a tandem of any length n, 1 ≤ nN, this can always be done.

The rest of this study is organized as follows. First, we describe Langevin dynamics simulations of the forced unfolding for single domains S2 and S1, and tandems S2–S2 and S2–S1. Second, we model the unfolding time distributions for single domains S2 and S1. The models of forced unfolding for single domains are used to assess the prediction accuracy of the order statistics-based analysis. Third, we perform a preliminary analysis of the forced unfolding times for tandems S2–S2 and S2–S1. Because in computer simulations we can access the parent unfolding times, we use standard tests for independence, based on Spearman rank correlation coefficient and Hoeffding's D statistic, and the quantile-quantile (Q-Q) plots to probe, respectively, the independence of unfolding times and their distributional equality. This allows us to classify the forced unfolding times as iid, inid, did, and dnid random variables (Table 5, Study 1) (19). Next, we use these data to generate ordered time variates, as observed in force-clamp experiments. The ordered unfolding times are then used to assess the performance of proposed tests for independence of the unobserved (parent) unfolding times and equality of their (parent) distributions. Finally, the dependent (did and dnid) unfolding times are used to illustrate the order statistics-based analysis of correlated unfolding transitions in tandems S2–S2 and S2–S1.

METHODS

Langevin dynamics simulations of tandem S2–S2 and S2–S1

We performed Langevin simulations of forced unfolding using coarse-grained models of the homogeneous dimer S2–S2 and the heterogeneous dimer S2–S1, formed by domains S2 and S1 (Fig. 1) (26,27). The off-lattice Cα-based coarse-grained model of protein tandems serves as a conceptual representation of the wild-type multidomain proteins (2730).

FIGURE 1.

FIGURE 1

(a) Model β-barrel proteins S1 (left) and S2 (right), formed by the hydrophobic (in blue), hydrophilic (in red), and neutral Gly residues (in gray). In the native state of S1, the terminal strands β1 and β4 (shown by yellow circles) form a rigid and highly stable native core; the native core of S2 involves the nonterminal strands β2 and β3, and the terminal strand β4 is flexible. (b and c) The homogeneous tandem S2–S2 (b) and the heterogeneous tandem S2–S1 (c) of S2 domain (shown in red) and S1 domain (yellow), connected “head-to-tail” by a flexible linker (shown in green). The linker is composed of five Gly residues. Constant mechanical force, f, is applied to the N-terminal of the first domain S2 and the C-terminal of the second domain S2 (S1) in the tandem S2–S2 (S2–S1). The arrow indicates the direction of applied force.

Tandem construction

The domains S2 and S1 consist of 46 hydrophobic (B), hydrophilic (L), and neutral (N) residues. Each bead is represented by a united atom at the position of the Cα atom (Fig. 1). The distance between Cα carbons is a = 3.8 Å. The tandems S2–S2 and S2–S1 are constructed by connecting domains S2 and S1 “head-to-tail” by a flexible linker of five Gly residues (Fig. 1) (19). The potential energy V = VBL + VBA + VDIH + VNB includes the bond-length potential VBL, bond-angle potential VBA, dihedral angle potential VDIH, and nonbonded potential VNB (26,30). The nonbonded distance R dependent interaction between a pair of B residues is given by Inline graphic where λ accounts for variation in the strength of hydrophobic interactions, and ɛh = 1.25 kcal/mol is the average strength of hydrophobic contacts. In the native state, S2 and S1 form four-stranded β-barrels, stabilized by Q0 = 106 native contacts (6.8 Å cut-off), with the potential energies of −85.5 kcal/mol and −88.0 kcal/mol, respectively. Interdomain interactions are limited to steric repulsion.

Forced unfolding

The forced unfolding kinetics are obtained by integrating the Langevin equations for each residue coordinate xj, subject to the total potential Vtot = VfX, i.e., ηdxj/dt = − ∂Vtot/∂xj + gj(t), where η is the friction coefficient and gj is Gaussian white noise. The force f = fn of magnitude f is applied to C- and N-terminals of the tandem in the direction of the end-to-end vector X (Fig. 1). Numerical integration is performed with a step size δt = 0.05τL, where τL = (ma2/ɛh)1/2 = 3 ps is the unit of time, and m ≈ 3 × 10−22 g is the residue mass. The simulation temperature Ts = 0.69ɛh /kB < TF ≈ 0.79ɛh /kB, where TF ≈ 0.79ɛh /kB is the equilibrium folding temperature for S1 and S2, is defined as the temperature at which the average fraction of contacts 〈Q(Ts)〉 ≈ 0.7Q0. The unfolding time for domain S2 (or S1) is defined as the time at which all contacts are disrupted. Throughout this study, the unfolding times and rates are expressed in terms of the number of integration steps Ntot (t = Ntotδt).

Preliminary analysis of the unfolding times for S2–S2 and S2–S1

To prepare the stage for the use of order statistics, we analyze the forced unfolding times for single S2 and S1 domains, and characterize their parent pdfs, ψS2(t) and ψS1(t). We also analyze the parent unfolding times for first (S21) and second (S22) domain in tandem S2–S2, and first (S21) domain and second (S12) domain in tandem S2–S1 for independence and equality of their parent pdfs. The tests used in this section should not be confused with the statistical tests for independence and distributional equality for ordered unfolding times introduced in the following section.

Unfolding times for single domains S2 and S1

Histograms of the unfolding times for single S2 and S1 domains, obtained at constant force f = 66 pN and f = 88 pN, and corresponding nonparametric density estimates are presented in Fig. 2. A nonparametric density estimate provides a visual assessment of the distribution and fits the density by locally weighting the observations (19,31,32). In force-clamp AFM experiments on a protein tandem of length n, a suitable model for the parent unfolding time pdfs can be obtained by using trial densities for the distribution of the first unfolding times, φ1:n(t), and fitting φ1:n(t) to the histograms of the first unfolding times {t1:n} (see Eqs. 7 and 8 in the next section). Here, as in Study 1, we used the Gamma density to describe the parent unfolding time pdfs for single domains S2 and S1,

graphic file with name M2.gif (1)

where α and k are the shape parameter and unfolding rate, respectively, and Γ(α) ≡ (α − 1)! (19). The Q-Q plots of the unfolding times for single domains S2 and S1 versus unfolding times for the Gamma distribution (Eq. 1) are displayed in Fig. 3. A Q-Q plot is a graphical technique for determining whether two data sets come from populations with common distribution (19). If the two sets have the same distribution, the points fall along the 45° reference line. Gamma provides a good fit to the unfolding times for S2 and S1 domains. The parameters of the Gamma distribution were computed using the maximum likelihood estimation method described in Study 1 (19). The maximum likelihood estimates of α and k for single S2 and S1 domains, which were used to compute the Gamma quantiles in the Q-Q plots, are reported in Table 1. The difference in the obtained parameter values shows clearly that the unfolding times for single S2 and S1 domains are nonidentically distributed.

FIGURE 2.

FIGURE 2

Histograms (bars) of the forced unfolding times for single S2 domain (a and b) and single S1 domain (c and d) obtained at constant force f = 66 pN (a and c) and f = 88 pN (b and d). The overlaid curves are the nonparametric density estimate (the bandwidth bw = 0.9 × min(Inline graphic used in the calculations is the default value used in the R software for statistical computing (36), where SD is the standard deviation, and IQR is the interquantile range of the data (32)). In the histograms presented here and in Figs. 4 and 7, the number of bins nb and the bandwidth bw are estimated as described above. In this figure and in Figs. 3–7, the time t is expressed in units of the number of integration steps Ntot (t = Ntot0.15 ps).

FIGURE 3.

FIGURE 3

Q-Q plots of the forced unfolding times for single S2 domain (a and b) and S1 domain (c and d), obtained at f = 66 pN (a and c) and f = 88 pN (b and d) versus quantiles of the Gamma density (Eq. (1)). The dashed line is the 45° reference line.

TABLE 1.

Maximum likelihood estimates and 95% standard errors for the dimensionless shape parameter α and unfolding rate k (in units of integration step) for single domains S2 and S1 obtained at f = 66 pN and f = 88 pN

Force, pN αS2 αS1 kS2 kS1
66 0.98 ± 0.09 4.62 ± 0.18 (2.27 ± 0.26) × 10−7 (1.52 ± 0.06) × 10−5
88 2.02 ± 0.14 7.03 ± 0.23 (3.84 ± 0.31) × 10−6 (4.59 ± 0.16) × 10−5

Unfolding times for tandems S2–S2 and S2–S1

The unfolding time histograms for domains S2 and S1 are shown in Fig. 4. The parent unfolding times for the first S21 domain (t1) and second S22 domain (t2) in tandem S2–S2, and unfolding times for the first S21 domain (t1), and second S12 domain (t2) in tandem S2–S1, were analyzed for independence and equality of their parent distributions.

FIGURE 4.

FIGURE 4

Histograms (bars) and nonparametric density estimates (curves) of the unfolding times for the first S21 domain (a) and second S22 domain (b) in tandem S2–S2, and for the first S21 domain (c) and second S12 domain (d) in tandem S2–S1, obtained at f = 88 pN.

Test for independence of unfolding times

In Study 1, we used the Spearman rank correlation coefficient (23,33), a nonparametric and scale-invariant measure of dependence. This measure detects linear and some nonlinear yet always monotonic relationships between two data sets {t1} and {t2}, when the sets either change in the same or in the opposite direction, i.e., when the values {t1} and {t2} both increase or decrease, or when the values {t1} always increase (decrease) while the values {t2} decrease (increase). Hoeffding's nonparametric test for independence, described in Appendix A (24,25), and its asymptotic equivalent (34) detect all dependence alternatives, including highly nonmonotonic relationships. The values of D range from −0.5 to 1, with larger D(t1, t2) values signifying stronger dependence between t1 and t2. In statistical data analyses, both tests of independence are typically carried out so that monotonic as well as nonmonotonic associations between two variables can be detected.

The values of D(t1, t2) and the Spearman rank correlations for the unfolding times {t1} and {t2} obtained at f = 66 pN and f = 88 pN for tandems S2–S2 and S2–S1 are reported in Table 2. The associated p-values for testing independence are given in parentheses. The threshold p-value, which represents the level of tolerance for rejecting the independence hypothesis, was set to 0.01 (in statistical hypothesis testing, the null is rejected if the p-value does not exceed the threshold). At f = 66 pN, both dependence measures conclude that domains S21 and S22 in tandem S2–S2 and domains S21 and S12 in tandem S2–S1 unfold independently. In contrast, at f = 88 pN, Hoeffding's test for independence finds the forced unfolding times for the same domains in the tandems S2–S2 and S2–S1 to be dependent. The Spearman rank correlation coefficient test, on the other hand, is not significant at level 0.01 for either tandem and does not detect dependence. Since Hoeffding's test is significant, the dependence between the unfolding times for the two domains in both tandems, obtained at f = 88 pN, is nonmonotonic. This result supports our previous finding (Study 1) that increasing the magnitude of applied force, f, may result in dependent unfolding transitions (19).

TABLE 2.

Preliminary analysis of the forced unfolding times

f = 66 pN
f = 88 pN
Tandem Hoeffding's D Spearman correlation Hoeffding's D Spearman correlation
S2–S2 0.0003 (0.25) −0.06 (0.15) 0.0032 (0.01) −0.03 (0.59)
S2–S1 −6.08291 × 10−6 (0.37) −0.05 (0.26) 0.0043 (0.0052) −0.10 (0.02)

Hoeffding's D statistics and Spearman rank correlation coefficients of the unfolding times for domains S21 and S22 in tandem S2–S2, and domains S21 and S12 in tandem S2–S1, obtained at f = 66 pN and f = 88 pN. The numbers in parentheses are the p-values for testing for independence of the two variables.

Test for equality of unfolding time pdfs

Q-Q plots were used for the empirical assessment of the equality of the unfolding time pdfs for domains S2 and S1 in tandems S2–S2 and S2–S1 (Fig. 5). The Q-Q plot for the first S21 domain against the second S22 domain in tandem S2–S2, obtained at f = 66 pN, shows that almost all data points fall on the reference line, indicating equality of the parent pdfs, i.e., Inline graphic A small parallel deviation of the time quantiles from the reference line for the same domains, obtained at increased force f = 88 pN, indicates only approximate distributional equality, i.e., Inline graphic Indeed, the unfolding times of the S21 domain are consistently shorter than the unfolding times of the S22 domain by a small time constant, Δt ≈ 0.4 × 106. This can also be seen by comparing the unfolding time histograms (Fig. 4, a and b). This time difference (Δt) induces the dependence detected by Hoeffding's D statistic. The Q-Q plots for the first S21 domain against the second S12 domain in tandem S2–S1 strongly indicate lack of equality of the parent pdfs both at f = 66 pN and f = 88 pN, i.e., Inline graphic (Fig. 5, c and d). This can also be seen from the bimodal shape of the unfolding time density for the S2 domain (Fig. 4 c).

FIGURE 5.

FIGURE 5

Q-Q plots of the unfolding times for the first domain S21 (t1) versus the second domain S22 (t2) in tandem S2–S2, obtained at f = 66 pN (a) and f = 88 pN (b), and for the first domain S21 (t1) versus the second domain S12 (t2) in tandem S2–S1, obtained at f = 66 pN (c) and f = 88 pN (d).

To summarize this section, we showed that the parent unfolding times for S2 domains in tandem S2–S2 are iid for f = 66 pN and did for f = 88 pN, whereas the parent unfolding times for S2 and S1 domains in tandem S2–S1 are inid for f = 66 pN and dnid for f = 88 pN.

RESULTS

We use simulated unfolding time data for model tandems S2–S2 and S2–S1 to assess the performance of the proposed tests for independence of the (parent) unfolding times and equality of the parent unfolding time pdfs from the ordered time data, t1:nt2:n ≤ … ≤ tn:n. To generate ordered time variates as observed in force-clamp AFM experiments, the unfolding times {t1} and {t2} for domains S21 ({t1}) and S22 ({t2}) in tandem S2–S2, and domains S21 ({t1}) and S12 ({t2}) in tandem S2–S1 were rearranged in increasing time order. That is, tmin < tmax, where tmin = min(t1, t2) and tmax = max(t1, t2) are the minimum and maximum unfolding times, respectively. The ordered variates from 500 runs for each dimer were grouped into ordered sets of the first {tmin} = {t1:2}, and second {tmax} = {t2:2} unfolding times.

Testing equality of the parent unfolding time pdfs by analyzing ordered time data

A simple empirical test for assessing distributional equality of the parent unfolding time pdfs for individual domains in a tandem D1D2– … −Dn can be based on a recurrence relation for order statistics (19). When the forced unfolding times are iid, the pdfs of the r-th and (r + 1)-st unfolding times (order statistics) in a tandem of length n are related to the pdf of the r-th unfolding times in a tandem of length n − 1 via the recurrence relation (34,35)

graphic file with name M7.gif (2)

Equation 2 also holds when the unfolding times are “exchangeable”, i.e., when they are identically distributed but could be dependent (did) (34,35), and when the parent unfolding time pdfs are identical in the sense that they have the same shape but may differ in the location of the peak, which quantifies the most probable unfolding time t*. This is the case for tandem S2–S2, at f = 88 pN. Hence, Eq. 2 applies both when the parent unfolding times for domains Di and Dj are strictly identically distributed, and when the unfolding times for, say, domain Dj are “shifted” from the unfolding times for domain Di by a time constant Δt = |tj* − ti*|.

By applying Eq. 2 recursively, we can obtain the parent unfolding time pdf for a single domain D, ψ (t) ≡ φ1:1(t), i.e.,

graphic file with name M8.gif (3)

Equation 3 provides a means to infer the parent distribution for a domain in a tandem from the order statistics pdfs φr:n, 1 ≤ rn, when the forced unfolding times are iid or did; that is, regardless of their dependence structure. In particular, Eq. 3 implies that when the unfolding times are identically distributed with common parent pdf ψ (t), then the latter can be obtained by “mixing” all the order statistics pdfs, φr:n, r = 1, 2, …, n, with equal weight 1/n, i.e.,

graphic file with name M9.gif (4)

A simple test for equality of the parent unfolding time pdfs for individual domains in a tandem can be constructed as follows. First, the ordered unfolding times, collected at a fixed force, are grouped into two time sets, one for unfolding times for a shorter tandem of length, say n1 = 1–3, and the other for unfolding times for a longer tandem of length say, n2 = 9–12. As noted in the introduction, in AFM experiments, the cantilever tip randomly picks up a tandem of any length, so that this separation is implementable in practice. The corresponding pdfs, Inline graphic and Inline graphic are estimated by using Eq. 4. Next, Inline graphic and Inline graphic are compared via a Q-Q plot. If the time quantiles for Inline graphic and Inline graphic fall close (far) to (from) the reference line, then the parent pdfs for individual domains in tandems of length n1 and n2 are identically (nonidentically) distributed. The difference between the time quantiles for Inline graphic and Inline graphic if any, can be used as a signature of the distributional inequality of the parent pdfs.

Test for equality of parent pdfs

The above arguments lead us to the following computational algorithm:

  • Step 1. Collect the forced unfolding times, Inline graphic for a tandem of shorter length n1.

  • Step 2. Generate a random number U in the interval (0, 1).

  • Step 3. If U ∈ (0, 1/n1), randomly select a point from the first order statistic, Inline graphic If U ∈ (1/n1, 2/n1), randomly select a point from the second order statistic, Inline graphic and so on.

  • Step 4. Repeat Steps 2 and 3 M times to obtain a sample of size M from Inline graphic (Eq. 4).

  • Step 5. Collect the forced unfolding times, Inline graphic for a tandem of longer length n2, and repeat Steps 2–4 to obtain a sample of size M from Inline graphic

  • Step 6. Draw the Q-Q plot for the time quantiles of Inline graphic against the time quantiles of Inline graphic and estimate the distance of the time quantiles from the reference line.

If the unfolding time quantiles fall close to the reference line, i.e., they are either aligned with or are parallel and close to the reference line, then Eq. 4 is satisfied and the parent unfolding times for individual domains (Ds) in a tandem D1D2– … −Dn are identically distributed, regardless of whether they are dependent. Significant nonlinear divergence from the reference line would indicate their distributional inequality.

Application of the algorithm to the ordered unfolding times of S2–S2 and S2–S1

We tested the performance of the proposed algorithm by using ordered unfolding time data for tandems S2–S2 and S2–S1. For two-domain tandems, Eq. 4 becomes

graphic file with name M26.gif (5)

The Q-Q plots of the time quantiles for single domain S2 versus the quantiles for tandem S2–S2, sampled from the mixture of the order statistics pdfs (Eq. 5), are displayed in Fig. 6. At f = 66 pN, the unfolding time quantiles run almost parallel to the reference line, indicating an approximate distributional equality (up to the time shift Δt) of the parent unfolding times for the first S21 domain and the second S22 domain, i.e., Inline graphic The time shift at the median (50% quantile) from the reference line is ∼Δt ≈ 3 × 106 integration steps (Fig. 7 a). At f = 88 pN, the time quantiles show a shorter time shift, Δt ≈ 0.5 × 106 integration steps, still running almost parallel to the reference line, which indicates an approximate distributional equality (up to Δt) of the parent unfolding times for S2 domains in S2–S2, i.e., Inline graphic (Fig. 6 b).

FIGURE 6.

FIGURE 6

(a and b) Q-Q plots of the unfolding times for single S2 domain versus the unfolding times for tandem S2–S2, generated by “mixing” the first and second order statistics pdfs via Inline graphic (Eq. 5) for the ordered unfolding times t1:2 and t2:2, obtained at f = 66 pN (a) and f = 88 pN (b). (c and d) Q-Q plots of the unfolding times for single S1 domain versus the unfolding times for tandem S2–S1, generated by mixing the first and second order statistics pdfs for t1:2 and t2:2, obtained at f = 66 pN (c) and f = 88 pN (d).

FIGURE 7.

FIGURE 7

Probability density functions for the first order (min) statistic, t1:2 = tmin, for tandems S2–S2 (a) and S2–S1 (b), obtained at f = 88 pN. The histograms (bars) of tmin are superimposed with the theoretical pdfs, φmin(t) ≡ φ1:2(t) (Eqs. 7 and 8). The parameter values, obtained from the fit, are given in Table 4.

The observed time shift Δt is due to the tension drop in the tandem chain, which occurs after the first unfolding transition in one of the two domains at time t = t1:2. The resulting chain elongation lowers the force-induced tension and the instantaneous force to a lower value, f′ < f = 66 pN, and hence it takes time Δt to ramp it up back to the initial level (f′ → f). As a result, the time quantiles obtained for the longer tandem (S2–S2) are above the reference line, indicating prolonged unfolding for S2 domains in the tandem compared to a single S2 domain. Although in our case study we used a single S2 domain and the dimer S2–S2 to represent, respectively, the tandems of shorter and longer length, this algorithm can be used to analyze protein tandems of any length n1 and n2 > n1. The Q-Q plots of the time quantiles for single domain S1 versus the quantiles for tandem S2–S1, sampled from the mixture of the order statistics pdfs (Eq. 5), are also displayed in Fig. 6 for comparison. We observe much greater nonparallel divergence from the reference line with a larger time shift, Δt ≈ 8 × 106 integration steps (f = 66 pN) and Δt ≈ 1 × 106 integration steps (f = 88 pN), at the 50% quantile, compared with tandem S2–S2. Such strong nonlinear divergence is indicative of the fact that the forced unfolding times for domains S2 and S1 in tandem S2–S1 are differently distributed both at f = 66 pN and f = 88 pN.

The results of the proposed test for distributional equality of the parent unfolding time pdfs, applied to the ordered unfolding times, agree with the results of preliminary data analysis, and confirm that the parent unfolding times, obtained at f = 66 pN and f = 88 pN, are identically distributed for tandem S2–S2 and nonidentically distributed for tandem S2–S1. The proposed algorithm can be used in statistical analyses of unfolding data available from force-clamp AFM measurements. In addition, for homogeneous tandems, the difference between the unfolding time quantiles for tandems of short and long length, parameterized by Δt, can be used to estimate the timescale of force-induced tension propagation along the tandem chain, τf. Indeed, there are n − 1 intervals of dropped tension of duration Δt in a tandem of length n. When the pdfs for tandems of different length n1n2, Inline graphic and Inline graphic are compared via Q-Q plots, τf can be estimated as τf ≈ Δt/|n2n1|.

Testing independence of the parent unfolding times by analyzing ordered time data

In this section, we propose a permutation test for iid versus did parent unfolding times and an overlap fraction test for inid versus dnid unfolding times using the ordered unfolding times for tandems S2–S2 and S2–S1.

Permutation test for iid versus did unfolding times

Let us assume that we record n ordered unfolding times sampled from the joint distribution Ψ(t1, …, tn) and joint pdf ψ (t1, …, tn), where as before ti denotes the unfolding time of the ith domain (i = 1, …, n) in a tandem of length n. Suppose we observe the unfolding time order statistics, t1:nt2:n ≤ … ≤ tn:n, sampled from the joint distribution Ψ(t1, …, tn). We want to infer if the (unobserved) parent data, t1, t2, …, tn are uncorrelated from their order statistics, t1:nt2:n ≤ … ≤ tn:n. Suppose now that the parent unfolding time data are indeed iid; that is, Ψ(T1, …, Tn) = Ψ(T1t1)Ψ(T1t2)…Ψ(T1tn), where Ψ(t) is their common cdf, and ψ (t1, …, tn) = ψ (t1)ψ (t2)…ψ (tn), where ψ (t) = dΨ(t)/dt is their common pdf. This factorization implies that if the parent data were iid, then the order statistics, t1:nt2:n ≤ … ≤ tn:n, could have had resulted from any permutation of the original data with equal probability. For example, the parent sample t1, t2, …, tn could have resulted in t1:nt2:n ≤ … ≤ tn:n with equal probability as the sample t1, t3, …, tn or the sample tn, t3, …, t1, and so on. The order in which the n-tuple (t1, …, tn) is arranged is irrelevant because all n! permutations of the n parent data points are equally likely to be observed, since they are independent realizations of the same distribution. Let us generalize the above arguments to M measurements. Suppose M ordered n-tuples, Inline graphic are observed, i = 1, …, M. If the parent unfolding time data were iid, the unfolding time order statistics obtained in the ith experiment, Inline graphic could have had resulted from any permutation of the parent data with equal probability. For each i = 1, …, M, all n! permutations of the n data points are equally likely to be the parent sample of the observed order statistics. This leads to the following algorithm for testing pairwise independence:

  • Step 1. For each experiment i = 1, …, M, randomly permute the n-tuples of the recorded unfolding time order statistics and let Inline graphic be the b-th permuted order statistics, where b is a permutation number. Store the result in matrix Inline graphic of dimension M × n, where Inline graphic

  • Step 2. Repeat Step 1 B times, i.e., b = 1, …, B to obtain matrices T1, …, TB.

  • Step 3. For b = 1, …, B, carry out Inline graphic pairwise tests for independence of all pairs of the n columns of Tb at a fixed significance level. Compute and store the fraction of rejections of the null hypothesis of independence.

In Step 3, both Spearman's rank correlation and Hoeffding's D statistic should be used so that most types of dependence are checked for (2325). Both measures are based on test statistics with known asymptotic distributions, which allow the computation of the p-values for testing independence. If the parent unfolding time data are independent, the test for independence in Step 3 will not be significant. An illustration of the algorithm is given in Appendix B.

Application of the algorithm to the ordered unfolding times of S2–S2

Table 3 summarizes the results of the application of the permutation algorithm to the ordered unfolding times for tandem S2–S2. The entries are the fractions of p-values >0.05 over 500 replicates (B = 500). We used a 5% cutoff, i.e., we assumed that if the obtained p-value ≤ 0.05, then there exists statistically significant dependence among the parent unfolding times for domains S2. At f = 66 pN, Hoeffding's test rejected independence only 100 − 99.6 = 0.4% of the time, thus providing strong support for the independence of unfolding times for the first S21 domain (t1) and second S22 domain (t2) in tandem S2–S2. The Spearman rank correlation coefficient also detected independence 100% of the time (Table 3). At f = 88 pN, the fraction of the p-values exceeding 0.05 for the Hoeffding's test is 0. That is, all 500 p-values for testing independence were highly significant, i.e., below the 5% cutoff, providing strong evidence for lack of independence between the parent unfolding times for the first S21 domains (t1) and the second S22 domain (t1) in tandem S2–S2. Thus, the permutation test for independence, applied to iid and did unfolding times for tandem S2–S2, recovers the results of the preliminary data analysis.

TABLE 3.

Results of the permutation test for independence of the parent unfolding times for domains S21 (t1) and S22 (t2) in tandem S2–S2

f = 66 pN
f = 88 pN
Test % of p-values >0.05 Test % of p-values >0.05
Hoeffding's D 0.996 Hoeffding's D 0
Spearman correlation 1 Spearman correlation 0

An empirical test for inid versus dnid unfolding times

An empirical approach for deducing independence of the parent inid and dnid unfolding times can be based on the overlap fraction F(r, r + 1;n), r = 1, …, n − 1, defined as the fraction of values shared by the r-th order statistic, tr:n, and the (r + 1)-st order statistic, tr+1:n, in an heterogeneous tandem (D1D2)n/2 of length n. That is,

graphic file with name M38.gif (6)

If F(r, r + 1;n) is smaller than a threshold value F*, then the unfolding times for, say, domain D1, differ from the unfolding times of domain D2 in a consistent fashion. Since domains D1 and D2 have different (parent) pdfs, i.e., Inline graphic this would mean that unfolding of D1 domains does not affect unfolding of D2 domains, and that these domains unravel independently. For example, the forced unfolding of domain S1 occurs on a faster timescale compared to the unfolding of the S2 domain (Fig. 5). Hence, the first unfolding transitions (t1:2) occur more frequently for domain S1 as compared to the S2 domain, and the consecutive unfolding transitions t1:2 and t2:2 are separated in time (uncorrelated). On the other hand, large values of F(r, r + 1;n), i.e., F(r, r + 1;n) > F* would indicate mixing among the unfolding times for domains D1 and D2 and signify their dependence.

Application of the overlap fraction test to the ordered unfolding times of S2–S1

We applied the overlap fraction test to assess independence of the parent unfolding times for S21 domain (t1) and S12 domain (t2) in tandem S2–S1. We set the threshold value for the overlap fraction to F* = 50%. For a heterogeneous tandem of length n = 2, the heuristic argument that led to this choice follows along these lines: if there were perfect mixing, that is the first order statistic originated with equal probability from both domains, then the ordered pair Inline graphic would be observed 50% of the time, and the ordered pair Inline graphic would be observed 50% of the time as well, where Inline graphic denotes the unfolding time of domain Di, i = 1, 2. This would lead to no separation between the values of the two order statistics (they would fall in the same range) and the overlap fraction would be close to one. Lack of mixing would mean that, say, the pair Inline graphic would be observed nearly always and the complement pair Inline graphic would be observed almost never, so that the overlap fraction would be close to zero. Of course, because of sampling variability, the overlap fraction would never be exactly equal to zero or one but rather close to either value. The closeness would depend on the magnitude of correlations and the size of the sample. The cutoff of 50% is simply the midpoint of the unit interval. In principle, one can estimate the cutoff much more accurately using resampling methods; here we simply use this subjective cutoff. For this choice, values of F(1, 2;2) < 50% would imply that one of the two domains S2 and S1 unfolds on a faster timescale, compared to the other domain. In the opposite case, i.e., when F(1, 2;n) > 50%, we would conclude that S2 and S1 domains unfold on a similar timescale and that the unfolding times are correlated. We found that at f = 66 pN, F(1, 2;n) = 24% < 50%, and at f = 88 pN, F(1, 2;n) = 61% > 50%. Hence, we recover the results of the preliminary analysis for tandem S2–S1, namely that the parent unfolding times for S2 and S1 domains in the tandem are independent at f = 66 pN, but dependent at f = 88 pN.

For tandems of length larger than two, if the tandem is fully heterogeneous, i.e., all its domains are distinct, perfect mixing is equivalent to all permutations of the n-tuple (t1:n, …, tn:n) being equally likely, and the overlap fraction of any two order statistics would be close to one. In particular, the overlap fraction of any two consecutive order statistics, F(r, r + 1;n), would also be close to one. In the other extreme of no mixing, the overlap fraction would be close to zero. Thus, even when the tandem consists of more than two domains, the midpoint cutoff of 50% can also be used. To conclude independence, all overlap fractions F(r, r + 1;n), r = 1, …, n − 1, must be smaller than the cutoff. We plan to examine the more general case of tandems composed of a mix of the same and distinct domains in a separate study.

Order statistics-based analysis of did and dnid unfolding times

The application of the test for distributional equality to ordered unfolding times obtained at f = 88 pN revealed a pronounced time shift Δt ≈ 0.5 × 106 integration steps for tandem S2–S2 and Δt ≈ 1 × 106 integration steps for tandem S2–S1 (Fig. 6). As we argued before, the origin of Δt is a tension drop in the tandem chain, which accompanies each unfolding transition. As a result, every next unfolding transition (t2:n, t3:n, …, tn:n) after the first transition (t1:n) in a tandem of length n is delayed by Δt. This builds up correlations (dependence). However, the dependence structure, defined by the time shift Δt, is trivial and affects only the second (t2:n), third (t3:n), etc., unfolding transition, but does not affect the first transition (t1:n). Therefore, for correlated unfolding events characterized by did and dnid unfolding times with such trivial dependence, the first order statistic t1:n can be described by using the order statistics for iid and inid unfolding times (Study 1) (19).

To illustrate our approach, here we use previously generated ordered time variates, i.e., the first unfolding times, {tmin} = {t1:2}, and second unfolding times, {tmax} = {t2:2}, for tandems S2–S2 and S2–S1 of length n = 2, to analyze did and dnid unfolding times for these tandems. Clearly, this approach can be generalized to a homogeneous ((D)n) and heterogeneous tandem ((D1D2)n/2) of any length n. The first order statistics pdfs, φ1:2(t), for tandems S2–S2 and S2–S1 are given by

graphic file with name M45.gif (7)

and

graphic file with name M46.gif (8)

respectively, where ΨS2(t) (ψS2(t)) and ΨS1(t) (ψS1(t)) represent the cdfs (pdfs) for domains S2 and S1 (19). To model φ1:2(t), we used the Gamma density (Eq. 1) with shape parameter α and unfolding rate k, which determine the most probable unfolding time, t* = (α − 1)/k, and the unfolding timescale τ = Γ(α + 1)/(Γ(α)k) for protein domains (see below). We used Eqs. 7 and 8 to fit the theoretical pdf for the first (min) order statistics, φmin(t) = φ1:2(t), to the histograms of the first unfolding time, tmin = t1:2, for tandems S2–S2 and S2–S1, obtained at f = 88 pN. The results of the fit are displayed in Fig. 7, and the obtained values of the model parameters are summarized in Table 4. In general, these agree with the maximum likelihood estimations of the same quantities for single domains S2 and S1 (Table 1). However, the values of α are slightly longer and the values of k are somewhat shorter for tandems S2–S2 and S2–S1, compared to the same quantities for single S2 and S1 domains. The same effect was observed in our previous study of forced unfolding in trimers S2–S2–S2 and S2–S1–S2 (Study 1) (19).

TABLE 4.

Numerical values of the shape parameter, α, and unfolding rate, k, (in units of integration steps) for domains S21 and S22 in tandem S2–S2, and domains S21 and S12 in tandem S2–S1

Parameters
Inline graphic Inline graphic Inline graphic Inline graphic
S2–S2 2.5 2.6 2.8 × 10−6 2.9 × 10−6
Parameters
Inline graphic Inline graphic Inline graphic Inline graphic
S2–S1 2.6 9.4 2.9 × 10−6 3.3 × 10−5

Values are obtained from the fit of the first order (min) statistics pdf, φ1:2(t) = φmin(t) to the histograms of the ordered unfolding times, t1:2, obtained at f = 88 pN (Fig. 7).

The increased (decreased) values of α (k), inferred from the order statistics pdf φ1:2(t) for domains S2 and S1 in tandems S2–S2 and S2–S1 are due to the presence of a short linker, which tends to prolong the forced unfolding times of protein domains in tandems. We estimated the effect of linkers on the unfolding timescale for domains S2 in tandem S2–S2 by taking the difference between the average unfolding times Inline graphic for domain S2 in tandem S2–S2 and the average unfolding time τS2 for single S2 domain, i.e.,

graphic file with name M56.gif (9)

where the values of kS2 and Inline graphic and Inline graphic) were taken from Table 1 (Table 4). Applying Eq. 9 yields ΔτS2 ≈ 8.3 ns. Although for the models of protein dimers connected by a short linker of five Gly residues this time is negligible compared to the average unfolding time of S2 domain in the dimer Inline graphic μs and for a single S2 domain τS2 ≈ 0.08 μs, the effect of linkers may become more pronounced in long protein tandems, especially at a low force and/or for longer linkers. In force-clamp AFM experiments on a protein tandem of length n, the influence of linkers on the unfolding kinetics can be estimated by comparing the average first unfolding time (first order statistics) for a linker of a shorter length l1, τ1:n(l1), and a longer length l2 > l1, τ1:n(l2). The ratio (τ1:n(l2) − τ1:n(l1))/(l2l1) can then be used as an estimate for the unfolding time delay per unit length of the linker.

Let us now calculate the error in the estimates of the shape parameter, α, and unfolding rate, k, we would make if we were using the iid assumption in the analysis of did unfolding times for tandem S2–S2 obtained at f = 88 pN. When the unfolding times are iid, the parent unfolding time pdf, ψ(t), is obtained by pulling all unfolding times into a single histogram, i.e., Inline graphic (Eq. 6 in Study 1) (19). For n = 2, ψ(t) = φ1:2(t)/2 + φ2:2(t)/2. By fitting the Gamma density (Eq. 1) to the histogram of combined first and second unfolding times (t1:2 and t2:2), we obtain αS2 = 2.4 and kS2 = 2.2 × 106. The relative difference in the shape parameter αS2 and the unfolding rate kS2 between the estimates, obtained by using order statistics (αS2 = 2.55, kS2 = 2.85 × 10−6 (Table 4)) and by using the iid assumption, is small, ∼6% for αS2, but fairly large, ≈23%, for kS2. This comparison indicates that employing the iid assumption when the data are not iid may result in substantial estimation error of the forced unfolding rate.

DISCUSSION AND CONCLUSION

In our previous work (Study 1) (19), we proposed what to our knowledge is a new theory for describing the forced unfolding transitions in wild-type protein tandems and engineered polyproteins, available from force-clamp AFM experiments. The theory is inspired by the experimental AFM setup, in which only the ordered, i.e., first, second, etc., unfolding times in a tandem D1D2– … −Dn of length n are recorded. Given the stochastic nature of unfolding, it is not possible to tell which domain Di (i = 1, 2, …, n) has unfolded at any given time, t1:n, t2:n, …, tn:n. Order statistics overcomes this difficulty by analyzing ordered variates, and because the distributions of ordered unfolding times, φ1:n, φ2:n, …, φn:n, depend on the parent distributions for protein domains, Inline graphic the order statistics-based theory can be used to infer the parent pdfs (ψ) from the order statistics pdfs (φ).

We showed in Study 1 (19) that the iid assumption, that the (parent) unfolding times are independent (uncorrelated) and identically distributed (iid), may or may not hold depending on the tandem composition, the presence of interdomain interactions, and the magnitude of applied force. For example, in the heterogeneous tandems (D1D2)n the unfolding times of nonidentical domains D1 and D2 are expected to be nonidentically distributed. Also, the domain stabilization effect, observed in the heterogeneous tandems of I27–I28 repeats of titin, in tandems of FnIII domains (20,21), and in the homogeneous tandems of fibrinogen, makes the forced unfolding transitions strongly correlated. We showed that in tandems with no interdomain interactions, such as the model trimers S2–S2–S2 and S2–S1–S2 (Study 1, (19)) and dimers S2–S2 and S2–S1, analyzed here, the dynamic competition between tension propagation along the tandem chain and forced unfolding may couple the consecutive unfolding transitions at an elevated force level (f = 88 pN). As we argued in Study 1, in force-clamp AFM experiments on protein tandems, the forced unfolding transitions can be characterized by four different types of unfolding times, namely iid, inid, did, or dnid unfolding times (Table 5 in Study 1) (19). Only when the parent unfolding times are iid, which is not known a priori, can conventional unfolding data analyses, in which the unfolding times are pooled together into a single histogram, be used. However, when the parent unfolding times are correlated and/or nonidentically distributed, i.e., when the unfolding data are did, inid, or dnid, this approach is inappropriate. To illustrate the latter, we showed that the use of iid assumption in analyzing dependent unfolding times results in large estimation errors for the forced unfolding rate.

To take advantage of the proposed formalism, the unfolding times must be first classified as iid or inid or did or dnid unfolding times. In this study, we developed statistical tests for assessing the independence of parent unfolding times and their distributional equality. These tests allow one to gain information on the unobserved (parent) unfolding times for individual domains by analyzing the observed ordered unfolding times. The tests can be used in statistical analysis of unfolding data available from force-clamp AFM measurements to assess the validity of the iid assumption and to classify the forced unfolding transitions. We assessed the performance of these tests against the results of computer simulations of forced unfolding for the model dimers S2–S2 and S2–S1. We recovered the results of preliminary analysis, namely that the parent unfolding times for the homogeneous dimer S2–S2 are iid at f = 66 pN and did at f = 88 pN, whereas the parent unfolding times for the heterogeneous dimer S2–S1 are inid at f = 66 pN and dnid at f = 88 pN, which validates the order statistics-based theory. Although in our studies we employed the dimers (n2 = 2) and single domains (n1 = 1) to represent protein tandems of longer and shorter length, the tests can be used to assess the validity of the iid assumption and to classify the forced unfolding transitions for tandems of arbitrary lengths n1 and n2 > n1. The monomers and dimers serve as prototypes for tandems of short and long lengths as observed in force-clamp AFM probes on a protein tandem, (D)N, where unfolding data are available for tandems of different length, 1 < n < N. For the convenience of the reader, in Fig. 8 we outline the main steps for testing the distributional equality of the parent unfolding times and their mutual independence. We also give reference to the relevant Eqs. 3 and 10 presented in Study 1 (19), and Eqs. 7 and 8 in this study, which can be used to model the parent unfolding time distributions.

FIGURE 8.

FIGURE 8

Flowchart for characterization (Steps 1 and 2) and modeling (Step 3) of the forced unfolding times for a protein tandem.

In tandems formed by the noninteracting domains, such as domains S2 and S1 in dimers S2–S2 and S2–S1, the dependence among the consecutive unfolding transitions can be induced by the dynamic competition between the force-induced tension propagation along the tandem chain and the forced unfolding kinetics. It is likely that the dynamic coupling between tension propagation and unfolding kinetics occurs in wild-type tandems and engineered polyproteins as well. As we showed in this study, in such a case the dependence structure between the consecutive unfolding transitions is rather trivial, namely that every next unfolding transition after the first one in a tandem of length n, i.e., the second (t2:n), third (t3:n), etc., are delayed by constant time Δt of dropped tension. The test for distributional equality can be used to estimate the timescale for tension propagation, τf. This can be done, e.g., by comparing the parent unfolding time pdfs, Inline graphic and Inline graphic generated by using recurrence relation (3) for tandems of different length n1 and n2 > n1 via a Q-Q plot. Specifically, τf can be estimated from the time shift, Δt, as τf ≈ Δt/(n2n1). For the tandem S2–S2, we found that τf ≈ 0.5 μs for f = 66 pN and τf ≈ 0.07 μs for f = 88 pN. Hence, a moderate 33% change in applied force shifts τf by an order of magnitude.

We showed that in protein tandems with no interdomain interaction, yet characterized by the correlated unfolding transitions with the constant time shift, the first unfolding events (t1:n) are unaffected by the tension drop. Because of this, the pdf of the first order statistic φ1:n(t), can be still described by the order statistics for independent random variables (iid and inid, Study 1 (19)). To illustrate this point, we modeled φ1:2(t), for tandems S2–S2 and S2–S1 by using Eqs. 3 and 10 of Study 1. The shape parameter, α, and unfolding rate, k, obtained from the fit of φ1:2(t) to the histograms of the first unfolding times (t1:2) for tandems S2–S2 and S2–S1 (Table 4) agree with the same quantities obtained for single domains S2 and S1 (Table 1), thus validating our theory. We also showed that due to the presence of flexible linkers, the unfolding times for domains S2 and S1 in the model tandems S2–S2 and S2–S1 are slightly longer, as compared to the unfolding times for single S2 and S1 domains. This result corroborates our previous findings for longer tandems S2–S2–S2 and S2–S1–S2 (Study 1) (19). In wild-type protein tandems, the tension drop in the tandem chain and the presence of flexible linkers could slow down the protein unfolding kinetics, especially for large proteins and/or long linkers at a low stretching force. Here, we showed how the order statistics-based approach can be used to access the dynamics of tension propagation in the tandem chain and to estimate the effect of linkers.

The advantage of the order statistics-based approach is that it can be used to describe correlated as well as uncorrelated unfolding transitions in both homogeneous tandems (D)n of identical repeats (Ds) and heterogeneous tandems D1D2– … −Dn formed by nonidentical domains (D1, D2, …, Dn). Hence, the proposed formalism offers a unified framework for analyzing the forced unfolding transitions in protein tandems and polyproteins probed in force-clamp AFM experiments. Recent AFM probes on tandems of immunoglobulin I27–I28 repeats (20), heterogeneous tandem of FnIII domains (21), and homogeneous tandems of fibrinogen domains (22) show enhanced domain stabilization possibly due to intra- and/or interdomain interactions. In these tandems, the unfolding transitions are strongly correlated and the dependence structure is most likely nonmonotonic. Development of the order statistics-based theory for analyzing intra- and interdomain interactions in protein tandems is under way.

Acknowledgments

This work was supported by National Science Foundation grant DMS-0204563 (E.B.), and a start-up fund from the University of Massachusetts, Lowell (V.B.).

APPENDIX A: HOEFFDING'S D STATISTIC

Hoeffding's D statistic is a measure of the distance between the joint cdf of the two variables, Ψ(t1, t2), and the product of their marginal cdfs, Ψ1(t12(t2). When t1 and t2 are independent, Ψ(t1, t2) = Ψ1(t12(t2). In practice, the test is implemented as follows. Let (x1, y1), …, (xn, yn) be a random sample from the joint pdf f (x, y), n ≥ 5. To test the hypothesis that X is independent of Y, let ri denote the rank of xi in the sample x1, …, xn, let si be the rank of yi in the sample y1, …, n, and let ci denote the number of sample pairs (xa, ya) for which both xa < xi and ya < yi. That is,

graphic file with name M64.gif

where ν(a, b) = 1 if a < b; 0 otherwise. Hoeffding's D statistic is defined by

graphic file with name M65.gif (A1)

where Inline graphic and Inline graphic If Dd(α, n), X and Y are found to be statistically significantly dependent at level β. The critical values of d(α, n) can be obtained from Table A.25 in Hollander and Wolfe (25). In the free access statistical software R (36), the package Hmisc computes the test statistic and associated p-values for testing that two variables are independent.

APPENDIX B: ILLUSTRATION OF THE PERMUTATION TEST

Suppose we collect unfolding time data from a protein tandem of size n = 3 and repeat the experiment two times (M = 2). Suppose that the observed values of the first sample are Inline graphic μs, Inline graphic μs, and Inline graphic μs. If the unfolding times were iid, the observed ordered times could have had originated from any of the following six observations with equal probability 1/6: (t1 = 5 μs, t2 = 7 μs, t3 = 15 μs), or (t1 = 7 μs, t2 = 5 μs, t3 = 15 μs), or (t1 = 7 μs, t2 = 15 μs, t3 = 5 μs), or (t1 = 15 μs, t2 = 7 μs, t3 = 5 μs), or (t1 = 15 μs, t2 = 5 μs, t3 = 7 μs), or (t1 = 5 μs, t2 = 15 μs, t3 = 7 μs). Suppose now that the observed ordered unfolding times of the second sample are Inline graphic μs, Inline graphic μs and Inline graphic μs. Similarly, they could have had originated from any of the following six observations with equal probability 1/6: (t1 = 2 μs, t2 = 10 μs, t3 = 11 μs), or (t1 = 10 μs, t2 = 2 μs, t3 = 11 μs), or (t1 = 10 μs, t2 = 11 μs, t3 = 2 μs), or (t1 = 11 μs, t2 = 10 μs, t3 = 2 μs), or (t1 = 11 μs, t2 = 2 μs, t3 = 10 μs), or (t1 = 2 μs, t2 = 11 μs, t3 = 10 μs). The permutation algorithm, applied to this example, would involve the following steps:

  • Step 1. Suppose the first permutation (b = 1) of the first and second samples resulted in the following observations, (Inline graphic μs, Inline graphic μs, Inline graphic μs), and (Inline graphic μs, Inline graphic μs, Inline graphic μs), where b is the permutation number. Store the result in matrix Tb = T1 of order M × n = 2 × 3 = 6,
    graphic file with name M80.gif
  • Step 2. Repeat Step 1 B times, i.e., b = 1, …, B, to obtain matrices T1, …, TB.

  • Step 3. For b = 1, …, B, carry out Inline graphic pairwise tests for independence of all pairs of the three columns of matrix Tb at a fixed level β. For b = 1, compute Hoeffding's D statistic and Spearman's rank correlation for the unfolding time pairs (5 μs, 15 μs) and (15 μs, 2 μs), (5 μs, 15 μs) and (5 μs, 11 μs), and (15 μs, 2μs) and (5 μs, 11 μs), and record the p-values of the three tests for independence.

Editor: Feng Gai.

References

  • 1.Pickart, C. M. 2001. Mechanisms underlying ubiquitination. Annu. Rev. Biochem. 70:503–533. [DOI] [PubMed] [Google Scholar]
  • 2.Weissman, A. M. 2001. Themes and variations on ubiquitylation. Nat. Rev. Mol. Cell Biol. 2:169–178. [DOI] [PubMed] [Google Scholar]
  • 3.Labeit, S., M. Gautel, A. Lakey, and J. Trinick. 1992. Towards a molecular understanding of titin. EMBO J. 11:1711–1716. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Trinick, J., P. Knight, and A. Whiting. 1984. Purification and properties of native titin. J. Mol. Biol. 180:331–356. [DOI] [PubMed] [Google Scholar]
  • 5.Schwarzbauer, J. E., and J. L. Sechler. 1999. Fibronectin fibrillogenesis: a paradigm for extracellular matrix assembly. Curr. Opin. Cell Biol. 11:622–627. [DOI] [PubMed] [Google Scholar]
  • 6.Stossel, T. P., J. Condeelis, L. Cooley, J. H. Hartwig, A. Noegel, M. Schleicher, and S. S. Shapiro. 2001. Filamins as integrators of cell mechanics and signalling. Nat. Rev. Mol. Cell Biol. 2:138–145. [DOI] [PubMed] [Google Scholar]
  • 7.Feng, Y., and C. A. Walsh. 2004. The many faces of filamin: A versatile molecular scaffold for cell motility and signalling. Nat. Cell Biol. 6:1034–1038. [DOI] [PubMed] [Google Scholar]
  • 8.Popowicz, G. M., R. Muller, A. A. Noegel, M. Schleicher, R. Huber, and T. A. Holak. 2004. Molecular structure of the rod domain of Dictyostelium filamin. J. Mol. Biol. 342:1637–1646. [DOI] [PubMed] [Google Scholar]
  • 9.Carrion-Vazquez, M., A. F. Oberhauser, S. B. Fowler, P. E. Marszalek, S. E. Proedel, J. Clarke, and J. M. Fernandez. 1999. Mechanical and chemical unfolding of a single protein: a comparison. Proc. Natl. Acad. Sci. USA. 96:3694–3699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Rief, M., M. Gautel, F. Oesterhelt, J. M. Fernandez, and H. E. Gaub. 1997. Reversible unfolding of individual titin immunoglobulin domains by AFM. Science. 276:1109–1112. [DOI] [PubMed] [Google Scholar]
  • 11.Zinober, R. C., D. J. Brockwell, G. S. Beddard, A. W. Blake, P. D. Olmsted, S. E. Radford, and D. A. Smith. 2002. Mechanically unfolding proteins: The effect of unfolding history and the supramolecular scaffold. Protein Sci. 11:2759–2765. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Rounsevell, R. W. S., A. Steward, and J. Clarke. 2005. Biophysical investigations of engineered polyproteins: Implications for force data. Biophys. J. 88:2022–2029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Schlierf, M., H. Li, and J. M. Fernandez. 2004. The unfolding kinetics of ubiquitin captured with single-molecule force-clamp techniques. Proc. Natl. Acad. Sci. USA. 101:7299–7304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Fernandez, J. M., and H. Li. 2004. Force-clamp spectroscopy monitors the folding trajectory of a single protein. Science. 303:1674–1678. [DOI] [PubMed] [Google Scholar]
  • 15.Brujic, J., R. I. Z. Hermans, K. A. Walther, and J. M. Fernandez. 2006. Single-molecule force spectroscopy reveals signatures of glassy dynamics in the energy landscape of ubiquitin. Nature Physics. 2:282–286. [Google Scholar]
  • 16.Oberhauser, A. F., P. K. Hansma, M. Carrion-Vazquez, and J. M. Fernandez. 2001. Stepwise unfolding of titin under force-clamp atomic force microscopy. Proc. Natl. Acad. Sci. USA. 98:468–472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Zhang, B., G. Xu, and J. S. Evans. 1999. A kinetic molecular model of the reversible unfolding and refolding of titin under force extension. Biophys. J. 77:1306–1315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hummer, G., and A. Szabo. Kinetics from nonequilibrium single-molecule pulling experiments. 2003. Biophys. J. 85:5–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Bura, E., D. K. Klimov, and V. Barsegov. 2007. Analyzing forced unfolding of protein tandems by ordered variates: 1. Independent unfolding times. Biophys. J. 93:1100–1115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Li, H., A. F. Oberhauser, S. B. Fowler, J. Clarke, and J. M. Fernandez. 2000. Atomic force microscopy reveals the mechanical design of a modular protein. Proc. Natl. Acad. Sci. USA. 97:6527–6531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Oberhauser, A. F., C. Badilla-Fernandez, M. Carrion-Vazquez, and J. M. Fernandez. 2002. The mechanical hierarchies of fibronectin observed with single-molecule AFM. J. Mol. Biol. 319:433–447. [DOI] [PubMed] [Google Scholar]
  • 22.Brown, A. E. X., R. I. Litvinov, D. E. Discher, and J. W. Weisel. 2007. Forced unfolding of coiled-coils in fibrinogen by single-molecule AFM. Biophys. J. 92:L39–L41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Gibbons, J. D., and S. Chakraborti. 2003. Nonparametric Statistical Inference, 4th ed. Marcel Dekker, New York.
  • 24.Hoeffding, W. 1948. A non-parametric test of independence. Ann. Math. Stat. 19:546–557. [Google Scholar]
  • 25.Hollander, M., and D. A. Wolfe. 1973. Nonparametric Statistical Methods. John Wiley & Sons, New York.
  • 26.Klimov, D. K., and D. Thirumalai. 2000. Native topology determines force-induced unfolding pathways in globular proteins. Proc. Natl. Acad. Sci. USA. 97:7254–7259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Raman, E. P., V. Barsegov, and D. K. Klimov. 2007. Folding of tandem-linked domains. Proteins. 67:795–810. [DOI] [PubMed] [Google Scholar]
  • 28.Onuchic, J. N., and P. G. Wolynes. 2004. Theory of protein folding. Curr. Opin. Struct. Biol. 14:70–75. [DOI] [PubMed] [Google Scholar]
  • 29.Dill, K. A., S. Bromberg, K. Yue, K. M. Fiebig, D. P. Yee, P. D. Thomas, and H. S. Chan. 1995. Principles of protein folding - A perspective from simple exact models. Protein Sci. 4:561–602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Veitshans, T., D. K. Klimov, and D. Thirumalai. 1997. Protein folding kinetics: Time scales, pathways, and energy landscapes in terms of sequence dependent properties. Fold. Des. 2:1–22. [DOI] [PubMed] [Google Scholar]
  • 31.Silverman, B. W. 1986. Density Estimation. Chapman and Hall, London.
  • 32.Scott, D. W. 1992. Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, New York.
  • 33.Kendall, M. G., and J. D. Gibbons. 1990. Rank Correlation Methods, 5th ed. Edward Arnold, London.
  • 34.Blum, J. R., J. Kiefer, and M. Rosenblatt. 1961. Distribution free tests of independence based on the sample distribution function. Ann. Math. Stat. 35:138–149. [Google Scholar]
  • 35.David, H. A., and H. N. Nagaraja. 2003. Order Statistics. Wiley Interscience, New York.
  • 36.R Development Core Team. 2006. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3–900051–07–0, URL http://www.R-project.org.

Articles from Biophysical Journal are provided here courtesy of The Biophysical Society

RESOURCES