Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Mar 1.
Published in final edited form as: Biopolymers. 2023 May 31;115(2):e23553. doi: 10.1002/bip.23553

Analyzing paramagnetic NMR data on target DNA search by proteins using a discrete-state kinetic model for translocation

Binhan Yu 1, Junji Iwahara 1,*
PMCID: PMC10687310  NIHMSID: NIHMS1906010  PMID: 37254885

Abstract

Before reaching their targets, sequence-specific DNA-binding proteins nonspecifically bind to DNA through electrostatic interactions and stochastically change their locations on DNA. Investigations into the dynamics of DNA-scanning by proteins are nontrivial due to the simultaneous presence of multiple translocation mechanisms and many sites for the protein to nonspecifically bind to DNA. Nuclear magnetic resonance (NMR) spectroscopy can provide information about the target DNA search processes at an atomic level. Paramagnetic relaxation enhancement (PRE) is particularly useful to study how the proteins scan DNA in the search process. Previously, relatively simple two-state or three-state exchange models were used to explain PRE data reflecting the target search process. In this work, using more realistic discrete-state stochastic kinetics models embedded into an NMR master equation, we analyzed the PRE data for the HoxD9 homeodomain interacting with DNA. The kinetic models that incorporate sliding, dissociation, association, and intersegment transfer can reproduce the PRE profiles observed at some different ionic strengths. The analysis confirms the previous interpretation of the PRE data and shows that the protein’s probability distribution among nonspecific sites is nonuniform during the target DNA search process.

Keywords: dynamics, electrostatic interaction, kinetics, paramagnetic NMR, protein-DNA interaction

Introduction

Specific bindings to particular sequences or structural signatures of DNA are crucial for transcription factors and DNA-repair enzymes.1,2 To regulate genes, transcription factors must locate specific sequences among numerous nonspecific sequences on genomic DNA.3,4 To fix damaged sites, DNA-repair enzymes must locate lesions in the overwhelming presence of undamaged sites on DNA.5,6 Efficient search for the targets is required for the biological functions of these DNA-binding proteins.1 Over the five decades since the discovery of the amazingly rapid binding of the lac repressor to its operator DNA,7 the mechanisms for rapid search by DNA-binding proteins have been studied via various approaches.3,4,819

During the target search processes, the proteins stochastically scan DNA and change their locations on DNA through some different mechanisms such as sliding, hopping, and intersegment transfer.2023 Initially, these translocation mechanisms were merely hypothetical concepts in the 20th century.20 However, they were experimentally confirmed through direct observations in vitro and in living cells by single-molecule techniques in the 21st century.4,1319 Nevertheless, how the proteins scan DNA is not well understood at an atomic level.

Unlike single-molecule techniques, nuclear magnetic resonance (NMR) spectroscopy can in principle provide atomic-level insight into target DNA search processes.24,25 Paramagnetic relaxation enhancement (PRE) is particularly useful for investigating the search processes.2628 PRE arises from dipole-dipole interactions between the observed nucleus and unpaired electrons.29 In exchange systems involving multiple states, PRE rates are sensitive to the states in which the distances between the unpaired electrons and the observed nuclei are short.26 Even a state with an equilibrium population being as small as 1% can govern the observed PRE rates. Owing to these features, PRE allows for not only the detection of low-population states, but also the analysis of their structures.30 To investigate target DNA search processes, intermolecular PRE can be measured using 15N/13C-labeled proteins together with DNA to which an EDTA-Mn2+ group is covalently tethered in a site-specific manner.2628,31,32 The PRE data can indicate which parts of the protein transiently become proximal to the paramagnetic group in the target search process. Such information can help understand how the protein scans DNA before reaching its target.

Previously, PRE data on the target DNA search process were interpreted in a semi-quantitatively manner using two or three states.26,30 Although such interpretations are convenient and provide useful insights, the actual search process involves a considerably larger number of states because proteins can nonspecifically bind to various sites on DNA. Many NMR studies suggest that the manner how proteins nonspecifically interact with DNA in the search process is fundamentally similar to the target-bound state in terms of electrostatic interactions with DNA backbone.28,3344 As multiple translocation mechanisms (including sliding, dissociation/association, and intersegment transfer) coexist and affect apparent PRE rates differently, more realistic kinetic models may provide us with greater insights into the target DNA search.

From this perspective, a discrete-state kinetic model involving a realistic number of states and translocation mechanisms is used in our current study to interpret the previously published NMR PRE data on the target DNA search process of the HoxD9 homeodomain.26,28 Similar kinetic models were previously used in some theoretical studies on target DNA search12,4548 as well as in fluorescence and NMR line-shape studies4951. As shown below, our model is well suited to be incorporated into the McConnell equations52,53 to consider the apparent PRE rates in exchange systems. Our approach confirms the previous conclusions drawn from the PRE data and provides additional information regarding how proteins change locations on DNA during the target search process.

Theory and Computation

Previous experimental data and interpretation

Before describing our analysis, we must summarize the experimental data of Iwahara and Clore.26 They measured 1H transverse PRE rates Γ2 for the HoxD9 homeodomain interacting with a 24-base pair (bp) DNA duplex containing a HoxD9 recognition sequence and an EDTA-Mn2+ group conjugated to a thymine base. The PRE profiles of HoxD9 bound to the 24-bp cognate DNA duplex in 20 mM NaCl were consistent with those predicted from static structure of the specific complex with the target site. However, for 100 and 160 mM NaCl, the PRE data of many residues were clearly inconsistent with the Mn2+1HN distance rMn in the specific complex structure. For instance, for the dT-EDTA-Mn2+ group shown in Figure 1a, the 1HN nuclei of the HoxD9 homeodomain residues 26-29 and 42-43 exhibited Γ2 > 20 s−1, even though they are located on the opposite side of the protein with respect to Mn2+ and rMn > 35 Å in the complex with the target. Although the PRE data were inconsistent with the structure, the residual dipolar coupling data for the HoxD9-DNA complex were consistent with the structure of the specific complex in both 20 mM and 160 mM NaCl.30

Figure 1.

Figure 1.

(a) PRE Γ2 data reflecting the target DNA search process of the HoxD9 homeodomain.26 Although the data with 20 mM NaCl were consistent with the structural model for the specific complex with the target site, the data with 100 and 160 mM NaCl were not. When the exchange between the ground state and the target search state is fast due to a high ionic strength, the apparent PRE rates reflect the target DNA search process. Asterisks in the graph indicate that apparent Γ2 rates were too large to determine. (b) Apparent PRE rate Γ2 as a function of the exchange rate kex for a two-state exchange model. The apparent Γ2 reflects the minor state when kex is sufficiently large. Only when kex >> |Γ2B – Γ2A|, the apparent Γ2 rate becomes equal to a population-weighted average (pAΓ2A + pBΓ2B).30 Panel a was adapted from Ref. 26 with permission from Springer Nature.

The PRE data were interpreted as the evidence of the target DNA search process in which the protein can transiently bind to various nonspecific sites, including those near the EDTA-Mn2+ group.26 Due to the pseudo-C2 symmetry of the DNA double helix, the protein can adopt two opposite orientations at each site. When the exchange between the ground state (i.e., the target-bound state) and the intermediates is fast due to a high ionic strength, the apparent PRE rates reflect the low-population intermediates for which the Mn2+1HN distance is short. Figure 1b shows the apparent Γ2 rate as a function of the exchange rate kex for a two-state exchange system simulated with the McConnell equations. The apparent Γ2 rate approaches the population average of the Γ2 rates for the two species only when the exchange rate is considerably greater than the difference in relaxation rates (i.e., kex >> |Γ2B – Γ2A|, the fast exchange regime on the relaxation time scale).30 The rate kex for exchange between the ground state and the nonspecifically bound state can substantially increase at higher ionic strength, as electrostatic interactions between the protein and DNA become weaker. In fact, the z-exchange NMR data at low ionic strengths showed that even a slight increase in ionic strength drastically accelerates the translocation of the HoxD9 homeodomain on DNA.26 This explains why the PRE data for the HoxD9 homeodomain at 100 and 160 mM NaCl reflected the target DNA search, whereas the PRE data at 20 mM NaCl did not.

Discrete-state model for a target DNA search process

Our goal in the current study is to interpret the PRE data using a more realistic kinetic model of the target DNA search process. Although the two- or three-state exchange model can offer a good explanation on the experimental PRE data on the HoxD9 homeodomain-DNA interactions,26,30 the actual target DNA search process involves a considerably larger number of nonspecific DNA-protein states. During the search process, each protein molecule moves from one location to another on DNA through three major mechanisms:20 namely, sliding, dissociation/reassociation, and intersegment transfer. Sliding is a nondirectional shift from a site to an adjacent site while remaining bound to the DNA. The sliding in the proximity of targets is particularly important, causing the so-called antenna effect, and accelerates the search process9,12,5456. Furthermore, the proteins dissociate from DNA, undergo three-dimensional diffusion, and reassociate with another site. When this occurs between two proximal sites, it is called hopping. Intersegment transfer is direct translocation from a DNA duplex to another duplex without going through the intermediary of free protein.

We use a discrete-state kinetic model involving a large number of overlapping sites for nonspecific DNA association (Figure 2a). The rate constants for sliding (ksl,j), dissociation (koff,j), association (kon,j), and intersegment transfer (kit,j) are defined for each binding site (represented by an index j) with affinity represented by the dissociation constant Kd,j = koff,j/kon,j. Based on the principle of detailed balance,57 the sliding kinetic rate constants ksl,j and ksl,j+1 for two adjacent sites are related as follows:

ksl,j+1=ksl,j(Kd,j+1Kd,j)[Dj][Dj+1] (1).

Figure 2.

Figure 2.

Discrete-state kinetic model used in this study. (a) Translocation mechanisms and relevant rate constants (ksl for sliding; koff for dissociation; kon for association; and kit for intersegment transfer). Each rate constant is separately defined for translocation of the protein that is specifically bound to the target (i.e., ksl,s, kon,s, koff,s, and kit,s). The overlapping nonspecific and specific binding sites for the HoxD9 homeodomain are indicated by arrows below the DNA sequence used for the PRE experiments. (b) Structural models for individual DNA-bound states. The structures were drawn with ChimeraX.60 (c) Spherical representation of the Mn2+ spatial distribution. The EDTA-Mn2+ group is tethered via a flexible linker.31,32 The spatial distribution of the paramagnetic center is relatively wide with the radius Rp = 7 Å.61

[Dj] and [Dj+1] are the concentrations of sites j and j+1 in the free state, respectively. Strictly speaking, intersegment transfer is supposed to involve a DNA-bridging intermediate; however, it can be regarded as a single step second-order process characterized by an apparent second-order rate constant kit,i.49,50,58 Due to the principle of detailed balance, the rate constants kit,j and kit,n for intersegment transfers from the sites j and n satisfy:

kit,n=kit,j(Kd,nKd,j) (2).

Since the concentrations of all chemical species remain constant during the NMR experiments, second-order processes in the current system can be treated as pseudo-first-order processes using a kinetic matrix.59 We construct kinetic matrices Ks, Kda and Kit for sliding, dissociation/association, and intersegment transfer, respectively, as described in our previous study on NMR line-shape analysis for nonspecific DNA-protein interactions.51 The overall kinetic matrix K is defined as:51

K=Ks+Kda+Kit (3).

The dimensions of these matrices are the total number of states for the protein, including the target-bound, nonspecifically bound, and unbound states.

McConnell equation for an exchange system

The discrete-state kinetic model for protein translocation on DNA can readily be incorporated into the McConnell equation for chemical exchange involving an arbitrary number of states:52,53

ddtm(t)=(iWR+K)m(t) (4)

This equation describes the time dependence of transverse nuclear magnetization for each state. The basis m(t) is a column vector comprising transverse magnetizations for individual states at a time t. Each vector element, mj, is a complex number (i.e., mj = mj,x + imj,y, where i is the unit imaginary number) representing transverse magnetization at a time t. The matrices R, K, and W represent the relaxation matrix, the kinetic matrix, and the chemical shift matrix, respectively.59 R is a diagonal matrix with each diagonal element being the intrinsic transverse relaxation rate for each state. W is also a diagonal matrix with elements of 2πδj, where δj is the chemical shift in Hz units for the jth state.

The solution of Eq. 4 is obtained using a matrix exponential as follows:52

m(t)=exp(Gt)m(0) (5),

In Eq. 5, m(0) represents the initial status at time 0 and its elements are proportional to the equilibrium populations of individual states. The matrix exponential exp(−Gt) is a time propagator, where G = iWR + K.

Simulation of apparent PRE rates for an exchange system

Through experiments on a paramagnetic sample and a corresponding diamagnetic sample, an apparent transverse PRE rate Γ2app is measured as:

Γ2app=R2,paraappR2,diaapp (6),

where R2,paraapp and R2,diaapp are the apparent transverse relaxation rates for the paramagnetic and diamagnetic samples, respectively. To simulate the apparent PRE rates for an exchange system, the matrix G in Eq. 5 should be defined separately for the paramagnetic and diamagnetic samples (Gpara and Gdia, respectively) as follows:

Gpara=iWRpara+K (7)
Gdia=iWRdia+K (8),

in which Rpara and Rdia are diagonal matrices with the intrinsic relaxation rates R2,para,j and R2,dia,j, respectively, for the jth state. For paramagnetic systems with an isotropic χ tensor (e.g., Mn2+ and nitroxide), pseudo-contact shifts do not occur,30 thereby rendering the chemical shift matrix W identical for both the paramagnetic and diamagnetic samples. The kinetic matrix K is also identical for both the samples.

Transverse PRE rates are typically measured using a pulse sequence involving a spin-echo scheme.30,62,63 In such a case, the vector m following a spin-echo scheme with a duration of T is:

mdia(T)=exp(GdiaT2)[exp(GdiaT2)mdia(0)]* (9)
mpara(T)=exp(GparaT2)[exp(GparaT2)mpara(0)]* (10),

In these equations, the asterisks (*) denotes conjugates (i.e., each imaginary part is multiplied by −1) that correspond to the effect of a 180° pulse along x. In PRE measurements with a two-timepoint approach, an apparent PRE rate is determined by:62

Γ2app=1TbTalnIpara(Ta)Idia(Tb)Ipara(Tb)Idia(Ta) (11),

where Ipara and Idia are the peak intensities for the paramagnetic and diamagnetic states, respectively; and Ta and Tb are the two time points for the spin-echo scheme. The peak intensities Ipara and Idia can be simulated with Eqs. 9 and 10. Thus, Eqs. 3 and 711 allow us to simulate the apparent transverse PRE rate Γ2app for the target DNA search system.

Calculations of intrinsic PRE rates from structural models

Our discrete-state kinetic model assumes that the DNA has N distinct overlapping binding sites (Figure 2a). For the 24-bp DNA used in the PRE experiments, our model involves 20 overlapping sites for the binding of the HoxD9 homeodomain to DNA as shown in Figure 2b. The structural models for protein-DNA complexes at individual sites were generated using the crystal structure of the highly homologous Antp homeodomain-DNA complex (Protein Data Bank [PDB] 9ANT).64 Using the Xplor-NIH software,65 B-form DNA was extended at each end to generate a 54-bp DNA complex, and the Antp homeodomain was replaced with a HoxD9 homeodomain through a homology modeling. Then, 20 complexes of a 24-bp DNA with the homeodomain being located at different positions were generated from the 54-bp DNA complex by cutting it at different positions with a 24-bp window. Each structure model represents the homeodomain located at the corresponding binding site that has the same binding orientation and binding surface relative to the specific complex. Due to the pseudo-C2 symmetry of the DNA double helix, the bound protein molecule can adopt two opposite orientations at each of the 20 sites. Therefore, our kinetic model for the HoxD9 homeodomain involved a total of 41 states, including the free state.

The intrinsic 1H PRE rates Γ2 for individual states of the DNA-bound protein were calculated using the 1H-Mn2+ distances from the structural models along with:29

Γ2=115(μ04π)2γH2g2μB2s(s+1)[4J(0)+3J(ωH)] (12),

where μ0 is the permeability of the free space; γH, the 1H nuclear gyromagnetic ratio; g, the electron spin g-factor; μB, the Bohr magneton; s, the spin quantum number (s = 5/2 for Mn2+); and J(ω), the reduced spectral density function for the dipole-dipole interaction between the observed nucleus and the unpaired electrons.

We simulated intermolecular PRE arising from an EDTA-Mn2+ group covalently attached to a DNA thymine base. As shown in the previous studies,32,61 this paramagnetic group is floppy due to rotatable bonds within the linker connecting the EDTA-Mn2+ group to a thymine base (Figure 2c). The spatial distribution of the paramagnetic center should be considered when predicting the PRE from a structure. We used the following spectral density function for a uniform spherical distribution of the paramagnetic center:61

J(ω)=R6τc1+ω2τc2+[(1Rp2R2)31]R6τt1+ω2τt2 (13),

in which R is the distance between a 1H nucleus and the center of the spherical distribution of Mn2+ and Rp is the distribution radius. The correlation times in Eq. 13 are τc = (τr−1 + τs−1)−1 and τt = (τc−1 + τi−1)−1, where τs is the electron relaxation time; τr, the molecular rotational correlation time; and τi, the correlation times for internal motions.

The center of the spatial distribution of the Mn2+ ion was transferred to the corresponding position of each generated structure, as described previously.61 The intrinsic PRE rates for each state were calculated using Eqs. 12 and 13 with the distance Rp = 7 Å based on a previous study on PRE arising from EDTA-Mn2+.61 For the states in which the protein is closest to the EDTA-Mn2+ group, a smaller radius of 5 Å was used to account for the steric effects of the protein on the spatial distribution of the EDTA-Mn2+ group.

Salt concentration dependence

The salt dependence of dissociation constants Kd for protein-DNA complexes is empirically given by:6668

logKd=alog[NaCl]+b (14),

This relationship was explained with the counterion condensation theory.6771 We modeled the salt dependence of the dissociation constant koff based on the salt dependence of Kd. Previous studies also showed a log-log linear relationship for the rate constant kit for the intersegment transfer:26,50

logkit=aklog[NaCl]+bk (15).

Based on the experimental data,26 we selected ak = 3 and a value of bk was chosen to provide a specified value of kit at 100 mM NaCl. We used a corresponding log-log linear relationship for sliding as well.

Computation

All computations were performed using MATLAB software (version 2022a; MathWorks) with in-house scripts. The total concentration of protein and DNA used in the simulations were 400 μM and 600 μM, respectively. Based on the experimental data on apparent DNA affinity of the HoxD9 homeodomain,28 the dissociation constant Kd,s = 30 nM was used for the specific DNA complex, and Kd,n = 6000 nM was used for nonspecific complexes at 100 mM NaCl. Assuming a diffusion-controlled process with the Smoluchowski limit along with an orientation constraint for macromolecular association,72,73 the intrinsic association rate constants kon and kon,T were set to 107 M−1 s−1. The rate constant for intersegment transfer kit and the rate constant for sliding ksl were set to 105 M−1s−1 and 105 s−1, respectively, for a site with a dissociation constant is Kd,n= 6000 nM at 100 mM NaCl. When nonuniform populations were used for nonspecific sites (see below), Kd,n was adjusted according to the populations, and the rate constants kit and ksl were set using Eqs. 1 and 2 to maintain the principle of detailed balance. The values of kit and ksl were based on the previous studies on nonspecific DNA complexes of the HoxD9 homeodomain.28,51 The values of these rate constants at different salt concentrations were calculated using Eq. 15 and ak = 3, based on the data of Fig. S1 of Ref. 26. This value was also used for the parameter a = 3 in Eq. 14. The dissociation rate constants were calculated with koff = konKd. The chemical shift of the free protein was set to 0 Hz, whereas the chemical shifts of the DNA-bound states were randomly chosen from Gaussian distribution with a standard deviation of 100 Hz and the mean of 200 Hz.51 It should be noted that the apparent Γ2 rates are virtually independent of chemical shifts unless an unrealistically wide variation is used for the chemical shifts. The intrinsic transverse 1H relaxation rates for free and DNA-bound states were set to 30 s−1 and 50 s−1, respectively, according to experimental data of the DNA complex and the free HoxD9 homeodomain.51 In Eqs. 9 and 10, Ta = 4.6 ms and Tb = 18.6 m were used based on the experimental conditions.26

Results and Discussion

To examine whether the kinetic model on translocation of the protein can explain the observed PRE data, we conducted the McConnell equation-based simulations of the apparent PRE rates Γ2 for individual residues of the HoxD9 homeodomain bound to DNA. The exchange system was defined with 41 states, including 40 DNA-bound states (i.e., 1 specific and 39 nonspecific complexes; Figure 2b) and a free state. It was assumed that the nonspecific and specific complexes share the same structural features in terms of the binding mode.

Simulations assuming uniform binding distribution for nonspecific sites

We first conducted the simulations under an assumption that the protein is uniformly distributed among all nonspecific sites with an equal affinity. We calculated the equilibrium populations of the individual states by solving the following simultaneous equations:

Kd,s=[T]eq[P]eq/[PT]eq (16)
Kd,n=[D]eq[P]eq/[PD]eq (17)
Ptot=[P]eq+[PT]eq+(2N1)[PD]eq (18)
Dtot=[T]eq+[PT]eq (19)
Dtot=[D]eq+[PD]eq (20).

Kd,s is the intrinsic dissociation constant for the specific site (i.e., target). Kd,n is the intrinsic dissociation constant for each nonspecific site. Ptot and Dtot are the total concentrations of protein and DNA, respectively. [P]eq and [T]eq are the equilibrium concentrations of the free protein and the free target, respectively. [D]eq is the equilibrium concentration of each nonspecific site in the free state. [PT]eq and (2N–1)[PD]eq are the equilibrium concentrations of the target-bound protein and the proteins that are nonspecifically bound to DNA, respectively. Using Eqs. 1620, we calculated the equilibrium concentrations, [P]eq, [T]eq, [D]eq, [PT]eq and [PD]eq. Next, we constructed the vector m(0) for Eqs. 9 and 10 using the equilibrium populations of the free protein ([P]eq/Ptot) and the target-bound protein ([PT]eq/Ptot) and the 39 states of the nonspecifically bound protein ([PD]eq/Ptot).

The apparent PRE rates were simulated at 20, 100, and 160 mM NaCl. Figure 3a shows the simulation results for the system with a uniform distribution for nonspecific protein binding. Under the simulation conditions, 73% of the protein was bound to the target and 27% was nonspecifically bound to the 24-bp DNA duplex. The populations of the protein bound to individual nonspecific sites were assumed to be identical. As observed in the experiments, the simulated apparent Γ2 rates at 20 mM NaCl (black circles) were consistent with the structure of the protein-target complex. The blue solid line in Figure 3a shows the intrinsic Γ2 rates calculated from the structure of the specific complex alone. The apparent Γ2 rates at 20 mM NaCl calculated using the McConnell equation for the exchange system were close to the intrinsic Γ2 rates for the target bound state (the solid blue in Figure 3a). As observed in the experiment (Figure 1a), the apparent Γ2 rates were consistent with the structure model of the specific complex with the target (Figure 3b).

Figure 3.

Figure 3.

McConnell equation-based simulations of apparent Γ2 rates for 1HN nuclei of the HoxD9 homeodomain in the presence of 24-bp DNA containing a target site. The kinetic model involving the 41 states (Figure 2) were used for the simulations. Results from the simulations under the assumption that the HoxD9 homeodomain is uniformly distributed among nonspecific sites in the target search process. (a) Simulated apparent Γ2 rates for 1HN nuclei at 20, 100, and 160 mM NaCl shown in black, green, and red, respectively. Asterisks represent apparent Γ2 rates larger than 85 s−1. Eqs. 14 and 15 were used to account for the salt dependence of the dissociation constants and the rate constants (see the main text). The solid blue lines represent the intrinsic Γ2 rates calculated for the structure model of the specific complex with the target. (b, c) Mapping of the simulated apparent Γ2 rates on the structure in the same way as Figure 1a. Data at 20 mM NaCl and 160 mM NaCl are shown in Panels b and c, respective.

When the salt concentration is raised in the simulation, the apparent Γ2 rates for some residues substantially deviated from the intrinsic Γ2 rates and drastically increased. This salt-dependent increase is qualitatively consistent with the experimental observations. It appears that the increase is mainly due to intersegment transfer and sliding because removing the matrices Kit and Ksl from the matrix K in Eq. 3 eliminated the increase in Γ2 rates.

Comparison of Figure 3 with Figure 1a clearly displays significant differences between the simulated data and the experimental data at high ionic strengths. For example, the experimental data showed that the residues 26-30 exhibited a considerably large increase in apparent Γ2 rates upon the increase in NaCl concentrations, whereas the simulations show only a marginal increase in the apparent Γ2 rates for these residues. Conversely, experimental data showed only a marginal increase in the apparent Γ2 rates for residues 55-60, while the simulation data showed a large increase. These discrepancies suggest that the probability distribution of the HoxD9 homeodomain at nonspecific sites is nonuniform in the target search process.

Simulations with nonuniform binding distribution for nonspecific sites

To account for nonuniform distribution of the protein that is nonspecifically bound to DNA, we performed another set of simulations using the same kinetic model. Before these simulations, we optimized the populations at individual nonspecific sites, as previously done for the PRE data on the nonspecific DNA-protein complexes of the HMGB1 A-box domain.27 In this approach, we used the experimental PRE data arising from two different EDTA-Mn2+ groups conjugated to two sites on the opposite ends of the 24-bp DNA (i.e., sites 1 and 4 in Figure 2 of Ref. 26) for the HoxD9 homeodomain at 160 mM NaCl. Assuming that the apparent Γ2 rates were virtually equal to the population-weighted average of the intrinsic Γ2 rates of different states at 160 mM NaCl, we optimized the populations (pj) through constrained linear least-squares fitting to minimize the following function:27

E=n(Γ2,obs,njpjΓ2,j,n)2 (21),

in which Γ2,obs,n is an observed PRE rate; and Γ2,j,n represents the intrinsic Γ2 rate for the nth nucleus calculated from the structural model of the jth state. The sum of the populations pj was constrained to 1. The population of the target-bound protein was constrained to [PT]eq/Ptot, whereas the populations of nonspecifically bound proteins were constrained to a range of between 10−4 and 1–[PT]eq/Ptot. Data of 98 Γ2 rates (including those for sites 1 and 4) were used for the minimization. The number of fitting parameters was 38 in this minimization because the populations of the free and target-bound states were kept fixed, and the sum of populations for all 41 states was constrained to 1. Therefore, the degree of freedom was 60 (= 98 – 38) for this fitting. Using Eq. 21, the residual sum of squares for the nonuniform distribution model (Enon) and that for the uniform distribution model (Euni) were assessed for an F-test.74 F = (60/38)(EnonEuni)/Enon was calculated to be 551, which was far greater than the critical value (1.96) at 1% significance level of the F distribution with (38, 60) degrees of freedom. These results suggest that the nonuniform distribution model is better than the uniform distribution model.

The use of a nonuniform distribution in the simulations also requires careful settings of the kinetic matrices Ksl, Kda, and Kit that do not violate the principle of detailed balance. Following the population optimizations, Eqs. 1 and 2 were used to meet this requirement for the nonspecific and target sites. Using the updated kinetic matrices and equilibrium populations, we simulated the apparent PRE Γ2 rates using the McConnell equation. Figure 4a shows the simulation results. Compared to the case with the uniform distribution, the profiles of the simulated apparent Γ2 rates at 100 and 160 mM NaCl were more consistent with the experimental data. The improvement due to the nonuniform distribution was particularly obvious for the residues 26-30 and 55-60 through comparison of Figures 3 and 4 with Figure 1a. These data suggest that there are some preferential sites for nonspecific bindings for the protein during the target search process. This is a new insight from our current model.

Figure 4.

Figure 4.

Results from the simulations with nonuniform distribution of the protein in the target search process. The populations of individual states for the nonspecifically bound protein were optimized as described in the main text. The results are shown in the same manner as in Figure 3.

It is not surprising that the distribution of proteins bound to nonspecific sites is nonuniform. In fact, in the literature on theoretical investigation of target DNA search by proteins, it is typically assumed that the free-energy landscape for proteins that are nonspecifically bound to DNA depends on the sequence.45,75,76 If the dependence of binding free energy on DNA sequence is known from such experimental data as ΔΔG7779 or position weight matrix (PWM)8082, the equilibrium populations at individual nonspecific sites can be predicted from the sequence. Variation in nucleotide sequence among nonspecific sites can explain the nonuniform distribution of the protein during the search process.

Limitations of the discrete-state model

Although our current model can reproduce some key features of experimental PRE data at different salt concentrations, some limitations of this model should be noted. In our current model, protein translocation on DNA is described in a “discrete” manner with all bound states sharing the same structural properties. The structural models of individual nonspecific complexes shown in Figure 2b are based on the same structural model for the homeodomain-target complex. However, the assumption of the same structure at every nonspecific site is merely for the sake of convenience and may be unrealistic. Proteins may switch between different binding modes at nonspecific sites on DNA.40,4547 The remaining discrepancies between the observed and simulated PRE data may arise from the presence of states that cannot be represented by any of the structure models used in our current simulations. Nonetheless, since our current simulations reproduced the experimental observations, it seems likely that the nonspecific complexes during the target search process are structurally similar to the specific complex.

Conclusions

Through the use of the McConnell equation incorporating a realistic kinetic model for protein translocation on DNA, we have analyzed the previously published PRE data on the HoxD9 homeodomain bound to 24-bp DNA containing a target sequence. This model accounts for protein translocation through sliding, dissociation/re-association, and intersegment transfer. The model successfully reproduced the experimental PRE data and confirmed the previous interpretation of the salt-dependent PRE data regarding the target DNA search process. Before reaching the target, the HoxD9 homeodomain nonspecifically bind to various DNA sites through interactions that are structurally similar to those in the specific complex. Both the intersegment transfer and sliding processes appear to make significant contribution to the target search process influencing the PRE data at high ionic strength. The probability distribution of the protein among the nonspecific sites is nonuniform.

Acknowledgments

This work was supported by Grant R35-GM130326 from the National Institutes of Health (to J.I.) and Grant H-2104-20220331 from the Welch Foundation (to J.I.). The current paper was submitted to celebrate the Murray Goodman Memorial Prize to Dr. Marius Clore, who has inspired us for many years. One of us (J.I.) worked in the Clore laboratory for 5 years from 2002 until 2007. The experimental data used in this paper were published in 2006 as a product of the research training in the Clore laboratory.26 We would like to dedicate this paper to Dr. Marius Clore.

Footnotes

Conflict of Interest Statement

Authors declare no conflicts of interest.

Data Availability Statement

All computational data are shown in the figures. The MATLAB scripts used to produce these data are available upon request to the corresponding author.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All computational data are shown in the figures. The MATLAB scripts used to produce these data are available upon request to the corresponding author.

RESOURCES