Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2012 Aug 31;28(22):2971–2978. doi: 10.1093/bioinformatics/bts537

Biologistics—Diffusion coefficients for complete proteome of Escherichia coli

Tomasz Kalwarczyk 1, Marcin Tabaka 1, Robert Holyst 1,*
PMCID: PMC3496334  PMID: 22942021

Abstract

Motivation: Biologistics provides data for quantitative analysis of transport (diffusion) processes and their spatio-temporal correlations in cells. Mobility of proteins is one of the few parameters necessary to describe reaction rates for gene regulation. Although understanding of diffusion-limited biochemical reactions in vivo requires mobility data for the largest possible number of proteins in their native forms, currently, there is no database that would contain the complete information about the diffusion coefficients (DCs) of proteins in a given cell type.

Results: We demonstrate a method for the determination of in vivo DCs for any molecule—regardless of its molecular weight, size and structure—in any type of cell. We exemplify the method with the database of in vivo DC for all proteins (4302 records) from the proteome of K12 strain of Escherichia coli, together with examples of DC of amino acids, sugars, RNA and DNA. The database follows from the scale-dependent viscosity reference curve (sdVRC). Construction of sdVRC for prokaryotic or eukaryotic cell requires ~20 in vivo measurements using techniques such as fluorescence correlation spectroscopy (FCS), fluorescence recovery after photobleaching (FRAP), nuclear magnetic resonance (NMR) or particle tracking. The shape of the sdVRC would be different for each organism, but the mathematical form of the curve remains the same. The presented method has a high predictive power, as the measurements of DCs of several inert, properly chosen probes in a single cell type allows to determine the DCs of thousands of proteins. Additionally, obtained mobility data allow quantitative study of biochemical interactions in vivo.

Contact: rholyst@ichf.edu.pl

Supplementary information: Supplementary data are available at Bioinformatics Online.

1 INTRODUCTION

Biologistics and biochemistry in a crowded environment are two emerging interdisciplinary fields of science. They provide quantitative analysis of transport of proteins and their spatio-temporal correlations involved in gene expression and regulation. According to the current state-of-the-art theory of gene expression (activation or repression) in bacteria (Elf et al., 2007; Li et al., 2009), mobility of proteins is one of the few parameters necessary to describe reaction rates of gene regulation. The mobility is understood as a three-dimensional diffusion or one-dimensional sliding along DNA (for prokaryotes and eukaryotes), or by velocity of molecular motors (in eukaryotic cells). Understanding of diffusion-limited biochemical reactions requires accurate in vivo mobility data for the largest possible number of proteins in their native forms. The three-dimensional diffusion of different types of macromolecules in the cytoplasm of Escherichia coli has been experimentally studied in several cases (Bakshi et al., 2012; Campbell and Mullins, 2007; Cluzel et al., 2000; Derman et al., 2008; Elowitz et al., 1999; English et al., 2011; Golding and Cox, 2004; Jasnin et al., 2008; Konopka et al., 2006; Kumar et al., 2010; Mika et al., 2010; Mullineaux et al., 2006; Nenninger et al., 2010; Slade et al., 2009; van den Bogaart et al., 2007), but experimental determination of the mobility of all proteins is technically an impossible task because of their large number in a given cell. For example, the proteome of the K12 strain of E. coli (Blattner et al., 1997) contains more than 4300 proteins. Moreover, most of the recent studies concern measurements mainly performed with the use of green fluorescent protein (GFP) (Elowitz et al., 1999; Konopka et al., 2006; Kumar et al., 2010; Mika et al., 2010; Nenninger et al., 2010; Slade et al., 2009; van den Bogaart et al., 2007) or GFP fusion proteins (Jennifer et al., 2001).

Attempts to study the diffusion of many proteins simultaneously, under conditions resembling the interior of the cells, were performed in silico by McGuffee and Elcock (2010). Computational methods, however, have limitations arising from the speed and capacity of computing hardware and small number of interacting proteins in the system (~50 different types of proteins) (McGuffee and Elcock, 2010). An alternative approach is the quantitative analysis of available literature data. Mika and Poolman (2011) gathered literature data of diffusion coefficients (DCs) of ~20 different types of proteins in E. coli and proposed a power law dependence of the DC on the molecular weight of proteins. This power law, however (Mika and Poolman, 2011), can be applied only for the proteins in a narrow range of molecular weights, i.e. between 20 and 30 kDa.

In this work, we present a method for predictions of the DCs of proteins for the proteome of any cell. We collected all available literature data (Bakshi et al., 2012; Campbell and Mullins, 2007; Cluzel et al., 2000; Derman et al., 2008; Elowitz et al., 1999; English et al., 2011; Golding and Cox, 2004; Jasnin et al., 2008; Konopka et al., 2006; Kumar et al., 2010; Mika et al., 2010; Mullineaux et al., 2006; Nenninger et al., 2010; Slade et al., 2009; van den Bogaart et al., 2007) on diffusion of various probes, including small molecules (water, glucose), proteins and plasmids, in the cytoplasm of E. coli. We used those data and the scaling function of viscosity (Hołyst et al., 2009; Kalwarczyk et al., 2011; Szymański et al., 2006a, b) to predict the mobility of macromolecules in the bacterial cytoplasm. We also predicted the DCs of amino acids, sugars, proteins and DNA. We created a unique database, including the DCs of all proteins of strain K12 of E. coli (4302 proteins), their oligomers and their potential complexes with translocation proteins; 6600 records in total.

2 METHODS

2.1 A brief description of the method

Our predictions of DCs of proteins in the bacterial cytoplasm are based on experimental data on diffusion in the cytoplasm of E. coli available in the literature (Bakshi et al., 2012; Campbell and Mullins, 2007; Cluzel et al., 2000; Derman et al., 2008; Elowitz et al., 1999; English et al., 2011; Golding and Cox, 2004; Jasnin et al., 2008; Konopka et al., 2006; Kumar et al., 2010; Mika et al., 2010; Mullineaux et al., 2006; Nenninger et al., 2010; Slade et al., 2009; van den Bogaart et al., 2007). The method relies on the dependence Inline graphic, where Inline graphic is the DC of macromolecule in water of viscosity Inline graphic, and Inline graphic is the DC of macromolecule in the cytoplasm. Inline graphic is the effective viscosity experienced by the macromolecule during diffusion in the cytoplasm. The protocol of determination of DCs is graphically represented in Figure 1.

Fig. 1.

Fig. 1.

Diagram of a method of predicting the DC of any molecule in the cell cytoplasm. To predict the DCs of molecules in the cytoplasm, it is essential to correctly select the probes that will be used to determine the reference curve. Next, one need to measure the DCs of selected probes in water (buffer) Inline graphic and the DC in the cytoplasm of studied cell Inline graphic. Using Inline graphic and Inline graphic, we create the sdVRC. To predict the DC of a given molecule, it is necessary to know its hydrodynamic radius Inline graphic or Inline graphic. Although sdVRC depends on both Inline graphic and Inline graphic, in practice, both parameters can be calculated knowing only one of them. Finally, by substituting the values of Inline graphic and Inline graphic to sdVRC, the DC in the cytoplasm Inline graphic can be determined

2.2 Calculation of hydrodynamic radii and DCs in water

Hydrodynamic radius of proteins was determined using the following formula (Dill et al., 2011):

graphic file with name bts537m1.jpg (1)

while for RNA we used Equation (2) (Werner, 2011).

graphic file with name bts537m2.jpg (2)

Dependence of the hydrodynamic radii of linear, circular or super coiled DNA on molecular weight [Equations (3)–(5), respectively] was obtained from DCs of DNA constructs (Robertson et al., 2006) using Equation (6).

graphic file with name bts537m3.jpg (3)
graphic file with name bts537m4.jpg (4)
graphic file with name bts537m5.jpg (5)

Radii of amino acids and sugars have been calculated, assuming that the hydrodynamic radius Inline graphic corresponds to the van der Waals radius Inline graphic calculated according to the procedure described elsewhere (Zhao et al., 2003).

For each probe, we use the literature values of Inline graphic, while the values of Inline graphic (if not available) were calculated using the Stokes–Sutherland–Einstein equation [Equation (6)].

graphic file with name bts537m6.jpg (6)

2.3 Calculation of DCs of various molecules in the cytoplasm of E. coli

Using the molecular weights from Uniprot protein database (Apweiler et al., 2011; Jain et al., 2009), we calculated the DCs for the complete proteome of E. coli (K12 strain). We identified the cellular localization of each protein as well as its quaternary structure (a single polypeptide chain or multiple chain aggregates or complexes). In the case of membrane or periplasmic proteins, we adopted the assumption that, after synthesis, the proteins diffuse via the cytoplasm to its target in the membrane, through one of two transport pathways [twin-arginine translocation (TAT) or the general secretion system (Sec)] (Driessen and Nouwen, 2008; Sargent, 2007). Consequently, these proteins were considered as single polypeptide chains (the TAT pathway) or protein complexes with SecB or Tig proteins (the Sec pathway). Hydrodynamic radius of proteins was determined using Equation (1). When the protein was composed of several subunits, the molecular weight of all polypeptide chains comprising the protein was added together. On the basis of cumulative molecular weight of the complex, hydrodynamic radius of the protein Inline graphic and further its DC Inline graphic was calculated [Equations (1) and (6)]. Then, using Equation (7), we calculated the relative DCs for all analysed proteins, and we calculated the DCs of proteins in the cytoplasm Inline graphic. The calculated DCs of all proteins in the cytoplasm are summarized in Supplementary Table S1.

3 RESULTS AND DISCUSSION

3.1 Construction of the scale-dependent viscosity reference curve

We collected the literature data (Bakshi et al., 2012; Campbell and Mullins, 2007; Cluzel et al., 2000; Elowitz et al., 1999; English et al., 2011; Golding and Cox, 2004; Jasnin et al., 2008; Konopka et al., 2006; Kumar et al., 2010; Mika et al., 2010; Mullineaux et al., 2006; Nenninger et al., 2010; Slade et al., 2009; van den Bogaart et al., 2007) for DCs of different solutes and macromolecules in the cytoplasm of E. coli (Fig. 2 and Table 1). We used the least squares method to fit those data with Equation (7) (Kalwarczyk et al., 2011).

graphic file with name bts537m7.jpg (7)

here Inline graphic is the hydrodynamic radius of the probe, and Inline graphic and Inline graphic are length scales characterizing the cytoplasm. Inline graphic (an average distance between surfaces of proteins), Inline graphic (average hydrodynamic radius of the biggest crowders) and a (a constant of the order of one) are the fitting parameters whose values for the cytoplasm of E. coli are as follows: Inline graphic nm, Inline graphic nm and Inline graphic. From the scale-dependent viscosity reference curve (sdVRC), we directly determined the macroscopic viscosity Inline graphic of the cytoplasm. We found that Inline graphic (26 000 times greater than the viscosity of water – Inline graphic at 310 K). Inline graphic is comparable to the radius of the loops (Kim et al., 2004) of DNA covered with proteins. The second length scale determined from sdVRC, Inline graphic, is comparable to the average distance between surfaces of proteins. Inline graphic determines the length scale above which the viscosity ceases to depend on the size of the probe and reaches the macroscopic value. For a probe smaller than ξ, the experienced viscosity has a value comparable to the viscosity of water.

Fig. 2.

Fig. 2.

The sdVRC. The logarithm of viscosity Inline graphic divided by the viscosity of water Inline graphic [Inline graphic] as a function of the hydrodynamic radius Inline graphic of various probes (Table 1) of radii from 0.16 nm to 203 nm (closed square). The cytoplasmic DCs Inline graphic of probes were taken from the literature (Bakshi et al., 2012; Campbell and Mullins, 2007; Cluzel et al., 2000; Elowitz et al., 1999; English et al., 2011; Golding and Cox, 2004; Jasnin et al., 2008; Konopka et al., 2006; Kumar et al., 2010; Mika et al., 2010; Mullineaux et al., 2006; Nenninger et al., 2010; Slade et al., 2009; van den Bogaart et al., 2007) (cf. Table 1). By fitting the data with Equation (7) (solid line), we determined two length scales: Inline graphic nm and Inline graphic nm. We also determined the macroscopic viscosity of the cytoplasm Inline graphic, i.e. 26 000 times higher than the viscosity of water Inline graphic at 310 K. Shading represents the maximum error of fitting

Table 1.

Data used in the construction of sdVRC—cf. Figure 2

Probe Mw (kDa) rp (nm) Inline graphic Reference
Water 0.018 0.16 0.1 Jasnin et al. (2008)
Glucose 0.423 0.53 2.1 Mika et al. (2010)
mEos2 26 2.8 2.1 English et al. (2011)
EYFP 27 2.8 2.4 Kumar et al. (2010)
GFP 27 2.8 2.4 Elowitz et al. (1999)
GFP 27 2.8 3.2 Elowitz et al. (1999)
GFP 27 2.8 2.2 van den Bogaart et al. (2007)
GFP 27 2.8 2.6 Slade et al. (2009)
GFP2 27 2.8 2.3 Nenninger et al. (2010)
GFP 27 2.8 3.2 Mika et al. (2010)
GFP 27 2.8 2.7 Konopka et al. (2006)
GFP-His6 28 2.8 3.1 Elowitz et al. (1999)
torA-GFP 30 2.9 2.5 Mullineaux et al. (2006)
CheY-GFP 41 3.3 2.8 Cluzel et al. (2000)
NlpA-GFP 55 3.7 3.4 Nenninger et al. (2010)
NlpAInline graphic-GFP 55 3.7 3.2 Nenninger et al. (2010)
torA-GFP2 57 3.8 2.2 Nenninger et al. (2010)
torA-GFP2 57 3.8 2.1 Nenninger et al. (2010)
AmiA-GFP 58 3.8 3.6 Nenninger et al. (2010)
AmiA-GFP 58 3.8 3.6 Nenninger et al. (2010)
AmiAInline graphic-GFP 58 3.8 2.2 Nenninger et al. (2010)
CFP-CheW-YFP 71 4.1 3.5 Kumar et al. (2010)
cMBP-GFP 72 4.1 3.2 Elowitz et al. (1999)
torA-GFP3 84 4.4 2.2 Nenninger et al. (2010)
CFP-CheR-YFP 86 4.4 3.3 Kumar et al. (2010)
torA-GFP4 111 4.9 2.2 Nenninger et al. (2010)
torA-GFP5 138 5.3 2.8 Nenninger et al. (2010)
(β-Gal-GFP)4 582 9.4 3.5 Mika et al. (2010)
Ribosome 70S 2,500 16.6 6.0 Bakshi et al. (2012)
mRNA-GFP 6,000 21.3 6.2 Golding and Cox (2004)
Plasmid-GFP 18,480 203.9 10.1 Campbell and Mullins (2007)

We used as-obtained sdVRC [Equation (7)] as a tool for prediction of DCs of all known proteins of K12 strain (Blattner et al., 1997) of E. coli as well as other molecules and macromolecules.

3.2 Interpretation of sdVRC

For more than a decade, diffusion of various proteins in the cytoplasm of E. coli has been studied (Table 1) (Bakshi et al., 2012; Campbell and Mullins, 2007; Cluzel et al., 2000; Elowitz et al., 1999; English et al., 2011; Golding and Cox, 2004; Jasnin et al., 2008; Konopka et al., 2006; Kumar et al., 2010; Mika et al., 2010; Mullineaux et al., 2006; Nenninger et al., 2010; Slade et al., 2009; van den Bogaart et al., 2007). Those experimental data show that the DCs exponentially depend on the size of the diffusing molecule. For example, GFP with a molecular weight Inline graphic kDa and hydrodynamic radius Inline graphic nm is characterized by cytoplasmic DC (Elowitz et al., 1999) Inline graphic. On the other hand, the DC of large oligomeric protein consisting of four subunits of GFP-tagged β-galactosidase (β-gal-GFP)4, of radius almost three times greater than GFP (Inline graphic kDa, Inline graphic nm), is equal to Inline graphic (Mika et al., 2010). The above differences are explained in terms of scale-dependent viscosity (Kalwarczyk et al., 2011) experienced by the diffusing molecule [cf. sdVRC, Equation (7)]. Equation (7) is an empirical equation primarily found for synthetic systems such as polymer or micellar solutions (Hołyst et al., 2009; Kalwarczyk et al., 2011; Szymański et al., 2006a, b). Interpretation of four parameters in Equation (7) (Inline graphic and Inline graphic) is taken from those studies (Hołyst et al., 2009; Kalwarczyk et al., 2011; Szymański et al., 2006a, b). In synthetic systems, Inline graphic is the average distance between macromolecular components of the complex liquid and Inline graphic is equal to the hydrodynamic radius of a polymer random coil or of a micelle. In sdVRC, both Inline graphic and Inline graphic determine the viscosity experienced by a probe diffusing in the investigated liquid. For Inline graphic, the probe experiences the macroscopic viscosity Inline graphic. A probe of radius Inline graphic smaller than Inline graphic moving in the liquid experiences the viscosity of the solvent Inline graphic. On the other hand, a probe of Inline graphic will experience a viscosity higher than the viscosity of the solvent. Finally, the effective viscosity Inline graphic experienced by a probe of radius between Inline graphic and Inline graphic (Inline graphic) depends exponentially on Inline graphic. In case of the cytoplasm of mammalian cells, Inline graphic corresponds to the hydrodynamic radius of the filaments forming the cellular cytoskeleton in the volume of the cytoplasm (Kalwarczyk et al., 2011). The bacterial cytoskeleton (Shih and Rothfield, 2006), however, is located directly next to the inner membrane (Pogliano, 2008). We can therefore assume that it should not have a large contribution to the viscosity experienced by the proteins diffusing across the cytoplasm. This assumption is also supported by the value of Inline graphic nm determined from fitting, which is similar to the radius of the objects identified as fragments of the bacterial nucleoid (around 40 nm) (Kim et al., 2004), i.e. loops of DNA covered with structural proteins. This value can be compared with the value of the hydrodynamic radius of the filaments forming the bacterial cytoskeleton (Hou et al., 2012; Pogliano, 2008) (fragments of length L = 100 nm and a radius r = 2.5 nm), which is ~17 nm (Vandesande and Persoons, 1985), well below Inline graphic, obtained from the fit. Therefore, the length scale, Inline graphic, is neither correlated with the hydrodynamic radius of the filaments nor with the proteins whose highest hydrodynamic radius is about 10 nm. Inline graphic in the cytoplasm of E. coli equals Inline graphic nm and is comparable with the average distance between proteins. Parameters of the sdVRC (Inline graphic and Inline graphic) depend on the internal structure of the cytoplasm (proteins density, size of the nucleoid, etc.). Thus, each cell type will be characterized by a different shape of the reference curve (due to differences in parameters Inline graphic and Inline graphic), while the mathematical form of the sdVRC will not change, and such curve can be constructed for other cell types.

3.3 Other models of diffusion in the cytoplasm

We compared our results with three models of diffusion in the cytoplasm of E. coli, available in the literature (Figures 3 and 4). McGuffee and Elcock (2010) proposed two models of diffusion in the cytoplasm: the ‘steric’ model, which takes into account only steric interactions between diffusing proteins, and the ‘full’ model, which includes steric, electrostatic and hydrodynamic interactions between diffusing entities. Comparison of the results (Figure 3) shows that the model we propose takes into account possible interactions between the diffusing probes and the surrounding environment. Moreover, we show that the full information needed to build the sdVRC can be obtained only after taking into account the probes whose Inline graphic greatly exceeds Inline graphic. For example, simulations conducted by McGuffee and Elcock (2010) include proteins that are most abundant in the cytoplasm, but the absence of large objects such as the nucleoid leads to underestimated values of Inline graphic. The effect starts to be meaningful for probes whose Inline graphic nm. In that case, the values of Inline graphic are lower by an order of magnitude with respect to experimental results.

Fig. 3.

Fig. 3.

The comparison of sdVRC with other existing models. The plot shows the literature values for the logarithm of Inline graphic (open squares) (Bakshi et al., 2012; Campbell and Mullins, 2007; Cluzel et al., 2000; Elowitz et al., 1999; English et al., 2011; Golding and Cox, 2004; Jasnin et al., 2008; Konopka et al., 2006; Kumar et al., 2010; Mika et al., 2010; Mullineaux et al., 2006; Nenninger et al., 2010; Slade et al., 2009; van den Bogaart et al., 2007). Black solid line represents Equation (7) with parameters: Inline graphic nm, Inline graphic nm and Inline graphic. We compared our results with data generated by McGuffee and Elcock (2010) and Mika and Poolman (2011). The data generated by McGuffee and Elcock (2010) were fitted by Equation (7), yielding the following parameters: for the ‘full’ model Inline graphic nm, Inline graphic nm and Inline graphic (dotted circle, dotted line), for the ‘steric’ model Inline graphic nm, Rh = 17 ± 6 nm and Inline graphic (open diamond, dashed line). The model proposed by Mika and Poolman (2011) where Inline graphic is plotted as dashed–dotted line

Fig. 4.

Fig. 4.

Comparison of measured and predicted Inline graphic as a function of molecular weight of the investigated probes. Predicted dependencies shown in the graph are expressed by Equation (7). The hydrodynamic radius Inline graphic of each type of macromolecules is given by the relationship: Inline graphic nm, where Inline graphic is the molecular weight of the macromolecule. For proteins C = 0.0514 and α = 0.392—Equation (1); RNA C = 0.0566 and α = 0.38—Equation (2), linear DNA C = 0.024 and α = 0.57—Equation (3); circular DNA C = 0.0125 and α = 0.59—Equation (4); super coiled C = 0.0145 and α = 0.57—Equation (5). For comparison, we present experimental data on DCs of proteins (Cluzel et al., 2000; Elowitz et al., 1999; English et al., 2011; Konopka et al., 2006; Kumar et al., 2010; Mika et al., 2010; Mullineaux et al., 2006; Nenninger et al., 2010; Slade et al., 2009), RNA (Golding and Cox, 2004), plasmid (Campbell and Mullins, 2007) and ribosomes 30S and 70S (Bakshi et al., 2012). The dashed–dotted straight line indicates the relationship Inline graphic proposed by Mika and Poolman (2011). The dependence of Inline graphic on Inline graphic proposed by Mika and Poolman (2011), when applied to large plasmids (Inline graphic kDa), yields several orders of magnitude overestimation of DC

We also compared our results with the model proposed by Mika and Poolman (2011), where Inline graphic. As can be seen, the power law dependence of Inline graphic on Inline graphic may also lead to underestimated values of Inline graphic. For example, for the ribosome 70S Inline graphic measured experimentally is five times higher than predicted using power law dependence. Therefore, the power law dependence proposed by Mika and Poolman (2011) holds for the proteins in a small range of molecular weights 20–30 kDa and, moreover, is not applicable to macromolecules other than proteins. This is because each type of macromolecules (DNA, RNA, proteins, polymers, etc.), has different shape and thus different dependence of Inline graphic on Inline graphic [Equations (1)–(5)]. The shape of the macromolecule and in consequence its radius translates into the DC. The dependence of DC Inline graphic of different types of macromolecules (proteins, RNA and DNA) on their molecular weight is shown in Figure 4.

3.4 Accuracy of the model

Accuracy in determination of the sdVRC strongly depends on the amount of available data. One would expect that increasing the amount of data for probes of Inline graphic and Inline graphic, would significantly decrease the maximum error of the sdVRC (compare Fig. 2).

To test the accuracy of the presented method, we perform an analysis of the error of calculation of DC Inline graphic for GFP as a function of the number of experimental data points. Using Equation (7), we generated 10 datasets, where the number of data points ranges from 6 to 100. The generated data were uniformly distributed on a logarithmic scale and were randomly drawn on the assumption that measurement error is described by a normal distribution with standard deviation Inline graphic. We assumed that the error of Inline graphic equals to 5%. We found that 20 data points are sufficient to obtain Inline graphic at the level of 20% for the GFP (averaged over 10 generated datasets). In comparison, Inline graphic obtained from the analysis of the literature data was at the level of 40% (cf. Fig. 2). This is mainly because of the small number of available experimental data. Furthermore, most of the experimental data are available for a narrow range of hydrodynamic radii (around 3 nm, cf. Fig. 2) which is not preferred in this type of analysis. To date, however, there is no experimental data which would improve the accuracy of the sdVRC. Therefore, to improve the accuracy, additional experiments are needed to cover a wider range of Inline graphic of the probes and also uncertainties of Inline graphic should be minimized.

3.5 DCs of proteins

Preparing a database of DCs of the entire proteome, one should keep in mind that about 45% of the proteome are proteins forming a larger macromolecular complex (homo-, hetero-oligomers and complexes of membrane proteins with translocation proteins). Thus, the calculation of DCs of proteins should be carried out also for protein complexes. The Uniprot protein database (Apweiler et al., 2011; Jain et al., 2009) contains information on the molecular weight of proteins, their quaternary structure and their location in cell. Using these data and sdVRC (cf. Fig. 2) we calculated the DCs Inline graphic of all proteins in E. coli (Supplementary Table S1) present in the cytoplasm as monomers (single polypeptide chains) or as multimers (homo- or hetero-oligomers) or complexes composed of many chains, see Fig. 5). Figure 5A shows the histogram of molecular weights of cytoplasmic proteins, including homo- and hetero-multimers. Distribution of molecular weights of proteins is given by log-normal distribution with probability density function Inline graphicInline graphic, where standard deviation Inline graphic and mean molecular weight Inline graphic kDa. The relationship between the DC and the molecular weight of protein is expressed by the Equations (1) and (7). A histogram of DCs of cytoplasmic proteins is shown in Figure 5B (same proteins as in Fig. 5A). The distribution follows the curve given by the probability density function: Inline graphic.

Fig. 5.

Fig. 5.

Distributions of molecular weights and DCs of cytoplasmic proteins in E.coli. (A) Histogram of molecular weights of cytoplasmic proteins (created using data from the Uniprot database). The histogram is described by log-normal distribution Inline graphic with standard deviation Inline graphic and the mean molecular weight Inline graphic kDa. (B) Histogram of DCs of cytoplasmic proteins (from our database) and the probability density function Inline graphic—solid line

We also calculated Inline graphic of membrane proteins that are ~30% of the proteome of E. coli. Membrane proteins, after synthesis by the ribosome, are transported to the membrane, according to one of the two pathways: the TAT (Sargent, 2007) in which proteins are transported as single polypeptides in a folded state and the Sec (Driessen and Nouwen, 2008) in which unfolded proteins are complexed mainly by one of the two proteins: SecB or Tig.

We created a database (Supplementary Table S1) listing the DCs of all proteins, including their monomeric forms, the possible homo- and hetero-multimers, and in the case of membrane proteins also the complexes with translocation proteins (SecB and Tig). Apart from DCs of proteins, we calculated Inline graphic of small molecules such as amino acids or sugars and for macromolecules such as RNA or DNA (linear, circular and super coiled). Calculated values of DCs are listed in Table 2.

Table 2.

Predicted, cytoplasmic DCs of small amino acids, sugars, selected proteins and ribosomes and DNA constructs

Molecule Inline graphic (nm) Inline graphic (Inline graphic)
Guanine 0.29 539
Histidine 0.32 478
Galactose 0.33 458
Arginine 0.34 428
Lactose 0.41 328
ATP 0.43 302
TrpR–Monomer 2.1 19.71
TrpR–Dimer 2.7 10.92
LacI–Monomer 3.2 7.28
LacI–Tetramer 5.6 1.79
RNAP Holoenzyme 8.5 0.5
Ribosome 30s 11.6 0.18
Ribosome 50s 13.2 0.11
Ribosome 70s 16.6 0.05
Pyes2 142a 1.13Inline graphic
CTD-2657L24 802b 1.62Inline graphic

aHydrodynamic radius calculated using Equation (3). bHydrodynamic radius calculated using Equation (5).

The predicted DCs refer only to three-dimensional diffusion. In cells, particularly eukaryotes, there are also other types of transport such as molecular motors (Vale, 2003). Nevertheless, mobility, irrespective of the type of motion, is inversely proportional to the viscosity of the surrounding environment. Since the viscosity is dependent on the scale (Hołyst et al., 2009; Kalwarczyk et al., 2011; Szymański et al., 2006a, b), each type of motion will depend exponentially [Equation (7)] on the size of a moving object.

3.6 Application of DC database in studies of biochemical processes occurring in cells

Using the database of DCs, one can determine quantitatively whether the protein diffuses freely or interacts and forms complexes with much larger macromolecules, e.g. plasmids. Capoulade et al. (2011) performed diffusion measurements and showed that, in the nucleus of eukaryotic cell, euchromatin creates domains of high and low affinity for heterochromatin protein (HP1α).

Another kind of analysis was performed by Elf et al. (2007). Authors compared in vivo DCs of both: the lactose repressor in its native form and the lactose repressor devoid of the DNA-binding domain. Order of magnitude difference in the coefficient of diffusion of both proteins led to the conclusion that the native lactose repressor spends 87% of the time attached to the DNA. This shows that the presence of attractive interactions between diffusing particles will result in a slowdown of diffusion of molecules.

To clarify the method, consider a hypothetical protein of hydrodynamic radii Inline graphic nm. The DCs of this protein Inline graphic (calculated from sdVRC) is approximately equal to Inline graphic. The time required by the protein to visit every place in the cell volume [for E. coli V Inline graphic (Kubitschek, 1990)] is approximately equal to Inline graphicInline graphic. Now suppose that the protein binds to a plasmid whose molecular weight equals to 10 000 kDa, the DC of the plasmid is of the order of Inline graphic Inline graphic. Suppose further that the protein spends one-tenth of the time diffusing freely Inline graphic, and the remaining 90% of time Inline graphic as a complex with the plasmid (Inline graphic). The effective DCs of the complexes Inline graphic, defined as Inline graphic, and under assumption that Inline graphic, will be nearly an order of magnitude lower than the predicted one (Inline graphic):Inline graphic. According to the above analysis, we can assume that any deviation of experimentally measured DC from the proposed sdVRC will result from intermolecular interactions such as specific or non-specific binding.

3.7 Diffusion in the cytoplasm and the diffusion in organelles of eukaryotes

Prokaryotic cells are characterized by small sizes [volume of E. coli is approximately V Inline graphic (Kubitschek, 1990)]. Measurements of diffusion in the cytoplasm of these cells are performed for the entire volume of the cytoplasm. Thereby, the effective DC measured in these experiments is the value averaged over the entire volume of the cytoplasm. Because the sdVRC was found on the basis of DCs, in the case of E. coli, this curve is also averaged over the entire volume of the cell. At this point, it should be stressed that the sdVRC should not be used to describe diffusion on the cell membrane due to structural differences between membrane and cytoplasm, and the two-dimensional nature of such diffusion.

Small sizes of prokaryotic cell also affect the long-time behaviour of diffusing objects. This is known as confined diffusion (Ochab-Marcinek and Holyst, 2011). Nevertheless, from the normal, three-dimensional DCs (short time diffusion), one can draw constructive conclusions. For example, English et al. (2011) on the basis of short-time diffusion measurements have characterized the catalytic cycle of RelA protein.

Eukaryotic cells are much larger than bacteria. Therefore, measurements of diffusion in these cells are easier and can be performed in the individual organelles [e.g. nucleus (Pederson, 2000)]. In previous work, we showed that it is possible to construct a reference curve for the cytoplasm of mammalian HeLa and Swiss 3T3 cells (Kalwarczyk, et al., 2011). However, based on comparison of the results obtained by Lukacs et al. (2000) for the cytoplasm and the nucleus of HeLa cancer cell, we expect that the sdVRC determined for each cellular organelle is different. Furthermore, as sdVRC depends on the structure of the environment where diffusion occurs, it should be unique for a given cell or even organelle.

4 CONCLUSION

The method presented above has a high predictive power. Although, so far a large error of the method (40% for proteins), the experimentally measured DCs coincide remarkably well with the predicted DCs (cf. Fig. 4). Therefore, measurements of DCs of several inert probes in a single cell type allow to determine the DCs of thousands of proteins and other (macro)molecules. Correctly designed experiment would require involvement of different experimental techniques (NMR, FRAP, FCS, particle tracking) and the use of probes in a wide range of sizes. One needs to know the DC of a given probe in water and/or the hydrodynamic radius of this probe. Additionally for the same probe, measurements of diffusion in cytoplasm of the cell should be performed. Sizes of selected probes should be uniformly distributed along the logarithmic scale of sizes. We showed that only 20 measurements are required to predict the cytoplasmic DC of the typical protein with 20% accuracy.

Analysis of the sdVRC allows to determine the characteristic length scales Inline graphic and Inline graphic, and the DC of any (macro)molecule in the cytoplasm. For the cytoplasm of E. coli, we found that Inline graphic is surprisingly well correlated with the average radius of the DNA loops forming the nucleoid. This suggests that the nucloeid is the main crowding agent (responsible for the macroscopic viscosity) in the cytoplasm of E. coli.

Finally, it should be noted that there are no additional requirements (except experimental data) to construct analogous database of DCs in other systems such as the nucleus or mitochondria of eukaryotic cells. We also believe that sdVRC can be easily adopted to calculate other types of mobility, including one-dimensional sliding, velocity of molecular motors, etc., as they all are inversely proportional to the viscosity.

Supplementary Material

Supplementary Data

ACKNOWLEDGEMENTS

The authors would like to thank Prof. Marcin Fialkowski for inspiring discussions. R.H. conceived the study; R.H. directed the project with input from T.K and M.T.; T.K. made data analysis and processing with inputs from R.H and M.T.; T.K. and R.H. wrote the manuscript.

Funding: T.K. thanks the National Science Center for funding the project from the funds granted on the basis of the decision number: DEC1-2011/01/N/ST3/00865, and Foundation for Polish Science for START scholarship. M.T. thanks the Ministry of Science of Poland for support within the Iuventus-Plus program IP2010 052570 (2011). R.H. thanks the National Science Center for funding the project from the funds granted on the basis of the decision number: 2011/02/A/ST3/00143 (Maestro grant).

Conflict of Interest: none declared.

REFERENCES

  1. Apweiler R, et al. Ongoing and future developments at the universal protein resource. Nucleic Acids Res. 2011;39:D214–D219. doi: 10.1093/nar/gkq1020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bakshi S, et al. Superresolution imaging of ribosomes and RNA polymerase in live Escherichia coli cells. Mol. Microbiol. 2012;85:21–38. doi: 10.1111/j.1365-2958.2012.08081.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Blattner F, et al. The complete genome sequence of Escherichia coli k-12. Science. 1997;277:1453–1462. doi: 10.1126/science.277.5331.1453. [DOI] [PubMed] [Google Scholar]
  4. Campbell CS, Mullins RD. In vivo visualization of type ii plasmid segregation: bacterial actin filaments pushing plasmids. J. Cell Biol. 2007;179:1059–1066. doi: 10.1083/jcb.200708206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Capoulade J, et al. Quantitative fluorescence imaging of protein diffusion and interaction in living cells. Nat. Biotechnol. 2011;29:835–842. doi: 10.1038/nbt.1928. [DOI] [PubMed] [Google Scholar]
  6. Cluzel P, et al. An ultrasensitive bacterial motor revealed by monitoring signaling proteins in single cells. Science. 2000;287:1652–1655. doi: 10.1126/science.287.5458.1652. [DOI] [PubMed] [Google Scholar]
  7. Derman AI, et al. Intracellular mobility of plasmid DNA is limited by the para family of partitioning systems. Mol. Microbiol. 2008;67:935–946. doi: 10.1111/j.1365-2958.2007.06066.x. [DOI] [PubMed] [Google Scholar]
  8. Dill KA, et al. Physical limits of cells and proteomes. Proc. Natl. Acad. Sci. USA. 2011;108:17876–17882. doi: 10.1073/pnas.1114477108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Driessen AJM, Nouwen N. Protein translocation across the bacterial cytoplasmic membrane. Annu. Rev. Biochem. 2008;77:643–667. doi: 10.1146/annurev.biochem.77.061606.160747. [DOI] [PubMed] [Google Scholar]
  10. Elf J, et al. Probing transcription factor dynamics at the single-molecule level in a living cell. Science. 2007;316:1191–1194. doi: 10.1126/science.1141967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Elowitz M, et al. Protein mobility in the cytoplasm of Escherichia coli. J. Bacteriol. 1999;181:197–203. doi: 10.1128/jb.181.1.197-203.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. English BP, et al. Single-molecule investigations of the stringent response machinery in living bacterial cells. Proc. Natl. Acad. Sci. USA. 2011;108:E365–E373. doi: 10.1073/pnas.1102255108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Golding I, Cox E. RNA dynamics in live Escherichia coli cells. Proc. Natl Acad. Sci. USA. 2004;101:11310–11315. doi: 10.1073/pnas.0404443101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Hołyst R, et al. Scaling form of viscosity at all length-scales in poly(ethylene glycol) solutions studied by fluorescence correlation spectroscopy and capillary electrophoresis. Phys. Chem. Chem. Phys. 2009;11:9025–9032. doi: 10.1039/b908386c. [DOI] [PubMed] [Google Scholar]
  15. Hou S, et al. Characterization of Caulobacter crescentus ftsz protein using dynamic light scattering. J. Biol. Chem. 2012;287:23878–23886. doi: 10.1074/jbc.M111.309492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Jain E, et al. Infrastructure for the life sciences: design and implementation of the uniprot website. BMC Bioinformatics. 2009;10:136. doi: 10.1186/1471-2105-10-136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Jasnin M, et al. Down to atomic-scale intracellular water dynamics. EMBO Rep. 2008;9:543–547. doi: 10.1038/embor.2008.50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Jennifer L-S, et al. Studying protein dynamics in living cells. Nat. Rev. Mol. Cell Biol. 2001;2:444–456. doi: 10.1038/35073068. [DOI] [PubMed] [Google Scholar]
  19. Kalwarczyk T, et al. Comparative analysis of viscosity of complex liquids and cytoplasm of mammalian cells at the nanoscale. Nano Lett. 2011;11:2157–2163. doi: 10.1021/nl2008218. [DOI] [PubMed] [Google Scholar]
  20. Kim J, et al. Fundamental structural units of the Escherichia coli nucleoid revealed by atomic force microscopy. Nucleic Acids Res. 2004;32:1982–1992. doi: 10.1093/nar/gkh512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Konopka MC, et al. Crowding and confinement effects on protein diffusion in vivo. J. Bacteriol. 2006;188:6115–6123. doi: 10.1128/JB.01982-05. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Kubitschek H. Cell-volume increase in Escherichia coli after shifts to richer media. J. Bacteriol. 1990;172:94–101. doi: 10.1128/jb.172.1.94-101.1990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Kumar M, et al. Mobility of cytoplasmic, membrane, and DNA-binding proteins in Escherichia coli. Biophys. J. 2010;98:552–559. doi: 10.1016/j.bpj.2009.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Li G-W, et al. Effects of macromolecular crowding and DNA looping on gene regulation kinetics. Nat. Phys. 2009;5:294–297. [Google Scholar]
  25. Lukacs G, et al. Size-dependent DNA mobility in cytoplasm and nucleus. J. Biol. Chem. 2000;275:1625–1629. doi: 10.1074/jbc.275.3.1625. [DOI] [PubMed] [Google Scholar]
  26. McGuffee SR, Elcock AH. Diffusion, crowding & protein stability in a dynamic molecular model of the bacterial cytoplasm. PLoS Comput. Biol. 2010;6:e1000694. doi: 10.1371/journal.pcbi.1000694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Mika JT, Poolman B. Macromolecule diffusion and confinement in prokaryotic cells. Curr. Opin. Biotechnol. 2011;22:117–126. doi: 10.1016/j.copbio.2010.09.009. [DOI] [PubMed] [Google Scholar]
  28. Mika JT, et al. Molecular sieving properties of the cytoplasm of Escherichia coli and consequences of osmotic stress. Mol. Microbiol. 2010;77:200–207. doi: 10.1111/j.1365-2958.2010.07201.x. [DOI] [PubMed] [Google Scholar]
  29. Mullineaux C, et al. Diffusion of green fluorescent protein in three cell environments in Escherichia coli. J. Bacteriol. 2006;188:3442–3448. doi: 10.1128/JB.188.10.3442-3448.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Nenninger A, et al. Size dependence of protein diffusion in the cytoplasm of Escherichia coli. J. Bacteriol. 2010;192:4535–4540. doi: 10.1128/JB.00284-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Ochab-Marcinek A, Holyst R. Scale-dependent diffusion of spheres in solutions of flexible and rigid polymers: mean square displacement and autocorrelation function for FCS and DLS measurements. Soft Matter. 2011;7:7366–7374. [Google Scholar]
  32. Pederson T. Diffusional protein transport within the nucleus: a message in the medium. Nat. Cell Biol. 2000;2:E73–E74. doi: 10.1038/35010501. [DOI] [PubMed] [Google Scholar]
  33. Pogliano J. The bacterial cytoskeleton. Curr. Opin. Cell Biol. 2008;20:19–27. doi: 10.1016/j.ceb.2007.12.006. [DOI] [PubMed] [Google Scholar]
  34. Robertson RM, et al. Diffusion of isolated DNA molecules: dependence on length and topology. Proc. Natl. Acad. Sci. USA. 2006;103:7310–7314. doi: 10.1073/pnas.0601903103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Sargent F. The twin-arginine transport system: moving folded proteins across membranes. Biochem. Soc. Trans. 2007;35(Part 5):835–847. doi: 10.1042/BST0350835. (Focus Topic at Life Sciences 2007 Conference, Glasgow, Scotland, July 9–12, 2007) [DOI] [PubMed] [Google Scholar]
  36. Shih Y-L, Rothfield L. The bacterial cytoskeleton. Microbiol. Mol. Biol. Rev. 2006;70:729–754. doi: 10.1128/MMBR.00017-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Slade KM, et al. Quantifying green fluorescent protein diffusion in Escherichia coli by using continuous photobleaching with evanescent illumination. J. Phys. Chem. B. 2009;113:4837–4845. doi: 10.1021/jp810642d. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Szymański J, et al. Movement of proteins in an environment crowded by surfactant micelles: anomalous versus normal diffusion. J. Phys. Chem. B. 2006a;110:7367–7373. doi: 10.1021/jp055626w. [DOI] [PubMed] [Google Scholar]
  39. Szymański J, et al. Diffusion and viscosity in a crowded environment: from nano- to macroscale. J. Phys. Chem. B. 2006b;110:25593–25597. doi: 10.1021/jp0666784. [DOI] [PubMed] [Google Scholar]
  40. Vale R. The molecular motor toolbox for intracellular transport. Cell. 2003;112:467–480. doi: 10.1016/s0092-8674(03)00111-9. [DOI] [PubMed] [Google Scholar]
  41. van den Bogaart G, et al. Protein mobility and diffusive barriers in Escherichia coli: consequences of osmotic stress. Mol. Microbiol. 2007;64:858–871. doi: 10.1111/j.1365-2958.2007.05705.x. [DOI] [PubMed] [Google Scholar]
  42. Vandesande W, Persoons A. The size and shape of macromolecular structures—determination of the radius, the length, and the persistence length of rodlike micelles of dodecyldimethylammonium chloride and bromide. J. Phys. Chem. 1985;89:404–406. [Google Scholar]
  43. Werner A. Predicting translational diffusion of evolutionary conserved RNA structures by the nucleotide number. Nucleic Acids Res. 2011;39:e17. doi: 10.1093/nar/gkq808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Zhao YH, et al. Fast calculation of van der Waals volume as a sum of atomic and bond contributions and its application to drug compounds. J. Org. Chem. 2003;68:7368–7373. doi: 10.1021/jo034808o. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data
supp_bts537_Table_S1.pdf (789.9KB, pdf)

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES