Skip to main content
Biophysical Journal logoLink to Biophysical Journal
. 2016 Jun 7;110(11):2320–2327. doi: 10.1016/j.bpj.2016.04.020

Contact Statistics Highlight Distinct Organizing Principles of Proteins and RNA

Lei Liu 1, Changbong Hyeon 1,
PMCID: PMC4906362  PMID: 27276250

Abstract

Although both RNA and proteins have densely packed native structures, chain organizations of these two biopolymers are fundamentally different. Motivated by the recent discoveries in chromatin folding that interphase chromosomes have territorial organization with signatures pointing to metastability, we analyzed the biomolecular structures deposited in the Protein Data Bank and found that the intrachain contact probabilities, P(s) as a function of the arc length s, decay in power-law ∼sγ over the intermediate range of s, 10 ≲ s ≲ 110. We found that the contact probability scaling exponent is γ ≈ 1.11 for large RNA (N > 110), γ ≈ 1.41 for small-sized RNA (N < 110), and γ ≈ 1.65 for proteins. Given that Gaussian statistics is expected for a fully equilibrated chain in polymer melts, the deviation of γ-value from γ = 1.5 for the subchains of large RNA in the native state suggests that the chain configuration of RNA is not fully equilibrated. It is visually clear that folded structures of large-sized RNA (N ≳ 110) adopt crumpled structures, partitioned into modular multidomains assembled by proximal sequences along the chain, whereas the polypeptide chain of folded proteins looks better mixed with the rest of the structure. Our finding of γ ≈ 1 for large RNA might be an ineluctable consequence of the hierarchical ordering of the secondary to tertiary elements in the folding process.

Introduction

RNA and proteins, under appropriate environmental conditions, adopt three-dimensionally (3D) compact native folds that are essential for a variety of biological functions. Despite general similarities of the folding principles that both biopolymers are made of sequences foldable to a functionally competent structure as an outcome of evolutionary selection (1, 2, 3, 4, 5), the overall shape of the native RNA differs from that of proteins in several aspects. Proteins are in general more compact, globular, and flexible than RNA (6). Such differences may be originated from the distinct nature of the building block. The energy scale of binary interaction that pairs nucleotides is typically greater than that of amino acids. Furthermore, the requirement of charge neutralization (or screening) along the backbone differentiates the foci of RNA dynamics, especially at the early stage of folding (7), from those of proteins.

Spotlighted in the recent studies of chromatin folding exploiting fluorescence in situ hybridization (8, 9) and chromosome conformation capture techniques (10, 11, 12), human chromosomes in the interphase have a territorial organization (9) and the individual chromosome is also partitioned into a number of topologically associated domains (TADs), possibly mediated by proteins such as CTCF and cohesin (13). The contact probability P(s) of two loci separated by the genomic distance s can provide glimpses into the arrangement of the chromatin chain. From the polymer perspective, a test chain in a fully equilibrated homogeneous polymer melt is expected to obey the Gaussian statistics because of the screening of excluded volume interaction (14), thus satisfying P(s)s3/2. It was, however, shown that P(s) of human chromatin in cell nucleus displays P(s)s1.08 at the genomic scales of 1 Mb < s < 10 Mb (11). To account for the origins of the human genome organization and its characteristic scaling exponent γ = 1.08 and patterns of contact map demonstrating TADs, several different models have been put forward, which include the crumpled (fractal) globule (11, 15, 16), random loop (17), strings and binders switch model (18), and confinement-induced glassy dynamics (19).

Besides the overall shape, chain organizations of the native folds of RNA and proteins are in general visually different from each other. Compared with proteins in which α-helices, β-strands, and loops thread through one another to form a native structure, a folded RNA with large N looks more crumpled; a number of secondary structure elements (helices, bulges, loops) forming independently stable modular contact domains are further assembled into a compact 3D structure. Here, borrowing the several statistical measures that have been used to study the genome/chromosome organization inside cell nucleus, we substantiate the fundamental differences between the chain organizations of RNA and proteins in native states and discuss their significance in connection to their folding mechanisms.

Materials and Methods

Calculation of contact probability and extraction of scaling exponent

Using atomic coordinates of RNA and protein from the Protein Data Bank (PDB), we consider that two residues i and j are in contact if the minimum distance between any two heavy atoms of these residues, located at ri and rj, is smaller than a cutoff distance dc (= 4 Å). The contact probability for a biomolecule α with chain length Nα (the number of residues) is thus determined by calculating

Pα(s)=i<jNαδ(|ij|s)Θ(dcmin|rirj|)i<jNαδ(|ij|s), (1)

where Θ(x)=1 for x ≥ 0; otherwise, Θ(x)=0. Two examples of P(s) are given in Fig. 1, B and C. The power-law relation of P(s)sγ is observed over the intermediate scale. We determined the value of γ by fitting P(s) over the range of smin<s<smax. The details of fitting procedure are discussed in the Supporting Material.

Figure 1.

Figure 1

Contact probability scaling exponent, γ, and chain configurations of RNA and proteins. (A) The value γ versus N obtained for 60 individual RNA (data in red) and 324 proteins (data in cyan), whose γ-value is obtained from the power-law fit to P(s) with the correlation coefficient (c.c.) > 0.9 (see Fig. S4 for γ versus N plot with error bars, and Tables S1 and S2 in the Supporting Material for PDB entries used here). The data points for FhuA and GPCRs are included for further discussion, although c.c.<0.9. Histograms of γ, p(γ) for RNA (γ ≈ 1.30 ± 0.44) and proteins (γ = 1.65 ± 0.50) are shown on the right. (B) Representative structures of RNA in the rainbow coloring scheme from 5′ (blue) to 3′ (red), indexed with the number in γ versus N plot. Depicted are the structures of 1) a large subunit of rRNA (PDB: 2O45 (66), γ = 1.11); 2) a small subunit of rRNA (PDB: 2YKR, γ = 1.28) (67); 3) Twort group I ribozyme (PDB: 1Y0Q, γ = 1.24) (68); 4) A-type ribonuclease P (PDB: 1U9S, γ = 0.85) (69); 5) TPP-riboswitch (PDB: 3D2G, γ = 1.29) (70); and 6) tRNA (PDB: 1VTQ, γ = 2.18). P(s), which provides γ-value, is shown for a large subunit of rRNA on the left corner. The scaling exponent (γ) of P(s)sγ is obtained from the fit (dashed line) to the data points in green; the data in gray are excluded from the fit (see the Supporting Material for details of fitting procedure). (C) Protein structures in the rainbow coloring scheme from the N- (blue) to the C-terminus (red). Depicted in (C) are the structures of 1) FhuA (PDB: 1QJQ, γ = 1.49) (71); 2) an actin monomer (PDB: 1J6Z, γ = 1.56) (72); 3) metacaspase (PDB: 4AF8, γ = 1.64) (73); 4) green fluorescent protein (PDB: 1EMA, γ = 1.45) (74); 5) T4 lysozyme (PDB: 2LZM, γ = 1.68) (32, 75); and 6) Chondroitin Sulfate ABC lyase I (PDB: 1HN0, γ = 0.73) (25). P(s) for FhuA is shown on the left corner. (D) The mean contact probabilities, P¯(s), calculated over the RNA and protein structures in the PDB. To see this figure in color, go online.

Mean contact probability

Each structure in the PDB has a different chain size Nα (α=1,2,,Imax). Thus, to consider the nonuniform distribution of chain size in computing the mean contact probability, we calculated the following N-dependent probability averaged over the total number of distinct chain sizes:

P¯(s)=1NmaxNmin+1N=NminNmaxP¯(s|N), (2)

where P¯(s|N)α=1Imaxδ(NαN)Pα(s)/α=1Imaxδ(NαN) is the mean contact probability for the structures with chain size N, and we used the value of Pα(s) only for the range of 4sNα2/3. The value P¯(s) for RNA and proteins are shown in Fig. 1 D. 〈M(s)〉, 〈ns(s)〉, 〈DOP〉, 〈DOS〉, and 〈R(s)〉, were calculated using similar definitions as Eq. 2. A cautionary note is in place. Unlike the contact probability exponent calculated for each macromolecular structure, these mean properties obtained by averaging over each ensemble of proteins and RNA are meant for understanding the general difference between RNA and proteins as two distinct classes of macromolecules.

Results

Power-law exponent γ of contact probability

The contact probability P(s) calculated for individual biopolymers (Eq. 1) exhibit power-law decay over the intermediate range of s, 10sO(102) (the left panel of Fig. 1, B and C). The scaling exponent γ from the fit using P(s)sγ was obtained for each biopolymer (see text and Figs. S1–S4 in the Supporting Material for details, where we discussed the accuracy of obtaining γ and showed the error bar of γ for each macromolecule) and its distributions, p(γ), for RNA, and proteins are contrasted in Fig. 1 A. Proteins have p(γ) broadly distributed from 0.5 to 2.5 centered around γ ≈ 1.5, whereas p(γ) for RNA is sharply peaked at γ ≈ 1.1. No clear correlation is found between γ and the chain length (N) in proteins; however, in RNA while γ-values are broadly distributed at small N, they are sharply centered around γ ≈ 1.1 when N100 (see also Fig. S6).

The distinct scaling exponents, γ ≈ 1.11 for the P(s) of 23S rRNA (P(s) at the left corner of Fig. 1 B) and γ ≈ 1.49 for FhuA (P(s) at the left corner of Fig. 1 C), elicit special attention. The value of γ ≈ 1.0, especially for large-sized RNA arises from their characteristic chain organization: Similar to TADs in chromosomes, proximal sequences along the chain are stabilized by basepairing to form independently stable modular contact domains, consisting of hairpin, bulges, and loops. Further assemblies among these contact domains are achieved by a number of tertiary interactions (base triples, kissing loops, coaxial stackings through ribose zipper, A-minor motif, and metal-ion interactions) (20, 21). The abundance of distal contacts resulting from the hierarchical chain assembly likely contributes to the greater frequency of the long-range contacts, giving rise to γ ≈ 1.11 for 23S rRNA on the scale of 10 ≲ s ≲ 300 (see the next section). The distinct chain organizations of RNA and proteins become more evident when molecules are visualized using rainbow coloring scheme spanning the chain (Fig. 1, B and C). The overall chain topology of 23S rRNA resembles a crumpled globule (22, 23) that retains clearly demarcated contact domains held by distal interdomain contacts. The territorial organization of contact domains made of proximal sequences is highlighted in large-sized RNA structures (see the large and small subunit of rRNA in Fig. 1 B).

In stark contrast to rRNA, typical proteins with γ ≈ 1.5 (indexed with black labels from 1 to 5 in Fig. 1, A and C) retain chain conformations whose subchains look topologically more intermingled with the rest of the structure, lacking visually distinct domains of a similar color. The intermingled chain configurations of native proteins as well as the contact probability scaling exponent γ ≈ 1.5 points to a configuration of equilibrium globule, which is also supported by the same conclusion reached by investigating the loop size distribution of native protein structures (24). Of particular note are the proteins with γ < 1.0, which are found at the outliers of p(γ). For example, γ = 0.73 is for chrondroitin sulfate ABC lyase I (the protein indexed with 6) (25), the chain configuration of which has clearly demarcated contact domains.

Instead of calculating the s-dependent contact probability for individual molecules (Pα(s),α=1,2,Imax), one can also consider ensemble-averaged characteristics of native RNA and protein organizations, P¯(s) (Eq. 2 and Fig. 1 D). The mean contact probability calculated for each ensemble of RNA and proteins exhibits power-law decay P¯(s) with γ ≈ 1.1 for RNA and γ ≈ 1.6 for proteins on the scale of (2030)s100, which helps us in understanding the general difference of structural ensemble between RNA and proteins as two distinct classes of macromolecules.

Cautionary remarks are in place in regard to the power-law scaling of P(s). The characteristic power-law decay behavior of 23S rRNA with γ ≈ 1.1 is only valid for the intermediate range of s. For small s, P(s) decays with a different power-law exponent (see the two panels of P(s) in Fig. 1, B and C). As reported by Lua and Grosberg (22), on local scales both RNA and proteins have a chain organization different from the one on a larger scale, which is also confirmed in our study by the distinct scaling exponent γ ≈ 0.4 for RNA and γ ≈ 1.4 for proteins with s < 20 (Fig. S7). Hence, in the strict sense the chain organizations of both RNA and proteins are not scale-invariant, which is not the case for any real polymer either. Depending on the length scale of interest, a different picture is revealed from real polymer chains. Of note, the new scaling exponent γ = 0.75 recently discovered for chromatin organization at a resolution (10 kb ≲ s ≲ 1 Mb) (26) higher than the previous study (s ≳ 700 kb) (11) implies that the self-similarity found at the intermediate resolution (P(s)s1.08) cannot be extended to the internal structure of contact domain.

Long-range contacts from contact map

Contact maps along with the 3D structure offer a more concrete insight into the distinct chain organization of biopolymers with different γ. For instance, the contact maps of 23S-rRNA (γ = 1.11; Fig. 2 A) and FhuA (γ = 1.49; Fig. 2 B) reveal that 23S rRNA has a greater density of long-range contacts than FhuA. Interestingly, in 23S rRNA the modular contact domains made of sequences, spanning i = 500–1000 (magenta) and 1500–1750 (orange) or between i = 500–1000 (magenta) and 2000–2500 (cyan), form extensive interfaces (Fig. 2 A). In comparison, FhuA has β-barrel structure with the long-range tertiary contacts formed between the subdomain (blue) made of N-terminal sequences (i = 1–150) and β-strands (i = 200–700) surrounding it (Fig. 2 B).

Figure 2.

Figure 2

Analysis of long-range contacts. (A and B) Contact maps of 23S rRNA and FhuA, whose P(s) values are provided in Fig. 1. In FhuA, the long-range contacts (s ≳ 100), enclosed by a yellow box, are formed between the structure made of N-terminal sequences (i = 1–150) and surrounding β-strands forming the barrel. In 23S-rRNA, the locations of the clusters of long-range contacts formed at the interfaces between contact domains are highlighted using different colors on each range of sequences along with the 3D structures. (C) Histogram of the density of long-range contacts calculated for RNA and protein structures in the PDB. To see this figure in color, go online.

To generalize this finding for RNA and proteins, for each structure we calculated the proportion of long-range contacts (ϕ), between any sites i and j, satisfying jismin, as the ratio between the observed number of long-range contacts and the maximum possible number of long-range contacts, i.e., ϕ=NjisminNΘ(dc|rirj|), where Θ(⋅⋅⋅) is the Heaviside step function and the normalization constant N=(Nsmin+1)(Nsmin)/2. The corresponding histograms p(ϕ) for RNA and proteins are shown in Fig. 2 C with smin=30. The finding that RNA has p(ϕ) distributed to larger ϕ-values than proteins indicates that a significant number of tertiary contacts are used for assembling the secondary structure elements abundant in RNA. This result is robust to the variation of smin value.

Inter-subchain interactions and surface roughness

To quantify further the distinct chain organization of RNA and proteins, we borrow analytic tools developed in the studies of chromosome organization (22, 23). The number of contacts, M(s), that a subchain has with the rest of the structure (see Fig. 3 A) (16) scales as M(s)sβ1 for both RNA and proteins, where 〈⋅⋅⋅〉 denotes an average over the chain size frequency (see Materials and Methods). The exponent β1 is different for RNA (β1RNA=0.9) and protein (β1prot=0.6), and 〈M〉 is greater for RNA when s ≳ 40, indicating that RNA has more number of inter-subchain contacts for s ≳ 40. The same conclusion was drawn by computing the roughness of the sub-chain surface (23), which is quantified using ns(s), the number of monomers in a subchain that are in contact with at least one monomer belonging to other subchains (see Fig. 3 B). ns(s)Lsβ2 with β2RNA=0.9>β2prot=0.7, suggesting that RNAs have rougher sub-chain surfaces. The scaling relationships of the inter-subchain interactions (Ms0.9) and the surface monomers (nss0.9) for RNA compare well with those of crumpled globules (M, nss1) (16, 23).

Figure 3.

Figure 3

Chain-size frequency weighted number of inter-subchain interactions M (A) and the number of surface monomers ns (B) as a function of subchain size s in RNA (red) and proteins (blue). To see this figure in color, go online.

The values M(s) and ns(s) are related to each other with M(s)Qns(s), where Qsνd/s is the proportionality constant, the total number of possible monomers (∼sνd) that can fill the volume defined by a blob consisting of s monomers, thus giving a scaling relation β1=νd1+β2 (23). From this relation and β1,2, we obtain the Flory exponent ν = 1/3 for native RNA and ν = 0.3 for proteins, which is in perfect agreement with the values of ν obtained from an independent analysis of macromolecular structures in the PDB, ν ≈ 0.33 for RNA and ν = 0.31 for proteins in RGNν (6).

Degree of interpenetration and segregation

Next, we calculate the fraction of residues from other subchains found in the ellipsoidal volume enclosing a subchain averaged over all subchains of length s, which corresponds to the degree of interpenetration (DOP) (22). The degree of segregation (DOS), DOS=dA,B/(2RGAB), is defined by the ratio between dA,B and (2RGAB), where dA,B is the distance between the center positions of two nonoverlapping subchains A and B, and RGAB is the gyration radius of the union of these two subchains. DOS is defined by the ratio of these two values (dA,B and RGAB) averaged over all the pairs of subchains A and B with the same length s. DOP and DOS as a function of s for both RNA and proteins (Fig. 4) indicate that while subchains separated by a large arc length s are well separated from each other in RNA, the subchains in RNA penetrate the volume of other subchains more deeply than proteins can. This explains why the decline of P(s) for RNA is slower than for proteins (Fig. 2), which leads to a smaller exponent, γ.

Figure 4.

Figure 4

The mean (A) DOP and (B) DOS as a function of sub-chain size s in RNA (red) and in proteins (blue). To see this figure in color, go online.

The number of long-range contacts

The total number of contacts over a given range of ssmin<s<smax is considered with P(s)qsγ: nc(N)=sminsmax(Ns)P(s)dsqsminsmax(Ns)sγds, and hence

nc(N)/qN(smax1γsmin1γ1γ)+(smax2γsmin2γ2γ). (3)

Notably, (1) nc/q scales linearly with N for both RNA and proteins, regardless of γ-value; and (2) the prefactor of nc/q depends only on γ. For smin=30 and smax=100, Eq. 3 leads to ncγ=1.1(N)/q0.81N+46, and ncγ=1.6(N)/q0.11N+6.0.

Meanwhile, from the plots of nc(N) using structures in PDB (see Fig. 5), we obtain

ncRNA(N)/qRNA0.77N+65,ncpro(N)/qpro0.11N+4.7, (4)

where the prefactors qRNA0.48 and qpro4.71 from the fits to P¯(s) in Fig. 1 are used. Note that for a given N, ncγ=1.1(N)>ncγ=1.6(N) and ncRNA(N)>ncpro(N). Together with other quantities, i.e., 〈DOP〉, 〈DOS〉, 〈M(s)〉, and 〈ns(s)〉, the number of contacts, nc(N), calculated here persistently assert that RNA has a greater number of long-range contacts than proteins of the same size.

Figure 5.

Figure 5

Scatter plots of the number of contacts for (A) RNA and (B) proteins over the intermediate range with smin = 30 and smax = 100. To see this figure in color, go online.

It is of note that the analyses presented in Figs. 3, 4, and 5 are different from investigating each macromolecule one by one (Fig. 1) and finding the structure-function relationship. Given that the ensemble in question is the product of evolution, clarifying the difference between two classes of macromolecules (RNA and proteins) is promising as soon as the evolutionary questions are concerned.

Discussion

Due to intramolecular forces stabilizing the chain molecule, both native RNA and protein molecules retain compact and space-filling structures, satisfying RGN1/3 (6, 27), which, from the polymer physics perspective, is regarded as the property of polymers in poor solvent conditions. It is, however, critical to note that the size of a subchain surrounded by other subchains should scale as R(s)s1/2, which is indeed confirmed for the proteins with γ = 1.5 (Fig. S5). According to the “Flory theorem” (14, 28), a test chain in a fully equilibrated homogeneous semidilute or concentrated polymer melt (29), in spherical confinement (30), or even in globule, is expected to obey the Gaussian statistics because of the screening of excluded volume interaction or counterbalance between attraction and repulsion (14), thus satisfying R(s)s1/2 or P(s)s3/2 (see the Supporting Material). The distinct contact probability exponent is highlighted by our analysis that γ ∼ 1.0 for large RNA and γ ∼ 1.5 for small RNA or globular proteins over the intermediate range of 20 ≲ s ≲ 100. Evident from rRNA structure (Fig. 1), subchains of RNA at scales s>20 are assembled into modular contact domains, which are better demarcated in the form of stem-loop helices than proteins, and stitched together through long-range tertiary contacts (Fig. 2 A). The evidence of this characteristic architecture of RNA with multimodular domains is visualized vividly in the form of multiple rupture events in single-molecule pulling experiments of Tetrahymena ribozymes (31), while many proteins display a cooperative and effectively all-or-none unfolding under force (32, 33).

What causes the crumpled structures of large RNA at the scale of 20 ≲ s ≲ 100? Here, the statistical rarity of knots in native RNA (34, 35), which is unparalleled by proteins or DNA (22, 36), is worth noting. In general, knots are unavoidable when a long polymer chain (NNe200, where Ne is the entanglement length (29)) is folded to an equilibrium globule (16, 37). Topological knot-free constraints inherent to the ring polymers, however, have been shown to organize melts of unconcatenated polymer rings or a single long polymer ring into crumpled globules, preventing entanglements (23, 38). Because large RNA molecules, assembled by a number of secondary structural elements (hairpin loops, stems), resemble a collection of small and large rings, it can be surmised during the folding process, the knot-free constraints are effectively imposed. The knot-free constraints are more likely applied for RNA because the energy scale associated with secondary structure elements (εsec), is in general well separated from that of tertiary interactions (εter), such that iεiseckεkterkBT (39), which makes secondary structure elements independently stable. By contrast, to fold, proteins undergo a reptation-like process, after the initial collapse (40), which may take place with ease because secondary structure elements of proteins (α-helix, β-sheet) are only marginally stable relative to the thermal energy. If necessary, these motifs can be reassembled into thermodynamically more stable structures.

While local and remote contacts are mixed in the folding nuclei of proteins, the formation of secondary structures in RNA folding usually precedes the formation of tertiary contacts, so that the folding of RNA is hierarchical (2, 41). Folding under kinetic control produces thermodynamically metastable and kinetically trapped intermediates, which occurs ubiquitously in RNA folding (42), especially in cotranscriptional folding of RNA (43, 44). A decision, made at an early stage of folding, involved with the formation of independently stable secondary structure elements is difficult to reverse, although in a worst-case scenario, cofactors such as metal-ions (45, 46), metabolites (47), and RNA chaperones (48) still can induce a secondary structure rearrangement. Hence, a more proper way to understand conformational dynamics of a large RNA molecule with N ≳ 100 is to consider an ensemble of multiple functional states (49, 50, 51, 52) instead of a thermodynamically driven, unique native state. It is noteworthy that RNA secondary structure prediction algorithms, which use the strategy of searching the minimum free energy structure (53, 54, 55), fail to predict the correct secondary structure when N ≳ 100, and require the comparative sequence analysis or experimental constraints (56, 57). This could be ascribed to the consequence of error accumulated in predicting RNA structures with large N, but it is also suspected that the (free) energy minimization principle cannot be extended to account for the folding process of large RNA. The contact statistics of large RNA, P(s) ∼ s−1, can be used as an additional constraint or guideline for structure prediction.

A situation analogous to the hierarchical folding of large RNA is prevalent in the two-stage membrane protein folding where the insertion of transmembrane α-helices, guided by translocons, is followed by the postinsertion folding (58, 59). We indeed find that the contact probabilities of class A G-protein coupled receptors (GPCRs) give γ ≈ 1 (blue circles in the middle panel of Fig. 1). Because γ ≈ 1 means the chain organization of native GPCRs is not in entropy-maximum state, a thermodynamically guided, spontaneous in vitro refolding of GPCRs into the native form is expected to be nonpermissible. An atomic force microscopy experiment on an α-helical membrane protein, antiporter (N ≈ 380), whose γ-value we find is ≈ 1.1, could not be refolded to the original form after mechanically unfolded (60). However, a recent single-molecule force experiment (61) has shown that GlpG, an α-helical transmembrane protein with N ≈ 270, can reversibly fold in bicelles even after the entire structure including transmembrane helices is disrupted by mechanical forces. Remarkably, we find γ ≈ 1.5 for GlpG. For membrane proteins of known native structures, their γ-values can be used to judge whether or not spontaneous in vitro refolding is possible.

Because the time required for equilibrium sampling of conformations (τeq) increases exponentially with the system size (N) as τeqeN (62), signatures of metastability or nonequilibration in chain conformation could be ubiquitous in a macromolecular structure with large N. Through the statistical analysis of structures in PDB, our study puts forward that these forms of crumpled chain organization with γ ≈ 1 of large native RNA and some classes of proteins are an ineluctable outcome of the folding mechanism under kinetic control.

Our results, based on the structures available in PDB, might be fraught with a possible sample bias because the current structural information available in the PDB is limited, underrepresenting intrinsically disordered proteins or membrane proteins for proteins, and long noncoding intron RNA abundant in the cell for RNA (63, 64). Nevertheless, our general conclusions on the difference in the organization principle between proteins and RNA will still hold even when the database of PDB is further expanded. Especially, we expect that an inclusion of long noncoding RNA structures (N > 200), which should be possible in the near future, will make our conclusions more robust because the hierarchical nature of RNA folding process would become more evident for RNA with larger N and reinforce the territorial (crumpled-like) organization in RNA.

Author Contributions

L.L. and C.H. designed and performed the research, analyzed the data, and wrote the article.

Acknowledgments

We thank the Korea Institute for Advanced Study for providing computing resources (KIAS Center for Advanced Computation, Linux Cluster System) for this work.

Editor: Rohit Pappu.

Footnotes

Supporting Materials and Methods, seven figures, and three tables are available at http://www.biophysj.org/biophysj/supplemental/S0006-3495(16)30215-6.

Supporting Citations

Reference (65) appears in the Supporting Material.

Supporting Material

Document S1. Supporting Materials and Methods, Figs S1–S7, and Tables S1–S3
mmc1.pdf (952.8KB, pdf)
Document S2. Article plus Supporting Material
mmc2.pdf (2.7MB, pdf)

References

  • 1.Schuster P., Fontana W., Hofacker I.L. From sequences to shapes and back: a case study in RNA secondary structures. Proc. Biol. Sci. 1994;255:279–284. doi: 10.1098/rspb.1994.0040. [DOI] [PubMed] [Google Scholar]
  • 2.Tinoco I., Jr., Bustamante C. How RNA folds. J. Mol. Biol. 1999;293:271–281. doi: 10.1006/jmbi.1999.3001. [DOI] [PubMed] [Google Scholar]
  • 3.Thirumalai D., Hyeon C. RNA and protein folding: common themes and variations. Biochemistry. 2005;44:4957–4970. doi: 10.1021/bi047314+. [DOI] [PubMed] [Google Scholar]
  • 4.Chen S.J., Dill K.A. RNA folding energy landscapes. Proc. Natl. Acad. Sci. USA. 2000;97:646–651. doi: 10.1073/pnas.97.2.646. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Morcos F., Schafer N.P., Wolynes P.G. Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection. Proc. Natl. Acad. Sci. USA. 2014;111:12408–12413. doi: 10.1073/pnas.1413575111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Hyeon C., Dima R.I., Thirumalai D. Size, shape, and flexibility of RNA structures. J. Chem. Phys. 2006;125:194905. doi: 10.1063/1.2364190. [DOI] [PubMed] [Google Scholar]
  • 7.Thirumalai D., Lee N., Klimov D. Early events in RNA folding. Annu. Rev. Phys. Chem. 2001;52:751–762. doi: 10.1146/annurev.physchem.52.1.751. [DOI] [PubMed] [Google Scholar]
  • 8.Langer-Safer P.R., Levine M., Ward D.C. Immunological method for mapping genes on Drosophila polytene chromosomes. Proc. Natl. Acad. Sci. USA. 1982;79:4381–4385. doi: 10.1073/pnas.79.14.4381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Cremer T., Cremer C. Chromosome territories, nuclear architecture and gene regulation in mammalian cells. Nat. Rev. Genet. 2001;2:292–301. doi: 10.1038/35066075. [DOI] [PubMed] [Google Scholar]
  • 10.Dekker J., Rippe K., Kleckner N. Capturing chromosome conformation. Science. 2002;295:1306–1311. doi: 10.1126/science.1067799. [DOI] [PubMed] [Google Scholar]
  • 11.Lieberman-Aiden E., van Berkum N.L., Dekker J. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Dekker J., Marti-Renom M.A., Mirny L.A. Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nat. Rev. Genet. 2013;14:390–403. doi: 10.1038/nrg3454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zuin J., Dixon J.R., Wendt K.S. Cohesin and CTCF differentially affect chromatin architecture and gene expression in human cells. Proc. Natl. Acad. Sci. USA. 2014;111:996–1001. doi: 10.1073/pnas.1317788111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Grosberg A.Y., Khokhlov A.R. AIP Press; New York: 1994. Statistical Physics of Macromolecules. [Google Scholar]
  • 15.Grosberg A., Nechaev S., Shakhnovich E. The role of topological constraints in the kinetics of collapse of macromolecules. J. Phys. 1988;49:2095–2100. [Google Scholar]
  • 16.Mirny L.A. The fractal globule as a model of chromatin architecture in the cell. Chromosome Res. 2011;19:37–51. doi: 10.1007/s10577-010-9177-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bohn M., Heermann D.W., van Driel R. Random loop model for long polymers. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 2007;76:051805. doi: 10.1103/PhysRevE.76.051805. [DOI] [PubMed] [Google Scholar]
  • 18.Barbieri M., Chotalia M., Nicodemi M. Complexity of chromatin folding is captured by the strings and binders switch model. Proc. Natl. Acad. Sci. USA. 2012;109:16173–16178. doi: 10.1073/pnas.1204799109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kang H., Yoon Y.-G., Hyeon C. Confinement-induced glassy dynamics in a model for chromosome organization. Phys. Rev. Lett. 2015;115:198102. doi: 10.1103/PhysRevLett.115.198102. [DOI] [PubMed] [Google Scholar]
  • 20.Batey R.T., Rambo R.P., Doudna J.A. Tertiary motifs in RNA structure and folding. Angew. Chem. Int. Ed. Engl. 1999;38:2326–2343. doi: 10.1002/(sici)1521-3773(19990816)38:16<2326::aid-anie2326>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
  • 21.Nissen P., Ippolito J.A., Steitz T.A. RNA tertiary interactions in the large ribosomal subunit: the A-minor motif. Proc. Natl. Acad. Sci. USA. 2001;98:4899–4903. doi: 10.1073/pnas.081082398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lua R.C., Grosberg A.Y. Statistics of knots, geometry of conformations, and evolution of proteins. PLOS Comput. Biol. 2006;2:e45. doi: 10.1371/journal.pcbi.0020045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Halverson J.D., Smrek J., Grosberg A.Y. From a melt of rings to chromosome territories: the role of topological constraints in genome folding. Rep. Prog. Phys. 2014;77:022601. doi: 10.1088/0034-4885/77/2/022601. [DOI] [PubMed] [Google Scholar]
  • 24.Berezovsky I.N., Grosberg A.Y., Trifonov E.N. Closed loops of nearly standard size: common basic element of protein structure. FEBS Lett. 2000;466:283–286. doi: 10.1016/s0014-5793(00)01091-7. [DOI] [PubMed] [Google Scholar]
  • 25.Huang W., Lunin V.V., Cygler M. Crystal structure of Proteus vulgaris chondroitin sulfate ABC lyase I at 1.9Å resolution. J. Mol. Biol. 2003;328:623–634. doi: 10.1016/s0022-2836(03)00345-0. [DOI] [PubMed] [Google Scholar]
  • 26.Sanborn A.L., Rao S.S., Aiden E.L. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc. Natl. Acad. Sci. USA. 2015;112:E6456–E6465. doi: 10.1073/pnas.1518552112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Flory P.J. Interscience Publishers; New York: 1969. Statistical Mechanics of Chain Molecules. [Google Scholar]
  • 28.Flory P.J. The configuration of real polymer chains. J. Chem. Phys. 1949;17:303–310. [Google Scholar]
  • 29.de Gennes P.G. Cornell University Press; Ithaca, NJ: 1979. Scaling Concepts in Polymer Physics. [Google Scholar]
  • 30.Cacciuto A., Luijten E. Self-avoiding flexible polymers under spherical confinement. Nano Lett. 2006;6:901–905. doi: 10.1021/nl052351n. [DOI] [PubMed] [Google Scholar]
  • 31.Onoa B., Dumont S., Bustamante C. Identifying kinetic barriers to mechanical unfolding of the T. thermophila ribozyme. Science. 2003;299:1892–1895. doi: 10.1126/science.1081338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Shank E.A., Cecconi C., Bustamante C. The folding cooperativity of a protein is controlled by its chain topology. Nature. 2010;465:637–640. doi: 10.1038/nature09021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Mickler M., Dima R.I., Rief M. Revealing the bifurcation in the unfolding pathways of GFP by using single-molecule experiments and simulations. Proc. Natl. Acad. Sci. USA. 2007;104:20268–20273. doi: 10.1073/pnas.0705458104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Micheletti C., Di Stefano M., Orland H. Absence of knots in known RNA structures. Proc. Natl. Acad. Sci. USA. 2015;112:2052–2057. doi: 10.1073/pnas.1418445112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Burton A.S., Di Stefano M., Micheletti C. The elusive quest for RNA knots. RNA Biol. 2016;13:134–139. doi: 10.1080/15476286.2015.1132069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Noel J.K., Sułkowska J.I., Onuchic J.N. Slipknotting upon native-like loop formation in a trefoil knot protein. Proc. Natl. Acad. Sci. USA. 2010;107:15403–15408. doi: 10.1073/pnas.1009522107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Grosberg A.Y. Critical exponents for random knots. Phys. Rev. Lett. 2000;85:3858–3861. doi: 10.1103/PhysRevLett.85.3858. [DOI] [PubMed] [Google Scholar]
  • 38.Imakaev M.V., Tchourine K.M., Mirny L.A. Effects of topological constraints on globular polymers. Soft Matter. 2015;11:665–671. doi: 10.1039/c4sm02099e. [DOI] [PubMed] [Google Scholar]
  • 39.Thirumalai D., Hyeon C. Non-Protein Coding RNAs. Springer; New York: 2008. Theory of RNA folding: from hairpins to ribozymes. [Google Scholar]
  • 40.Thirumalai D. From minimal models to real proteins: time scales for protein folding kinetics. J. Phys. I (Fr.) 1995;5:1457–1467. [Google Scholar]
  • 41.Greenleaf W.J., Frieda K.L., Block S.M. Direct observation of hierarchical folding in single riboswitch aptamers. Science. 2008;319:630–633. doi: 10.1126/science.1151298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Treiber D.K., Williamson J.R. Beyond kinetic traps in RNA folding. Curr. Opin. Struct. Biol. 2001;11:309–314. doi: 10.1016/s0959-440x(00)00206-2. [DOI] [PubMed] [Google Scholar]
  • 43.Repsilber D., Wiese S., Steger G. Formation of metastable RNA structures by sequential folding during transcription: time-resolved structural analysis of potato spindle tuber viroid (−)-stranded RNA by temperature-gradient gel electrophoresis. RNA. 1999;5:574–584. doi: 10.1017/s1355838299982018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Lutz B., Faber M., Schug A. Differences between cotranscriptional and free riboswitch folding. Nucleic Acids Res. 2014;42:2687–2696. doi: 10.1093/nar/gkt1213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Wu M., Tinoco I., Jr. RNA folding causes secondary structure rearrangement. Proc. Natl. Acad. Sci. USA. 1998;95:11555–11560. doi: 10.1073/pnas.95.20.11555. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Koculi E., Cho S.S., Woodson S.A. Folding path of P5abc RNA involves direct coupling of secondary and tertiary structures. Nucleic Acids Res. 2012;40:8011–8020. doi: 10.1093/nar/gks468. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Montange R.K., Batey R.T. Riboswitches: emerging themes in RNA structure and function. Annu. Rev. Biophys. 2008;37:117–133. doi: 10.1146/annurev.biophys.37.032807.130000. [DOI] [PubMed] [Google Scholar]
  • 48.Russell R., Jarmoskaite I., Lambowitz A.M. Toward a molecular understanding of RNA remodeling by DEAD-box proteins. RNA Biol. 2013;10:44–55. doi: 10.4161/rna.22210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Al-Hashimi H.M., Walter N.G. RNA dynamics: it is about time. Curr. Opin. Struct. Biol. 2008;18:321–329. doi: 10.1016/j.sbi.2008.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Solomatin S.V., Greenfeld M., Herschlag D. Multiple native states reveal persistent ruggedness of an RNA folding landscape. Nature. 2010;463:681–684. doi: 10.1038/nature08717. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Hyeon C., Lee J., Thirumalai D. Hidden complexity in the isomerization dynamics of Holliday junctions. Nat. Chem. 2012;4:907–914. doi: 10.1038/nchem.1463. [DOI] [PubMed] [Google Scholar]
  • 52.Hyeon C., Hinczewski M., Thirumalai D. Evidence of disorder in biological molecules from single molecule pulling experiments. Phys. Rev. Lett. 2014;112:138101. doi: 10.1103/PhysRevLett.112.138101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Rivas E., Eddy S.R. A dynamic programming algorithm for RNA structure prediction including pseudoknots. J. Mol. Biol. 1999;285:2053–2068. doi: 10.1006/jmbi.1998.2436. [DOI] [PubMed] [Google Scholar]
  • 54.Hofacker I.L. Vienna RNA secondary structure server. Nucleic Acids Res. 2003;31:3429–3431. doi: 10.1093/nar/gkg599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003;31:3406–3415. doi: 10.1093/nar/gkg595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Gutell R.R., Lee J.C., Cannone J.J. The accuracy of ribosomal RNA comparative structure models. Curr. Opin. Struct. Biol. 2002;12:301–310. doi: 10.1016/s0959-440x(02)00339-1. [DOI] [PubMed] [Google Scholar]
  • 57.Mathews D.H., Sabina J., Turner D.H. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 1999;288:911–940. doi: 10.1006/jmbi.1999.2700. [DOI] [PubMed] [Google Scholar]
  • 58.Popot J.-L., Gerchman S.-E., Engelman D.M. Refolding of bacteriorhodopsin in lipid bilayers. A thermodynamically controlled two-stage process. J. Mol. Biol. 1987;198:655–676. doi: 10.1016/0022-2836(87)90208-7. [DOI] [PubMed] [Google Scholar]
  • 59.Bowie J.U. Solving the membrane protein folding problem. Nature. 2005;438:581–589. doi: 10.1038/nature04395. [DOI] [PubMed] [Google Scholar]
  • 60.Kedrov A., Ziegler C., Müller D.J. Controlled unfolding and refolding of a single sodium-proton antiporter using atomic force microscopy. J. Mol. Biol. 2004;340:1143–1152. doi: 10.1016/j.jmb.2004.05.026. [DOI] [PubMed] [Google Scholar]
  • 61.Min D., Jefferson R.E., Yoon T.-Y. Mapping the energy landscape for second-stage folding of a single membrane protein. Nat. Chem. Biol. 2015;11:981–987. doi: 10.1038/nchembio.1939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Palmer R. Broken ergodicity. Adv. Phys. 1982;31:669–735. [Google Scholar]
  • 63.Rinn J.L., Chang H.Y. Genome regulation by long noncoding RNAs. Annu. Rev. Biochem. 2012;81:145–166. doi: 10.1146/annurev-biochem-051410-092902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Carninci P., Kasukawa T., Hayashizaki Y. The transcriptional landscape of the mammalian genome. Science. 2005;309:1559–1563. doi: 10.1126/science.1112014. [DOI] [PubMed] [Google Scholar]
  • 65.Friedman B., O’Shaughnessy B. Short time behavior and universal relations in polymer cyclization. J. Phys. II. 1991;1:471–486. [Google Scholar]
  • 66.Pyetan E., Baram D., Yonath A. Chemical parameters influencing fine-tuning in the binding of macrolide antibiotics to the ribosomal tunnel. Pure Appl. Chem. 2007;79:955–968. [Google Scholar]
  • 67.Guo Q., Yuan Y., Gao N. Structural basis for the function of a small GTPase RsgA on the 30S ribosomal subunit maturation revealed by cryoelectron microscopy. Proc. Natl. Acad. Sci. USA. 2011;108:13100–13105. doi: 10.1073/pnas.1104645108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Golden B.L., Kim H., Chase E. Crystal structure of a phage Twort group I ribozyme-product complex. Nat. Struct. Mol. Biol. 2005;12:82–89. doi: 10.1038/nsmb868. [DOI] [PubMed] [Google Scholar]
  • 69.Krasilnikov A.S., Xiao Y., Mondragón A. Basis for structural diversity in homologous RNAs. Science. 2004;306:104–107. doi: 10.1126/science.1101489. [DOI] [PubMed] [Google Scholar]
  • 70.Thore S., Frick C., Ban N. Structural basis of thiamine pyrophosphate analogues binding to the eukaryotic riboswitch. J. Am. Chem. Soc. 2008;130:8116–8117. doi: 10.1021/ja801708e. [DOI] [PubMed] [Google Scholar]
  • 71.Ferguson A.D., Braun V., Welte W. Crystal structure of the antibiotic albomycin in complex with the outer membrane transporter FhuA. Protein Sci. 2000;9:956–963. doi: 10.1110/ps.9.5.956. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Otterbein L.R., Graceffa P., Dominguez R. The crystal structure of uncomplexed actin in the ADP state. Science. 2001;293:708–711. doi: 10.1126/science.1059700. [DOI] [PubMed] [Google Scholar]
  • 73.McLuskey K., Rudolf J., Mottram J.C. Crystal structure of a Trypanosoma brucei metacaspase. Proc. Natl. Acad. Sci. USA. 2012;109:7469–7474. doi: 10.1073/pnas.1200885109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Ormö M., Cubitt A.B., Remington S.J. Crystal structure of the Aequorea victoria green fluorescent protein. Science. 1996;273:1392–1395. doi: 10.1126/science.273.5280.1392. [DOI] [PubMed] [Google Scholar]
  • 75.Weaver L.H., Matthews B.W. Structure of bacteriophage T4 lysozyme refined at 1.7 Å resolution. J. Mol. Biol. 1987;193:189–199. doi: 10.1016/0022-2836(87)90636-x. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Supporting Materials and Methods, Figs S1–S7, and Tables S1–S3
mmc1.pdf (952.8KB, pdf)
Document S2. Article plus Supporting Material
mmc2.pdf (2.7MB, pdf)

Articles from Biophysical Journal are provided here courtesy of The Biophysical Society

RESOURCES