Pairing statistics and melting of random DNA oligomers: Finding your partner in superdiverse environments

Simone Di Leo; Stefano Marni; Carlos A Plata; Tommaso P Fraccia; Gregory P Smith; Amos Maritan; Samir Suweis; Tommaso Bellini

doi:10.1371/journal.pcbi.1010051

. 2022 Apr 11;18(4):e1010051. doi: 10.1371/journal.pcbi.1010051

Pairing statistics and melting of random DNA oligomers: Finding your partner in superdiverse environments

Simone Di Leo ^1,^#, Stefano Marni ^1,^#, Carlos A Plata ^2,^#, Tommaso P Fraccia ³, Gregory P Smith ⁴, Amos Maritan ⁵, Samir Suweis ⁵, Tommaso Bellini ^1,^*

Editor: Eugene I Shakhnovich⁶

PMCID: PMC9022813 PMID: 35404933

Abstract

Understanding of the pairing statistics in solutions populated by a large number of distinct solute species with mutual interactions is a challenging topic, relevant in modeling the complexity of real biological systems. Here we describe, both experimentally and theoretically, the formation of duplexes in a solution of random-sequence DNA (rsDNA) oligomers of length L = 8, 12, 20 nucleotides. rsDNA solutions are formed by 4^L distinct molecular species, leading to a variety of pairing motifs that depend on sequence complementarity and range from strongly bound, fully paired defectless helices to weakly interacting mismatched duplexes. Experiments and theory coherently combine revealing a hybridization statistics characterized by a prevalence of partially defected duplexes, with a distribution of type and number of pairing errors that depends on temperature. We find that despite the enormous multitude of inter-strand interactions, defectless duplexes are formed, involving a fraction up to 15% of the rsDNA chains at the lowest temperatures. Experiments and theory are limited here to equilibrium conditions.

Author summary

Several biological processes require that specific partner molecules succeed in binding after negotiating their way through a huge number of interactions with other molecules. How such molecular recognition emerges among millions distinct molecular species is an open problem. We have studied, both experimentally and theoretically, such process of “molecular recognition” in pools of highly diverse random DNA oligomers, which binds preferentially, but not exclusively, to its perfect complementary sequence. We find a complex behavior, in which some perfect pairing takes place with a non-trivial temperature dependence that we understand thorough statistical mechanics modelling. The pairing pattern of short random DNA is relevant in the context of the origin of life since the so-called “RNA World” was most probably based on the mutual recognition of random chains.

Introduction

One of the defining features of biomolecules is the specificity and selectivity of their mutual interactions. Selectivity is at the heart of virtually all biological processes, including cell signalling, immune response, genetic transmission and regulation of gene expression. These processes are based on the presence of partner molecules that succeed in finding and docking to each other after negotiating their way through a huge number of collisions and interactions with other molecules, some of which exert attraction [1]. Indeed, in all actual cases, specific biomolecular interactions take place in “superdiverse” environments, i.e. in contexts of enormous variety of molecular species. Succeeding in pairing to the target is thus depending not only on the binding strength between partner molecules, but on the whole network of pair interactions between concurring molecular species, possible cooperativity, concentration and degeneracy. While the complexity of interactions in the contexts of biomolecular crowding is universally acknowledged [2, 3], the thermodynamics and statistical physics of large pools of distinct interacting molecules have been discussed only in the frame of phase transitions [4, 5], and molecular models of such systems have not been presented yet.

DNA oligonucleotides mixtures are one of the pre-eminent systems to experimentally recreate the conditions described above in a controlled way, given the natural selectivity of base pairing [6] and the possibility to artificially synthesize large ensembles of distinct DNA sequences with controlled distributions [7]. Herein, we consider systems formed by aqueous solutions of DNA oligonucleotides of length L in which the four bases (Cytosine, C; Guanine, G; Adenine, A; Thymine, T) are present with equal probability in each position in the sequence as sketched in Fig 1a. Thus, in each of these random-sequence DNA oligomers (rsDNA) solutions, 4^L distinct molecules can be found with approximately the same probability. We consider solutions of rsDNA oligomers with L = 8, 12 and 20 (8N, 12N and 20N), corresponding to mixtures of ≈ 7 ⋅ 10⁴, 2 ⋅ 10⁷ and 10¹² distinct sequences, respectively; in these systems we study interaction selectivity, defect distribution and equilibration properties as a function of the temperature.

Fig 1 — (a): Solutions of random-sequence DNA (rsDNA) oligomers of length L are mixtures made of 4^L distinct molecules, obtained by all the combinations of the four nucleobases, which are present at any position in the sequence with equal probability. (b): Each rsDNA oligomer can interact with 4^L different rsDNA oligomers, leading to a 4^L × 4^L interaction matrix. Each dot in the matrix represents the most energetically favorable pairing between the two selected rsDNA oligomers, among all the possible mutual shifts. (c) Each position in the interaction matrix corresponds to a specific duplex motif, characterized by pairing errors which are here described by the parameters in α. The most probable duplex in the matrix is highly defected, as the last example in the panel.

Because of its relevance, the hybridization of nucleic acids has been a topic of continuous investigation since their identification, which led to well-established tools and models to calculate free energy, melting temperature and secondary structure directly from the DNA (or RNA) sequences involved [8–11]. In particular, the thermodynamics of duplex formation is commonly evaluated in the frame of the so-called “Nearest Neighbor” (NN) model, in which the hybridization free energy is obtained by the summation of elemental contributions. These are extracted from database of melting temperatures of DNA oligomers and effectively take into account both WC and non-canonical pairings [8]. However, available models are accurate only in the case of pools of low numbers of distinct sequences [12], and do not have the capability of predicting the behavior of complex systems such as the rsDNA solutions considered here.

In a rsDNA solution, upon colliding, pairs of rsDNA molecules bind to each other with a strength and resultant stability that is mainly determined by the level of complementarity of the base sequences according to the Watson-Crick (WC) pairing rules. This results in an interaction matrix, sketched in Fig 1b, where each position represents the most stable duplex conformation for a given pair. The most probable outcome of a random collision between two rsDNA molecules is a quite unstable pair, with a small number of consequent paired bases (3 in case of L = 12, as in the bottom sketch of Fig 1c). However, even if less probable, more stable structures are formed, ranging from full complementarity, a condition that yields defectless helical duplexes (Fig 1c, top sketch), to duplexes with pairing mismatches of various types, listed in Fig 1c. To each duplex, with any pattern of defects, corresponds a binding free energy that can be computed from the sequences involved with standard tools. At the opposite end of the spectrum, there are rsDNA strands with no complementarity at all, as between an oligo made of only Ts and one made of only Cs.

In rsDNA at fixed L, the variety of binding possibilities increases with the number of allowed mismatches, while the binding energy decreases. As shown here, these two factors nearly compensate, leading to non-trivial competition between interaction strength and degeneracy. In a previous study of rsDNA [13], it was observed that, when L > 12, rsDNA solutions self-organize into liquid crystal phases. Given the mechanism by which these phases are formed in solutions of DNA oligomers [14, 15], this finding suggests that hybridization of rsDNA leads to duplexes with fairly well-paired terminals. This feature of rsDNA solutions remained speculative, with no experimental or statistical support.

Besides its value as a platform for exploring hybridization in crowded nucleic acid environments and for describing the network of interactions in superdiverse mixtures, the study of rsDNA is also relevant to the evaluation of scenarios for the origin of life. Indeed, if the RNA world hypothesis is correct, such a state had to be anticipated by a condition in which RNA oligomers were abiotically synthesized with a large degree of randomness, from which ribozyme sequences could have been subsequently selected [16, 17]. Whether such molecular mixtures could form WC pairs or whether hybridization was instead prevented by the large variety of species is an information that can shape the RNA world model itself, clarifying the role of complementarity and duplex formation in the prebiotic environment. [18].

In this paper, we study rsDNA solutions by a combination of three complementary strategies: (i) measurement of the overall degree of hybridization by UV absorbance as a function of temperature (T); (ii) measurements of the degree of hybridization and thermal stability of pairs of mutually complementary sequences mixed with rsDNA by fluorescence Contact-Quenching (CQ); (iii) development of a theoretical framework, based on a re-parametrization of the NN thermodynamic parameters, enabling quantitative predictions on rsDNA hybridization and pairing error statistics.

Materials and methods

Description of the system

8N, 12N and 20N rsDNA oligomers were synthesized on solid phase by an Äkta Oligopilot. The products were purified via dialysis against a 25mM NaCl solution and lyophilized. The samples were characterized by HPLC (see S2 Text). An analogous previous synthesis was characterized by MALDI-TOF mass spectroscopy [13]. Stock aqueous solutions were prepared at c_rsDNA ≈ 50g/l, from which the final samples concentrations were obtained by dilution: ranging from c_rsDNA = 0.04g/l to c_rsDNA = 25g/l and with ionic strengths of c_NaCl = 0.15M, c_NaCl = 0.45M and c_NaCl = 1.0M.

The statistics of the pairing quality have been explored by mixing rsDNA with a tagged pair of complementary DNA strands. Specifically, we have used the 8-base-long couple 8A* and 8B* and the 12-base-long couple 12A* and 12B*, modified by 5’-Texas Red (A* oligomers) and 3’-6-FAM (Fluorescein) moieties (B* oligomers), respectively, so that upon hybridization the two fluorophores come in contact (see Fig G in S2 Text). Specific sequences are as follows. 8A*: TexasRed-5’-ACAGTCCT-3’. 8B*: 5’-AGGACTGT-3’-FAM. 12A*: TexasRed-5’-ACGACAGTCCTG-3’. 12B*: 5’-CAGGACTGTCGT-3’-FAM. 8A*, 8B*, 12A* and 12B* were purchased from IDT.

Using UV hyperchromicity to detect ensemble rsDNA melting

The overall degree of hybridization in rsDNA was evaluated by measuring the absorbance A at the wavelength λ = 258nm. A is obtained by averaging over an interval Δλ = 3nm. Experiments were performed with the Evolution 300 UV-Vis spectrophotometer from Thermo Scientific customized with a Quantum Northwest peltier hot/cold stage with hold temperature accuracy of ±0.05°C. Experiments have been performed with 1^o C/min heating and cooling rate. The cell holder was capable of hosting two different types of cell: standard quartz cuvettes with optical path length ℓ = 1cm, adequate to investigate DNA solutions with c ≈ 0.02 − 0.04g/l; microfluidic cells with ℓ = 10μm from Starna Scientific Ltd to investigate low volumes (< 50μl) of more concentrated DNA solutions (c ≈ 20 − 35g/l) (see S2 Text). Absorbance data as function of temperature, A(T), were treated according to standard protocols [19] (see S2 Text) to extract the “ensemble” melting curve θ_e(T), i.e. the fraction of rsDNA oligomers forming duplexes (of any quality) at temperature T. The melting temperature T_m is defined by θ_e(T_m) = 1/2.

Contact-quenching detects the pairing of specific sequences

Fluorescence-based measurements were used to detect how frequently a specific sequence is able to find its exact complementary strand in the midst of the rsDNA solution. This was done by mixing 8A* and 8B*, in equal amount in a 8N solution, and similarly with 12A* and 12B* in 12N. Although the pair of fluorophores Texas Red and FAM were originally chosen to obtain FRET signal, we found that the dominant effect signaling their interaction is the so called “Contact Quenching” (CQ), i.e. the drop in fluorescence quantum yield of both fluorophores when the two fluorophores are in close proximity [20] (see S2 Text). In the case considered here, the quenching is deep (about 80% reduction for Texas Red and 50% for FAM) and can be easily exploited to extract the fraction θ_AB of A* oligomers that forms a defectless duplex with its complementary partner B*.

To extract θ_AB(T) we monitored the quenching of the Texas Red emission, which is known to have a small T dependence [21]. Fluorescence emission vs. T was measured using the Applied Biosystems QuantStudio 5, a Real-Time PCR Instrument by Thermo Fisher Scientific. Calibration and normalization procedures to extract θ_AB from raw data are provided in S2 Text. They also include a thermodynamic characterization of the 8A*-8B* and 12A*-12B* duplexes, since their binding free energy ΔG_A*B* is slightly modified with respect to their untagged analogs because of the stabilizing effect of the fluorophores at the terminals. CQ experiments were performed by mixing fixed concentrations of both A* and B*, c_fluo = 100nM, with rsDNA solutions prepared at 0.04g/l < c_rsDNA < 25g/l, and ionic strengths c_NaCl = 0.15M and c_NaCl = 1.0M. Measurements were performed in 15mM TRIS HCl at a pH of 7.4, to minimize fluorescence drift due to pH sensitivity of FAM [21].

In rsDNA solutions, each specific sequence is present at a concentration c_rsDNA/4^L. The stoichiometric ratio of each added fluorescent sequence (A* or B*) and the same non-fluorescent sequence already present in the solution of rsDNA is:

\begin{matrix} ϕ = \frac{c_{f l u o}}{c_{r s D N A} / 4^{L}} . \end{matrix}

(1)

When ϕ = 1, i.e. the amount of fluorescently labeled sequence equals that of the same sequence without tag, θ_AB offer an approximate evaluation of the degree in which errorless pairing is present within rsDNA solutions. At the same time, the measurement of the ϕ dependence of θ_AB(ϕ) enables a detailed comparison with theoretical predictions.

Experimental results

Ensemble melting of rsDNA

Hyperchromicity in ultraviolet enables accessing the overall degree of hybridization in rsDNA solutions. Fig 2, green symbols, shows θ_e(T) for 12N, c_rsDNA = 0.04g/l, c_NaCl = 1M, which we compare, as reference, with the melting curves of binary mixtures of complementary DNA 12mers computed, with standard approaches, at two concentrations: c_DNA = 0.02g/l (blue dashed line) and c_DNA = 0.04/4¹² g/l (red dashed line), both at c_NaCl = 1M. As visible, θ_e exhibits a behavior intermediate between the two. rsDNA duplexes are way more unstable (of ≈ 30°C) than duplexes of complementary strands at equal total concentration, a clear manifestation of the selectivity of rsDNA pairing. At the same time, rsDNA duplexes appears about 5°C more thermally stable than the 12mer complementary duplexes when solubilized at the same concentration at which they are present in the rsDNA solution, an indication that the formation of defected pairing is a relevant feature of the hybridization of rsDNA, as also suggested by the milder slope of θ_e(T).

In Fig 3a, θ_e(T) measured for 12N and 20N at c_rsDNA ≈ 0.04g/l are shown for three different salt concentrations c_NaCl. Since T_m decreases with L but increases with c_rsDNA, to obtain reliable θ_e(T) for 8N, we performed melting experiments at a larger concentration, c_rsDNA ≈ 25g/l, shown in Fig 3b. We find in all conditions θ_e(T) to depend on T more mildly than in typical melting curves in binary solutions of complementary strands. Fig 3 also shows that T_m of rsDNA grows with c_NaCl, with L and with c_rsDNA, as it appears by comparing panels (a) and (b), in agreement with DNA melting in less complex systems [8, 22].

Experimental data shown here were taken at equilibrium. Attaining this condition is not trivial, since the lifetime of DNA duplexes dramatically depends on the length of the oligomers and on the concentration of the solutions [23, 24]. To approach equilibrium of 20N in dilute conditions, we considered only θ_e(T) measured upon heating after a long equilibration time at low T. Similar attention had to be given to the behavior of concentrated 12N solutions, as mentioned in the next Section. Further information on equilibrium conditions is provided in the S1 Text. The non-equilibrium behavior of rsDNA will be the topic of a future work.

Probability of defectless duplexes

The study of θ_e(T) hints at hybridization in rsDNA as a combination of selectivity with a certain degree of defects in the duplex formation. However, θ_e(T) does not offer much insight on how probable it is to find duplexes with a certain pairing quality. Aiming at this kind of information, CQ experiments in solutions of 8N and 12N were performed. We did not perform analogous measurement for the 20N both because of the artifacts due to non-equilibrium pairing and because of the small accessible range of ϕ (4²⁰ is a large number!).

Fig 4 shows the fraction of hybridized 8A* and 8B* in 8N, θ_A*B*(T), for various ϕ. In the case of 8N we could reach ϕ = 1, corresponding to c_rsDNA = 16g/l. At ϕ = 1 (red dots) the fraction of paired A*B* reaches about 30% at low T. As ϕ increases, the fraction of A*B* duplexes increases, as expected. The dependence of θ_A*B*(T) on ϕ is a useful tool to test our theoretical model, as discussed below.

Similar experiments, shown in S1 Text, have been carried on for 12A* and 12B* in 12N, where however in some conditions (largest c_rsDNA) we cannot reach equilibrium.

Theoretical framework

Although the combination of the measured θ_e(T) and θ_A*B*(T) offers important insight on the quality of pairing within the rsDNA system, a deeper understanding of the driving mechanisms in this superdiverse environment requires a statistical model able to take into account the balance between binding energy and degeneracy. Indeed, the strongest binding energy is achieved in defectless duplexes, which are only formed with a fraction 1/4^L of the total number of sequences. On the contrary, defected duplexes are more weakly bound, but they can be assembled with a larger variety of sequences, i.e. the weaker the binding energy, generally, the larger its degeneracy.

The hybridization in simple systems formed by a limited number of sequences is well described by the current thermodynamic approaches, such as the NN model. Despite their accuracy, there is yet no theoretical frame to apply this knowledge to systems with high complexity such as the rsDNA, where more than 4^L × 4^L interactions are involved. Explicitly computing the free energy for all pairs involved would be too computationally expensive. We thus develop a mean-field-type theoretical approach that uses averages of the thermodynamic parameters of the NN model and their re-parametrization on a simple counting of defects. With this approach we can compute θ_e(T) and θ_CQ(T) at equilibrium with no free parameters. Hence, a direct comparison between theory and experiments is achieved.

Ensemble melting of rsDNA

We consider a rsDNA mixture containing N chains in a volume V (with a total concentration c = N/V). We assume the mixture to be perfectly balanced (see S2 Text), that is, each sequence is present through N/4^L copies. We also assume on-off hybridization, with no intermediate state between unbound and paired, which is justified given the limited length (L ≤ 20) of the oligomers here considered [11]. Thus, a given couple (i and j) interact with a set of 2L − 1 distinct binding free energies $Δ G_{i j}^{(α_{s})}$ , depending on their mutual alignment: we use the shift parameter α_s with −(L − 1) ≤ α_s ≤ L − 1 to express such alignment. Specifically, α_s = 0 stands for the condition of perfect alignment, i.e. the 3’ terminal base of strand i is aligned with the 5’ terminal base of strand j and vice versa. Positive or negative α_s indicate an overhang on the 3’ or 5’ terminal, respectively.

We define the Boltzmann factor

\begin{matrix} ζ_{i j}^{(α_{s})} \equiv [c] exp (- β Δ G_{i j}^{(α_{s})}), \end{matrix}

(2)

where the brackets around the concentration denotes that it is measured in mol/L ([c] = c/(mol/L)), and β = (k_BT)⁻¹ with k_B being the Boltzmann constant. In real mixtures, some duplexes are very unlikely, with large ΔG and thus small ζ.

In this setting, we can formally define, using the canonical distribution, the probability of any given hybridization state and the related partition function, which contains all the relevant statistical information of the system (see S3 Text). However, the partition function of the rsDNA system in the thermodynamic limit (N → ∞) is not in a closed form. Consequently, an exact derivation of the fraction of paired oligomers θ_e in such a limit is not simple to obtain. This is thoroughly discussed in the S3 Text, where two opposite limits (high and low temperatures) are carried out allowing the derivation of analytical expressions to bypass this obstacle. Furthermore, an Unification Ansatz that matches the two approximations in their range of validity has been worked out. This theoretical approach allows us to compute $θ_{e}^{(i)} (T)$ , the approximated melting curve for the oligomers with sequence i in the midst of all the 4^L species in the rsDNA solution (see S3 Text for details on the derivation).

\begin{matrix} θ_{e}^{(i)} = 1 - \frac{2}{1 + \sqrt{1 + \frac{4}{4^{L}} \sum_{j = 1}^{4^{L}} \sum_{α_{s} = - (L - 1)}^{L - 1} ζ_{i j}^{(α_{s})}}} . \end{matrix}

(3)

This analytical expression has the same structure of the melting curve as a solution of self-complementary sequences (Eq 18a in Ref. [25]), with two relevant differences: the factor 1/4^L normalizing the concentration (the concentration of any specific sequence is c/4^L), and the double summation of the pairing weight of i: for all possible partner sequence j, and for all the possible shifts.

Since the explicit computation of all (2L − 1) × 4^L × 4^L binding energies is prohibitive, and since our aim is to provide a statistical insight on the qualities of the double helices, we introduce a parametrization of the duplex quality, defining the pairing errors vector

\begin{matrix} α \equiv (α_{s}, α_{e 1}, α_{e 2}, α_{i}), \end{matrix}

(4)

which describes the number and type of pairing errors in the rsDNA duplexes. Besides the shift parameter α_s, α includes the number of consecutive base mismatches at the two duplex terminals (α_e1, α_e2) and the number α_i of mismatches inside the duplex, as sketched in Fig 1c. This parametrization is useful to compute the degeneracy g(L, α), i.e., the number of sequences of length L that can form a duplex characterized by α with a given reference sequence:

\begin{matrix} g (L, α) = 4^{| α_{s} |} 3^{(α_{e 1} + α_{e 2} + α_{i})} (\binom{L - 2 - | α_{s} | - α_{e 1} - α_{e 2}}{α_{i}}) . \end{matrix}

(5)

Let us note that the exponential factors with base 3 and 4 indicate the number of different nucleobases that can occupy a site in the dangling end of the shift or lead to an internal or external mismatch, respectively, whereas the binomial coefficient accounts for all possible dispositions of the α_i mismatches in the internal region of the duplex.

The same set of parameters forming α is used to evaluate the free energy of the pairings. The hybridization free energy is commonly computed on the base of the “Nearest Neighbor” approach [26], by which the binding enthalpy and entropy are obtained as a summation of quartets of neighboring nucleobases, their values depending on the specific bases that form it [8]. We propose a parametrization of the NN thermodynamics obtained by averaging over the quartets that allows us to compute $Δ G_{f_{C G}}^{(α)}$ , an approximated value of $Δ G_{i j}^{(α_{s})}$ based, not on detailed knowledge of the involved sequences but, just on α and the fraction f_CG of C or G bases in the sequence i. The reader can find in see S3 Text the computation of the average energetic parameters from the literature ones [8] and the salt correction [22]. An estimation of the error introduced in this simplification is shown in Fig 5, in which we show the melting curve of a solution of two 8mers with perfect complementarity. Therein, the melting curve computed using $Δ G_{f_{C G}}^{(α)}$ with f_CG = 0.5 and α = 0 (green line) is compared to the family of melting curves, computed with the traditional NN protocol, corresponding to a set of specific sequences with f_CG = 0.5 (dashed pink line and shadow). Our energetic description approximates the average melting curve with standard SantaLucia protocol with low discrepancy ΔT < 1°C.

Fig 5 — The green solid line stands for the melting curve predicted for a pair of complementary strands when using the energy obtained from our parametrization, with α = 0 and f_CG = 0.5. The purple dashed-line and related shading are, respectively, the mean value and the standard deviation computed using the NN model over a set of 40 melting curves of distinct DNA sequences with fixed f_CG = 0.5. The dotted lines are the melting curve $θ_{e}^{(f_{C G})}$ of 8N with different values of f_CG (Eq (6)), as specified in the colorbar. The solid dark red line is the ensemble melting curve θ_e, obtained by the average of $θ_{e}^{(f_{C G})}$ over the possible values of f_CG (as given by Eq (7)).

By inserting these parametrizations in Eq (3), the melting curve $θ_{e}^{(i)}$ can be generalized to the melting curve of all the sequences in the rsDNA with a certain CG content:

\begin{matrix} θ_{e}^{(f_{C G})} = 1 - \frac{2}{1 + \sqrt{1 + \frac{4}{4^{L}} \sum_{α} g (L, α) ζ_{f_{C G}}^{(α)}}}, \end{matrix}

(6)

where the product $g (L, α) ζ_{f_{C G}}^{(α)}$ expresses the statistical weight of the pairings of a sequence with CG presence of f_CG with all the sequences leading to the formation of a duplex with quality α. Note that ∑_α refers to summation over all the possible pairs yielding α. Fig 5 shows the set of $θ_{e}^{(f_{C G})}$ with f_CG ranging from 0 to 1 in the case of 8N.

The ensemble melting is obtained by the average of $θ_{e}^{(f_{C G})}$ over the possible values of f_CG, weighted with the probability p_L(f_CG) of having a sequence with f_CG in the rsDNA:

\begin{matrix} \begin{matrix} θ_{e} & = \sum_{f_{C G}} p_{L} (f_{C G}) θ_{e}^{f_{(C G)}} \\ = 1 - \frac{1}{2^{L}} \sum_{f_{C G}} (\binom{L}{f_{C G} \cdot L}) \\ \times \frac{2}{1 + \sqrt{1 + \frac{4}{4^{L}} \sum_{α} g (L, α) ζ_{f_{C G}}^{(α)}}} . \end{matrix} \end{matrix}

(7)

The summation is over all possible fractions, that is, the product f_CG ⋅ L sweeps the integer numbers from 0 to L. Fig 5 shows θ_e (red line) for 8N. Two features are clearly noticeable: the ensemble T_m predicted for rsDNA is much lower than those of complementary strands at the same concentration and ionic strength (dashed line), in agreement with experimental observations; the T dependence of θ_e is milder, reflecting the variety of energies involved in the various duplexes that can be formed in rsDNA solutions. The choice of splitting the statistical summations in two steps enlightens the relevance of both averaging the free energy for fixed f_CG and of averaging the melting curves over f_CG. While the latter is apparent upon inspecting Fig 5, further insight on the former can be obtained by Fig A in S3 Text, where the melting curve inclusive of all summations forf_CG = 0.5 is compared with simplified approaches, which are found to differ both in T_m and in the shape of the melting curve.

Probability of defectless and defected duplexes

Neat rsDNA solutions

By using the parametrization introduced above, it is possible to evaluate the fraction $θ_{α}^{(f_{C G})}$ of strands involved in duplexes having any specific type of pairing α, for sequences with f_CG. This is done by weighting the total fraction of paired oligomers $θ_{e}^{(f_{C G})}$ with the statistical weight of the specific defect class, i.e.,

\begin{matrix} θ_{α}^{(f_{C G})} = \frac{g (L, α) ζ_{f_{C G}}^{(α)}}{\sum_{α^{'}} g (L, α^{'}) ζ_{f_{C G}}^{(α^{'})}} θ_{e}^{(f_{C G})} . \end{matrix}

(8)

The fraction θ_α of duplexes having a given α in the whole rsDNA solution is the averaged $θ_{α}^{(f_{C G})}$ , weighted with the probability p_L(f_CG),

\begin{matrix} \begin{matrix} θ_{α} & = \sum_{f_{C G}} p_{L} (f_{C G}) θ_{α}^{(f_{C G})} = \\ = \frac{1}{2^{L}} \sum_{f_{C G}} (\binom{L}{f_{C G} \cdot L}) \cdot \frac{g (L, α) ζ_{f_{C G}}^{(α)}}{\sum_{α^{'}} g (L, α^{'}) ζ_{f_{C G}}^{(α^{'})}} θ_{e}^{(f_{C G})} \end{matrix} \end{matrix}

(9)

Having access to θ_α, we can rank the defects based on their probability. This is shown in Fig 6a where we plot the six most frequent errors in 12N as a function of T. Perfect (defectless) duplexes are not dominant, but also not negligible, exceeding 10% at the lowest T considered. The most frequent form of error is at the terminals as a result of their reduced energy cost compared to internal mismatches.

By suitable summation of θ_α, it is possible to determine the fraction of duplexes having a total of defects |α| ≡ |α_s|+ α_e1 + α_e2+ α_i. The resulting θ_|α| are shown in Fig 6b. We find the fraction of duplexes with |α| = 1 to be dominant, while the fraction of duplexes with more than 2 errors becomes negligible at low T. It is also interesting to notice that θ_|α| computed for the three considered values of L converges to the same value at low T. This phenomenon can be understood as a consequence of mainly two reasons. First, g(L, α) does not depend on L when only dangling ends and external mismatches, the dominant form of pairing errors, are present (see Eq (5)). Second, internal mismatches are instead nearly negligible at low temperatures in this range of L (Fig 6a). An analytic derivation of this low T limit is reported in the S3 Text.

Modeling the contact quenching experiments

Fig 6 shows the fraction of duplexes with a certain quality α among all the possible kinds of duplexes forming the rsDNA mixture. However, in CQ experiments, the rsDNA solution is enriched by the presence of the labelled strands A* and B* at a concentration expressed by the ratio ϕ. The fraction of A*B* pairs at a given ϕ can be predicted through an adequate extension of the model,

\begin{matrix} θ_{A^{*} B^{*}} = \frac{ϕ ζ_{A^{*} B^{*}}}{ϕ ζ_{A^{*} B^{*}} + \sum_{α} g (L, α) ζ_{f_{C G}}^{(α)}} θ_{e}^{(C Q)}, \end{matrix}

(10)

where ζ_A*B* = [c] exp(−βΔG_A*B*) is the Boltzmann weight of the A*B* couple, f_CG is the fraction of C or G bases in the A*B* sequences and $θ_{e}^{(C Q)}$ is the fraction of paired A* or B*, either to each other or to any other oligomer of the rsDNA solution, that can be computed generalizing Eq (6) for $θ_{e}^{(f_{C G})}$ ,

\begin{matrix} θ_{e}^{(C Q)} = 1 - \frac{2}{1 + \sqrt{1 + \frac{4}{4^{L}} [ϕ ζ_{A^{*} B^{*}} + \sum_{α} g (L, α) ζ_{f_{C G}}^{(α)}]}} . \end{matrix}

(11)

Further details on the pairing energies are given in the S2 Text. It can be easily seen that, in the limit of ϕ → ∞, $θ_{e}^{(C Q)}$ correctly converges to the melting curve of a couple of complementary strands in equal concentration.

Discussion

The model we developed here provides theoretical predictions directly comparable to experimental observations for both the ensemble melting and the formation of defectless duplexes in rsDNA solutions. In Fig 3, the theoretically computed θ_e(T) (dashed lines) are compared to the measured θ_e(T) (symbols). The agreement is very good, especially if we take into account that the model has no free parameters. Fig 7 compares the experimental melting temperatures, T_m, to its theoretical prediction as a function of the parameters here considered: L, c_rsDNA, c_NaCl. The average deviation between predicted and observed T_m is within 1°C, or within 0.5°C if we exclude the data point of 12N at c_NaCl = 0.15M, in which T_m is the lowest and thus more difficult to determine because of the narrow T interval available to define the low-T baseline (see blue open circles in Fig 3a). The difference between computed and measured T_m is comparable with the typical errors in predicting T_m for any given specific sequence with standard thermodynamic approach [22].

Fig 7 — Comparison of measured and predicted T_m for rsDNA solutions. Symbols: experimental T_m obtained from the melting curves of 8N, 12N and 20N (Fig 3), as function of salt, c_NaCl. Conditions are specified in the legend. Dashed lines: theoretical predictions of Eq (7).

The key result of the model is the non-trivial T dependence of the statistics for pairing quality in rsDNA solutions, which is codified into θ_α(T), the fraction of duplexes with quality α, as given by Eq (9) and shown in Fig 6. θ_e(T) reflects the complexity of the interactions in such superdiverse environment, but only as an ensemble feature. A much more stringent test is provided by the comparison between CQ experiments, i.e., the measured and predicted θ_A*B*(T, ϕ). In Fig 8 we compare the CQ data already shown in Fig 4 for 8N at c_NaCl = 0.15M with the predicted θ_A*B*(T) for several values of ϕ. The agreement is good, compatible with the range of uncertainty of both experiment and model, the latter being a consequence of the uncertainty on the free energy associated to the 8A*8B* duplex formation.

Fig 8 — Fraction of paired 8A*8B* in 8N, θ_A*B*, as determined via CQ experiments (dots and light shading, as in Fig 4) and from the model (dashed lines and dark shading), at c_NaCl = 0.15M and for several ϕ values (colors, see legend). Black data, lines and shadings: melting in 8A*+8B* solutions, in the absence of rsDNA. Shaded regions of the theoretical predictions are obtained from the experimental uncertainty on the pairing energy between A* and B** (see S2 Text).

In Fig 9 we show measured and predicted θ_A*B* as a function ϕ for 8N at T ≈ 15°C at two ionic strengths, c_NaCl = 0.15M and c_NaCl = 1M. Noticeably, the ionic strength has little effect on θ_A*B*(T, ϕ), much less than on T_m. This is because the salt contribution to the statistical weights is similar for the most probable pairings. Since at low T the pairing probability is approximately the ratio between statistical weights, the salt contribution effectively cancels (see S3 Text).

The success of our model—which has no free parameter—in describing rsDNA solutions might be surprising, given the level of approximation introduced in the energy parametrization on α. Indeed, the molecular diversity of rsDNA makes its behavior intrinsically averaged between all possible pairings patterns, rendering our parametrization, built on averages, particularly adequate. We would also like to point out that the model validity is limited to equilibrium conditions, as we documented above and in the S1 Text. Also, the model assumes unlimited molecular availability, and does not include the effects of competitive binding which could arise from constrained stoichiometric ratios in limited pools of molecules. The successful comparison with experiments indicates that rsDNA with short enough chain length satisfies all these requirements. In systems with longer molecules, out-of-equilibrium conditions and hybridization states with more than one helical region could become relevant [11].

The agreement with observations also validates the predicted pairing distributions θ_α(T) and θ_|α|(T) shown in Fig 6, which are worth discussing further. These distributions indicate that, in a superdiverse environment of random sequence oligonucleotides, the selectivity afforded by the free energy of base-pairing is “marginal”. Specifically, the resulting pairing is good, but not perfect, with the majority of sequences being defected, but with less than two pairing errors. Defectless pairing involves at most a fraction of ≈ 14% of the rsDNA strands.

The pairing statistics of rsDNA largely depends on the compensation between two opposing factors: (i) the degeneracy, g(L, α), which grows in a nearly exponential way with the total number of pairing errors (see Eq (5)), thus favoring the formation of defected duplexes; and (ii) the binding free energy, ΔG, which increases approximately linearly with the number of defects (see S3 Text), yielding a Boltzmann factor with an exponential advantage to duplexes with less defects (see Eq (2)). In DNA duplexes formation, these two factors nearly balance, with a partial dominance of the energetic component, a condition leading to the smooth α and T dependence of the probability distributions. The result would change if the degeneracy grows faster or slower than an exponential with the binding energy. An example of this latter condition is given by the selectivity of PCR primers within the genome, in which the set of competing bindings is limited with respect to the random situation, thus enabling a strong dominance of defectless primer binding.

An obvious question is how critical is this marginal condition, and whether modifications in the nucleobase structure, and thus of pairing and stacking energy, could significantly improve selectivity. To answer this question we computed θ_α(T) by assuming that all pairing energies equal that of CG (which is roughly double of that of AT). We find a significant, but not dramatic, increment in θ₀, that at low T reaches 0.3 (see Fig D in S1 Text), indicating that the basic features of θ_α(T) are stable within the range of energies involved in natural and artificial nucleobase binding [7]. As a further test of the pairing efficiency in random systems, we computed θ₀(T) by considering random systems formed by only 2 bases, instead of the four natural ones, using energetic parameters intermediate between AT and CG (see Fig E in S1 Text). Even in this condition, where the degeneracy of the defected duplexes is strongly reduced, the fraction of perfect pairs is larger but still well under 0.4. When, on the contrary, the number of bases are increased (still assuming a WC-type pairing role), θ₀ markedly decreases. These observations strengthen the notion that, in the range of pairing energies of nucleic acids, the presence of randomness appears to be the dominant factor in determining the quality of the pairs.

These observations sets a reference for the selectivity in contexts of strong randomness and heterogeneity of sequences such as those that have likely characterized the origin of life and RNA world. Whatever the mechanisms of chain amplification and lengthening, and whatever base pair variants were at the time available, they could not have relied on levels of selectivity much better than those reported here. We previously proposed that one of such mechanisms leading to the formation of long nucleic acid chains could exploit the symmetry breaking and formation of molecular column due to liquid crystal ordering [27, 28]. rsDNA can indeed form, in given conditions, columnar liquid crystals [13]. How the distribution of pairing quality here discussed can be compatible with liquid crystal formation appears as a subtle matter that will be the topic of a forthcoming work.

Conclusion

We introduced rsDNA solutions as a model system of superdiverse mixture, enabling the study of interactions and pair formations in the midst of a huge amount of competing molecular species, a condition offering a conceptual paradigm for the molecular variety and selectivity of biological environments. In the analysis of rsDNA solutions, we could take advantage of the limited polymer heterogeneity given by the four nucleobases, of the rather simple and highly characterized pairing rule, of the availability of solid-state synthesis and of the variety of experimental tools.

This combination of factors enabled us to experimentally characterize, and theoretically describe, the selectivity of pairing, and found that the majority of rsDNA duplexes contain pairing errors, but limited to one or two per duplex, a condition that still grants a reliable stability to the structures.

Based on the success in the description of rsDNA, we applied our approach to the description of the selective pairing in the context of the PCR technology and miRNA based gene regulation, which is the topic of a forthcoming publication.

Extending this statistical approach to biomolecular superdiverse systems closer to cell environments, characterized by a less dramatic diversity but by more complex and less defined interactions, will be the challenging development of this work.

Supporting information

S1 Text. Further results.

Evidence of Out-of-Equilibrium Conditions; Perfect pairing probability with different energies and number of nucleobases types. Fig A: Evidence of out-of-equilibrium behavior for 12N. T dependence of the fraction of duplexed strands θ_e measured while heating and cooling at 1°C/min. Fig B: Evidence of out-of-equilibrium behavior for 20N. Absorbance vs. T measured upon heating after one month equilibration at 4°C. Fig C: θ_A*B*, fraction of paired 12A* − 12B* in 12N, determined from the model and via CQ experiments with different cooling rates. Fig D: $θ_{0}^{(f_{C G})}$ computed with different values of f_CG in 8N. Fig E: $θ_{0}^{(n_{b})}$ computed with different values of n_b in 12N.

(PDF)

Click here for additional data file.^{(2.5MB, pdf)}

S2 Text. Materials and methods.

Characterization of rsDNA synthesis; Measurement of rsDNA concentration; UV Absorbance: Experimental Setup; Analysis of UV Absorbance Data; Characterization of A* and B* Fluorescence; Contact-Quenching Data Analysis; Free Energy of A*B* Duplex. Fig A: HPLC traces of 12N compared with two different 12mers. HPLC traces of 8N, 12N and 20N. Fig B: Quartz microfluidic. Quantum Northwest Peltier. Fig C: Temperature calibration of T measured by a thermistor in contact with the microfluidic cell vs. the internal control T_peltier. Fig D: Steps in the analysis of absorbance data A(T) of 12N used to extract the melting curves θ_e. Fig E: Fluorescence Emission Spectra of labeled DNA systems: A*, B*, A*B*, A*B and AB*. Fig F: Absorbance Spectra of labeled DNA systems: A*, B*, A*B*, A*B and AB*. Fig G: Simplified representation of the 12A*B* duplex, showing the relative size of DNA duplex, linker and fluorescent moieties FAM and TexasRed. Fig H: fluorescence intensity of TexasRed vs. T in a solution of 12A* + 12B*. Fig I: Normalized fluorescence intensity of TexasRed in a solution of 12A*, 12B* and 12B. Fig J: Average normalized fluorescence of 8A*+8B*. The fit enables determining the linear drift at low temperatures, corresponding to the signal of fully paired A*B*. Fig K: T_m vs A*B* concentration measured by CQ for the following systems.

(PDF)

Click here for additional data file.^{(11.6MB, pdf)}

S3 Text. Comprehensive Description of the theoretical model.

Partition Function; Melting Curve Approximations for rsDNA solution; Energetic Parametrization based on α; Pairing Statistics with α_i = 0; Effects of the Ionic Strength on the Pairing Statistics.Fig A: Melting Curves of rsDNA, according to the Low T Approximation, High T Approximation and Unification Ansatz.

(PDF)

Click here for additional data file.^{(356.9KB, pdf)}

Acknowledgments

We thank N.A. Clark for useful discussions and E. Toffolo for her precious guidance in using PCR equipment for CQ measurements.

Data Availability

All relevant data are within the manuscript and its Supporting information files.

Funding Statement

S.D.L., S.M. and T.B. acknowledge support from MIUR-PRIN (Grant No. 2017Z55KCW). T.P.F. acknowledges IPGG Laboratoire d’Excellence, “Investissement d’avenir" program ANR-10-IDEX-0001-02 PSL, ANR-10-LABX-31 and ANR-10-EQPX-34. S.S. acknowledges UNIPD grant BIRD209912. C.A.P. acknowledges the support from PGC2018-093998-B-I00 funded by FEDER/Ministerio de Ciencia e Innovaciòn-Agencia Estatal de Investigaciòn (Spain) and from the program PAIDI-DOCTOR by Junta de Andalucìa and European Social Fund. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Phillips R, Kondev J, Theriot J, Garcia HG. Physical biology of the cell. 2nd ed. New York: Garland Science; 2012. [Google Scholar]
2. Ellis RJ. Macromolecular crowding: an important but neglected aspect of the intracellular environment. Curr Opin Struct Biol. 2001;11: 114–119. doi: 10.1016/S0959-440X(00)00239-6 [DOI] [PubMed] [Google Scholar]
3. Politou A, Temussi PA. Revisiting a dogma: the effect of volume exclusion in molecular crowding. Curr Opin Struct Biol. 2015;30: 1–6. doi: 10.1016/j.sbi.2014.10.005 [DOI] [PubMed] [Google Scholar]
4. Jacobs WM, Frenkel D. Phase transitions in biological systems with many components. Biophys J. 2017;112: 683–691. doi: 10.1016/j.bpj.2016.10.043 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Harmon TS, Holehouse AS, Pappu RV. To mix, or to demix, that is the question. Biophys J. 2017;112: 565–567. doi: 10.1016/j.bpj.2016.12.031 [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Calladine CR, Drew H, Luisi B, Travers A. Understanding DNA: The Molecule and it how Works. 2nd ed. San Diego, CA: Elsevier Academic Press; 2004. [Google Scholar]
7. Hoshika S, Leal NA, Kim MJ, Kim MS, Karalkar NB, Kim HJ, et al. Hachimoji DNA and RNA: A genetic system with eight building blocks. Science. 2019;363: 884–887. doi: 10.1126/science.aat0971 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. SantaLucia J Jr, Hicks D. The thermodynamics of DNA structural motifs. Annu Rev Biophys Biomol Struct. 2004;33: 415–440. doi: 10.1146/annurev.biophys.32.110601.141800 [DOI] [PubMed] [Google Scholar]
9. Zadeh JN, Steenberg CD, Bois JS, Wolfe BR, Pierce MB, Khan AR, et al. NUPACK: analysis and design of nucleic acid systems. J Comput Chem. 2011;32: 170–173. doi: 10.1002/jcc.21596 [DOI] [PubMed] [Google Scholar]
10. Owczarzy R, Vallone PM, Gallo FJ, Paner TM, Lane MJ, Benight AS. Predicting sequence-dependent melting stability of short duplex DNA oligomers. Biopolymers. 1997;44: 217–239. doi: 10.1002/(SICI)1097-0282(1997)44:3<217::AID-BIP3>3.0.CO;2-Y [DOI] [PubMed] [Google Scholar]
11. Vologodskii A, Frank-Kamenetskii MD. DNA melting and energetics of the double helix. Phys Life Rev. 2018;25: 1–21. doi: 10.1016/j.plrev.2017.11.012 [DOI] [PubMed] [Google Scholar]
12. Wolfe BR, Porubsky NJ, Zadeh JN,Dirks RM, Pierce NA. Constrained multistate sequence design for nucleic acid reaction pathway engineering. J Am Chem Soc. 2017;139: 3134–3144. doi: 10.1021/jacs.6b12693 [DOI] [PubMed] [Google Scholar]
13. Bellini T, Zanchetta G, Fraccia TP, Cerbino R, Tsai E, Smith GP, et al. Liquid crystal self-assembly of random-sequence DNA oligomers. Proc Natl Acad Sci. 2012;109: 1110–1115. doi: 10.1073/pnas.1117463109 [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Nakata M, Zanchetta G, Chapman BD, Jones CD, Cross JO, Pindak R, et al. End-to-end stacking and liquid crystal condensation of 6–to 20–base pair DNA duplexes. Science. 2007;318: 1276–1279. doi: 10.1126/science.1143826 [DOI] [PubMed] [Google Scholar]
15. Fraccia TP, Smith GP, Bethge L, Zanchetta G, Nava G, Klussmann S, et al. Liquid crystal ordering and isotropic gelation in solutions of four-base-long DNA oligomers. ACS Nano. 2016;10: 8508–8516. doi: 10.1021/acsnano.6b03622 [DOI] [PubMed] [Google Scholar]
16. Bartel DP, Szostak JW. Isolation of new ribozymes from a large pool of random sequences. Science. 1993;261: 1411–1418. doi: 10.1126/science.7690155 [DOI] [PubMed] [Google Scholar]
17. Ekland EH, Szostak JW, Bartel DP. Structurally complex and highly active RNA ligases derived from random RNA sequences. Science. 1995;269: 364–370. doi: 10.1126/science.7618102 [DOI] [PubMed] [Google Scholar]
18. Kudella PW, Tkachenko AV, Salditt A,Maslov S, Braun D. Structured sequences emerge from random pool when replicated by templated ligation. Proc Natl Acad Sci. 2021;118: e2018830118. doi: 10.1073/pnas.2018830118 [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Owczarzy R. Melting temperatures of nucleic acids: discrepancies in analysis. Biophys Chem. 2005;117: 207–215. doi: 10.1016/j.bpc.2005.05.006 [DOI] [PubMed] [Google Scholar]
20. Marras SAE, Kramer FR, Tyagi S. Efficiencies of fluorescence resonance energy transfer and contact–mediated quenching in oligonucleotide probes. Nucleic Acids Res. 2002;30: e122. doi: 10.1093/nar/gnf121 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. You Y, Tataurov AV, Owczarzy R. Measuring thermodynamic details of DNA hybridization using fluorescence. Biopolymers. 2011;95: 472–486. doi: 10.1002/bip.21615 [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Owczarzy R, You Y, Moreira BG, Manthey JA, Huang L, Behlke MA, et al. Effects of sodium ions on DNA duplex oligomers: improved predictions of melting temperatures. Biochemistry. 2004;43: 3537–3554. doi: 10.1021/bi034621r [DOI] [PubMed] [Google Scholar]
23. Woodside MT, Behnke-Parks WM, Larizadeh K,Travers K, Herschlag D, Block SM. Nanomechanical measurements of the sequence-dependent folding landscapes of single nucleic acid hairpins. Proc Natl Acad Sci. 2006;103: 6190–6195. doi: 10.1073/pnas.0511048103 [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Dupuis NF, Holmstrom ED, Nesbitt DJ. Single-molecule kinetics reveal cation-promoted DNA duplex formation through ordering of single-stranded helices. Biophys. 2013;105: 756–766. doi: 10.1016/j.bpj.2013.05.061 [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Plata CA, Marni S, Maritan A, Bellini T, Suweis S. Statistical physics of DNA hybridization. Phys Rev E. 2021;103: 042503. doi: 10.1103/PhysRevE.103.042503 [DOI] [PubMed] [Google Scholar]
26. Ghosh S, Takahashi S, Endoh T, Tateishi-Karimata H, Hazra S, Sugimoto N. Validation of the nearest-neighbor model for Watson–Crick self-complementary DNA duplexes in molecular crowding condition. Nucleic Acids Res. 2019;47: 3284–3294. doi: 10.1093/nar/gkz071 [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Fraccia TP, Smith GP, Zanchetta G, Paraboschi E, Yi Y, Walba DM, et al. Abiotic ligation of DNA oligomers templated by their liquid crystal ordering. Nat Commun. 2015;6: 6424. doi: 10.1038/ncomms8463 [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Todisco M, Fraccia TP, Zanchetta G, et al. Nonenzymatic polymerization into long linear RNA templated by liquid crystal self-assembly. ACS nano. 2018;12: 9750–9762. doi: 10.1021/acsnano.8b05821 [DOI] [PubMed] [Google Scholar]

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010051.r001

Decision Letter 0

Nir Ben-Tal, Eugene I Shakhnovich

14 Feb 2022

Dear prof Bellini,

Thank you very much for submitting your manuscript "Pairing Statistics and Melting of Random DNA Oligomers: finding your Partner in Superdiverse Environments" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Eugene I. Shakhnovich

Guest Editor

PLOS Computational Biology

Nir Ben-Tal

Deputy Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors present a combined experimental and theoretical thermodynamic study of a solution consisting of all possible L-nucleotide-long DNA oligomers (L being 8, 12 and 20) generated by the random synthesis from a mixture of equal proportion of four nucleotide precursors. Authors’ main conclusion is that, despite the presence of a number of chains that can form mismatched duplexes with the given chain, which could prevent it from finding the complementary partner, the yield of fully matched duplexes at low enough temperatures is pretty high.

Major comments

1. The authors failed to convince this reviewer that a pretty artificial system they consider deserves so much attention. In Introduction, they claim that the system they consider is relevant to the pre-biological evolution consideration within the framework of the RNA World hypothesis. But this claim looks too far-fetched to me.

2. There is no doubt that our current understanding of thermodynamic parameters of the DNA double helix and mismatched pairs is pretty solid so the fact that the authors obtained a good agreement between theory and experiment for their artificial system is not surprising and adds very little, if any, to our understanding the DNA thermodynamics.

Technical comment

Although the authors originally formulate the problem under consideration in very general terms, to perform a theoretical treatment they make a number more or less arbitrary assumptions, which allow them to solve the problem under consideration. For example, to calculate melting curves of fully matched duplexes they start with the nearest neighbor approximation, which takes into account the parameters of heterogeneous stacking obtained from melting experiments with long DNA duplexes, rather than using a simplified version of the melting theory, which considers the duplex stability as a function of only the duplex GC-content. But in the end, they de facto arrive at the simplified version anyway. They would make their reasoning more convincing and straightforward if they relied on a finding by Vologodskii and Frank-Kamenetskii (Phys Life Rev 25, 1-21, 2018) that theoretical models, which take into account heterogeneous stacking, do not yield better predictions of oligonucleotide duplex melting temperatures than the simplest model, which relies only on the GC-content.

Reviewer #2: This is a very well-done paper dealing with the ability of complementary DNA strands to find each other among a plethora of other strands, from completely incompatible to partially compatible, and taking into account also imperfect structural pairing.

Experiments and theory nicely go hand-in-hand, and there its much to say, all in all.

Nonetheless, I want to probe the authors in two directions:

1) the theory works too well, given many approximations. It would be good if the authors could add a section for limitations of the analytical model, and where it could go wrong. This would provide a better view for potential users of the framework about when and where this treatment is appropriate and when they should instead be careful

2) Could the authors think about specific mixtures of DNA sequences designed in such a way to maximise complementarity. For example, what about ensuring a certain minimal Hamming distance (or the like) between sequences to reduce mismatches?

I think these would be two valuable additions both in the direction of acknowledging the limitations of the model, and in terms of proposing how to make the model "useful" beyond the present experiments.

Reviewer #3: The paper by Simone De Leo and co-workers present an analysis of the melting of random DNA oligomers in "Superdiverse Environments". The article is rather technical for the journal; however, it includes detailed supplementary information.

It is well done and shows exceptional agreement between the experimental observations and the theoretical predictions.

However, a deeper discussion of the results in the context of modern DNA in living cells, implications of their findings and possible limitation or strength due to the oligomer lengths would have been more commendable for a larger scientific audience. For instance, a more extensive discussion on the PCR primer selectivity and possible errors in the amplification would be interesting in different applications.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No: From my reading, I did not see any claim of public data/codes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLoS Comput Biol. 2022 Apr 11;18(4):e1010051. doi: 10.1371/journal.pcbi.1010051.r002

Author response to Decision Letter 0

4 Mar 2022

Attachment

Submitted filename: response to reviewers PLOS CB.pdf

Click here for additional data file.^{(149.7KB, pdf)}

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010051.r003

Decision Letter 1

Nir Ben-Tal, Eugene I Shakhnovich

22 Mar 2022

Dear prof Bellini,

We are pleased to inform you that your manuscript 'Pairing Statistics and Melting of Random DNA Oligomers: finding your Partner in Superdiverse Environments' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.

Best regards,

Eugene I. Shakhnovich

Guest Editor

PLOS Computational Biology

Nir Ben-Tal

Deputy Editor

PLOS Computational Biology

***********************************************************

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010051.r004

Acceptance letter

Nir Ben-Tal, Eugene I Shakhnovich

7 Apr 2022

PCOMPBIOL-D-22-00089R1

Pairing Statistics and Melting of Random DNA Oligomers: finding your Partner in Superdiverse Environments

Dear Dr Bellini,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Livia Horvath

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Text. Further results.

(PDF)

Click here for additional data file.^{(2.5MB, pdf)}

S2 Text. Materials and methods.

(PDF)

Click here for additional data file.^{(11.6MB, pdf)}

S3 Text. Comprehensive Description of the theoretical model.

(PDF)

Click here for additional data file.^{(356.9KB, pdf)}

Attachment

Submitted filename: response to reviewers PLOS CB.pdf

Click here for additional data file.^{(149.7KB, pdf)}

Data Availability Statement

All relevant data are within the manuscript and its Supporting information files.

[pcbi.1010051.ref001] 1. Phillips R, Kondev J, Theriot J, Garcia HG. Physical biology of the cell. 2nd ed. New York: Garland Science; 2012. [Google Scholar]

[pcbi.1010051.ref002] 2. Ellis RJ. Macromolecular crowding: an important but neglected aspect of the intracellular environment. Curr Opin Struct Biol. 2001;11: 114–119. doi: 10.1016/S0959-440X(00)00239-6 [DOI] [PubMed] [Google Scholar]

[pcbi.1010051.ref003] 3. Politou A, Temussi PA. Revisiting a dogma: the effect of volume exclusion in molecular crowding. Curr Opin Struct Biol. 2015;30: 1–6. doi: 10.1016/j.sbi.2014.10.005 [DOI] [PubMed] [Google Scholar]

[pcbi.1010051.ref004] 4. Jacobs WM, Frenkel D. Phase transitions in biological systems with many components. Biophys J. 2017;112: 683–691. doi: 10.1016/j.bpj.2016.10.043 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010051.ref005] 5. Harmon TS, Holehouse AS, Pappu RV. To mix, or to demix, that is the question. Biophys J. 2017;112: 565–567. doi: 10.1016/j.bpj.2016.12.031 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010051.ref006] 6. Calladine CR, Drew H, Luisi B, Travers A. Understanding DNA: The Molecule and it how Works. 2nd ed. San Diego, CA: Elsevier Academic Press; 2004. [Google Scholar]

[pcbi.1010051.ref007] 7. Hoshika S, Leal NA, Kim MJ, Kim MS, Karalkar NB, Kim HJ, et al. Hachimoji DNA and RNA: A genetic system with eight building blocks. Science. 2019;363: 884–887. doi: 10.1126/science.aat0971 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010051.ref008] 8. SantaLucia J Jr, Hicks D. The thermodynamics of DNA structural motifs. Annu Rev Biophys Biomol Struct. 2004;33: 415–440. doi: 10.1146/annurev.biophys.32.110601.141800 [DOI] [PubMed] [Google Scholar]

[pcbi.1010051.ref009] 9. Zadeh JN, Steenberg CD, Bois JS, Wolfe BR, Pierce MB, Khan AR, et al. NUPACK: analysis and design of nucleic acid systems. J Comput Chem. 2011;32: 170–173. doi: 10.1002/jcc.21596 [DOI] [PubMed] [Google Scholar]

[pcbi.1010051.ref010] 10. Owczarzy R, Vallone PM, Gallo FJ, Paner TM, Lane MJ, Benight AS. Predicting sequence-dependent melting stability of short duplex DNA oligomers. Biopolymers. 1997;44: 217–239. doi: 10.1002/(SICI)1097-0282(1997)44:3<217::AID-BIP3>3.0.CO;2-Y [DOI] [PubMed] [Google Scholar]

[pcbi.1010051.ref011] 11. Vologodskii A, Frank-Kamenetskii MD. DNA melting and energetics of the double helix. Phys Life Rev. 2018;25: 1–21. doi: 10.1016/j.plrev.2017.11.012 [DOI] [PubMed] [Google Scholar]

[pcbi.1010051.ref012] 12. Wolfe BR, Porubsky NJ, Zadeh JN,Dirks RM, Pierce NA. Constrained multistate sequence design for nucleic acid reaction pathway engineering. J Am Chem Soc. 2017;139: 3134–3144. doi: 10.1021/jacs.6b12693 [DOI] [PubMed] [Google Scholar]

[pcbi.1010051.ref013] 13. Bellini T, Zanchetta G, Fraccia TP, Cerbino R, Tsai E, Smith GP, et al. Liquid crystal self-assembly of random-sequence DNA oligomers. Proc Natl Acad Sci. 2012;109: 1110–1115. doi: 10.1073/pnas.1117463109 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010051.ref014] 14. Nakata M, Zanchetta G, Chapman BD, Jones CD, Cross JO, Pindak R, et al. End-to-end stacking and liquid crystal condensation of 6–to 20–base pair DNA duplexes. Science. 2007;318: 1276–1279. doi: 10.1126/science.1143826 [DOI] [PubMed] [Google Scholar]

[pcbi.1010051.ref015] 15. Fraccia TP, Smith GP, Bethge L, Zanchetta G, Nava G, Klussmann S, et al. Liquid crystal ordering and isotropic gelation in solutions of four-base-long DNA oligomers. ACS Nano. 2016;10: 8508–8516. doi: 10.1021/acsnano.6b03622 [DOI] [PubMed] [Google Scholar]

[pcbi.1010051.ref016] 16. Bartel DP, Szostak JW. Isolation of new ribozymes from a large pool of random sequences. Science. 1993;261: 1411–1418. doi: 10.1126/science.7690155 [DOI] [PubMed] [Google Scholar]

[pcbi.1010051.ref017] 17. Ekland EH, Szostak JW, Bartel DP. Structurally complex and highly active RNA ligases derived from random RNA sequences. Science. 1995;269: 364–370. doi: 10.1126/science.7618102 [DOI] [PubMed] [Google Scholar]

[pcbi.1010051.ref018] 18. Kudella PW, Tkachenko AV, Salditt A,Maslov S, Braun D. Structured sequences emerge from random pool when replicated by templated ligation. Proc Natl Acad Sci. 2021;118: e2018830118. doi: 10.1073/pnas.2018830118 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010051.ref019] 19. Owczarzy R. Melting temperatures of nucleic acids: discrepancies in analysis. Biophys Chem. 2005;117: 207–215. doi: 10.1016/j.bpc.2005.05.006 [DOI] [PubMed] [Google Scholar]

[pcbi.1010051.ref020] 20. Marras SAE, Kramer FR, Tyagi S. Efficiencies of fluorescence resonance energy transfer and contact–mediated quenching in oligonucleotide probes. Nucleic Acids Res. 2002;30: e122. doi: 10.1093/nar/gnf121 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010051.ref021] 21. You Y, Tataurov AV, Owczarzy R. Measuring thermodynamic details of DNA hybridization using fluorescence. Biopolymers. 2011;95: 472–486. doi: 10.1002/bip.21615 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010051.ref022] 22. Owczarzy R, You Y, Moreira BG, Manthey JA, Huang L, Behlke MA, et al. Effects of sodium ions on DNA duplex oligomers: improved predictions of melting temperatures. Biochemistry. 2004;43: 3537–3554. doi: 10.1021/bi034621r [DOI] [PubMed] [Google Scholar]

[pcbi.1010051.ref023] 23. Woodside MT, Behnke-Parks WM, Larizadeh K,Travers K, Herschlag D, Block SM. Nanomechanical measurements of the sequence-dependent folding landscapes of single nucleic acid hairpins. Proc Natl Acad Sci. 2006;103: 6190–6195. doi: 10.1073/pnas.0511048103 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010051.ref024] 24. Dupuis NF, Holmstrom ED, Nesbitt DJ. Single-molecule kinetics reveal cation-promoted DNA duplex formation through ordering of single-stranded helices. Biophys. 2013;105: 756–766. doi: 10.1016/j.bpj.2013.05.061 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010051.ref025] 25. Plata CA, Marni S, Maritan A, Bellini T, Suweis S. Statistical physics of DNA hybridization. Phys Rev E. 2021;103: 042503. doi: 10.1103/PhysRevE.103.042503 [DOI] [PubMed] [Google Scholar]

[pcbi.1010051.ref026] 26. Ghosh S, Takahashi S, Endoh T, Tateishi-Karimata H, Hazra S, Sugimoto N. Validation of the nearest-neighbor model for Watson–Crick self-complementary DNA duplexes in molecular crowding condition. Nucleic Acids Res. 2019;47: 3284–3294. doi: 10.1093/nar/gkz071 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010051.ref027] 27. Fraccia TP, Smith GP, Zanchetta G, Paraboschi E, Yi Y, Walba DM, et al. Abiotic ligation of DNA oligomers templated by their liquid crystal ordering. Nat Commun. 2015;6: 6424. doi: 10.1038/ncomms8463 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010051.ref028] 28. Todisco M, Fraccia TP, Zanchetta G, et al. Nonenzymatic polymerization into long linear RNA templated by liquid crystal self-assembly. ACS nano. 2018;12: 9750–9762. doi: 10.1021/acsnano.8b05821 [DOI] [PubMed] [Google Scholar]

PERMALINK

Pairing statistics and melting of random DNA oligomers: Finding your partner in superdiverse environments

Simone Di Leo

Stefano Marni

Carlos A Plata

Tommaso P Fraccia

Gregory P Smith

Amos Maritan

Samir Suweis

Tommaso Bellini

Roles

Abstract

Author summary

Introduction

Fig 1. Description of the system.

Materials and methods

Description of the system

Using UV hyperchromicity to detect ensemble rsDNA melting

Contact-quenching detects the pairing of specific sequences

Experimental results

Ensemble melting of rsDNA

Fig 2. Double strand vs. random sequence DNA melting.

Fig 3. Ensemble melting of rsDNA.

Probability of defectless duplexes

Fig 4. Probability of defectless duplexes in rsDNA.

Theoretical framework

Ensemble melting of rsDNA

Fig 5. Theoretical predictions of melting curves, computed for DNA 8mers at 25g/L and 1M NaCl.

Probability of defectless and defected duplexes

Neat rsDNA solutions

Fig 6. Pairing statistics in rsDNA.

Modeling the contact quenching experiments

Discussion

Fig 7. Melting temperatures for rsDNA: Theory vs experiment.

Fig 8. Probability of defectless duplexes in rsDNA: Theory vs experiment.

Fig 9. Probability of defectless duplexes in rsDNA: Salt dependence.

Conclusion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Nir Ben-Tal

Eugene I Shakhnovich

Roles

Author response to Decision Letter 0

Decision Letter 1

Nir Ben-Tal

Eugene I Shakhnovich

Roles

Acceptance letter

Nir Ben-Tal

Eugene I Shakhnovich

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases