Significance
The past 15 years have seen a proliferation of experimental techniques aimed at engineering self-assembled structures. These bottom–up techniques rely on specific interactions between components that arise from diverse physical mechanisms such as chemical affinities and shape complementarity attraction. Comparisons of specificity across such diverse systems, each with unique physics and constraints, can be difficult. Here we describe an information theoretic measure, capacity, to quantify specificity in a range of recent experimental systems. Capacity quantifies the maximal amount of information that can be encoded and then resolved by a system of specific interactions, as a function of experimentally tunable parameters. Our framework can be applied to specific interactions of diverse origins, from colloidal experiments to protein interactions.
Keywords: specificity, self-assembly, colloid, mutual information, crosstalk
Abstract
Specific interactions are a hallmark feature of self-assembly and signal-processing systems in both synthetic and biological settings. Specificity between components may arise from a wide variety of physical and chemical mechanisms in diverse contexts, from DNA hybridization to shape-sensitive depletion interactions. Despite this diversity, all systems that rely on interaction specificity operate under the constraint that increasing the number of distinct components inevitably increases off-target binding. Here we introduce “capacity,” the maximal information encodable using specific interactions, to compare specificity across diverse experimental systems and to compute how specificity changes with physical parameters. Using this framework, we find that “shape” coding of interactions has higher capacity than chemical (“color”) coding because the strength of off-target binding is strongly sublinear in binding-site size for shapes while being linear for colors. We also find that different specificity mechanisms, such as shape and color, can be combined in a synergistic manner, giving a capacity greater than the sum of the parts.
Specific interactions between many species of components are the bedrock of biochemical function, allowing signal transduction along complex parallel pathways and self-assembly of multicomponent molecular machines. Inspired by their role in biology, engineered specific interactions have opened up tremendous opportunities in materials synthesis, achieving new morphologies of self-assembled structures with varied and designed functionality. The two major design approaches for programming specific interactions use either chemical specificity or shape complementarity.
Chemical specificity is achieved by dividing binding sites into smaller regions, each of which can be given one of A “colors” or unique chemical identities. Sites bind to each other based on the sum of the interactions between corresponding regions. For example, a recent two-color system paints the flat surfaces of three-dimensional polyhedra with hydrophobic and hydrophilic patterns (1) or with a pattern of solder dots (2), allowing polyhedra to stick to each other based on the registry between their surface patterns. Another popular approach uses DNA hybridization, where specific matching of complementary sequences has been used to self-assemble structures purely from DNA strands (3, 4) and from nanoparticles coated with carefully chosen DNA strands (5–9).
Shape complementarity uses the shapes of the component surfaces to achieve specific binding, even though the adhesion is via a nonspecific, typically short-range potential. In the synthetic context, shape-based modulation of attractive forces over a large dynamic range was first proposed and experimentally demonstrated for colloidal particles (10, 11), using tunable depletion forces (12, 13). Recent experiments have explored the range of possibilities opened up by such ideas, from lithographically designed planar particles (14) with undulating profile patterns to “Pacman” particles with cavities that exactly match smaller complementary particles (15). The number of possible shapes that can be made using these types of methods depends on fabrication constraints but the possibilities can be quite rich (16, 17). Using only nonspecific surface attraction, experiments have achieved numerous and complex morphologies such as clusters, crystals, glasses, and superlattices (10, 18–21).
A further class of programmable specific interactions combines both chemical specificity and shape complementarity. The canonical example is protein-binding interactions (22); the binding interactions between two cognate proteins are specified by their amino acid sequence, which programs binding pockets with complex shape and chemical specificity. Recent efforts (23, 24) aim to rationally design these protein interactions for self-assembly. Because both the shape of the binding pocket and its chemical specificity are determined by the same amino acid sequence, these two features cannot be controlled independently. Other synthetic systems offer the promise of independent control of chemical and shape binding specificity, giving a larger set of possible interactions.
These diverse systems achieve specific interactions through disparate physical mechanisms, with different control parameters for tuning binding specificity. However, they must all solve a common problem (25, 26): create a family of N “lock” and “key” pairs that bind well within pairs but avoid off-target binding across pairs (“crosstalk”). Any crosstalk limits the efficacy of the locks and keys. For example, in the context of DNA-based affinities, although there are unique sequences of length L, the strong off-target binding severely restricts the number that can be productively used. Analogously, for colloidal systems driven by depletion interactions, there can be significant off-target binding due to partial contact. The performance of a system of specific interactions depends acutely on how the system constraints (e.g., number of available bases, fabrication length scale, etc.) limit its ability to avoid crosstalk.
In this paper, we develop a general information theory-based framework for quantitatively analyzing specificity in both natural and synthetic systems. We use a metric based on mutual information to derive a bound on the number of different interacting particles that a system can support before crosstalk overwhelms interaction specificity. Increasing the number of nominally distinct pairs beyond this limit cannot increase the effective number of distinguishable species. We compute this information-theoretic “capacity” for different experimental systems of recent interest, including DNA-based affinities and colloidal experiments in shape complementarity. We show that shape-based coding fundamentally results in lower crosstalk and higher capacity than color-based coding. We also find that shape- and color-based coding can be combined synergistically, giving a superadditive capacity that is greater than the sum of the color and shape parts.
The Capacity of Random Ensembles
We consider systems where every component is designed to interact specifically with a single cognate partner, whereas interactions between “off-target” components are undesirable crosstalk. We assume that N distinct “locks” , ,…, , have unique binding partners, “keys” , ,…, (Fig. 1A). The physics of a particular system determines the binding energy between every lock and key. Assuming equal concentrations of locks and keys in a well-mixed solution, binding between lock and key will occur with probability , where Z is a normalization factor such that (Supporting Information) and is the temperature scale. The mutual information transmitted through binding is defined as
[1] |
where is the marginal distribution of , representing the total probability of seeing in a bound pair [and similarly ]. Mutual information is a global measure of interaction specificity in systems with many distinct species; it quantifies how predictive the identity of a lock is of the identity of a key found bound to it.
Consider a set of interacting lock–key pairs for which for all cognate pairs (strong binding), whereas for crosstalking interactions (weak binding) . We assume are independent and identically distributed random numbers drawn from a distribution of gap energies , with , where the exact form of depends on the physics of the system. Denoting as an average with respect to , one can approximate Eq. 1 as
[2] |
[3] |
(Supporting Information). In a system with crosstalk that contains N nominally distinct lock–key pairs, is the effective number of fully distinguishable lock–key pairs. can be much smaller than N if crosstalk is significant [e.g., if ].
Intuitively, information theory predicts that a system with noncrosstalking lock–key pairs can perform a task with the same effectiveness as a system with N crosstalking species. For example, in the self-assembly of a multicomponent structure, distinct but crosstalking species can take each other’s place, decreasing the effective number of species. This effect has been shown to reduce self-assembly yield (27–29). Similarly, the efficacy of N parallel signaling pathways is known to be reduced by crosstalk (30). In Fig. 1B we show a typical plot of . grows initially with N, but stops growing at , the point of diminishing returns; adding any further species beyond increases only the superficial diversity of species but cannot increase .
Paralleling Shannon’s theory of communication, we define “capacity” C as
[4] |
(Fig. 1B). [Capacity in information theory is often measured in bits per second, whereas here we intentionally use the same units as I. Furthermore, capacity is traditionally defined as a maximum over all possible distributions ; here we restrict to maximizing only over one parameter, N, where all N pairs are randomly chosen from the ensemble (Supporting Information).] The capacity is the largest number of bits of information that can be encoded using a system of specific interactions and still be uniquely resolved by the physics of interactions. Determining C, or equivalently the largest value of , is of crucial importance to both synthetic and biological systems because it limits, for example, the number of independent signaling pathways or the complexity of self-assembled structures.
We can compute capacity for any crosstalk energy distribution by finding the maximum of Eq. 3. A useful approximation is
[5] |
giving a simple rule for the dependence of capacity on the binding energy distribution (Supporting Information). The importance of maximizing in Eq. 5 is intuitive: To increase the capacity of the system, the (exponential average of the) gap between on-target binding and off-target binding should be made as large as possible (see Supporting Information for the precise relationship between and ). Fig. 1C shows three distinct probability distributions, two of which have identical . As predicted, reaches a higher maximum for distributions with larger .
We note that our definition of capacity uses equilibrium binding probabilities and hence applies only at long times compared with unbinding times. In practice, this typically limits , and so we use this bound on s herein. The formalism can be easily extended to include kinetic effects by computing at a finite time t, although this is not our focus here.
In what follows, we show how capacity depends on binding interactions and fabrication constraints for several systems of recent interest. In most systems, the on-target binding energy typically strengthens with the binding surface area S of cognate pairs as , where ε is the binding energy per unit area. However, we find that the off-target energies can grow with S at very different rates across several systems we study. We parameterize this variation as
[6] |
where depend on the details of binding interactions. We show below that if the specificity is determined purely by “colors” (i.e., chemical identities), then . In contrast, if specificity arises from shape complementarity, , as long as the range of the surface attraction is small compared with the length scale of shape variation. Thus, crosstalk grows very slowly with the number of independent binding units in shape-based systems, allowing for a dramatic decrease in crosstalk and improvement of capacity relative to systems that use chemical specificity.
Mutual Information in Random Ensembles
Derivation.
Below we derive Eqs. 2 and 3 in the main text. Our model system has N distinct “locks” , ,…, , which each have unique binding partners, “keys” , ,…, . We do not specify the physics of these interacting components but require that there is a well-defined binding energy between every lock and key and that locks do not bind to locks (and respectively keys to keys). We assume that each lock binds with its cognate key with a strong on-target energy , whereas each off-target lock and key bind with a weak off-target energy that is drawn from some random distribution. We can equivalently rewrite formulas in terms of , where .
In this model we assume that there are equal concentrations of locks and keys in a well-mixed solution, and binding probabilities are determined by the Boltzmann distribution such that lock and key bind with probability . Here
[S1] |
[S2] |
[S3] |
As the N pairs of locks and keys are drawn randomly from the ensemble, we replace Z with , where the angled brackets denote the average with respect to . This gives
[S4] |
is the marginal distribution of , representing the total probability of seeing in a bound pair. We approximate
[S5] |
[S6] |
[S7] |
In the same manner, .
Mutual information between components X and Y is defined as
[S8] |
For a mixture of N pairs of locks and keys, we can rewrite the mutual information as a function of N and the distribution :
[S9] |
Replacing the last sum with its expected value, we get
[S10] |
[S11] |
[S12] |
If the in the above equation is , I will have units of bits.
Two Types of Behavior for I.
Examination of Eq. S10 reveals two distinct behaviors of the mutual information that depend on and specifically on and . When , I will have a distinct maximum.
Type I: .
If interactions have reasonable specificity, typical values of the gap will be at least . In this case, the distribution will be mostly supported at and for any such reasonable distribution, and hence .
When , we can calculate the capacity as the maximum value of I, occurring at , as
[S13] |
[S14] |
[S15] |
[where for simplicity, C throughout this section is written in units of nats, although it can be converted to the equations in the main text by multiplying by .]
Type II: .
For completeness, we discuss the case of that can occur for distributions whose typical values are near ; i.e., typical off-target interactions are as large as on-target binding.
When , reaches its maximum either at or at . In the latter case, we can compute . Taking the limit of I as N goes to infinity gives
[S16] |
[S17] |
(where the last approximation results from ).
It is instructive to see these two behaviors for two simple distributions. If the gap distribution comes from a Dirac delta function, , we find that
[S18] |
[S19] |
[S20] |
[S21] |
Thus, the mutual information behaves as type I, and
[S22] |
[S23] |
[S24] |
[S25] |
as . Gap energy distributions that are tightly centered at a large mean value (Gaussian, Poisson, etc.) behave similarly.
On the other hand, if is drawn from an exponential distribution, , the behavior is different. In this case
[S26] |
[S27] |
[S28] |
The mutual information for a system with this gap distribution thus does not exhibit a maximum, but rather follows the equation for type II,
[S29] |
[S30] |
as . [For small , we can expand the log to get .] This is exactly what we find when we calculate the mutual information numerically (Fig. S1).
Normalization Choice.
To compute mutual information, one needs to first transform a matrix of binding energies between N locks and N keys into one of probabilities. Above we have chosen , where . This corresponds to a model where all of the locks and keys are placed in a well-mixed enclosure together and every key has the chance to bind to every lock.
An alternative model might choose a different normalization. For example, to encompass analyte sensing in synthetic applications and protein sensors in biological signal transduction pathways, a slightly different model, which would also more closely reflect the Shannon communication setting, is necessary. For these systems one may choose a model in which the locks (which here we think of as sensors) are held in place at a given location, while each key (the object to be detected) enters and has a chance to bind to each of the locks. In this case, the probability of encountering a key should sum to one over all possible locks. Thus, , and .
Although results for the two normalizations closely resemble one another, a significant difference can arise, for example, when the kth lock–key pair does not bind very well ( is small), but is highly specific (). In this case, the kth pair would not contribute to the mutual information in the well-mixed case (because it rarely binds), whereas in the communication setting, it would contribute.
Relationship Between , , and .
In the text we have defined , where the average is taken over the probability distribution and . If s is the same for all lock–key pairs (as is the case, for example, with DNA binding), we may rewrite this as
[S31] |
[S32] |
[S33] |
[S34] |
[S35] |
where the approximation follows for .
Numerical Sampling Methods for Random Ensembles.
To numerically calculate the capacity of a random ensemble of lock–key pairs, two options are available. First, for each N, one can randomly sample many N × N matrices, drawn from the ensemble, and calculate the average at that point, using Eq. S8 above. The capacity C is then given by the maximum I achieved and by just the N at which I reaches C.
Alternatively, one can use Eq. S10 above. This requires calculating and . If an analytical formula for these terms is available, this may be computed directly. However, even in complex energy scenarios, as long as one can randomly sample pairs of locks and keys and measure their binding energy, one can calculate these terms numerically and then use Eq. S10. These two give the same results in most cases (Fig. S2).
Case Study: Binomial Binding
It is instructive to see how capacity scales for the case of a binomially distributed gap, which serves as a model for binding in color systems. Here we assume that each on-target pair of letters binds with energy , whereas an off-target pair of letters binds with energy 0 . (One may also allow for a nonzero binding energy between off-target letters—and, in fact, we do so for Fig. 5C, for which on-target letters bind with and off-target letters bind with . However, for simplicity, the rest of this paper and the analysis below set off-target letter binding to zero.)
If crosstalk is negligible (), then we can use all possible pairs, and . However, when crosstalk plays a significant role, we must use Eq. S10 above to determine the capacity and . Plugging in for the distribution, we can calculate the capacity using the approximation from Eq. 5 (main text) as
[S36] |
One interesting consequence of this is that . In other words, more information can be encoded if a single channel is composed of a longer sequence than if many channels of smaller sequences are used. However, the difference is logarithmic in L and so is negligible for large L.
We may examine how the capacity scales as A, the alphabet size, becomes large:
[S37] |
[S38] |
From the above one can see that is what sets the upper limit of the capacity in the crosstalk-limited regime. Because on-target binding is fixed as , capacity is determined by w. In the worst-case scenario, an on-target lock looks identical to an off-target lock, which would mean every letter binds, giving and thus a gap (given by ) of . This would occur if , in other words when the surfaces are just sticky. On the other hand, in the best-case scenario, none of the letters in bind to the letters , giving and . This occurs when . Thus, A is just a tuning factor that allows the system to go from to with increasing A: As A is increased, the more likely it is that a letter in is mismatched to a letter in , giving a zero contribution for that pair.
Furthermore the weak binding is given by
[S39] |
[S40] |
The mutual information in the binomial case has a maximum at , but plateaus down to a constant for large N. We can derive that limit by taking the limit of I as of Eq. S10. Alternatively, we can derive the limit from Eq. S8 without any averaging, by assuming that when , the total possible unique pairs, the information content should be the same as using each of the pairs exactly once. In either case we find
[S41] |
Fabrication Defects
Unintended defects in synthesis can cause mismatch between cognate pairs and thus reduce on-target binding and capacity. We model such defects by adding independent (normally distributed) random undulations of order σ to keys that would otherwise be cognate to their lock, thereby reducing on-target binding. We find that the effect of such defects is small when , the depletion particle size, but degrades capacity significantly when . Such degradation originates from the same aspect of shape coding that gives it lower crosstalk than color coding; a single large mismatch can be sufficient to dramatically reduce on-target binding (Fig. S3).
Contact Between Random Surfaces
In the main text, we emphasized that shape-based coding achieves better specificity than color-based coding because the amount of crosstalk for shapes is much smaller than that for color. The intuitive idea is that when coding with shapes, off-target binding between two mismatched random shapes typically constitutes only a few points of contact and does not scale linearly with the size of the components. We verified this by simulating off-target pairs of shaped components and measuring the average , the off-target binding energy, as a function of L and d. As shown in Fig. S4A, the binding energy first rises sublinearly with L and then saturates for large L. When we allow translations between the components (Fig. S4B), such that w is taken as the strongest possible off-target binding across all translations, the curve is still strongly sublinear in L (indeed, highly logarithmic).
We may further verify this by examining random 1D walks of length L and measuring how much time they typically spend within a distance d of their maximum. We simulate random walks , where in each “time” step, the random walk can take a step of . Examining a range of d from 0.5 to 5 of various lengths, we find that the average amount of time spent within d of the maximum first increases sublinearly with L and then flattens out to a constant (Fig. S4C). Two typical walks of and , at , give intuition for why this is the case (Fig. S4D): When L is small relative to d, a good fraction of the walk is contained between d of its maximum, and so increasing L increases the amount of time spent in this area. When L is much larger than d, only a small fraction of the time is spent close to the maximum.
Pacman Particle Interactions
We derive the depletion-driven interactions between pairs of spherical lock–key Pacman particles of the type described in ref. 15. We describe each lock (Fig. S5A) as a sphere containing a hemispherical cavity of radius , cut off at angle θ (although all calculations in the main text were performed for ). Each key is a sphere of size , and for a cognate lock and key pair, . Attraction is mediated by depletant particles of diameter , and the energy of attraction between a lock and a key is , where is the change of excluded depletant particle volume (and ε mediates the strength of the attraction). We derive a formula for that is more accurate than that used in ref. 15, particularly when the radii of the lock and key are close to each other [e.g., ].
We derived (Fig. S5B) for four regimes of lock/key sizes:
When , , where is the contact area between the lock and the key. The contact area is simply the entire inner area of the lock, , where θ is half the total angle subtended by the inner surface of the lock at its origin.
When , the key does not fit into the lock and makes contact along an annulus of radius . As discussed in ref. 15, the width t of this annulus is set by fabrication constraints (in particular, how sharp the edges of the locks are) and was estimated to be nm. Hence the effective contact area is and . We assume that the contact area drops to this value at from the value of at and linearly interpolate between the two.
When , we consider two regimes:
-
i)
If : In this case, we wish to compute the excluded volume shown as the area between dashed lines in Fig. S5C. The excluded volume is entirely contained within the lock; in particular, the edges of the lock play no role. Hence, the excluded volume is easily computed as the intersection of two overlapping spheres. We find
[S42] |
-
ii)
When approaches from below, at a particular radius , the excluded volume reaches the edges of the lock. See Fig. S5D; using the formula for intersecting spheres would give an overestimate of the binding energy. Instead, we draw a plane through the opening of the lock (line in Fig. S5D) and count only the excluded volume contained between the plane and the lock. The radius at which such edge effects become important is given by
[S43] |
The formula for a triple intersection for gives
[S44] |
[S45] |
[S46] |
[S47] |
[S48] |
[S49] |
The full curve for is shown in Fig. S5B.
The Capacity of Color
We first consider the capacity of interactions mediated through binding sites that are subdivided into multiple regions, each of which can be assigned any one of A chemical identities or colors. We take inspiration from DNA coding that acts via complementary hybridization between single-stranded DNA. Previous work (31) developed engineering principles for determining the optimal length and nucleotide composition of these DNA strands based on detailed models of the binding energy. Information theoretic measures have also been used to understand binding of transcription factors to DNA and other sequence-based molecular recognition problems (32–36). Although the theory of DNA coding has a long history (37), our contribution here formulates the problem in a mutual information framework that relates the capacity to a physical quantity and hence allows for direct comparison of varied chemical (color) and shape systems.
In our simplified color model, a lock is composed of L units, each of which is painted with one of A chemical colors (Fig. 2A). Each color binds to itself with energy and binds to other colors with energy 0, such that locks and their cognate keys have the same sequence. The binding energy of any two strands and is given by (where is the color of the lth site of , and δ is the Kronecker delta). We analyze this system with translations, where is given by the strongest binding across all possible translations of the two strands relative to each other, as well as without translations.
We calculate by sampling N randomly selected pairs of locks and keys, constructing the interaction matrix E, and computing using Eq. 1. We average over many repetitions. An approximate but faster method to compute (necessary for large ) uses Eq. 3, sampling random pairs of off-target locks and keys to estimate and . The two methods give nearly identical results (Supporting Information and Fig. S2), and the calculations in this paper henceforth are carried out with the second method.
Fig. 2B examines when , , and so that a lock and its key bind together with on-target energy . The mutual information has a maximum of 5.5 bits near , far less than the total number of unique sequences (). Due to crosstalk, even though there are nominally 146 pairs at capacity, the system behaves as if there are only independent pairs.
An obvious way of increasing capacity is to boost by increasing L. This strengthens both on-target binding and off-target binding, because both s and w scale with L . However, the gap between them widens (Fig. 2C, solid lines), and the capacity scales linearly with L (Fig. 2C, Inset) (Supporting Information). As a comparison, we also show the capacity when translation is allowed between any two strands. Off-target strands can now translate until they find the strongest binding, increasing crosstalk and thus lowering capacity.
In practice, on-target binding must be limited to below ∼10 for the binding to be reversible; hence L cannot be increased arbitrarily without also decreasing ε. An alternate way to increase capacity at fixed s is to increase the number of colors A. As , accidental mismatches in off-target binding are rare; , and the capacity is limited only by s. In Fig. 2D, capacity in the large A limit can be approximated by setting in Eq. 5, giving bits. However, in practice, alphabet size A cannot be easily increased in experiments, and other techniques must be used to decrease the off-target binding strength, such as the use of shape complementarity.
The Capacity of Shape
Systems of interacting, complementary shapes are characterized by the nonspecific binding of surfaces mediated by a short-range force of characteristic length . The components’ shapes sterically allow or inhibit two surfaces from coming into contact, dictating specificity. We find that crosstalk is qualitatively weaker in such shape-based systems, resulting in higher capacity than in color-based models.
We examine the capacity of a model inspired by a recent experimental system consisting of lithographically sculpted micron-sized particles with complementary shapes (14) whose attractive interactions are mediated by the depletion force. The constraints on the shapes of these components (size μm, line width nm, radius of curvature nm) still leave a large variety of shapes that can interact in a lock and key fashion, yet crosstalk between similarly shaped components reduces the number of effectively unique pairs. We model this system by defining each solo component as a series of L adjoining bars of various heights, whose profile is similar to a Tetris piece. For each lock , the shape of the cognate key is exactly complementary, as in Fig. 3A. We account for fabrication constraints by setting the width of each bar to 1 μm and restricting the change of one bar height relative to its neighboring bars to be less than μm. Depletant particles of diameter d (typically nm) create an attractive energy of for two surfaces separated by . Thus, . In experiments, ε is set by the depletant particle volume fraction and the temperature. In principle, the fabrication fidelity must also be accounted for, as local defects in the shape will disrupt cognate binding. The effect of such defects is shown in Fig. S3; we find that defects of size much less than d, the depletant particle size, have minimal impact on capacity. We assume such a limit in the remainder of the text.
We find that crosstalk with shapes differs fundamentally from the color models discussed earlier. Whereas on-target binding strength still increases linearly with L, off-target binding is almost independent of L (Fig. 3C). In fact, we find that for large enough L, off-target binding ; for larger (or smaller L), is still strongly sublinear in L (Supporting Information and Fig. S4). The weak dependence of on L can be understood intuitively, as a lock pressed to a random mismatched key will typically come into contact at a single location. In contrast, in color-based systems, off-target locks and keys are in full contact and hence . Thus, and hence capacity C for shape systems can be significantly higher than for color-based systems with the same strong binding energy . In Fig. 3B, bits whereas bits with similar parameters (, ). Finally, in Fig. 3D, for fixed L, we find that capacity falls rapidly and all specificity is lost when the spatial range of depletion interactions exceeds the scale of spatial features δ, as expected. These results are consistent with earlier experiments (12) and computational models (13) that established a high dynamic range in the strength of depletion interactions between surfaces roughened by asperities and in particular found that the attraction between surfaces was diminished when the asperity height was below the depletion particle size.
Our results, although intuitive in retrospect, point to a qualitative advantage for coding through shapes; random mismatched shapes have a crosstalk that is, at worst, sublinear in binding-site size whereas crosstalk is linear in site size for color-based systems. Our work suggests that such increased specificity is very robust as it is derived from basic properties of shape itself. Knowing the precise benefits of shape-based coding is important in deciding to incorporate it in engineering efforts going forward.
We may further apply this framework to the recent experimental system of Pacman-like lock–key colloids. In this system (15), a key is a sphere of radius r (typically 1–3 μm), whereas its cognate lock is a larger sphere with a hemispherical cavity of radius r, complementary to its key (Supporting Information and Fig. S5). The attraction is mediated by depletant particles of diameter nm. Multiple pairs of locks and keys may be used concurrently, with the ith pair having a key radius of , with the risk of keys binding to incorrect locks.
How should one choose N lock–key radii to minimize crosstalk and maximize capacity? We may gain some intuition by considering a system containing only two lock–key pairs of radii and , respectively. The on-target binding energies of the two pairs are proportional to the area of contact: , because each key makes perfect contact with its own lock. Assuming , crosstalk energy , corresponding to the larger key of size contacting an annulus around the smaller lock of size . The other crosstalk energy is typically much larger, corresponding to the smaller key fitting into the larger lock of size (see Supporting Information for complete derivation). Thus, there are two competing pressures on the radii : Increasing the overall size of both pairs improves specificity because the on-target energies grow faster than the crosstalk terms. However, grows rapidly if the radii are too similar to each other. Hence the optimal solution for requires setting (the largest allowed radius) and . The binding energy of six particles (in this case optimally chosen to maximize I) is shown in Fig. 4A, with on-target binding and the two types of off-target binding shown.
This intuitive argument does not capture many-body effects that determine capacity for larger N. We find the optimal at fixed N by maximizing the mutual information I in Eq. 1 numerically through gradient descent; note that Eq. 3 cannot be used because the on-target binding energy s varies across pairs. Fig. 4B (solid line) shows the mutual information of optimally chosen radii as a function of N, an improvement over randomly chosen radii (dashed line). Fig. 4C shows the optimal set of radii for various N, with nm and ; the optimal spacing of the radii is .
Interestingly, when , the system has saturated. I does not increase any further (Fig. 4B) and the optimal set involves repeating locks and keys of the smallest radii. Intuitively, the smallest lock–key pairs have become so small that making an additional lock–key pair of an even smaller radius would yield very low self-binding energy relative to the incurred crosstalk. Hence the only way to increase N without decreasing I is to create new nominal pairs at the smallest radius; such pairs are obviously indistinguishable through physical interactions and hence do not increase mutual information any further. We find that the capacity decreases with increasing size of depletant particles and that by nm the system is saturated (reuses radii) when N > 2. Similarly, increasing (with fixed largest cognate binding energy ) increases capacity.
Thus, we find that this colloidal particle system can support about lock–key pairs without much crosstalk, with depletion particles of diameter 100 nm and restricting the largest binding energy to . This is far smaller than the capacity of the previously described DNA sequences or general shape-based strategies. However, these lock–key colloidal pairs are characterized by only one parameter (the radius), so the space of available pairs is significantly smaller than DNA or shape systems with L parameters. In particular, in the current system, additional lock–key pairs are forced to be of smaller radii and hence of lower and lower cognate binding energies. Such considerations emphasize the importance of quantitative information-theoretic optimization in systems with such a limited shape space.
Combining Channels
Thus far we have focused on locks and keys interacting exclusively through a single kind of physical interaction. Using our quantitative framework, we may now ask how capacity increases when multiple sources of specificity, such as shape and color, are combined in a single set of locks and keys. As is known in information theory (38, 39), the combined capacity of two interacting channels can be significantly higher than the sum of the individual capacities.
Linking Two Systems.
The simplest model for combining two channels is to physically link a lock of system 1 to a lock of system 2. We assume that there is no interaction between the two parts of the lock or between the key from one system and the lock of the other system. (We do not take into account entropic effects due to avidity.) Thus, for a linked system (which we denote as ), the two independent systems with gaps of and are combined such that . Hence the gap distribution of the linked system is the convolution of the independent systems: , and the capacity can be computed using Eq. 3 in terms of the gap distributions of the individual systems.
When two channels are linked in this form without any interaction, we expect the total capacity of the system to be (39). We explicitly compute this linked capacity for the physical system shown in Fig. 5A, Left, in which a color system of length L is linked to a shape system of length L (Shape Color). The distribution , obtained by convolving and , is shown in Fig. 5B. The resulting capacity is additive up to corrections that are small when L is large (Supporting Information).
Mixing Two Systems.
In a mixed system, the physics of the individual systems are combined, and there is no general formula for the resulting gap distribution because . We study a model in which the surfaces of shapes are coated with chemical colors, and we denote mixed systems by (Fig. 5A, Right). The energy is the sum of the shape and color interactions, but the color interaction energy implicitly depends on the shape; only when the surfaces are near each other can the color-dependent interaction matter. We assume a distance dependence of the color interaction, with length scale , such that the energy of interaction decays as for two surfaces separated at a distance h.
We can intuitively understand how the mixed model differs from the linked model by examining random off-target pairs, as shown in Fig. 5A. In the model, crosstalk arises from accidental matches in either independent channel; hence the crosstalk is simply the sum of the number of matching sites in the two channels. However, in the model, crosstalk in the color channel can arise at a site only when there is an accidental match in both color and shape channels at that site. For example, in Fig. 5A, all three matching color sites contribute to crosstalk in the model. However, in the model, these three sites are not accidentally matched in the shape channel; because the three color sites are not in contact, they do not contribute to crosstalk. As a result, off-target binding is generally weaker and the typical gap higher in the model, as we find in Fig. 5B. Thus, the mixing of shape and color in this interactive manner increases the capacity.
We may further examine how the capacity changes as a function of , the interaction range of the color system. When is small compared with δ, the maximum height of local shape features, shape features can be easily distinguished by the color force and so the color and shape work in concert to increase capacity. Increasing blurs the shape contours and the color interactions no longer distinguish shapes, thereby becoming less specific. Indeed, Fig. 5C shows that when becomes large, the color system and the shape system act independently, and the capacity relaxes to the capacity of the linked system Shape Color.
In summary, laying out color-based codes on undulating surfaces significantly reduces the total crosstalk because color-matched sites must also be matched in shape to contribute to crosstalk. Such color–shape synergy persists as long as the spatial range of color interactions is shorter than the length scale of shape variation.
Discussion
Here we have shown that mutual information provides a general metric for specificity, bounding the number of distinct lock–key pairs that can be supported by systems of programmable specific affinities. Mutual information is well suited as a measure of specificity for many reasons. First, mutual information is a global measure of specificity, accounting for all possible interactions between N species of locks and keys. Second, as a result, it provides a precise answer as to how many particle pairs can be productively used in a given system. As N is increased, crosstalk necessarily increases as we crowd the space of possible components (Fig. 1B) (25), with more and more lock–key pairs. Capacity is determined by the point at which the information gain due to larger N is negated by the increase in crosstalk.
Third, we can use mutual information to quantitatively compare disparate types of programmable interactions, from DNA hybridization to depletion-driven interactions. Our framework can also quantitatively predict how varying physical parameters (e.g., depletion particle size, range of interactions, elastic modulus of shapes) raises or lowers specificity. The models we discuss can be further refined in various ways, for example by allowing DNA strands to fold, examining shapes in three dimensions, or taking into account the entropic effects of multivalency and avidity (40).
Using such an approach, we found that (i) shape complementarity intrinsically suffers less crosstalk than color (i.e., chemical specificity)-based interactions and (ii) multiple physical interactions, such as color-based and shape-based interactions, can be combined in a synergistic manner, giving a capacity that is greater than the sum of the parts. Such predictions are especially valuable, given the proliferation of different mechanisms for creating and combining distinct mechanisms of specificity: Mutual information provides an unbiased way of comparing their efficacy to each other. As programmable specificity continues to drive technological developments in self-assembly (41), understanding how the mutual information of paired components can be built up toward creating larger, multicomponent objects is a critical future direction of this work.
Although we focus on applications to colloidal systems, we note that the framework developed here can be used to study biological systems as well. In 1894, Emil Fischer proposed the “lock and key” model as an analogy for understanding enzyme–substrate specificity (42), focusing on the physical shapes of paired interacting components; mutual information encompasses this idea and can be applicable to a large number of biological systems. In particular, our model is useful for predicting the differences between interacting proteins that use shape complementarity alone and those that combine both shape and electrostatic complementarity (e.g., Dpr-DIP vs. Dscam proteins) (43) and may also be applied to a host of other biological interaction networks (22) where information transmission and pair specificity play critical roles in biological function [e.g., histidine kinase/response regulator proteins (44) and the immune system (25)]. Crucially, the mutual information model provided above is flexible enough to be extended to some of the challenging physics encountered in biology. Nonequilibrium systems can be accounted for by computing time-dependent probabilities of interactions instead of the equilibrium probabilities, whereas hypotheses for increased specificity like “induced fit” and recent variants (35) can be tested directly for their impact on capacity.
In this work, we have shown that mutual information is a powerful tool to describe diverse specificity models. The strength of our framework is that it is broadly applicable—it may immediately be applied to any system for which the pairwise energies of interactions are known, both in biology and in synthetic experiments. We believe that using the capacity as a measure of system specificity will provide a simple metric for analyzing, comparing, and optimizing systems of programmable interactions.
Acknowledgments
We thank Elizabeth Chen, Mikhail Tikhonov, Matthew Pinson, and Lucy Colwell for helpful discussions. This research was funded by the National Science Foundation through Grant DMR-1435964, the Harvard Materials Research Science and Engineering Center Grant DMR1420570, and the Division of Mathematical Sciences Grant DMS-1411694. M.P.B. is an investigator of the Simons Foundation.
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1520969113/-/DCSupplemental.
References
- 1.Randhawa JS, Kanu LN, Singh G, Gracias DH. Importance of surface patterns for defect mitigation in three-dimensional self-assembly. Langmuir. 2010;26(15):12534–12539. doi: 10.1021/la101188z. [DOI] [PubMed] [Google Scholar]
- 2.Gracias DH, Tien J, Breen TL, Hsu C, Whitesides GM. Forming electrical networks in three dimensions by self-assembly. Science. 2000;289(5482):1170–1172. doi: 10.1126/science.289.5482.1170. [DOI] [PubMed] [Google Scholar]
- 3.Winfree E. California Institute of Technology; Pasadena, CA: 1998. Algorithmic self-assembly of DNA. PhD dissertation. [Google Scholar]
- 4.Ke Y, Ong LL, Shih WM, Yin P. Three-dimensional structures self-assembled from DNA bricks. Science. 2012;338(6111):1177–1183. doi: 10.1126/science.1227268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Mirkin CA, Letsinger RL, Mucic RC, Storhoff JJ. A DNA-based method for rationally assembling nanoparticles into macroscopic materials. Nature. 1996;382(6592):607–609. doi: 10.1038/382607a0. [DOI] [PubMed] [Google Scholar]
- 6.Alivisatos AP, et al. Organization of ‘nanocrystal molecules’ using DNA. Nature. 1996;382(6592):609–611. doi: 10.1038/382609a0. [DOI] [PubMed] [Google Scholar]
- 7.Valignat M-P, Theodoly O, Crocker JC, Russel WB, Chaikin PM. Reversible self-assembly and directed assembly of DNA-linked micrometer-sized colloids. Proc Natl Acad Sci USA. 2005;102(12):4225–4229. doi: 10.1073/pnas.0500507102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Biancaniello PL, Kim AJ, Crocker JC. Colloidal interactions and self-assembly using DNA hybridization. Phys Rev Lett. 2005;94(5):058302–058307. doi: 10.1103/PhysRevLett.94.058302. [DOI] [PubMed] [Google Scholar]
- 9.Nykypanchuk D, Maye MM, van der Lelie D, Gang O. DNA-guided crystallization of colloidal nanoparticles. Nature. 2008;451(7178):549–552. doi: 10.1038/nature06560. [DOI] [PubMed] [Google Scholar]
- 10.Mason TG. Osmotically driven shape-dependent colloidal separations. Phys Rev E Stat Nonlin Soft Matter Phys. 2002;66(6 Pt 1):060402. doi: 10.1103/PhysRevE.66.060402. [DOI] [PubMed] [Google Scholar]
- 11.Mason TG. 2015. Process for creating shape-designed particles in a fluid. US patent publication US 9090860 B2 (July 28, 2015)
- 12.Zhao K, Mason TG. Directing colloidal self-assembly through roughness-controlled depletion attractions. Phys Rev Lett. 2007;99(26):268301. doi: 10.1103/PhysRevLett.99.268301. [DOI] [PubMed] [Google Scholar]
- 13.Zhao K, Mason TG. Suppressing and enhancing depletion attractions between surfaces roughened by asperities. Phys Rev Lett. 2008;101(14):148301. doi: 10.1103/PhysRevLett.101.148301. [DOI] [PubMed] [Google Scholar]
- 14.Hernandez CJ, Mason TG. Colloidal alphabet soup: Monodisperse dispersions of shape-designed lithoparticles. J Phys Chem C. 2007;111:4477–4480. [Google Scholar]
- 15.Sacanna S, Irvine WT, Chaikin PM, Pine DJ. Lock and key colloids. Nature. 2010;464(7288):575–578. doi: 10.1038/nature08906. [DOI] [PubMed] [Google Scholar]
- 16.Ye X, et al. Morphologically controlled synthesis of colloidal upconversion nanophosphors and their shape-directed self-assembly. Proc Natl Acad Sci USA. 2010;107(52):22430–22435. doi: 10.1073/pnas.1008958107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Denkov N, Tcholakova S, Lesov I, Cholakova D, Smoukov SK. Self-shaping of oil droplets via the formation of intermediate rotator phases upon cooling. Nature. 2015;528(7582):392–395. doi: 10.1038/nature16189. [DOI] [PubMed] [Google Scholar]
- 18.Zhao K, Bruinsma R, Mason TG. Entropic crystal-crystal transitions of Brownian squares. Proc Natl Acad Sci USA. 2011;108(7):2684–2687. doi: 10.1073/pnas.1014942108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Rossi L, et al. Shape-sensitive crystallization in colloidal superball fluids. Proc Natl Acad Sci USA. 2015;112(17):5286–5290. doi: 10.1073/pnas.1415467112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Miszta K, et al. Hierarchical self-assembly of suspended branched colloidal nanocrystals into superlattice structures. Nat Mater. 2011;10(11):872–876. doi: 10.1038/nmat3121. [DOI] [PubMed] [Google Scholar]
- 21.Yang S-M, Kim S-H, Lim J-M, Yi G-R. Synthesis and assembly of structured colloidal particles. J Mater Chem. 2008;18:2177–2190. [Google Scholar]
- 22.Johnson ME, Hummer G. Nonspecific binding limits the number of proteins in a cell and shapes their interaction networks. Proc Natl Acad Sci USA. 2011;108(2):603–608. doi: 10.1073/pnas.1010954108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.King NP, et al. Computational design of self-assembling protein nanomaterials with atomic level accuracy. Science. 2012;336(6085):1171–1174. doi: 10.1126/science.1219364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lai Y-T, King NP, Yeates TO. Principles for designing ordered protein assemblies. Trends Cell Biol. 2012;22(12):653–661. doi: 10.1016/j.tcb.2012.08.004. [DOI] [PubMed] [Google Scholar]
- 25.Perelson AS, Oster GF. Theoretical studies of clonal selection: Minimal antibody repertoire size and reliability of self-non-self discrimination. J Theor Biol. 1979;81(4):645–670. doi: 10.1016/0022-5193(79)90275-3. [DOI] [PubMed] [Google Scholar]
- 26.Segel LA, Perelson AS. Shape space: An approach to the evaluation of cross-reactivity effects, stability and controllability in the immune system. Immunol Lett. 1989;22(2):91–99. doi: 10.1016/0165-2478(89)90175-2. [DOI] [PubMed] [Google Scholar]
- 27.Murugan A, Zou J, Brenner MP. Undesired usage and the robust self-assembly of heterogeneous structures. Nat Commun. 2015;6:6203. doi: 10.1038/ncomms7203. [DOI] [PubMed] [Google Scholar]
- 28.Jacobs WM, Reinhardt A, Frenkel D. Communication: Theoretical prediction of free-energy landscapes for complex self-assembly. J Chem Phys. 2015;142(2):021101. doi: 10.1063/1.4905670. [DOI] [PubMed] [Google Scholar]
- 29.Hedges LO, Mannige RV, Whitelam S. Growth of equilibrium structures built from a large number of distinct component types. Soft Matter. 2014;10(34):6404–6416. doi: 10.1039/c4sm01021c. [DOI] [PubMed] [Google Scholar]
- 30.Podgornaia AI, Laub MT. Determinants of specificity in two-component signal transduction. Curr Opin Microbiol. 2013;16(2):156–162. doi: 10.1016/j.mib.2013.01.004. [DOI] [PubMed] [Google Scholar]
- 31.Milenkovic O, Kashyap N. Coding and Cryptography. Springer; Berlin: 2006. pp. 100–119. [Google Scholar]
- 32.Myers CR. Satisfiability, sequence niches and molecular codes in cellular signalling. IET Syst Biol. 2008;2(5):304–312. doi: 10.1049/iet-syb:20080076. [DOI] [PubMed] [Google Scholar]
- 33.Schneider TD. Information content of individual genetic sequences. J Theor Biol. 1997;189(4):427–441. doi: 10.1006/jtbi.1997.0540. [DOI] [PubMed] [Google Scholar]
- 34.Itzkovitz S, Tlusty T, Alon U. Coding limits on the number of transcription factors. BMC Genomics. 2006;7:239. doi: 10.1186/1471-2164-7-239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Savir Y, Tlusty T. 2009. Molecular recognition as an information channel: The role of conformational changes. 43rd Annual Conference on Information Sciences and Systems, 2009 (IEEE, Baltimore), pp 835–840.
- 36.Wu K-T, et al. Polygamous particles. Proc Natl Acad Sci USA. 2012;109(46):18731–18736. doi: 10.1073/pnas.1207356109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Adleman LM. Molecular computation of solutions to combinatorial problems. Science. 1994;266(5187):1021–1024. doi: 10.1126/science.7973651. [DOI] [PubMed] [Google Scholar]
- 38.Alon N. The Shannon capacity of a union. Combinatorica. 1998;18:301–310. [Google Scholar]
- 39.Cover TM, Thomas JA. Elements of Information Theory. Wiley-Interscience; New York: 2012. [Google Scholar]
- 40.Kane RS. Thermodynamics of multivalent interactions: Influence of the linker. Langmuir. 2010;26(11):8636–8640. doi: 10.1021/la9047193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Zeravcic Z, Manoharan VN, Brenner MP. Size limits of self-assembled colloidal structures made using specific interactions. Proc Natl Acad Sci USA. 2014;111(45):15918–15923. doi: 10.1073/pnas.1411765111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Fischer E. 1894. Einfluss der configuration auf die wirkung der enzyme [Influence of configuration on the action of enzymes]. Berichte der Deutschen Chemischen Gesellschaft 27:2985–2993. German.
- 43.Carrillo RA, et al. Control of synaptic connectivity by a network of Drosophila IgSF cell surface proteins. Cell. 2015;163(7):1770–1782. doi: 10.1016/j.cell.2015.11.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Podgornaia AI, Laub MT. Protein evolution. Pervasive degeneracy and epistasis in a protein-protein interface. Science. 2015;347(6222):673–677. doi: 10.1126/science.1257360. [DOI] [PubMed] [Google Scholar]