Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Feb 4.
Published in final edited form as: Nat Mater. 2016 Apr;15(4):366–370. doi: 10.1038/nmat4594

Nucleic acid memory

Victor Zhirnov 1, Reza M Zadegan 1, Gurtej S Sandhu 1, George M Church 1, William L Hughes 1
PMCID: PMC6361517  NIHMSID: NIHMS1003852  PMID: 27005909

Abstract

Nucleic acid memory has a retention time far exceeding electronic memory. As an alternative storage media, DNA surpasses the information density and energy of operation offered by flash memory.


Information and communication technologies generate vast amounts of data that will far eclipse today’s data flows (Fig. 1). Memory materials must therefore be suitable for high-volume manufacturing. At the same time, they must have elevated information stability and limit the energy consumption and trailing environmental impacts that such flows will demand. Analysts estimate that global memory demand — at 3 × 1024 bits — will exceed projected silicon supply in 2040 (Fig. 1b and Supplementary Information sections 1 and 2). To meet such requirements, flash-memory manufacturers would need ~109 kg of silicon wafers even though the total projected wafer supply is ~107–108 kg (Supplementary Figs 1 and 2). Such forecasts motivate an exploration of unconventional materials with cost-competitive performance attributes. With information retention times that range from thousands to millions of years, volumetric density 103 times greater than flash memory and energy of operation 108 times less, we believe that DNA usedas a memory-storage material in nucleic acid memory (NAM) products promisesa viable and compelling alternative to electronic memory.

Figure 1 |.

Figure 1 |

Change of storage needs over time. a, Timeline of stored analogue, digital and total data (ref. 48) where the percentage values refer to the fraction of stored digital data. b, Projected global memory demand. Actual (filled circles: i, ref. 49; ii, ref. 50; iii, ref. 51) and projected (open circles; iv, ref. 51; v, ref. 52) data points fall between the conservative estimate and the upper bound. See also Supplementary Information section 1.

In this Commentary, we discussthe information retention, density and energetics of NAM — specifically related to DNA — for non-biological and non-volatile memory applications, ranging from letters to libraries. The potential of NAM has often been dismissed, as nucleic acids are believed by some to be fragile and therefore unreliable. This is not the case. For example, the room-temperature half-life of ancient DNA exceeds 100 years1,2. Indeed, the complete genomes of an ~50,000-year-old Neanderthal3 recovered from Siberia and an ~700,000-year-old horse4 recovered from the Arctic permafrost (approximate average temperature −4 °C) have been sequenced. Still, the long-term stability of DNA and its decay kinetics are poorly understood at a per-bit (that is, base) level. As an energy-barrier model shows (Methods), DNA hasa retention time far exceeding electronic memory, and it can store information reliably over time. Through first-principle calculations, DNA has been validatedas a model material for future NAM products (Supplementary Information section 8). Therefore, we call for increased cooperation between the biotechnology and semiconductor sectors to pair previously unfathomable technological advances — such as those from the Human Genome Project — with the scaling expertise of the semiconductor industry.

Nucleic acid memory as a material

As a material, nucleic acids are negatively charged polyelectrolytes with four monomers (the nucleotides A, T or U, C and G). Monomers are covalently bonded to form polymer chains. Once polymerized, an individual chain can hydrogen-bond with itself or with other chains that satisfy base complementarity. These attributes endow nucleic acids with the power of molecular self-assembly, which is made possible by the thermal fluctuations between complementary hydrogen bonds during Watson–Crick hybridization. During DNA hybridization, adenine (A) forms a base-pair with thymine (T), and guanine (G) pairs with cytosine (C). In RNA, thymineis substituted by uracil (U). By encoding sequence complementarity, molecular self-assembly can be exploited to pull nucleic acids like a rope5, weave them like a fabric6,7, decorate them like a scaffold8,9 and recycle10 them like a thermoplastic. Beyond their recyclability, nucleic acids and potential NAM products would also have cradle-to-cradle manufacturing capacity11 to reduce waste and environmental degradation. For example, NAM-device manufacturers could leverage existing biological feedstock (such as fish eggs) or agricultural waste (suchas remainders from harvested plants) as manufacturing input streams12.

As a biological material, nucleic acids manage the information of life for life. Similar to a library, they store, organizeand regulate genetic information to build and maintain vital ecosystems. Akin to a wiki, hereditary content evolves through gene insertion, deletion and modificationof the nucleotide monomers, thereby sustaining information survival via mutation. With a base-4 quaternary code, combinatorial uniqueness in DNA and RNA scales by 4n, where n is the number of bases within each sequence. Although biomolecular machinery has evolved to work with four-letter alphabets, in the future combinatorial uniqueness could be expanded via chemical modifications of the bases and/or inclusion of alternative base pairs such as X and Y (ref. 13). When information bits are encoded into polymer strings, researchers and manufacturers can manage and manipulate physical, chemical and biological information with standard molecular biology techniques14 and toehold-mediated strand displacement15. Examples include actuating DNA like a machine1517, flipping it like a switch18,19, programming it like a computer20 and storing it like a time capsule21,22.

As a memory material, nucleic acids are also information-dense, programmable polymers with digitally encoded, stored and retrievable data. When compared with contemporary memory materials, they do not require lithography, and hence are a cost-competitive alternative to high-volume and high-density memory manufacturing. In addition, the non-volatile nature of nucleic acids and their low energy of operation in living cells — 108 times more efficient than flash memory, the industry gold standard (Supplementary Fig. 3) — are further advantages. Moreover, information encoded into nucleic acids has a volumetric storage density far exceeding electronic-memory projections (Table 1 and Supplementary Information sections 4 and 5). For example, a haploid human genome stores 6 × 109 bits of information per cell, whereas the storage density of Escherichia coli is ~1019 bit cm–3 (Supplementary Information section 5). If such density could be achieved in information-storage systems, all of today’s global storage needs (~1022 bits) would fit in a 10 × 10 × 10 cm3 box, and ~1 kg of DNA would satisfy projected world storage needs in 2040 (~2 × 1024 bits; Fig. 1b and Supplementary Information section 1).

Table 1 |.

Comparison between baseline memory technologies and projected DNA memory.

Metrics Hard disk Flash memory DRAM Cellular DNA
Read/write latency ~3–5 ms per bit* ~100 μs per bit* <10 ns per bit <100 μs per bit
Retention >10 years* ~10 years* ~64 ms* >100 years§
ON power ~0.04 W per GB* ~0.01–0.04 W per GB* ~0.4 W per GB <10−10 W per GB
Aerial density ~1,011 bit cm−2* ~1010 bit cm−2 ~109 bit cm−2 Not available
Volumetric density ~1013 bit cm−3 ~1016 bit cm−3 ~1013 bit cm−3 ~1019 bit cm−3
*

Representative values currently in production47.

Projected fundamental limit (see Supplementary section 6).

§

Based on empirical evidence from studies of ancient DNA. DRAM, dynamic random access memory.

Scaling of NAM

Product scaling reduces cost and increases profit. In no industry is this more visible than in memory manufacturing, which achieves near-zero marginal costs of production and maximized profits when the cost of producing an additional unit becomes insignificant. Mirroring the success of Moore’s Law, product scaling in (and integration between) biotechnology, synthetic biology, digital biology and NAM is expected to follow that of semiconductor manufacturing. We believe that the degree of integration will increase as (i) the Materials Genome Initiative23 expedites material discovery and deployment, (ii) the biotechnology industry comes up with approaches that exceed Moore’s Law, (iii) the semiconductor industry embraces ‘more-than-Moore’ integration24 and (iv) the Semiconductor Synthetic Biology (SemiSynBio) Roadmap drives information technologies towards nucleic-acid-based lithography and memory25.

As with efforts such as the Human Genome Project, the engineering challenge for NAM is to identify a solution that can be responsive to managing, storing and accessing a sheer volume of information. Historically, the Human Genome Project took 13 years, 40 institutions and US$3.8 billion to complete. However, today a single sequencer in one lab can process a genome in one day, 14,000 times faster and 2 million times less expensive than it was for the 40 labs combined26, in part because of the technological advances resulting from that effort. More specifically, the sequencing price per megabase (Mb) was US$31,250 in 2002; today it is about US$0.63 per Mb, a 50,000-fold decrease. According to the National Human Genome Research Institute, the cost to sequence a genome has outpaced Moore’s Law since 2008 (ref. 27). Further improvements in sequencing throughput (>104) and parallelization (>107) are expected in the next 5 years26. Emerging technologies such as nanopore sequencing will further reduce errors, cost, time and energetics during reading28. Current coststo read and write DNA are ~10–7 and ~10–4 US$ per bit, respectively29. Although the cost to sequence DNA has dropped from ~0.1 to ~10–7 US$ per bit in 10 years, synthesis costs have decreased more slowly than sequencing costs yet still considerably faster than Moore’s Law.

Similar to cell-based information-processing systems30, compartmentalization is essential for NAM because it reduces the reaction volume and decreases the energy of operation. By scaling down the reaction volume via microfluidics, it is feasible to gain substantial energy and cost savings for reading and writing processes31,32. Further cost savings are possible with light-directed chemistry in combination with lithography33. For example, photoelectrochemical DNA synthesismay increase throughput34. In addition, DNA laser printing has a target cost of 10–6 US$ per bit and a potential error rate of 1 ppm (ref. 35). Moreover, reading36 and writing37 via nanopores38 could make NAM faster and less expensive.

Storage capacity of NAM

Enabled by exponential progress in DNA synthesis and sequencing, an ~1,000× increase in DNA storage was made possible when DNA-based files became compatible with mainstream digital formats21,22. Two groups, one at Harvard University21 and another at the European Molecular Biology Laboratory22, have independently validated that books, images and audio files can be written into DNA and then read back without error. More recently, information has been encoded into DNA via an error-correction code (ECC), and DNA’s information retention has been improved to an estimated ~2,000 years at 10 °C and ~2,000,000 years at −18 °C by the encapsulation of the DNA into silica39. A more complete analysis could include evaluating trade-offs between the per-base stability and the ECC overhead (such as power, time and density). Regardless, DNA is becoming viable for non-biological and non-volatile NAM applications that require archival storage capacity greater than 100 years with infrequent access. Such archives would include massive scientific, financial, governmental, historical, genealogical, personal and genetic records.

Performance of NAM

DNA-stability and hence NAM-retention values have so far been empirical because source-material defects are ignored in the polymerase chain reactions and sequencing of ancient DNA (ref. 40; see Supplementary Information sections 3 and 4 for a detailed system-level analysis). Moreover, the perceived fragility of biological matter is a legitimate concern for practical storage technologies. That being said, perceptions can be misleading when information loss is viewed within the cell. For example, the number of DNA degradation events in a single human cell is estimated to be 104–105 per day41. Hydrolysis is the dominant mechanism for information loss in DNA (Fig. 2; Supplementary Information sections 7 and 8), as it contributes to depurination, deamination (C to U changes) and backbone cleavage. In comparison, NAM storage outside of a cell is attractive for archival storage because evolution does not occur and hydrolysis can be controlled.

Figure 2 |.

Figure 2 |

Mechanism of NAM information degradation by hydrolysis. a, Simulated model of a single-stranded DNA molecule in an aqueous environment. b, Illustrated mechanisms of depurination (red crosses in place of DNA bases), backbone cleavage (red cross in the DNA backbone), and point mutations resulting from deoxycytidine deamination (red crosses next to DNA bases).

We have carried out a NAM-retention analysis using a generic energy-barrier model for memory devices (Fig. 3a), consisting of a storage node with information-bearing particles ‘protected’ by an energy barrier. The information-bearing particles can be electrons (as in flash memory), magnetic domains (as in hard disks), atoms/ions or molecular fragments such as nucleotides in DNA. The energy-barrier model for a memory element is assumed to be analogous to the kinetic-barrier model for chemical reactions (Fig. 3b) and is applied to several mechanisms of state loss in NAM (Supplementary Information section 8).

Figure 3 |.

Figure 3 |

Generic barrier models and memory retention. a, Schematic of a generic barrier model for memory devices, consisting of a storage node with information-bearing particles ‘protected’ by a barrier. Eb, energy-barrier height. b, A similar barrier model is used in chemical kinetics applied to chemical reactions. c, Summary of calculated memory-retention times for DNA and NAND flash memory in both air and water.

In memory devices, the informational state is created by the presence or absence of information-bearing particles in the storage node. In order to prevent state loss and hence information loss, the storage node is defined by energy barriers of sufficient height, Eb (Fig. 3a). Two types of unintended energy transitions can occur: classical (thermally excited over-barrier transitions) and quantum (through-barrier tunnelling). As a result of the very small electron mass, quantum transitions are the dominant factor that limits the minimum bit size in electron-based memory to ~10–15 nm (Supplementary Information section 6). For heavier particles such as atoms or molecular fragments, quantum transitions are suppressed, which affords greater storage densities (Supplementary Information section 4). For example, we demonstrated memory cells of less than 10 nm for nanoionic memories such as RRAM (resistive random-access memory; ref. 42). In comparison, the characteristic bit size in DNA is ~0.3 nm (Supplementary Information section 5). For memory elements using heavy particles, the state losses and thus retention times are governed by classical physics, which occur when the particle jumps over the barrier (Fig. 3a)and the particle’s kinetic energy E at temperature T is larger than Eb.

We provide a summary of the theoretical retention times, in both water and dryair, calculated (equation 6 in Methods)for two main degradation mechanisms — depurination and deamination (Fig. 3c). In water at −4 °C, depurination and deamination retention times for DNA are 1,108,965 and 283,001 years, respectively, which makes the estimated age of the ~700,000 year old horse4 reasonable and compares favourably with the retention time for NAND flash memory — the industry gold standard. A third degradation mechanism, backbone cleavage, occursat depurinated sites, and is thereforea dependent event (Supplementary Information section 8). The calculated results confirm that DNA stability and hence NAM retention values range, at10 °C, from 2 × 104 years in water to 2 × 107 years in air (at 20 °C the valuesare 34 × 102 and 4 × 106 in water and air, respectively). The results also confirm the importance of hydrolysis, where a drastic improvement in retention time is shown by changing the environment from wet to dry. The effect of hydrolysis is greater than temperature, which is consistent with experimental observations43.

Deamination is the highest source of information loss in ancient DNA (Fig. 3c) and has the lowest energy barrier (Supplementary Information section 8). To combat information loss in practical memory or storage systems, ECCs are widely used44. Fortunately, DNA is easy to copy, which decreases the ECC overhead and thus makes error correction a primary factor for data integrity. Nevertheless, quantifying the per-bit stability is the fundamental starting point for discussing information retrievalin NAM, which future algorithms should follow and the semiconductor industry should heed. Although the objectives of our barrier model are to validate NAM with DNA, our calculations have broader implications in biology. For example, they could provide a predictive model for DNA preservation.

Beyond retention, a comparison of DNA with mainstream memory technologies (Table 1) indicates that DNA is a viable option for hyperdense data-storage applications that require an ultralow energy of operation. For instance, compared to flash memory, the volumetric density of DNA is about three orders of magnitude larger (Supplementary Information sections 4 and 5) and its power requirements approximately eight orders of magnitude lower (Supplementary Information section 3). While both valid and compelling, the power requirements in cells do not directly translate to future NAM products. For example, additional energy may be needed to convert an information string made from DNA into a digitally readable format.

Outlook

To justify high-volume manufacturing of NAM, researchers and manufacturers must quantify naturally occurring defects at a per-bit level. DNA-stability and NAM-retention values have so far been empirical because defects within ancient DNA are ignored during data extraction. Through first-principle calculations (Supplementary Information section 8), we have directly compared NAM to electronic memory and explored its potential use in hyperdense data-storage products that require ultralow energies of operation.

Given exponentially increasing demands for safeguarded information worldwide, and the long retention times for DNA (ranging from thousands to millions of years), NAM can store the world’s information for future generations using far less space and energy. NAM could thus be used as a time capsule for massive, infrequently accessed records in scientific, financial, governmental, historical, genealogical, personal and genetic domains.

Unlike other memory technologies, NAM can be replicated into numerous physical copies of itself with high fidelity and low cost. In practice, the cost of NAM ranges from 3.6 × 1015 to 9 × 1016 bits per US dollar. Real-world applications range from hard drives to an information-management system for synthetic biology, to a platform for both watermarking and tracking genetic content, to next-generation encryption tools that necessitate physical (rather than electronic) embodiment. Like so many emerging technologies today, ethics and policy must light the way for NAM to flourish responsibly. In analogy to the enigma machines in World War II, NAM has the potential to hide in plain sight. There are a few indicators of NAM’s bright future: a 70-gram handheld sequencing instrument45 from Oxford Nanopore Technologies, and Illumina’s effort to develop a portable, smartphone-based DNA chip sequencer as a molecular stethoscope46.

Challenges that must be addressed to develop NAM hard drives in high volume are daunting and include how to cost effectively and expeditiously source, synthesize, recycle, up-cycle and reuse nucleic acids such as DNA for high-volume manufacturing. Equally challenging is the need to increase the speed and decrease the cost to predictably read, write, package and store DNA. In spite of these challenges, NAM has the potential to mirror Moore’s Law — ultimately promoting product scaling in, and integration between, biotechnology, synthetic biology and semiconductor fabrication. In support of this opportunity, public–private partnerships can engage with and align to the SemiSynBio Roadmap26.

Methods

Retention calculations.

For memory elements using heavy particles, the state losses and thus retention are governed by classical transitions, which occur when the particle jumps over the barrier (Fig. 3a) and the particle’s kinetic energy E at temperature T is larger than Eb, the energy-barrier height. The corresponding probability for over-barrier transitions Pc (referred toas a classic error probability) is obtained from the Boltzmann distribution (kB, Boltzmann’s constant):

Pc=exp(EbkBT) (1)

The rate, r, at which the transitions occur is obtained by multiplying equation (1) by the number of collisions of the confined particles with the barrier per unit time, often referred to as the thermal attempt frequency, f0:

r=f0exp(EbkBT) (2)

Equation (2) is commonly used to analyse memory devices (for example, magnetic, electronic or ionic), and is analogous to the Arrhenius equation for chemical reactions. As follows from this equation, the probability of a state loss by a memory element, and thus the retention time, depends on the attempt frequency f0 and the barrier height Eb. In Supplementary Information section 8 these two parameters are discussed in regard to DNA degradation mechanisms. For each of the mechanisms, the lifetime Δt can be defined through the probability that n elements are destroyed. P is the probability of ‘success’ in one trial. The number of trials k during time interval Δt is:

k=Δt×f0 (3)

The probability that at least one element will be destroyed during a sampling time Δt is:

πk=1(1P)k (4)

And the probability for n elements to degrade during the interval Δt is:

πkn=(1(1P)k)n (5)

Assuming 50% of n elements degrade, πkn = 1/2. From equations (1), (3), and (5) we obtain:

[1exp(EbkBT)]f0Δt=121n (6)

or

Δt=1f0ln(121n)ln(1exp(EbkBT)) (7)

The numerical values for f0 and Eb for different environments and degradation mechanisms are shown in Supplementary Information section 8.

Supplementary Material

1

Acknowledgements

Research described in this Commentary was supported in part by the Micron Foundation, the Semiconductor Research Corporation, the National Institute of General Medical Sciences of the National Institutes of Health (K25GM093233), and the National Science Foundation (CMMI–1344915). Special thanks are given to K. Marker and D. Zahn for their thoughtful reviews of the manuscript.

Footnotes

Additional information

Supplementary information is available in the online version of the paper.

Competing financial interests

W.L.H. and R.M.Z have received financial support to explore NAM technologies from the Semiconductor Research Corporation, the Function Accelerated nanomaterial Engineering Research Center, and from the Micron Foundation. G.M.C has patents licensed to Oxford Nanopore Technologies, equity in Genia-Roche for nanopore sequencing, financial involvement in multiple next-generation sequencing and synthesis companies, and financial support from Technicolor on DNA-storage technologies.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES