Abstract
In the mid-1930s, the English mathematician and logician Alan Turing invented an imaginary machine which could emulate the process of manipulating finite symbolic configurations by human computers. His machine launched the field of computer science and provided a foundation for the modern-day programmable computer. A decade later, building on Turing’s machine, the American–Hungarian mathematician John von Neumann invented an imaginary self-reproducing machine capable of open-ended evolution. Through his machine, von Neumann answered one of the deepest questions in Biology: Why is it that all living organisms carry a self-description in the form of DNA? The story behind how two pioneers of computer science stumbled on the secret of life many years before the discovery of the DNA double helix is not well known, not even to biologists, and you will not find it in biology textbooks. Yet, the story is just as relevant today as it was eighty years ago: Turing and von Neumann left a blueprint for studying biological systems as if they were computing machines. This approach may hold the key to answering many remaining questions in Biology and could even lead to advances in computer science.
Keywords: molecular computation, biological computation, finite state machine, structural biology, DNA polymerase
For centuries, human civilization has been obsessed with building automata—machines performing mechanical operations by following a predetermined set of instructions. From cuckoo clocks to temple gates opening without human intervention, automata were used as tools, religious spectacles, and prototypes to explain scientific principles. When they mimicked the behavior of animals, automata challenged what it meant to say something was alive. Around the middle of the 20th century, von Neumann became interested in building the ultimate “living” automaton: a self-reproducing machine which could evolve into more complex machines.
Widely regarded as one of the most influential minds of the 20th century, von Neumann made many fundamental contributions to the mathematical foundations of quantum mechanics, was one of the pioneers of game theory, and was the principal architect behind the logical and design principles of modern computers (1). In the early 1940s, von Neumann became interested in the nascent field of “cybernetics” (2), which is concerned with studying the behavior of animals and machines. Because both follow instructions within logical and mechanical constraints, scholars of cybernetics believed animals and machines would share much in common concerning information, communication, and control mechanisms (2).
In comparing natural and artificial machines, von Neumann found it intriguing living organisms could evolve into more complex organisms over generations (3, 4). Such behavior, he reasoned, would be difficult to engineer into an artificial machine. If machine A were to construct machine B, it must contain a complete B description. Moreover, A would also have to contain additional material to administer the construction of B. As a result, B could not be more complex than A, and the natural tendency would be to degenerate as one machine built another machine.
Turing’s Universal Computing Machine
To design a hypothetical machine that could build more complex machines, von Neumann took inspiration from Alan Turing, who a few years earlier had conceived a hypothetical “universal” machine that could compute anything any other machine could compute (5). Another intellectual giant of the 20th century and a pioneer of computer science and artificial intelligence, Turing is also well known for breaking the Nazi Enigma code during the Second World War.
Turing did not invent his machine to address any problem in Biology, but rather to settle the “decision problem.” Labeled by David Hilbert, the preeminent German mathematician at the time as “the main problem of mathematical logic,” the problem asks for a general algorithm that can decide through a finite process whether or not an arbitrary mathematical statement is provable from a given set of axioms using the rules of logic. Axioms are statements taken to be true. An algorithm is a systematic process that follows rules to find solutions. The word derives from “algorithmi,” the Latinized version of “Al-Khwarizmi,” the ninth-century Persian mathematician who first introduced algorithms to solve problems in algebra (The word algebra originates from the Arabic word “al-jabr,” which means “the reunion of broken parts”). When expressed in a particular language, an algorithm is called a computer program.
Von Neuman was renowned for this ability to produce mathematical proofs. But in 1926, commenting on Hilbert’s decision problem, he conjectured that it must have a negative solution and “we have no idea how to prove this.” Ten years later, Turing showed the problem has indeed a negative solution. The conceptual difficulty was one would have to try every conceivable program out of an astronomical number and show none of them works. Turing came up with a brilliant solution. It involved reducing computation into elementary steps that a simple machine could perform. By precisely defining what is computable, Turing used his abstract machine to prove that no general algorithm could decide whether a formula was provable. In 1936 at age 24, Turing published a landmark paper (5) which not only settled the decision problem but perhaps even more importantly, laid a foundation for the new field of computer science and the general-purpose programmable computer, which materialized a few years later.
Turing’s imaginary machine consisted of an infinitely long ribbon of tape divided into several segments and a tape head that can scan the tape, write one symbol at a time, and move one segment to the right or left along it. To “remember” what it was doing from one step to the next, Turing allowed the machine to have different “states,” which he envisioned representing the different states of mind when performing arithmetic. The machine then follows a set of rules given in the form of a transition table specifying for every initial state and scanned symbol, a specific operation (e.g., move the head one segment to the left or right or write “0” or “1”) and a final state. For example, one rule might be: “If the tape head is in state A and scans a 0, move the head one segment to the right and type 1, change its state to state B.” The transition rule may also instruct the machine not to change state or complete and halt its operation. Following the transition rules, the machine jumps from one state to another depending on the scanned symbol, each time performing a different operation. The input to the computation is the original symbols written on the tape while the output is whatever is left written on it when the machine eventually halts.
Despite its simplicity, Turing showed his machine could perform whatever computation any machine could perform. All he needed to do was to provide his machine with a description of the other machine, which he could do by encoding that machine’s transition table onto the tape. The transition table describing the machine was, in essence, the machine itself, and there could be infinitely many possible transition tables corresponding to infinitely many different machines. This generalized computation model is called a “Universal Turing machine” and forms the theoretical basis for the modern-day, general-purpose programmable computer.
Turing could use his abstract machine to enumerate all possible algorithms and prove there was no solution to the decision problem (5). If such an algorithm did exist, it would be possible to program a Turing machine to predict whether a second Turing machine, once given some arbitrary input, would eventually stop running and halt or fall into a vicious circle and loop forever. Turing proved such a Turing machine was logically impossible. For his proof, Turing used a strategy called reductio ad absurdum or proof by contradiction. He assumed such a halting machine exists, then showed that feeding the machine to itself leads to a contradiction.
It is no coincidence Turing fed the machine to itself for his proof, bringing out self-reference in the problem. At the beginning of the 20th century, self-referential statements were yielding paradoxes wreaking havoc on the field of mathematics, calling into question the consistency and completeness of formal axiomatic systems, which Hilbert so fervently espoused. A few years before Turing began working on his machine, the Austrian logician and mathematician Kurt Gödel shocked the foundations of mathematical logic to its core when he derived the self-referential statement “This statement is unprovable”, showing that not all true statements within a mathematical system could be proved from the axioms (6). While devising his self-reproducing machine, von Neumann would encounter a self-referencing problem at the heart of molecular biology.
von Neumann’s Universal Constructor
To build a universal self-reproducing machine capable of evolving more complex machines, von Neumann recognized that he needed to broaden the Turing machine concept to output another machine rather than a ribbon of tape with a bunch of ones and zeroes printed on it (1). Von Neumann conceived a machine composed of three components: a “blueprint” describing the machine, which in analogy to the Turing tape, carries instructions for how to build another machine; a universal “constructor” which decodes the instructions to construct the machine; and a universal “copying machine” which makes a copy of the instructions (3, 4). A machine builds a copy of itself using the instructions, then makes a copy of the instructions, and feeds them into the new machine, and so on.
To enable the machine to build a machine exceeding its own complexity, von Neumann included one more key ingredient. Forty years earlier, the Dutch botanist Hugo de Vries found that entirely new forms of the evening primrose Oenithera lamarckiana could arise spontaneously at random and propagate for many generations. He coined such changes “mutation.” Just as mutations can occur spontaneously in nature, von Neumann allowed the copying machine to make errors when copying instructions. The copying errors could lead to viable variants of the machine, which could evolve via natural selection to produce more complex machines.
We now know that living organisms are real-life implementations of von Neumann’s self-reproducing machine. A genetic tape carrying instructions in the form of a sequence of DNA is initially transcribed into a corresponding messenger RNA tape, then fed into a universal constructor called the “ribosome,” which translates the RNA message into a corresponding sequence of amino acids that specify a protein tape. The protein tape, in turn, spontaneously folds into molecular devices that provide the workhorses of cells. When organisms reproduce, the DNA tape is copied by polymerase enzymes and passed on from parents to offspring, explaining how inheritance works. Errors can occur when copying DNA, resulting in a diverse population of organisms. Eventually, some mutations would convey an advantage, and gradually over generations, those organisms with the advantage would proliferate and take over the population. This cycle of mutation and natural selection is what biologists call “Darwinian evolution”—the process that marks the beginning of Biology and life. The flow of information from DNA to RNA, to proteins, is what Francis Crick referred to as the “Central Dogma of Molecular Biology” (7).
Von Neumann described his self-reproducing machine in a lecture given at the “The Hixon Symposium on the Cerebral Mechanisms in Behavior” held at Caltech on September 20, 1948 (1, 3) (Fig. 1). This was 5 y before the discovery of the DNA double helix (8), and 12 y before Francis Crick proposed the Central Dogma of Molecular Biology (7). By that time, DNA was the leading contender to be the carrier of genetic information.
Fig. 1.
(Left) Flier used to advertise the Hixon symposium in 1948. (Right) Photograph of the participants. From left to right in the back row are Henry W. Brosin, Jeffress, Paul Weiss, Donald B Lindsley, John von Neumann, J. M. Nielsen, R. W. Gerard, H. S. Liddell. Front row Ward C Halstead, K. S. Lashley, Heinrich Klüver, Wolfgang Köhler, and R. Lorente de No. Images kindly provided by Loma Karklins, archivist at Caltech.
The symposium including von Neumnann’s lecture was published in a book in 1951 (3). Based on transcripts of the lecture, von Neumann clearly saw the connection between his self-reproducing machine and living organisms noting “...the instruction ID is roughly effecting the functions of a gene. It is also clear that the copying mechanism B performs the fundamental act of reproduction, the duplication of the genetic material, which is clearly the fundamental operation in the multiplication of living cells. It is also easy to see how arbitrary alterations of the system E, and in particular of ID, can exhibit certain typical traits which appear in connection with mutation, lethally as a rule, but with a possibility of continuing reproduction with a modification of traits.”
Von Neumann’s remarkable insight was recognizing that to self-reproduce, one needed a mechanism not to copy the machine per se but rather to copy a set of instructions for building the machine. His rationale was that the machine is “varying and reactive” and merely observing it could lead to changes difficult to foresee. In contrast, an instruction tape is “quasi-quiescent” and would be less likely to change with observation. Therefore, organisms carry instructions to build themselves because they need to duplicate, and instructions for how to build an organism can be copied with greater fidelity than the organism itself.
Today, we take it for granted instructions are copied; they do instruct the machine how to build the instructions themselves. Von Neumann recognized such a scheme would lead to a deep logical problem of the self-reference type. The instructions would have to include additional instructions (call them A) for building the instructions themselves. However, because A is part of the machine, additional instructions B would need to be supplied specifying how to make A. But then, C would be needed to describe B, and so on. One ends up in a “vicious circle,” like a picture containing a copy of itself Ad infinitum. Von Neumann avoids this vicious circle by separating the instructions from the rest of the machine and including a separate device to copy the instructions once the machine is constructed. Copying does not require added instructions; a mold provides the information needed for casting a sculpture and not any other. DNA serves as a template for its own replication.
Through his machine, von Neumann answered one of the deepest questions in biology: Why do all living organisms carry a self-description in the form of a genetic molecule such as DNA copied and passed on to progeny? This property distinguishes the living from the nonliving and Biology from all other intellectual disciplines. Many nonliving natural systems are complex; but only the living carries a self-description. Von Neumann’s machine tells us we carry a self-description because the instructions can be copied with greater fidelity than the organism itself and because copying errors or mutations lead to variation, the substrate of evolution. And DNA is copied to avoid a self-referencing problem. Von Neumann not only sets the stage for one of the biggest revolutions in biology, but he also tells us why living organisms are wired this way based on logical reasoning, by thinking of living organisms as machines.
It is clear in retrospect that von Neumann’s lecture had virtually no impact on the biological community. One of the attendees and organizers of the Hixon symposium was the famous chemist Linus Pauling, who a few years later would turn his attention to solving the structure of DNA (9), but he never referred to von Neumann’s work. Sydney Brenner, one of the pioneers of Molecular Biology, was a notable exception. He became aware of von Neumann’s work as early as 1952 but in his own words “was not smart enough to really see then that this is what DNA and the genetic code was all about” (10). Sydney was also one of the first scientists to see the structure of the DNA double helix. He remarked that immediately upon seeing the double helix in 1953, the concept of biological information and the connection to von Neumann’s self-reproducing machine crystallized in his mind (10). Sydney did not miss an opportunity to celebrate the work of von Neumann and Turing and its implications for biology (11), but this also appears to have had little impact on the biological community today.
Molecular Computation
But just how far can we push this analogy between living organisms and computing machines? Every biological phenomenon, including how we perceive the world, derives from the behavior and interactions of biomolecules inside cells. If the fundamental behavior of these life constituents could be described by a computing machine, then everything we call living would necessarily have to be the product of computation. Did nature invent biomolecules to implement molecular computations? If so, what types of problems do these computations solve?
A Brief History.
The notion that biomolecules are computing machines dates to the early 1960s, when biologists Francois Jacob and Jacques Monod (12) proposed biomolecules could execute conditional statements common to programming languages to control protein production in bacteria. During the 1970s, Charles Bennett compared RNA polymerase, which converts DNA into a messenger RNA, with a Turing machine, speculating molecules could provide more energy-efficient computing machines (13). But it would take two more decades before Leonard Adleman demonstrated the first real example of molecular computation (14), crystallizing this notion of biomolecules as computing machines.
Adleman took advantage of the specificity by which DNA strands combine to form DNA duplexes to solve an algorithmic problem known as the “traveling salesman problem” (14). This nontrivial problem asks for the shortest path among cities connected by airline routes that pass through every city exactly once. By creating DNA molecules that represent the cities and flights between them and then combining them in a test tube, he could get an answer within a few minutes.
Since Adleman’s work, molecular computation has been used to solve a variety of algorithmic problems such as the Hamiltonian path problem (14), the Boolean satisfiability problem (15), and the knight placement problem (16). In addition, molecular computation has been used to engineer complex logic circuits from molecular and biological building blocks enabling applications in drug screening, environmental monitoring, and disease diagnosis (17).
What About Natural Biomolecules?
Despite remarkable feats in molecular computation, most biologists do not view biomolecules as computing machines, nor do they view the biochemical reactions they catalyze as a form of computation. You will not find computer science language in most papers published in structural biology, which is a field seeking to understand how biomolecules work by determining their 3D structures at atomic resolution. Structural biologists have not yet embraced the notion of biomolecules as computing machines because all molecular computations realized to date have relied on engineering biomolecules to solve algorithmic problems, which are of little relevance to natural living organisms, and which do not have clear analogs in the natural world of biomolecules. So while we can fashion computers out of biomolecules that solve the Hamiltonian path problem, we do not find such programs in the natural world of biomolecules.
Systems Biology.
Yet, we have known for decades that natural biomolecules process information. In their seminal work on the lac operon in the early 1960s (12), Jacob and Monod revealed how bacteria regulate the levels of enzymes responsible for splitting sugars in response to sugar concentration. Here, a repressor protein specific for the sugar enzymes binds to the DNA of the corresponding genes, preventing RNA polymerase from making the mRNA for the sugar enzymes. The sugar molecules bind the repressor causing it to come off the DNA so that the mRNA and enzymes can be produced. Once the sugar molecules are consumed, the repressor returns to the DNA, blocking the synthesis of the sugar enzyme. Importantly, this is not merely an on–off switch. Instead, by determining the proportion of time during which repressor molecules stay on or off the DNA, the sugar concentration gears with great precision the level of the enzyme made to the level of the enzyme needed, thereby establishing an accurate feedback loop.
Over the past few decades, the field of systems biology has uncovered a wide array of logical circuits used in biological systems, revealing the extreme parallelism of information processing inside the cell, which was not fully captured by von Neumann’s ideas. Thus, systems biology has placed the information processing capacity of biomolecules on a firm mathematical footing, revealing behaviors that could only be modeled by treating the biological systems holistically as networks of many interacting biochemical reactions (18–20). This systems approach has addressed many fundamental questions in biology, including, for example, how structures, oscillations, or waves can arise in homogenous environments (19). Near the end of his life, Turing contributed to these efforts. In another landmark paper published in the early 1950s (21), he showed theoretically how two chemical species or “morphogens” diffusing and reacting with each other could generate spatial patterns like the arrangements of the spots on a leopard’s skin starting from a homogeneous uniform state.
Thus, information processing in molecular biology has been addressed explicitly many times for decades. Yet, despite enormous strides in systems biology, biomolecules are still generally treated as boxes converting inputs into outputs without providing a molecular description concerning the computation’s elementary steps. We might know the software but lack the machine description specifying how a biomolecule’s sequence determines how it processes information. The field of structural biology routinely provides a detailed atomic description of the various microscopic elementary steps needed to complete a biochemical reaction. Yet surprisingly, few attempts have placed this behavior of natural biomolecules within the context of a computing machine.
Natural Biomolecules as Computing Machines
In describing the raw materials needed for computation, Turing unknowingly described the behavior of all natural biomolecules. Like Turing machines, biomolecules also transition between different conformational or chemical “states,” as they “scan” substrates, each time completing a different operation such as adding or removing a chemical group. Like algorithms requiring a logical series of sequential steps to find a solution, biomolecules also go through many logical and sequential steps to catalyze multi-step biochemical reactions. Thus, biomolecules behave the way they do because they are computing machines, and the reactions they catalyze are a form of computation.
Scanning Substrates.
Turing reduced the information written on a two-dimensional sheet of paper into a linear array of 0s and 1s on a 1D tape as a matter of convenience. The Turing machine then uses successive observations to scan the tape, one segment at a time. Turing rationalized this feature because lengthy symbols would be difficult to observe at a “single glance.” Biomolecules also scan substrate molecules through a process called “molecular recognition.” Unlike the Turing machine, they observe several functional groups (symbols) on substrate molecules at once. Moreover, the symbols are not restricted to a binary choice (0 or 1) nor must they be arrayed in 1D. Instead, nature connects hundreds of different chemical groups in 3D to create thousands of substrates differing in shape, size, and electronic properties. Biomolecules then decode this 3D information with a single glance by physically binding the substrates using their active sites, which house functional groups optimally positioned to interact with chemical groups on the substrate.
The correct substrate hitting the active site in the right orientation will register and stick to the biomolecule like a key fitting into its lock, in turn eliciting changes leading to computation. Other molecules will bounce off and have no effect. Nature uses molecular diffusion as which is a very effective mechanism for traversing short distances, as a more economical replacement for circuit wires. Proteins scan a sea of substrate molecules in the cellular milieu by being constantly bombarded by them coming from all directions trillions of times a second.
Sequential Operations.
Biomolecules typically perform one specialized operation on a substrate, for example, breaking a particular bond. Afterward, they release the product. How then do biomolecules perform multiple sequential operations, as required by complex computations? Evolution invented an ingenious solution. Make the product of one biomolecule the substrate of another, and so on. In biological computation, the tape can fall off one machine operating one transition table and jump onto a second machine operating a second transition table.
States and Memory.
By giving the machine multiple states, Turing unknowingly explained one of the most enigmatic properties of biomolecules. Textbooks depict biomolecules such as DNA as still objects, yet nothing could be further from the truth. In just one second, DNA spontaneously morphs into thousands of different structures called “conformational states.” Even its chemical composition can change through epigenetic modifications. So, whereas we can draw the architectural blueprint for a building, when it comes to the molecular world of biomolecules, we need to specify a landscape of many different conformational and chemical possibilities, each forming with a specific probability (22). This landscape enables biomolecules to transition between different states, an essential feature of Turing’s machine.
Turing’s machine tells us biomolecules form alternative states because they provide a form of memory enabling them to “remember” what they are doing as they move from one logical step to the next. Indeed, there is a simple mathematical relationship between bits of information which can be stored and the number of states. A bit of memory has two states, two bits have four states, and n bits have 2n states. Therefore, a biomolecule with n conformational or chemical states can encode log2 (n) bits or log256 (n) bytes of information (1 byte has 28 = 256 states). In the field of thermodynamics, the quantity log (n) is also known as entropy, which is a measure of disorder. However, as we shall see, accessing this potentially enormous reservoir of conformational states as a form of memory requires means to specifically increase the probability of a given state.
Turing also tells us biomolecules change states to “change their state of mind.” Thus, depending on its state, the same biomolecule can perform different tasks. For example, the DNA double helix structure stores genetic information, but the single-stranded form is required to copy the DNA. Because the different conformational and chemical states occur with varying probabilities, the fate of a biochemical reaction is not sealed, but instead can branch into several different outcomes with varying probabilities. These include rare events, such copying errors when replicating DNA, which can result in mutations central to life and evolution. Since the behavior of biomolecules and the reactions they catalyze is probabilistic, we can never predict an animal’s behavior; all we can do is ascribe probabilities to future events.
A Concrete Example of Biological Computation
To convince myself biomolecules were, in fact, computing machines, I set out to reduce the behavior of a biomolecule into a logical transition table like the one Turing chose for his machine. I chose the biological analogue of von Neumann’s tape copier, DNA polymerase, the enzyme tasked with copying DNA, given its centrality to evolution.
Like a Turing machine, DNA polymerase uses successive observations to scan nucleotides in the DNA template strand by placing each nucleotide it encounters within its binding pocket. The polymerase then scans the nucleoside triphosphate monomer molecules G, C, A, and T in the solution by physically binding to them to form a base pair with the nucleotide in the binding pocket. The polymerase then incorporates (“writes”) the bound nucleoside to synthesize a copy of the complementary DNA strand. To copy DNA perfectly, polymerase must only incorporate monomers satisfying the Watson-Crick pairing rules: Incorporate “C” if the nucleotide is “G,” “G” if it is “C,” “T” if it is “A,” and “A” if it is “T”; otherwise reject the monomer. But polymerase cannot “see” the monomers; so how does it implement these rules?
Watson–Crick base pairs share a common rectangular shape whereas all other mismatches such as G–T and A–C form an irregular geometry (Fig. 2). Polymerase uses these geometrical differences to discriminate correct (rectangular) from incorrect (irregular) monomers (Fig. 2). The polymerase then cycles through a series of conformations to implement the transition rule “only accept monomers that form rectangular shaped base pairs.” Here is how it works.
Fig. 2.

(Left) Canonical Watson–Crick base pairs form a rectangular shape, referred to as the “Watson–Crick geometry.” (Middle) Noncanonical mismatches such as G–T and A–C form non-Watson–Crick geometries referred to as the “wobble” conformation. (Right) Mismatches can adopt a Watson–Crick-like geometries through tautomerization of the bases (indicated with a star).
The polymerase conformational landscape includes a highly probable “open” state in which the lid opens, allowing entry of monomers, and a less likely “closed” state in which the lid closes, blocking entry (Fig. 3A). When the correctly matched monomer enters the binding pocket, it forms a rectangular Watson–Crick base pair with the nucleotide in the template strand. The Watson–Crick base pair favors closing the lid shut, like a lid snapping into place with its matching container. As a result, the closed state is now the most probable conformation (Fig. 3A). This is how polymerase implements the transition rule, “Close the lid if the base pair has a rectangular shape”:
Fig. 3.

Mechanism of nucleotide incorporation by a high-fidelity DNA polymerase. Note that the mechanism can vary from polymerase to polymerase. The given example typifies the mechanisms of polymerase β and ε (23–25). (A) Correct incorporation of Watson–Crick base-pairs. (B) The induced-fit subroutine increases the fidelity of nucleotide incorporation by high-fidelity polymerases. (C) Nucleotide misincorporation through tautomeric shifts of nucleobases can lead to copying errors which if uncorrected can result in mutations.
| Initial State | Input | Operation | Final State |
|---|---|---|---|
| Open | Rectangle | Close lid | Closed |
As in a Turing machine, the closed state allows polymerase to “remember” it has consumed the correct monomer. The closed state stores this intermediate result. A polymerase in the closed state with a correct monomer bound in its active site as input transitions into a catalytically active conformational state (Fig. 3A). Because only correct Watson–Crick base pairs make it through this step, it helps prevent misincorporation of mismatches. This behavior is described by the following transition rule:
| Initial State | Input | Operation | Final State |
|---|---|---|---|
| Closed | Rectangle | Form catalytic structure | Catalytic |
Once in the catalytically active state, the monomer is chemically ligated and incorporated into the growing complimentary copy of the DNA strand (Fig. 3A). Whereas all other steps are reversible, this irreversible step costs energy because it entails creating information and a decrease in entropy. The energy is supplied by the incoming monomer nucleotide triphosphate in the form of a chemical bond, which is broken when incorporating the monomer, releasing the energy needed for its incorporation (Fig. 3A). This energy can also help direct movement of polymerase along the DNA tape so it can scan the next nucleotide following the sequence of events described below.
| State | Behavior | ||
|---|---|---|---|
| Initial State | Input | Operation | Final State |
| Catalytic | Rectangle | Incorporate nucleotide | Catalytic |
An incorporated monomer differs as an input from an unincorporated monomer and elicits a different behavior from the catalytic state, causing it to undo the catalytic structure and revert to back to the closed state (Fig. 3A):
| State | Behavior | ||
|---|---|---|---|
| Initial State | Input | Operation | Final State |
| Catalytic | Incorporated | Revert to close | Closed |
Distinguishing incorporated from unincorporated monomers is logically important; otherwise the polymerase would remain stuck in the catalytic state and fail to recycle and catalyze future monomer incorporations. Similarly, the incorporated monomer must elicit a new response from the closed state, otherwise if treated as an unincorporated monomer, the polymerase would cycle indefinitely between the closed and catalytic states, what computer scientist call “an infinite loop.” Instead, the closed state responds to the incorporated monomer by transitioning into the open rather than the catalytic state (Fig. 3A):
| State | Behavior | ||
|---|---|---|---|
| Initial State | Input | Operation | Final State |
| Closed | Incorporated | Open lid | Open |
Finally, the incorporated monomer also elicits a different behavior from the open state, causing the DNA polymerase to translocate one position along the DNA to scan the next nucleotide (Fig. 3A).
| State | Behavior | ||
|---|---|---|---|
| Initial State | Input | Operation | Final State |
| Open | Incorporated | Translocate | Open |
At this point, all intermediate results are discarded, freeing up the memory for a second cycle. The cycle repeats itself until the polymerase encounters a blank, and the copying process is halted. When you run the DNA polymerase program, the product of the computation is a complimentary copy of the DNA strand.
The Induced-Fit Subroutine.
Like many enzymes, polymerase is remarkably accurate, on average committing only one error for every one hundred thousand letters copied [note: this only reflects nucleotide misincorporation and does not include proofreading]. The polymerase employs another set of transition rules involving a different conformational state to actively reject incorrect monomers. This subroutine used by many enzymes is called “induced-fit” (26). Here is how it works.
When an incorrect monomer binds the polymerase pocket, it pairs with the template nucleotide to form a mismatched base pair with an irregular nonrectangular shape. The irregular mismatch prevents the lid from closing all the way (Fig. 3B). Instead, it only partially closes to form an “ajar” state:
| State | Behavior | ||
|---|---|---|---|
| Initial State | Input | Operation | Final State |
| Open | Irregular | Partially close lid | Ajar |
Another state in the conformational landscape, the ajar state is inactive and cannot incorporate monomers (Fig. 3B). The polymerase uses the ajar state to actively lure the conformation away from the catalytically active closed state should an incorrect monomer bind the pocket. The ajar state then reverts to the open state.
| State | Behavior | ||
|---|---|---|---|
| Initial State | Input | Operation | Final State |
| Ajar | Irregular | Open lid | Open |
The cycle of partially closing then opening continues until the mismatched monomer eventually falls out of the pocket, and the process can restart anew.
Copying Errors.
How is it that on rare occasions DNA polymerase makes a copying mistake? These random errors enabled von Neumann’s machine to achieve open-ended evolution. Because biomolecules are constantly contorting between various conformational states, in the world of molecular computation, the machine and the inputs are not written in stone. On rare occasions, they can change.
For example, on rare occasions, the G monomer can undergo a chemical change involving the relocation of a single hydrogen atom (Fig. 3C). The resulting “enolic” form of Genol is chemically similar to A and consequently can pair with T to form a near-perfect rectangular base pair (Fig. 3C) (27). Masquerading as a rectangular base pair, the Genol-T mismatch fools the polymerase into making a typo, typing “G” instead of “A” (Fig. 3C). If left uncorrected, such a typo would result in a mutation. Such “tautomeric” shifts are improbable (probability is typically in 1 in 100,000) (28) explaining why mutations are rare. They occur randomly explaining why mutations are chance events. Proteins also undergo probabilistic transitions in their 3D structure, which give rise to alternative outcomes. Through these conformational fluctuations, biology implements nondeterministic machines.
The logical behavior of DNA polymerase, which executes the “copy DNA” algorithm, can be reduced into a transition table (Table 1). The copy DNA algorithm can also be depicted using what computer scientists call a “state diagram” (Fig. 4A), which shares much in common with what biochemists call “kinetic mechanism.” If we look deeply, every biomolecule can be reduced into a transition table describing its logical behavior; thus every biomolecule can be described as a computational machine (Fig. 4B).
Table 1.
Transition table for nucleotide incorporation by a high-fidelity DNA polymerase. The transition table does not include the instructions required for kinetic proofreading
| State | Behavior | ||
|---|---|---|---|
| Initial State | Input | Operation | Final State |
| Open | Rectangle | Close lid | Close |
| Open | Irregular | Partially close lid | Ajar |
| Open | Incorporated | Translocate | Open |
| Open | Blank | Halt | Halt |
| Close | Rectangle | Form catalytic structure | Catalytic |
| Close | Incorporated | Open lid | Open |
| Catalytic | Rectangle | Incorporate nucleotide | Catalytic |
| Catalytic | Incorporated | Revert to close | Close |
| Ajar | Irregular | Open lid | Open |
| Ajar | Rectangle | Close lid | Close |
Fig. 4.
DNA polymerase a finite-state machine. (A) State diagram for the DNA polymerase nucleotide incorporation finite-state machine. q0, q1, q2, q3, and q4 refer to the states of DNA polymerase, open, closed, catalytic, ajar, and halt, respectively. The inputs are MA = match (rectangular); MM = mismatch (irregular); IN = incorporated; and BL = blank. Note that q0 processes the input “IN” by translocating (Trans) and reverting back to the state q0. (B) Graphical description of DNA polymerase as a computational machine.
A Hierarchy of Biocomputing Machines
Computer scientists have elaborated a hierarchy of increasingly powerful computational models, which employ progressively more sophisticated forms of memory (Fig. 5). Many of these machines have analogs in the universe of biological computation (Fig. 5).
Fig. 5.

Hierarchical organization of automata and biocomputing machines.
At the bottom of the hierarchy is the “finite-state machine” (FSM), a transducer converting inputs into outputs (Fig. 5). The information treated by a FSM resides solely in its states and inputs, it does not have external memory like the Turing Machine tape. FSMs are everywhere, from turnstiles to dishwashers and vending machines. Von Neumann also used FSMs as workhorses to elaborate his abstract model for self-reproduction (4). Although deceptively like a Turing machine, polymerase is constrained in its movement, writing, and erasing facilities and is better described by a FSM model (Fig. 4 A and B). Cells are full of FSMs, thousands of enzymes and receptors converting chemical substrates, mechanical motion, electricity, or light into an equally diverse array of outputs. Because the behavior of biomolecules can change stochastically, they are best described by a nondeterministic FSM model, in which a given input can result in more than one output with predefined probabilities.
Next up in the hierarchy are “pushdown” automata (Fig. 5), machines employing a memory device that pushes symbols to or from a “stack” during transitions. Like the stack of trays we find in cafeteria, operations never work on elements other than the top element. Examples of pushdown machines in biology include the kinetic proofreading (29) facility in DNA polymerase, not covered here. When DNA polymerase makes a copying error, the irregular mismatch is pushed into the stack of regular Watson–Crick base pairs. Following a set of transition rules, DNA polymerase backtracks one or two nucleotides, removing the misincorporated nucleotide from the “top of the stack.” RNA polymerases also backtrack when producing RNA to detect and respond to signals (30).
Further up the hierarchy are automata with a Turing tape whose length is bounded in a certain way (Fig. 5), upon which the machine can freely read and write symbols. Biomolecules possess memory tapes built right into their structure, residues that can be modified with various chemical substituents to elicit longer lasting changes in states. For example, the addition and removal of phosphate groups to specific protein residues in nerve cells encodes long-lived memory in the brain. The addition and removal of chemical marks on tails protruding out of histone proteins holding DNA together encode “epigenetic” instructions, giving liver and brain cells separate identities despite having identical DNA-based genetic instructions.
The Turing machine sits at the top of the hierarchy, and it can simulate any other computing machine (Fig. 5). By supplying the ribosome with the appropriate mRNA tape, it can be programmed to produce any biological computing machine. Determining whether the ribosome and RNA polymerase achieve universal computation through proteins and RNAs should be one of the goals of biological computation (31).
Decoding Biological Computation
In proving the decision problem has no solution, Turing showed that no matter how powerful, no computer program could ever predict the fate of another program. In the words of von Neumann, “You cannot construct an automaton which will predict the behavior of an arbitrary automaton.” Stephen Wolfram elaborated this idea through his principle of “computational irreducibility,” which states that once a computation achieves a certain level of complexity, it is no longer feasible to take shortcuts to make sense of the product of the computation–the only way to fully understand the output is to run the program in full (32). And run the program is exactly the experiment nature has been performing over the past 3.8 billion years of evolution. Each progeny runs a new program, and through cycles of trial and error, evolution defeats the decision problem. The price it had to pay are the genomes no longer with us, we only see a tiny fraction of the universe of biological programs that made it.
Suppose living organisms are products of complex computations. In that case, it will not be possible to understand and compute the behavior of an organism by measuring the inputs and outputs of biochemical processes and taking short-cuts to finding relationships between them; this approach is assured to fail. Yet, this mode of “big data” science is pervading many fields in Biology thanks to a technological revolution brought about by the sequencing of the human genome and AI. The consequence, in the words of Sydney Brenner, is that we are “drowning in a sea of data but remain thirsty for knowledge.”
Knowledge requires we reduce every biomolecule inside the cell into an appropriate transition table, and then running the program in full. We anticipate biomolecules employ a small number of transition rules, which can be weaved together in various combinations to construct all biochemical processes in living organisms. This would be analogous to the relatively small set of network motifs uncovered by systems biology, which appear to serve as basic building blocks for transcription networks (20). Determining the set of transition rules should be one of the goals of the field of biological computation.
Knowledge also requires that we understand how transition rules are implemented based on the sequence of a biomolecule and the laws of physics, which determine the conformational behavior of a biomolecule and how it interacts with its surroundings. Predicting transition rules from sequence should be the goal, not the dominant structure, which has been the focus of recent AI efforts (33). With the appropriate transition tables in hand, we could integrate structural biology more effectively with systems biology and be better positioned to compute the behavior of cells and possibly whole organisms and to rationally reprogram Biology to address technological, health, and societal needs.
By reducing biology into a computational form, entire fields within computer science could be leveraged to systematize biology. For example, the field of computational complexity could be used to examine tradeoffs between time, memory, and energy in various biochemical reactions, and to classify and explore the diversity of problems solved by biological computation. Conversely, computer scientists might be able to mine the natural universe of biological computation and harness billions of years of evolution to discover new computational models or algorithms and maybe even answer questions in mathematics and logic. After all, modeling the action of neurons is what inspired neuroscientist Warren McCulloch and logician Walter Pitts to develop artificial neural networks, which provided the foundation for AI (34). How living organisms harness quantum mechanics (35) could provide important clues for engineering next-generation quantum computers.
Moving forward, we may find that biology and computer science are joined at the hip, that they are the same discipline, a discipline that studies the behavior of machines.
Acknowledgments
I thank Drs. Adlemen (USC), Akhlaghpur (Rockefeller), Bray (Cambridge), Carroll (University of Maryland/HHMI), Searls (Rutger), Herschlag (Stanford), Karlsson (Université de Genève), Landweber (Columbia), Moore (Yale), Palmer (Columbia), Rees (Caltech), Wilson (Nebraska), Wolfram (Wolfram Research), Yannakakis (Columbia), and current and former trainees for their input and Caltech archivist Loma Karklins for providing Fig. 1.
Author contributions
H.M.A.-H. wrote the paper.
Competing interests
The author declares no competing interest.
Footnotes
This article is a PNAS Direct Submission.
Data, Materials, and Software Availability
There are no data underlying this work.
References
- 1.Bhattacharya A., The Man From the Future (Norton, 2022). [Google Scholar]
- 2.Wiener N., Cybernetics or Control and Communication in the Animal and the Machine (Hermann & Cie/The Technology Press/John Wiley & Sons, 1948). [Google Scholar]
- 3.Jeffress L. A., Cerebral Mechanisms in Behavior: The Hixon Symposium (Wiley, 1951), p. 311. [Google Scholar]
- 4.Neumann J. V., Theory of Self-Reproducing Automata, Burks A. W., Ed. (University of Illinois Press, 1966). [Google Scholar]
- 5.Turing A. M., On computable numbers, with an application to the Entscheidungs problem. Proc. Lond. Math. Soc. 42, 230–265 (1936). [Google Scholar]
- 6.Gödel K., Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I. Monatshefte für Mathematik und Physik 38, 173–198 (1931). [Google Scholar]
- 7.Crick F., Central dogma of molecular biology. Nature 227, 561–563 (1970). [DOI] [PubMed] [Google Scholar]
- 8.Watson J. D., Crick F. H., Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature 171, 737–738 (1953). [DOI] [PubMed] [Google Scholar]
- 9.Pauling L., Corey R. B., A proposed structure for the nucleic acids. Proc. Natl. Acad. Sci. U.S.A. 39, 84–97 (1953). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Brenner S., Interview with Sydney Brenner by Soraya de Chadarevian. Stud. Hist. Philos. Biol. Biomed. Sci. 40, 65–71 (2009). [DOI] [PubMed] [Google Scholar]
- 11.Brenner S., Turing centenary: Life’s code script. Nature 482, 461 (2012). [DOI] [PubMed] [Google Scholar]
- 12.Jacob F., Monod J., Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3, 318–356 (1961). [DOI] [PubMed] [Google Scholar]
- 13.Bennett C. H., Logical reversibility of computation. IBM J. Res. Dev. 17, 525–532 (1973). [Google Scholar]
- 14.Adleman L. M., Molecular computation of solutions to combinatorial problems. Science 266, 1021–1024 (1994). [DOI] [PubMed] [Google Scholar]
- 15.Liu Q., et al. , DNA computing on surfaces. Nature 403, 175–179 (2000). [DOI] [PubMed] [Google Scholar]
- 16.Faulhammer D., Cukras A. R., Lipton R. J., Landweber L. F., Molecular computation: RNA solutions to chess problems. Proc. Natl. Acad. Sci. U.S.A. 97, 1385–1389 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Benenson Y., Biomolecular computing systems: Principles, progress and potential. Nat. Rev. Genet. 13, 455–468 (2012). [DOI] [PubMed] [Google Scholar]
- 18.Murray D., Petrey D., Honig B., Integrating 3D structural information into systems biology. J. Biol. Chem. 296, 100562 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Westerhoff H. V., Palsson B. O., The evolution of molecular biology into systems biology. Nat. Biotechnol. 22, 1249–1252 (2004). [DOI] [PubMed] [Google Scholar]
- 20.Alon U., Network motifs: Theory and experimental approaches. Nat. Rev. Genet. 8, 450–461 (2007). [DOI] [PubMed] [Google Scholar]
- 21.Turing A. M., The chemical basis of morphogenesis. 1953. Proc. R. Soc. Lond. B Biol. Sci. 237, 37–72 (1952). [Google Scholar]
- 22.Frauenfelder H., Sligar S. G., Wolynes P. G., The energy landscapes and motions of proteins. Science 254, 1598–1603 (1991). [DOI] [PubMed] [Google Scholar]
- 23.Patel S. S., Wong I., Johnson K. A., Pre-steady-state kinetic analysis of processive DNA replication including complete characterization of an exonuclease-deficient mutant. Biochemistry 30, 511–525 (1991). [DOI] [PubMed] [Google Scholar]
- 24.Dahlberg M. E., Benkovic S. J., Kinetic mechanism of DNA polymerase I (Klenow fragment): Identification of a second conformational change and evaluation of the internal equilibrium constant. Biochemistry 30, 4835–4843 (1991). [DOI] [PubMed] [Google Scholar]
- 25.Kuchta R. D., Mizrahi V., Benkovic P. A., Johnson K. A., Benkovic S. J., Kinetic mechanism of DNA polymerase I (Klenow). Biochemistry 26, 8410–8417 (1987). [DOI] [PubMed] [Google Scholar]
- 26.Koshland D. E., Application of a theory of enzyme specificity to protein synthesis. Proc. Natl. Acad. Sci. U.S.A. 44, 98–104 (1958). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Watson J. D., Crick F. H., Genetical implications of the structure of deoxyribonucleic acid. Nature 171, 964–967 (1953). [DOI] [PubMed] [Google Scholar]
- 28.Kimsey I. J., et al. , Dynamic basis for dG*dT misincorporation via tautomerization and ionization. Nature 554, 195–201 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Hopfield J. J., Kinetic proofreading: A new mechanism for reducing errors in biosynthetic processes requiring high specificity. Proc. Natl. Acad. Sci. U.S.A. 71, 4135–4139 (1974). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Nudler E., RNA polymerase backtracking in gene regulation and genome instability. Cell 149, 1438–1445 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Akhlaghpour H., An RNA-based theory of natural universal computation. J. Theor. Biol. 537, 110984 (2022). [DOI] [PubMed] [Google Scholar]
- 32.Wolfram S., A New Kind of Science (Wolfram Media, Champaign, IL, 2002). [Google Scholar]
- 33.Jumper J., et al. , Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.McCulloch W. S., W., Pitts, A logical calculus of ideas immanent in nervous activity. Bull. Math. Biophys. 52, 99–115 (1943). [PubMed] [Google Scholar]
- 35.McFadden J., Al-Khalili J., The origins of quantum biology. Proc. Math. Phys. Eng. Sci. 474, 20180674 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
There are no data underlying this work.


