Abstract
The origin of the modern genetic code and the mechanisms that have contributed to its present form raise many questions. The main goal of this work is to test two hypotheses concerning the development of the genetic code for their compatibility and complementarity and see if they could benefit from each other. On the one hand, Gonzalez, Giannerini and Rosa developed a theory, based on four-based codons, which they called tesserae. This theory can explain the degeneracy of the modern vertebrate mitochondrial code. On the other hand, in the 1990s, so-called circular codes were discovered in nature, which seem to ensure the maintenance of a correct reading-frame during the translation process. It turns out that the two concepts not only do not contradict each other, but on the contrary complement and enrichen each other.
Keywords: Genetic code, Degeneracy, Circular code, Tessera
Introduction
In 1986, John Maynard Smith stated: “We understand biological phenomena only when we have invented machines with similar properties” (Smith 1986, pp 99–100). This quotation explains the motivation of this work quite well. This paper was written in order to better understand the origin of the genetic code using such a machinery. One possible machine or rather a model which gives a feasible explanation for an important aspect of the evolutionary processes of the genetic code was found by Gonzalez, Giannerini and Rosa. In their work “On the origin of degeneration in the genetic code” (Gonzalez et al. 2019) they focus on the degeneracy of amino acid coding and especially on symmetry as an essential cause and consequence of the natural phenomena of degeneracy (compare also Fimmel and Strüngmann 2016). A famous example, which shows the importance of including symmetry deliberations when considering natural phenomena, can be found in quantum mechanics. Here, symmetry describes more than just the patterns that matter takes – it is used to classify the nature of quantum states. This is by far not the only example of its kind. Noether’s theorem even states a one-to-one connection between fundamental laws of nature - so-called conservation laws- and respective symmetries in nature.
Taking these general considerations into account, Gonzalez, Giannerini, and Rosa argue that none of the theories regarding the origin of the genetic code pays the necessary attention to the idea of symmetry (Gonzalez et al. 2019). As a consequence the concept of tessera codes was developed. The tesserae build a subset of all tetranucleotides, chosen in such a way that the degeneracy of the vertebrate mitochondrial genetic code can be explained from the symmetries of the tesserae (Gonzalez et al. 2012).
The other line of thought adressed by the current work is the theory of circular codes. This theory is intended to explain the property of the noise-immunity of the genetic code, and is based on a proposal by Crick et al. (1957). They argue that the coding of amino acids requires only a subset of codons where the correct reading-frame is automatically and immediately recognizable - the so-called comma-free property. While Crick’s theory was refuted in reality (Nirenberg and Matthaei 1961), 40 years later so-called circular codes were discovered in nature (Arqués and Michel 1996). More specifically, it has been noticed that the set of codons, which, together with their frame-shifts in three potential reading-frames, are the most commonly used across all species, has very remarkable properties in terms of detecting the correct reading-frame (Fimmel and Strüngmann 2018; Fimmel et al. 2016; Michel 2017). The comma-free codes proposed by Crick belong to the same family of circular codes, but within them they have the most distinct error-detecting properties (see, for instance, Fimmel et al. 2018, 2017, 2016, xxxx). The natural circular codes have even more interesting structural properties, which makes it very doubtful that these structures play no role in biological processes (Arqués and Michel 1996, Fimmel and Strüngmann (2018)).
The primary goal of this work is to combine the two concepts, tesserae and circular codes, and see if they could benefit from each other. In this work we specify among other things a construction algorithm for circular tessera codes of maximal length. Furthermore, self-complementary tessera codes are characterized and criteria for their self-complementarity are formulated and proved in the language of graph theory. The growth tables for circular and comma-free tessera codes are also presented for the first time. In summary, one result of the work is that the two concepts under scrutiny—that of tessera codes and circularity—have proved to be mutually compatible and complementary.
Thus, with this work we hope to bring more clarity into the possible role of tesserae in the evolutionary process of the genetic code and the mechanisms behind it.
Definitions and Notations
The genetic code is written with words of three letters called codons, built over an alphabet
of four letters which are called nucleotide bases Uracil (Thymine), Cytosine, Adenine, and Guanine, in short U(T), C, A, G. Clearly, the number of codons is and by we will denote the cardinality of the set . Accordingly, the set denotes the set of 16 dinucleotides and the set contains the 256 tetranucleotides. It is hypothesized that during evolution the genetic code had several ancestors that might have consisted not only of trinucleotides but of dinucleotides or tetranucleotides or even combinations of these (see Baranov et al. 2009; Gonzalez et al. 2012; Seligmann 2014; Patel 2005; Wilhelm and Nikolajewa 2004; Wu et al. 2005). In particular, in Gonzalez et al. (2012) the tessera code was suggested as an ancestral code that might have been the origin of the mitochondrial code (see also Gonzalez et al. 2019). In order to define the tessera code we have to introduce some group theory and how it can be applied in the genetic setting.
Klein Four-Group and Equivalence Classes of Dinucleotides
The symmetric group on a set of elements is usually known as the group of permutations of these elements. Applying this to our genetic alphabet we define the symmetric group as
with the usual group operation given by composition of functions. Recall that a group is a set H together with an operation such that is associative and H contains a neutral element e as well as inverses for all (see Rotman 1995 for more details on groups). The group has elements and is trivially isomorphic to the symmetric group on four elements. We will use standard notation as can be found in Rotman (1995), e.g. we will either write or if satisfies , and . Naturally, any permutation can be applied to n-nucleotides of any length componentwise, i.e. if , then . There is no danger of confusion when denoting the induced bijective map by again for any natural number n.
In Fimmel et al. (2014), Fimmel et al. (2015) a subgroup L of was identified that seems to play an important role in error-detection and error-correction mechanisms during the translation process. This group consists of all permutations from that preserve the codon-anticodon relation and can be geometrically interpreted as the symmetry group of a square. In particular, it contains 4 bijective transformations of nucleotide bases that are invariant with respect to the chemical characters of the nucleotides (we will use the notations from Fimmel et al. 2014, 2015). These are the
Identity:
Strong/Weak (SW) or complementary transformation:
Pyrimidine/Purine (YR) transformation:
and Keto/Amino (KM) transformation:
In particular, the complementary map c is biologically important since it mirrors the hydrogen bonds and of the DNA double helix. Moreover, the transformation r from above carries codons of degeneracy class 4 to codons of degeneracy class less than 4 and vice versa - a symmetry property of the genetic code that was already observed by Rumer (see Fimmel et al. 2014, 2015 for more details). In the sequel we will denote the set of these four transformations as (Fig. 1).
Fig. 1.

Graphical representation of the primeval base symmetries. KM is represented by red, YR by green and SW by blue colored lines (Color figure online)
Equipped with the usual group operation of the set forms a subgroup of the symmetric group which is isomorph to the so-called Klein four-group. It can be easily verified that the group is commutative, i.e. for all and that all permutations in are of order two, i.e applying them twice yields the identity for every .
As we will see in the next section, the group is used in order to define the class of tesserae in mathematical terms. If we consider acting on the set of dinucleotides we obtain four orbits of size four. Recall that an orbit of an element x (here a dinculeotide) under some group H (here ) is defined as . Each orbit represents an equivalence class under the natural equivalence relation if and only if there is such that . An easy observation shows that for each such equivalence class there is a unique transformation that maps the first nucleotide of a dinucleotide in that class to the second nucleotide, e.g. the map SW for the class . Table 1 below shows the four equivalence classes and the corresponding permutations.
Table 1.
Each column is one of the four equivalence classes of dinucleotides: , , , under the action of on
| I | AA | AU | AC | AG |
| SW | UU | UA | UG | UC |
| YR | CC | CG | CA | CU |
| KM | GG | GC | GU | GA |
The left most column shows the transformation that sends the first dinucleotide in the class to the second, third and fourth, respectively, e.g. . The column header are the equivalence classes names. The header index is the unique transformation used for mapping the first nucleotide of a dinucleotide to the second
We are now almost in the position to define the set of tesserae as introduced in Gonzalez et al. (2012). But before we need some more technicalities. Besides the group acting as a group of exchanges of bases, there is a second important group which consists of transformations that permute the positions of single bases in a nucleotide sequence. Together with the usual composition of maps these permutations form again a group that once more can be seen as a symmetric group . For the convenience of the reader we here only recall the biologically relevant permutations that will be of importance for us: the so-called reversing permutation and the shift operations . Given an n-nucleotide we define and for as
which are the n-nucleotides obtained from x by reversing or a shift of k positions, respectively. Explicitely, for we have
and
It is now obvious that the anti-n-nucleotide of some n-nucleotide x can be described as with the complementary map SW from . For trinucleotides (codons) it is well-known that the anti-codon is always different from the codon. However, if n is even it might happen that for some n-nucleotide x. These nucleotide sequences are called self-complementary. For example, if , then the tetranucleotide ACGU is self-complementary since .
Tesserae: Definition and Structure
Tesserae were motivated biologically in an evolutionary context in Gonzalez et al. (2012). Each tessera is a tetranucleotide that has a particular form that comes from the symmetries induced by the group . Let us give a definition of a tessera in mathematical terms (see also Gonzalez et al. 2012 and Fimmel and Strüngmann 2019):
Definition 2.1
A tessera is a tetranucleotide (four letter word) of the form
where and . The set of all valid tesserae is denoted by TESS.
The set TESS is also called the tessera code since it is a subset of and hence a code in the sense that every concatenation of words from TESS has a unique decomposition over TESS. Clearly, the size of TESS is 64 and so we have . Table 2 shows the set of all tesserae together with their generating transformation.
Table 2.
The table of all tessera with the generating transformation
| Dinucleotide | id | c | p | r |
|---|---|---|---|---|
| AA | AAAA | AAUU | AAGG | AACC |
| CC | CCCC | CCGG | CCUU | CCAA |
| GG | GGGG | GGCC | GGAA | GGUU |
| UU | UUUU | UUAA | UUCC | UUGG |
| AC | ACAC | ACUG | ACGU | ACCA |
| AG | AGAG | AGUC | AGGA | AGCU |
| AU | AUAU | AUUA | AUGC | AUCG |
| CA | CACA | CAGU | CAUG | CAAC |
| CG | CGCG | CGGC | CGUA | CGAU |
| CU | CUCU | CUGA | CUUC | CUAG |
| GA | GAGA | GACU | GAAG | GAUC |
| GC | GCGC | GCCG | GCAU | GCUA |
| GU | GUGU | GUCA | GUAC | GUUG |
| UA | UAUA | UAAU | UACG | UAGC |
| UC | UCUC | UCAG | UCCA | UCGA |
| UG | UGUG | UGAC | UGCA | UGGU |
It is easy to see that a codon can be uniquely extended to a valid tessera by determing the unique permutation such that and letting . This shows that the tessera code TESS is 1-error-correcting and it was shown in Fimmel and Strüngmann (2019) that TESS can be obtained as a linear code from and by the so-called Plotkin construction from - for more details on this see (Fimmel and Strüngmann 2019).
In Gonzalez et al. (2012) the idea of symmetric primeval adaptor molecules that could recognize the normal reading frame in the coding strand in the 3–5 direction, in the complementary strand in the 3–5 direction, in the coding strand in the reverse 5–3 direction and in the complementary strand in the reverse 5–3 direction was utilized to propose an ancient model of tRNA adaptors that explains the reading mechanism and degeneracy distribution of the tesserae. In particular, since there exist self-complementary tesserae, e.g. ACGU, the tessera code allows degeneracy 2 and 4 only. Maintaining the degeneracy an algorithm was suggested in Gonzalez et al. (2019) for passing from the tessera code back to the (mitochondrial) genetic code in the following way: We assign to each of the transformations from a letter in the genetic alphabet via , , and and then perform the following algorithm displayed in Fig. 2.
Fig. 2.

Schematic representation of the mapping between the tessera onto the codon . (Color figure online)
For instance, the tessera ACGU will be mapped to the codon CUU since and . In the sequel we will denote by the corresponding codon under this algorithm. However, note that the two mappings and are not inverses of each other.
We now aim for a better description of tesserae. Let us assume that is a tessera. By definition there is an element such that
This implies that and have to be in the same equivalence class displayed in Table 1. Thus, the tessera code can be split into four disjoint subsets.
where
Clearly, any subset has a similar induced decomposition where the components could be empty.
Definition 2.2
Let be a tessera code. Then
where
The above decomposition will be used in Sect. 4 for constructing all maximal circular tessera codes.
Graph Theoretical Approach
In this section we recall a graph theory approach from Fimmel et al. (2016) that turned out to be very useful for determining properties of circular codes (see Sect. 3 for the definition of circularity) and extend it to our setting of tesserae. To each subset a directed graph will be associated as the union of disjoint components where . The vertices of such a component will be initial segments and end segments of n-tuples from X of length l and , respectively.
Definition 2.3
Let and . For we define a graph component with set of vertices and set of arcs as follows:
The graph associated to X is the union of the graphs for all . The graph is called the representing graph of X.
It is easy to see that the graph components of a representing graph are pairwise disjoint since their labels have different lengths. However, the components need not be connected. For the convenience of the reader and for a better illustration we give some examples for and 4 (Figs. 3, 4 and 5).
Fig. 3.

Graphical representation of the dinucleotide code X = {UC, CG, GU, AC, AA} which has only one component . (Color figure online)
Fig. 4.

Graphical representation of the trinucleotide code X = {UCA, UAC, CAU, ACA, ACG} which has only one component that is not connected. (Color figure online)
Fig. 5.

Graphical representation of the tetranucleotide code X = {AAUC, ACUA, ACUU, CUCU, CUUU} which has two components and that are both not connected but have two components themselves. (Color figure online)
Since the tesserae are tetranucleotides it follows that any set of tesserae has two (maybe empty) graph components in their representing graph, one with labels of length 1 and 3 and the other with labels of length 2.
In Fimmel et al. (2016) the graph approach was used to characterize circularity of codes in terms of graph theory. We will consider circular tessera codes in the next section but it seems reasonable to state the corresponding theorem in this section. For the technical definition of circularity see Definition 3.1.
Theorem 2.4
Let . Then the following are equivalent:
X is a circular code;
the representing graph is acyclic, i.e. does not contain any cycle.
In the particular case of tesserae we will use a second graph associated to a set that we shall utilize later on in order to construct maximal circular tessera codes.
Definition 2.5
Let . The di-cut-graphs and associated to X are defined as the representing graphs and of the sets
and
To conclude this section we give an example of a di-cut-graph of some tessera code X (Fig. 6).
Fig. 6.

Graphical representation of the di-cut-graph of the Tessera code X = {UCUC, AUGC, CUAG, GCCG}. (Color figure online)
Circular Tessera Codes
In this section we consider circular tessera codes. Simply speaking circularity means that a frame-shift in any concatenation of tesserae from that code will be detected. In the biological setting of the genetic code, a circular set of trinucleotides was first observed in Arqués and Michel (1996) and is supposed to play an important role in error-detection mechanisms during the translational process. We start with the definition of circularity for tesserae.
Definition 3.1
Let . A tessera code is called n-circular if for any set of tessera the concatenation has a unique decomposition into tesserae from the code X for any if considered on the circle. We will call a tessera code circular, if it is n-circular for all .
As we had noted before in Theorem 2.4 a tessera code X is circular if and only if its representing graph is acyclic. Moreover, it is easy to see that the code X is n-circular if and only if for any concatenation of tesserae from X with the shifted sequences for do not yield a valid sequence in , i.e.
In particular, a tessera code X is 1-circular if it does not contain the cyclically shifted tesserae of its members, i.e.
for all and . Therefore, a circular code can not contain any tessera that equals one of its shifts, e.g. , and it makes sense to consider the equivalence classes that are formed by tesserae and their circular shifts. If all shifts are different, then this class is called complete. There are 12 such complete equivalence classes, each containing 4 elements. Four other classes each contain one element and six classes each with two elements like . Table 5 displays all the complete equivalence classes of tesserae (Table 3).
Table 5.
Numbers of self-complementary circular codes of different code lengths
| Code length | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Number | 12 | 72 | 304 | 996 | 2580 | 5408 | 9264 | 12708 | 13696 | 11232 | 6144 | 1584 |
Table 3.
List of complete equivalence classes
| Tessera | Shift 1 | Shift 2 | Shift 3 | Class number |
|---|---|---|---|---|
| AUUA | UAAU | |||
| AAGG | AGGA | GGAA | GAAG | |
| AACC | ACCA | CCAA | CAAC | |
| CGGC | GCCG | |||
| CCUU | CUUC | UUCC | UCCU | |
| UUGG | UGGU | GGUU | GUUG | |
| GCUA | UAGC | |||
| GCAU | AUGC | |||
| UACG | CGUA | |||
| AGUC | GUCA | UCAG | CAGU | |
| CGAU | AUCG | |||
| ACUG | CUGA | UGAC | GACU |
Self-complementary tesserae are in bold
Since any circular code is also 1-circular and there are only 12 complete equivalence classes, it is obvious that a circular tessera code can contain at most 12 elements.
Definition 3.2
A circular tessera code is called maximal if it contains exactly 12 elements.
We will show in Sect. 4 how to construct all maximal circular tessera codes and now give an example of a 1-circular tessera code that is not 2-circular.
Example 3.3
Let . Then X is a 1-circular tessera code but the word ACGUCAUG has two decompositions on a circle
Thus X is not 2-circular. In particular, the graph component of the representing graph of of X contains a cycle.
Moreover, the example below shows that also the classes of 2- and 3-circular tessera codes are different:
Example 3.4
Let . Then X is a 2-circular (by means of easy computations) but not a 3-circular tessera code since the word CAGUUGCAGUUG has two decompositions on a circle
We show next that the graph component being not acyclic is not an accident but in fact it is the only possibility for 1-circular codes not to be circular. In order to do so recall that a cycle in a graph is a sequence of distinct vertices in . The length of this cycle is then defined to be n. Note that for a cycle of length 1 is a loop.
Proposition 3.5
Let X be a tessera code. Then the following hold:
-
(i)
The maximal length of a cycle in is 2; in particular, the maximal length of a path that does not contain a cycle is 1;
-
(ii)
The maximal length of a cycle in is 4; in particular, the maximal length of a path that does not contain a cycle is 3.
Proof
Let X be a tessera code. We first prove (i) by showing that any path in of length 2 must contain a cycle. Hence assume that contains a path of length 2. Without loss of generality we may assume that it starts with a nucleotide, e.g.
Then and are valid tesserae from X. By definition of tesserae the former tells us that there is a transformation such that and . The latter however, then implies that also and so which shows that and is a cycle.
We now prove (ii) by showing that any path of length 4 in contains a cycle. Assume that contains a path of length 4, e.g.
By definition of there are permutations such that
If one of the is the identity we obtain a cycle of length 1 (a loop). Thus all are different from the identity. If , then since . This gives a cycle of length 2. Thus and similarly , . If , then the group structure of implies that and so , hence we obtain a cycle of length 3. Finally, if , then similar arguments as above show that we get a cycle of length 3 or holds. Now
but is commutative and all elements in are of order 2, hence
Consequently, the path itself is a cycle of length 4.
As a corollary we obtain an important theorem. Note that part (ii) was also obtained in a bachelor-thesis (Cisowski 2015) with a much more technical proof.
Theorem 3.6
Let X be a tessera code. Then the following hold:
-
(i)
If X is 1-circular, then is acyclic;
-
(ii)The following two conditions are equivalent:
- X is circular;
- X is 3-circular.
Proof
We first prove (i). By Proposition 3.5 we know that the maximal length of a cycle in is 2, hence a cycle would be of the form which contradicts 1-circularity since .
In order to prove (ii) note that by Proposition 3.5 the maximal length of a cycle in is 4. However, a cycle of even length 2 is excluded by 1-circularity and of length 4 by 2-circularity since
implies that has two decompositions - a contradiction. Hence does not contain any cycle of even length and the maximal length of an odd cycle is 3. By Theorem 2.3 from [13] we conclude that X is circular if and only if it is 3-circular.
We conclude this section with a result that gives a handy criterion for constructing circular tessera codes and some application.
Theorem 3.7
Let be a tessera code. Then X is circular if
X is 1-circular
One of the di-cut graphs and is acyclic.
Proof
Assume that X is 1-circular and one of the di-cut graphs and is acyclic. Without loss of generality we assume that is acyclic. Assume that X is not circular. Then Proposition 3.5 and Theorem 3.6 imply that the component is acyclic and the maximal length of a cycle in is 4. Assume without loss of generality that
is a cycle in . Thus the tesserae and are in X. By definition of is follows that and are dinucleotides in the set and hence and are vertices of . Moreover,
is a cycle in - a contradiction to the fact that is acyclic.
The converse of Theorem 3.7 does not hold as the following example shows. Note, however, that the code (respectively ) can never contain dinucleotides of the form NN since they would imply that there is a tessera of the form NKNK in X which contradicts 1-circularity.
Example 3.8
Let
then X is a maximal circular tessera code but neither nor is acyclic.
We now state some application of the above results in order to construct maximal circular tessera codes from circular dinucleotide codes. In fact, the constructed codes will even have stronger properties:
Definition 3.9
A circular tessera code is called a -code if also the three shifted codes , and are circular.
Recall from Fimmel et al. (2015) that a maximal circular dinucleotide code must be of the form where is any linear ordering of the genetic alphabet .
Proposition 3.10
Let be a maximal circular dinucleotide code. Then
is a maximal tessera -code such that .
Proof
We first prove circularity of the code X. Clearly, . Since D is circular its graph is acyclic by Theorem 2.4 and thus we only need to verify that X is 1-circular by Theorem 3.7. But this is clear since the code contains exactly one tessera from each of the twelve complete equivalence classes from Table 5.
Now let be the nth shift of X for . Then we have
Clearly, is a dinucleotide circular code since it is equal to , hence its representing graph is acyclic and as above is 1-circular. By Theorem 3.7 we conclude that is a circular code.
It remains to show that also and are circular. However, in this case
which is circular and so Theorem 3.7 implies that also and also are circular. Hence X is a -code.
We would like to remark that the construction in the above lemma has some flexibility, e.g the tessera of the form can be substituted by tessera from the same equivalence class. However, it is not obvious how to construct all maximal circular tessera codes using this method. Nevertheless, in the next section we will give a way to obtain all such codes.
Construction of All Maximal Circular Tessera Codes
This section introduces one possibility to construct all maximal circular tessera codes. Recall that a circular tessera code is maximal if it contains exactly 12 elements. The construction will be accomplished in two major steps. Firstly, for each of the four equivalence classes from Table 1 we define a tournament on four vertices which are representing the single dinucleotides. Finally, we combine the four tournaments constructed in the previous step to construct maximal circular tessera codes. Recall that a tournament is a complete oriented graph (see e.g. Clark and Holton 1991). Figure 7 shows an example of a tournament.
Fig. 7.

An acyclic tournament on four nodes. (Color figure online)
As already proved in Theorem 3.6, the graph component associated to a tessera code X has either no path bigger than 1 or X is not circular. Even more precise, if is acyclic the code X must not even be 1-circular. Considering that, a construction of a maximal circular tessera code could almost be reduced to the problem of constructing a valid and acyclic which represents a correct tessera code X.
-
Step 1:In this step we construct four acyclic tournaments which together represent a tessera code of length 24 so that is acyclic. Note that a tournament on 4 vertices has exactly 6 edges and in order to be acyclic it has to be isomorphic to the tournament given in Fig. 7. Below we will show how to construct tournaments on four vertices that represent a correct (circular) tessera code, i.e. the tournaments will be acyclic. Together they form the desired code as
1
As it can be seen from the construction, is acyclic as it is the union of acyclic tournaments, while is not. Yet, for this initial step we can ignore this fact. Since , , and are disjoint it is sufficient that these subgraphs are acyclic to ensure the acyclicity of . As mentioned above, each of these subgraphs has to be isomorphic to the graph in Fig. 7.2 Let us choose one of the equivalence classes and assign numbers 1, 2, 3, 4 to the dinucleotides of . Now we draw directed edges from each node to the nodes with a higher number. This way we will obtain four acyclic tournaments, each of them represents a circular tessera code of size 6. This gives 4! possible assignments per subgraph. Hence, there are altogether tessera codes of size 24 with an acyclic -component.
-
Step 2:
In this step, we use the 331776 tessera codes, constructed in Step 1, to construct all possible maximal circular tessera codes. Since the is already acyclic, it is sufficient to focus on .
Lemma 4.1
Let be a tessera code constructed as above and
for some . Then the following hold:
.
Proof
First we prove (1). Obviously, t is represented by the arrow in the corresponding tournament. Obviously, . Let us consider . It follows that since it would be represented in the same tournament by the opposite directed arrow - a contradiction. Now we claim that one of the remaining shifts of t
is necessarily in the code . Let us first assert that the dinucleotides and cannot be in the same equivalence class as and since in this case takes place and, thus, . Consequently, one of the arrows or is drawn in the corresponding tournament and it follows that or . This proves (2).
The above lemma shows that consequently, consists of 12 pairs of cyclically equivalent tesserae. To ensure that the codes are circular, one of the cyclically equivalent tuples must be removed. This has to be done for all 12 cyclically equivalent pairs of tuples in such a code . It follows that each of the 331776 codes can be used to construct circular codes - with possible repetitions. It remains to prove that all maximal circular tessera codes can be obtained this way. Let X be a auch a maximal code. As shown above, the component of each is a simple directed acyclic graph with a maximum of four nodes. According to Theorem 3.1 (Fimmel et al. 2017), such a graph can be embedded in an acyclic tournament. In Step 1, all possible acyclic tournaments are constructed. Step 2 takes all possible subgraphs of each tournament and combines those. This ensures that all possible maximal circular tessera codes are represented in the construction.
Hence, the total amount of constructed maximal circular tessera codes include all maximal circular tessera codes.
The table below gives the exact numbers of circular and even - codes (compare 3.9) for all cardinalities from 1 to the maximum 12. Moreover, it also shows that number of comma-free codes. Recall that comma-free codes form a subclass of circular codes.
Definition 4.2
A code is called comma-free if any concatenation does not contain any as a substring except for (as initial segment) and (as end segment) themselves.
Clearly, a comma-free code is circular and X is comma-free if and only if it associated graph has no path of length more than 2 (see Fimmel et al. 2016) (Table 4).
Table 4.
Numbers of circular, comma-free and -tessera codes of different code lengths
| Code length | # 1-circular codes | # Circular codes | # -codes | # Comma free codes |
|---|---|---|---|---|
| 1 | 48 | 48 | 48 | 48 |
| 2 | 1056 | 1056 | 1056 | 1056 |
| 3 | 14080 | 14048 | 14016 | 13952 |
| 4 | 126720 | 125544 | 124368 | 122376 |
| 5 | 811008 | 791952 | 773088 | 745584 |
| 6 | 3784704 | 3606048 | 3433584 | 3214272 |
| 7 | 12976128 | 11908800 | 10922112 | 9816960 |
| 8 | 32440320 | 28230456 | 24577404 | 20952504 |
| 9 | 57671680 | 46720800 | 37987120 | 30297824 |
| 10 | 69206016 | 51111024 | 38129856 | 28015728 |
| 11 | 50331648 | 33113472 | 22240992 | 14790144 |
| 12 | 16777216 | 9592512 | 5685408 | 3351232 |
Self-Complementary Circular Tessera Codes
In this section we will discuss some properties of self-complementary tessera codes. In particular, we will determine all maximal self-complementary comma-free tessera codes and give a graph-theoretical characterization of self-complementarity for tessera codes.
Let us first recall the definition of self-complementarity of a code.
Definition 4.3
Let be a -nucleotide code. We will call X self-complementary if for each -nucleotide its anti--nucleotide is also in X:
We will also use the notation
According to the above, a circular tessera code can contain a maximum of 12 tesserae. Such a code can even be self-complementary, as the next example shows.
Example 4.4
The following code is a self-complementary maximal circular code:
The next lemma gives the exact number of self-complementary 1-circular tessera codes.
Lemma 4.5
The maximal size of a self-complementary 1-circular tessera code is 12 and the number of them is 4096.
Proof
Firstly, Example 4.4 shows that there are self-complementary circular codes of size 12 which is maximal. Secondly, inn order to calculate the exact number of self-complementary 1-circular codes, we first ascertain that for 6 conjugacy classes, the respective antitessera of a tessera from that class is found in another conjugacy class: The antitesserae of tesserae from class are all in class , from class in class and from class in class and, of course, vice versa. Thus, we have possibilities to choose 6 tesserae from these conjugacy classes for a 1-circular self-complementary tessera code. As for the classes , only the self-complementary tesserae can be chosen from these, since the other two form tessera-antitessera pairs and are cyclically equivalent. So we have further possibilities for this. Altogether we have maximal self-complementary 1-circular codes.
The following example shows that not every 1-circular self-complementary tessera code is also circular (even not 2-circular).
Example 4.6
Let us take (self-complementary) tesserae AAUU from the class and CCGG from the class , as well as GGAA (from ) and UUCC (from ) which are complementary to each other. Then we have that the word CCGGAAUU has two different decompositions on a circle:
With an extensive computer calculation the exact numbers of self-complementary circular and comma-free codes of maximal length are calculated:
Lemma 4.7
There are
tessera codes of maximal length.
Table 6.
The list of all self-complementary comma-free tessera codes of maximal length
| UUAA | CCAA | AGGA | UCCU | UUGG | CCGG | UCGA | CAUG | ACGU | AGCU | ACUG | CAGU |
| AAUU | AACC | AGGA | UCCU | GGUU | GGCC | UCGA | CAUG | ACGU | AGCU | ACUG | CAGU |
| UUAA | CCAA | GAAG | CUUC | UUGG | CCGG | GAUC | CAUG | ACGU | CUAG | ACUG | CAGU |
| AAUU | AACC | GAAG | CUUC | GGUU | GGCC | GAUC | CAUG | ACGU | CUAG | ACUG | CAGU |
| UUAA | CCAA | AGGA | UCCU | UUGG | CCGG | UCGA | UGCA | GUAC | AGCU | UGAC | GUCA |
| AAUU | AACC | AGGA | UCCU | GGUU | GGCC | UCGA | UGCA | GUAC | AGCU | UGAC | GUCA |
| UUAA | CCAA | GAAG | CUUC | UUGG | CCGG | GAUC | UGCA | GUAC | CUAG | UGAC | GUCA |
| AAUU | AACC | GAAG | CUUC | GGUU | GGCC | GAUC | UGCA | GUAC | CUAG | UGAC | GUCA |
| AAUU | ACCA | AAGG | CCUU | UGGU | CCGG | GAUC | UGCA | ACGU | AGCU | GACU | AGUC |
| UUAA | ACCA | GGAA | UUCC | UGGU | GGCC | GAUC | UGCA | ACGU | AGCU | GACU | AGUC |
| AAUU | CAAC | AAGG | CCUU | GUUG | CCGG | GAUC | CAUG | GUAC | AGCU | GACU | AGUC |
| UUAA | CAAC | GGAA | UUCC | GUUG | GGCC | GAUC | CAUG | GUAC | AGCU | GACU | AGUC |
| AAUU | ACCA | AAGG | CCUU | UGGU | CCGG | UCGA | UGCA | ACGU | CUAG | CUGA | UCAG |
| UUAA | ACCA | GGAA | UUCC | UGGU | GGCC | UCGA | UGCA | ACGU | CUAG | CUGA | UCAG |
| AAUU | CAAC | AAGG | CCUU | GUUG | CCGG | UCGA | CAUG | GUAC | CUAG | CUGA | UCAG |
| UUAA | CAAC | GGAA | UUCC | GUUG | GGCC | UCGA | CAUG | GUAC | CUAG | CUGA | UCAG |
We now aim for a graph-theoretical characterization of self-complementarity for tessera codes. Let us start with some observations on self-complementary 1-circular tessera codes:
Lemma 4.8
Let be a self-complementary 1-circular tessera code. Then it holds
Proof
Let X be a self-complementary 1-circular tessera code. Then for all
where . However, cyclically equivalent tesserae cannot be in the same 1-circular code.
The next property is discovered by examining maximum circular codes of codons (RNA triplets) (Fimmel et al. 2018). Assume that is a trinucleotide self-complementary code, the graph associated to Y. Then the following conditions are true:
for all vertices
where of a vertex v denotes the number of outgoing edges (directed edges that start in v) and denotes the number of ingoing edges, respectively. It was also shown in Fimmel et al. (2018) that the conditions from above are not sufficient in general to ensure self-complementarity but only for circular codes of size at least 18.
We will show next that in the case of tesserae or dinucleotides, the size of the code does not matter and that one can obtain a similar result. Let us first prove the claim for dinucleotides:
Lemma 4.9
Let be a 1-circular dinucleotide code, its associated graph. X is self-complementary if and only if
for all vertices
Proof
Let X be a self-complementary dinucleotide code, for some . Due to self-complementarity of X we have which implies that both conditions (1) and (2). Conversely, assume that X is a 1-circular code. Then its associated graph can be embedded into a tournament on four vertices (compare Fimmel et al. 2017). Assume that satisfies the conditions (1) and (2). The presence or absence of the self-complementary dinucleotides AU, UA, CG or GC in X does not affect either the self-complementarity of X or the conditions (1) and (2). Let us focus then on non-self-complementary dinucleotides from X. Suppose without loss of generality that the dinucleotide is in the code. For conditions (1) and (2) to be met, a dinucleotide and a dinucleotide must be in the code. This can be achieved in three ways:
In this case is valid or
The condition (2) can now only be met if the dinucleotide and the code is self-complementary or
The condition (2) can now only be met if the dinucleotide and the code is self-complementary
This proves that X is self-complementary.
In the case of tesserae we should additionally consider the condition from the Lemma 4.8 and obtain a handy characterization of self-complementarity.
Theorem 4.10
Let be a 1-circular tessera code, . X is self-complementary if and only if
for all vertices
Proof
One implication is analogous to the proof of Proposition 3.1 in Fimmel et al. (2018) considering Lemma 4.8. Conversely, assume that is a 1-circular tessera code that satisfies all three conditions (1), (2), (3). It is immediately clear by direct verification that for all equivalence classes with
holds, i.e. the dinucleotide codes are self-complementary. So we can restrict ourselves to the consideration of for . Since X is a 1-circular code each of is embedded into a tournament on four nodes.
Secondly, as we can see from Table 1, two of the six tesserae represented in each tournament, except of that corrsponding to , are self-complementary:
For these are AAUU (or UUAA) and CCGG (or GGCC)
For these are ACGU (or GUAC) and UGCA (or CAUG)
For these are AGCU (or CUAG) and UCGA (or GAUC)
and for each not self-complementary tessera where its anti-tessera should be in the same component due to the fact that
The rest of the proof can now be done analogously to the proof of Lemma 4.9.
In the Theorem above, the condition of 1-circularity can not be omitted, as the following example shows:
Example 4.11
Let us consider the following tessera code
The code is obviously not 1-circular and non-self-complementary since, for instance, takes place. But all three conditions from Theorem 4.10 are fulfilled. In the picture below, the round and square nodes represent pairs of reversed-complementary dinucleotides.

We conclude this section with a second theorem that gives a graph-theoretical characterization for tessera codes that are not 1-circular using the graph component of a code X.
Theorem 4.12
Let be a tessera code, . X is self-complementary if and only if
for all vertices
Proof
Let us assume that satisfies properties (1) and (2) from Theorem 4.12. Hence, for any tessera we have that and by property (1) also . Property (2) then implies that for some basis . It is clear that has to be the complement of by the unique definition of tesserae. More precisely, assume that such that which implies that and thus . Hence = . Therefore and X is self-complementary.
Let us make a final remark: A 1-circular tessera code X represented by a tournament which is built on four dinucleotides of one of the equivalence classes (see Table 1) is self-complementary if and only if the numbers 1, 2, 3, 4 (see paragraph Construct a Tournament) are assigned to dinucleotides so that 1 is complementary to 4 and 2 is complementary to 3, i.e. . In order to see this let the order on dinucleotides be defined as described above, and
If or then it is obvious that since or and . The only remaining case is . But in this case takes place per definition of the order on dinucleotides and . The opposite direction: Let and, correspondingly, . Then . The case is analogous. In both cases X is not a self-complementary code. Here is an example.
Example 4.13
For example, let us consider the class . Then one possible self-complementary assignment would be: , , and . The represented code {CUAG, CUUC, CUGA, UCAG, UCGA, GAAG} is self-complementary.
This shows that in the construction of all maximal circular tessera codes one can also identify and construct all maximal self-complementary circular codes.
Conclusions
In this work we have identified and characterized circular tessera codes and their properties. In Gonzalez et al. (2012) and Gonzalez et al. (2019) Gonzalez, Giannerini and Rosa had proposed an ancestor code of the universal genetic code that is based on 64 tetranucleotides built from dinucleotides by using the Klein four symmetry group. It was hypothesized that this tessera code existed before LUCA and even before the early genetic code that coded for 20 amino acids using all 64 codons. Possible primeval adaptor molecules that could decode the tessera were also modelled and it was shown that the tessera code mirrors exactly the degeneracy distribution of the mitochondrial genetic code.
We have combined the theory of tesserae with the the theory of circular codes that have been studied extensively during the last decades. Circular codes were found by an extensive statistical investigation in Arqués and Michel (1996) and seem to play an important role in the detection and correction mechanisms of the ribosome during translation. Moreover, it was hypothesized in [13] that ancestor codes of the universal genetic code might have used codons from a circular code only. Thus it was reasonable to investigate circular tessera codes which could have existed between a primitive genetic code and the tessera code.
Our results show that circular tessera codes can be of size 12 at most and we have given construction methods for all circular tessera codes of this size. Moreover, the number of circular (comma-free, self-complementary) tessera codes of any size between 1 and 12 have been calculated.
Acknowledgements
Open Access funding provided by Projekt DEAL.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Elena Fimmel, Email: e.fimmel@hs-mannheim.de.
Martin Starman, Email: m.starman@live.com.
Lutz Strüngmann, Email: l.struengmann@hs-mannheim.de.
References
- Arqués DG, Michel CJ. A complementary circular code in the protein coding genes. J Theor Biol. 1996;182:45–58. doi: 10.1006/jtbi.1996.0142. [DOI] [PubMed] [Google Scholar]
- Baranov PV, Venin M, Provan G. Codon size reduction as the origin of the triplet genetic code. PLoS ONE. 2009;4(5):e5708. doi: 10.1371/journal.pone.0005708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clark J, Holton DA. A first look at graph theory. Newark: World Scientific; 1991. [Google Scholar]
- Cisowski D (2015) Tessera-based encoding of the mitochondrial genome. Bachelor-Thesis, Mannheim
- Crick F, Griffith JS, Orgel LE. Codes without commas. Proc Natl Acad Sci USA. 1957;43(5):416–21. doi: 10.1073/pnas.43.5.416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fimmel E, Michel ChJ, Starman M, Strüngmann L. Self-complementary circular codes in coding theory. Theory Biosci. 2018;37(1):51–65. doi: 10.1007/s12064-018-0259-4. [DOI] [PubMed] [Google Scholar]
- Fimmel E, Michel ChJ, Strüngmann L. Diletter circular codes over finite alphabets. Math Biosci. 2017;294:120–129. doi: 10.1016/j.mbs.2017.10.001. [DOI] [PubMed] [Google Scholar]
- Fimmel E, Strüngmann L. Mathematical fundamentals for the noise immunity of the Genetic Code. BioSystems. 2018;164:186–198. doi: 10.1016/j.biosystems.2017.09.007. [DOI] [PubMed] [Google Scholar]
- Fimmel E, Strüngmann L. Linear codes and the mitochondrial genetic code. BioSystems. 2019;184:103990. doi: 10.1016/j.biosystems.2019.103990. [DOI] [PubMed] [Google Scholar]
- Fimmel E, Michel CJ, Strüngmann L. -nucleotide circular codes in graph theory. Phil Trans A. 2016;374:20150058. doi: 10.1098/rsta.2015.0058. [DOI] [PubMed] [Google Scholar]
- Fimmel E, Giannerini S, Gonzalez D, Strüngmann L. Circular codes, symmetries and transformations. J Math Biol. 2014;70(7):1623–44. doi: 10.1007/s00285-014-0806-7. [DOI] [PubMed] [Google Scholar]
- Fimmel E, Giannerini S, Gonzalez D, Strüngmann L. Dinucleotide circular codes and bijective transformations. J Theor Biol. 2015;386:159–165. doi: 10.1016/j.jtbi.2015.08.034. [DOI] [PubMed] [Google Scholar]
- Fimmel E, Michel Ch. J, Pirot F, Sereni JS, Starman M, Strüngmann L (2020) The relation between k-circularity and circularity of codes, submitted [DOI] [PMC free article] [PubMed]
- Fimmel E, Strüngmann L. Yury Borisovich Rumer and his biological papers on the genetic code. Phil Trans R Soc A. 2016;374:20150228. doi: 10.1098/rsta.2015.0228. [DOI] [PubMed] [Google Scholar]
- Gonzalez DL, Giannerini S, Rosa R (2012) On the origin of the mitochondrial genetic code: towards a unified mathematical framework for the management of genetic information. In: Nature precedings. 10.1038/npre.2012.7136
- Gonzalez DL, Giannerini S, Rosa R (2019) On the origin of degeneracy in the genetic code., In: Interface Focus 9: 20190038. 10.1098/rsfs.2019.0038 [DOI] [PMC free article] [PubMed]
- Michel CJ. The maximal self-complementary trinucleotide circular code in genes of bacteria, archaea, eukaryotes, plasmids and viruses. Life. 2017;7(20):1–16. doi: 10.3390/life7020020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nirenberg MW, Matthaei JH. The dependence of cell-free protein synthesis in E. coli upon naturally occurring or synthetic polyribonucleotides. Proc Natl Acad Sci USA. 1961;47:1588–1602. doi: 10.1073/pnas.47.10.1588. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patel A. The triplet genetic code had a doublet predecessor. J theor Biol. 2005;233:527–532. doi: 10.1016/j.jtbi.2004.10.029. [DOI] [PubMed] [Google Scholar]
- Rotman JJ. An introduction to the theory of groups. Berlin: Springer; 1995. [Google Scholar]
- Seligmann H. Putative anticodons in mitochondrial tRNA sidearm loops: Pocketknife tRNAs? J Theor Biol. 2014;7(340):155–63. doi: 10.1016/j.jtbi.2013.08.030. [DOI] [PubMed] [Google Scholar]
- Smith JM. The problems of biology. Oxford: Oxford University Press; 1986. [Google Scholar]
- Wilhelm T, Nikolajewa S. A new classification scheme of the genetic code. J Mol Evol. 2004;59(5):598–605. doi: 10.1007/s00239-004-2650-7. [DOI] [PubMed] [Google Scholar]
- Wu HL, Bagby S, van den Elsen JM. Evolution of the genetic triplet code via two types of doublet codons. J Mol Evol. 2005;61(1):54–64. doi: 10.1007/s00239-004-0224-3. [DOI] [PubMed] [Google Scholar]
