Basic principles of the genetic code extension

Paweł Błażej; Małgorzata Wnetrzak; Dorota Mackiewicz; Paweł Mackiewicz

doi:10.1098/rsos.191384

. 2020 Feb 5;7(2):191384. doi: 10.1098/rsos.191384

Basic principles of the genetic code extension

Paweł Błażej ^1,^✉, Małgorzata Wnetrzak ¹, Dorota Mackiewicz ¹, Paweł Mackiewicz ¹

PMCID: PMC7062095 PMID: 32257313

Abstract

Compounds including non-canonical amino acids (ncAAs) or other artificially designed molecules can find a lot of applications in medicine, industry and biotechnology. They can be produced thanks to the modification or extension of the standard genetic code (SGC). Such peptides or proteins including the ncAAs can be constantly delivered in a stable way by organisms with the customized genetic code. Among several methods of engineering the code, using non-canonical base pairs is especially promising, because it enables generating many new codons, which can be used to encode any new amino acid. Since even one pair of new bases can extend the SGC up to 216 codons generated by a six-letter nucleotide alphabet, the extension of the SGC can be achieved in many ways. Here, we proposed a stepwise procedure of the SGC extension with one pair of non-canonical bases to minimize the consequences of point mutations. We reported relationships between codons in the framework of graph theory. All 216 codons were represented as nodes of the graph, whereas its edges were induced by all possible single nucleotide mutations occurring between codons. Therefore, every set of canonical and newly added codons induces a specific subgraph. We characterized the properties of the induced subgraphs generated by selected sets of codons. Thanks to that, we were able to describe a procedure for incremental addition of the set of meaningful codons up to the full coding system consisting of three pairs of bases. The procedure of gradual extension of the SGC makes the whole system robust to changing genetic information due to mutations and is compatible with the views assuming that codons and amino acids were added successively to the primordial SGC, which evolved minimizing harmful consequences of mutations or mistranslations of encoded proteins.

Keywords: genetic code, codon, amino acid, mutation

1. Introduction

The basic diversity of proteins fulfilling a wide range of functions within organisms is based on 20 naturally occurring amino acids. The proteins are also modified post-translationally, which extends their properties. However, it is tempting to increase this variety with artificially designed amino acids or other molecules. They can be introduced directly into proteins or modified in a given proteinaceous molecule, but a more universal and stable solution is such modification of the standard genetic code (SGC) that the newly created proteins including non-canonical amino acids (ncAAs) are constantly produced by a given organism. Several approaches were invented to achieve this goal [1].

The first approach uses stop translation codons (e.g. rarely used UAG) to encode ncAAs [2–5]. This method requires a modified aminoacyl-tRNA synthetase which charges a tRNA with an ncAA. This suppressor tRNA must recognize the stop codon and then ncAA is incorporated into a protein during its synthesis. However, this method enables utilization of up to two stop codons because one of the three codons must be left as a termination signal of translation [6].

Another method applies quadruplet codons, which consist of an infrequently used triplet codon with an additional base [7–9]. Such a quadruplet is decoded by a modified tRNA containing a complementary quadruplet anticodon. Then, ncAA associated with this tRNA is added into a newly synthesized protein due to frameshifted open reading frame. However, the typical triplet can be decoded by a typical tRNA competitively, which decreases the efficiency of this procedure.

It is also possible to assign various sense codons to different ncAAs by withdrawing the cognate amino acid and aminoacyl-tRNA synthetase, and adding pre-charged ncAA-tRNAs bearing the corresponding anticodons [10–12]. This method, however, sacrifices a natural amino acid. A new method overcomes this problem and frees sense codons for ncAAs without elimination of natural ones [13]. This is achieved by utilization of appropriate synonymous codons, depletion of their corresponding tRNAs and addition of tRNAs pre-charged with ncAAs. This method enables expanding the repertoire to 23 potential ncAAs via division of multiple codon boxes [14] but can influence the efficiency and speed of translation as well as protein folding due to altered codon usage [15].

A weakness of these methods is that they rely on the set of four canonical bases, which can generate a limited set of codons, up to 64. Therefore, a promising approach is using unnatural base pairs, which can generate a much larger number of genuinely new codons. This approach does not interfere with the natural system because it does not involve the canonical codons, while the new ones are free of any natural role. Such experiments with at least three pairs of the fifth and the sixth nucleotide were already carried out and appeared promising [16–22]. Protein synthesis using this approach occurred successfully in semi-synthetic bacteria [23].

The inclusion of one pair of unnatural nucleotides can extend the SGC even up to 216 codons, which is nearly three times larger than the set of 64 canonical codons. The 152 new unassigned codons raise an exciting possibility of adding many unnatural amino acids or similar compounds and creating a new extended genetic code (EGC). Therefore, it is reasonable to pose a question about the rules according to which we can extend the code. There are many possibilities to do this. Here, we propose a way assuming that the genetic code should be a system resistant to point mutations, which can change the encoded information. In other words, we present a formal description of the genetic code expansion to minimize the cost of changing codons due to the mutations. The presented procedure of incremental expansion of the genetic code ensures robustness of the extended code against losing genetic information. This assumption seems attractive in the context of the hypothesis postulating that the genetic code evolved to minimize harmful consequences of mutations or mistranslations of coded proteins [24–33].

2. Methods

2.1. The extension of the standard genetic code

We start our investigation by applying a similar approach to that presented by [34], in which the SGC is described as a graph G(V₀, E₀), where V₀ is the set of vertices (nodes), whereas E₀ is the set of edges. V₀ corresponds to the set of 64 canonical codons using four natural nucleotides {A, T, G, C}, while the edges are induced by all possible single nucleotide substitutions between the codons. Therefore, the graph G(V₀, E₀) is a representation of all possible single-point mutations occurring between canonical codons.

In this work, we introduce a more general graph G(V, E), in which the set of vertices corresponds to 216 codons, using a six-letter alphabet, while the set of edges is defined in a similar way as E₀.

Definition 2.1. —

Let G(V, E) be a graph in which V is the set of vertices representing all possible 216 codons, whereas E is the set of edges connecting these vertices. All connections between the nodes fulfil the property that two nodes, i.e. codons u, v ∈ V, are connected by the edge e(u, v) ∈ E (u ∼ v), if and only if the codon u differs from the codon v in exactly one position.

In order to simplify our notation, we use further G instead of G(V, E). It is clear that the set of edges E of the graph G represents all possible single nucleotide substitutions, which occur between codons created by the set of natural nucleotides {A, T, G, C} as well as one pair of unnatural nucleotides {X, Y}. Assuming that all changes are equally probable, we obtain that G is an undirected, unweighted and regular graph with the vertices degree equal to 15. Moreover, the set of 64 canonical codons V₀ used in the SGC is a subset of V. Therefore, V₀ induces a subgraph G[V₀] of the graph G(V, E) according to the following definition.

Definition 2.2. —

If G(V, E) is a graph, and S ⊂ V is a subset of vertices of G, then the induced subgraph G[S] is the graph whose set of vertices is S and whose set of edges consists of the edges in E, which have both endpoints in S.

Following this definition, let us denote by V_n a subset of vertices (codons) involved in a given EGC with exactly n ≥ 1 non-canonical codons. This subset must fulfil the following property:

V_{0} \subset V_{n} \subseteq V,

i.e. V_n must be an extension of the set of canonical codons. As a result, we can define a graph G[V_n], which is a subgraph of the graph G generated by V_n. Therefore, the main goal of this work is to test the property of the graph G[V_n], which can be interpreted as a structural representation of the EGC. Thus, we develop methodology to describe features of the graph G.

2.2. The properties of the graph G

Interesting features of G can appear, when the set of vertices V is divided into the partition of eight disjoint and non-empty sets. It induces a specific connection between these vertices by edges. This partition includes also V₀, i.e. the set of natural codons.

Proposition 2.3. —

Let G(V, E) be a graph, where V represents the set of all possible 216 codons and E is the set of edges generated by single nucleotide substitutions. Then, the set of vertices V can be split unambiguously into eight disjoint subsets. These are V₀, B₁, B₂, B₃, B₁₂, B₁₃, B₂₃ and B₁₂₃, where

(a)
V₀ is the set of 64 canonical codons;

(b)
B₁ is the set of codons in which new nucleotides X or Y occur only in the first codon position;

(c)
B₂ is the set of codons in which new nucleotides X or Y occur only in the second codon position;

(d)
B₃ is the set of codons in which new nucleotides X or Y occur only in the third codon position;

(e)
B₁₂ is the set of codons in which new nucleotides X or Y occur in the first and the second codon position;

(f)
B₁₃ is the set of codons in which new nucleotides X or Y occur in the first and the third codon position;

(g)
B₂₃ is the set of codons in which new nucleotides X or Y occur in the second and the third codon position;

(h)
B₁₂₃ is the set of codons in which new nucleotides X or Y occur in all codon positions.

The number of elements, i.e. codons in theses sets are: |B₁| = |B₂| = |B₃| = 32, |B₁₂| = |B₂₃| = |B₁₃| = 16 and |B₁₂₃| = 8.

The graphical of relationships between these sets is presented in figure 1.

Based on such partition, we can investigate properties of the EGC. In order to do this, let us introduce the following notation. We denote another three subsets of V

C^{1} = V_{0} \cup B_{1} \cup B_{2} \cup B_{3},

2.1

The sets C¹ and $B_{12} \cup B_{13} \cup B_{23} \cup B_{123}$ are disjoint and constitute also a partition of V. We call the set $B_{1} \cup B_{2} \cup B_{3}$ ‘close neighbourhood’ of V₀ because it contains all codons that differ from the set V₀ in at most one position in a codon. In contrast to that, $B_{12} \cup B_{13} \cup B_{23} \cup B_{123}$ is not directly connected with V₀. Moreover, we introduce also the set C², defined as follows:

C^{2} = C^{1} \cup B_{12} \cup B_{23} \cup B_{13} .

2.2

It is clear that C² and B₁₂₃ are disjoint and also constitute a partition of V.

In the next proposition, we give several properties of edge connections between the selected sets of nodes.

Proposition 2.4. —

Let us consider the codon sets introduced in proposition 2.3 and two subsets of nodes C¹, C². Then we have the following properties:

(a)
Each codon c ∈ B_i, i = 1, 2, 3 has exactly four edges crossing from B_i to V₀;

(b)
Each codon c ∈ B_i, i = 1, 2, 3 has exactly four edges crossing from X_i to $B_{12} \cup B_{13} \cup B_{23} \cup B_{123};$

(c)
There does not exist any connection between $B_{12} \cup B_{13} \cup B_{23} \cup B_{123}$ and V₀;

(d)
Each codon c ∈ B_ij, i ≠ j, i, j = 1, 2, 3 has exactly eight edges crossing to C¹;

(e)
Each codon c ∈ B_ij, i = 1, 2, 3 has exactly two edges crossing from B_ij to B₁₂₃;

(f)
There does not exist any connection between B_ij and V₀;

(g)
There does not exist any connection between B₁₂₃ and C¹.

It is also interesting to describe some properties of subgraphs generated by codon sets B₁, B₂, B₃ and B₁₂, B₁₃, B₂₃, respectively. They are formulated in the following two lemmas:

Lemma 2.5. —

Graphs G[B₁], G[B₂] and G[B₃] are isomorphic to each other.

Proof. —

According to the definition of the graph isomorphism, there must exist a bijection f between G[B_i] and G[B_j], i ≠ j, i.e. f : G[B_i] → G[B_j] such that two vertices u, v are adjacent in G[B_i], if and only if f(u) and f(v) are adjacent in G[B_j]. In this case, such a bijection can be easily defined as a swap between respective codon positions, where nucleotides X and Y occur. ▪

We observe the same property in the case of codon sets B₁₂, B₁₃ and B₂₃. Thus, we can formulate a similar lemma:

Lemma 2.6. —

Graphs G[B₁₂], G[B₁₃] and G[B₂₃] are isomorphic to each other.

Proof. —

The proof is analogous to the proof of lemma 2.5. ▪

What is more, in the construction of the optimal EGC, we also use the fact that the graphs G[B_n], n ∈ {1, 2, 3, 12, 13, 23, 123}, can be represented as Cartesian products of other graphs. This important feature is presented in the following three propositions.

Proposition 2.7. —

The graph G[B₁], can be represented as a Cartesian product of graphs:

$G [B_{1}] = K_{2} K_{4} K_{4},$

where K₂ and K₄ are complete graphs of sizes two and four with the set of vertices {X, Y} and {A, T, G, C}, respectively. In this case, two vertices (x, y, z), (x′, y′, z′) are connected by the edge e((x, y, z), (x′, y′, z′)), if (x = x′ and y = y′ and z ∼ z′ ) or (x = x′ and y ∼ y′ and z = z′) or (x ∼ x′ and y = y′ and z = z′).

Proposition 2.8. —

The graph G[B₁₂], can be represented as a Cartesian product of graphs

$G [B_{12}] = K_{2} K_{2} K_{4},$

where K₂ and K₄ are complete graphs of sizes two and four with the set of vertices {X, Y} and {A, T, G, C}, respectively. In this case, two vertices (x, y, z), (x′, y′, z′) are connected by the edge e((x, y, z), (x′, y′, z′)), if (x = x′ and y = y′ and z ∼ z′ ) or (x = x′ and y ∼ y′ and z = z′) or (x ∼ x′ and y = y′ and z = z′).

Proposition 2.9. —

The graph G[B₁₂₃] can be represented as a Cartesian product of graphs

$G [B_{123}] = K_{2} K_{2} K_{2},$

where K₂ is a complete graph of size two with the set of vertices {X, Y}. In this case, two vertices (x, y, z), (x′, y′, z′) are connected by the edge e((x, y, z), (x′, y′, z′)), if (x = x′ and y = y′ and z ∼ z′) or (x = x′ and y ∼ y′ and z = z′) or (x ∼ x′ and y = y′ and z = z′).

2.3. The optimality of codon group

Similarly to [34], we introduce two measures describing properties of codon groups. They are the set conductance and the k-size conductance, which characterize the quality of a given codon sets in terms of non-synonymous mutations which lead to a replacement of one amino acid by another.

Definition 2.10. —

For a given graph G, let S be a subset of V. The conductance of S is defined as

$ϕ (S) = \frac{E (S, \bar{S})}{v o l (S)},$

where $E (S, \bar{S})$ is the number of edges of G crossing from S to its complement $\bar{S}$ and vol(S) is the sum of all degrees of the vertices belonging to S.

The measure ϕ(S) can be interpreted as a fraction of non-synonymous substitutions between S and $\bar{S}$ , if S is a group of codons encoding the same amino acid and $\bar{S}$ includes codons bearing other genetic information. It is interesting that the optimal codon group, in terms of its robustness to point mutations, should be characterized by low values of the set conductance. Therefore, the number of nucleotide substitutions that change a coded amino acid should be relatively small in comparison to the total number of all possible nucleotide mutations involving all codons belonging to the given set. In this context, it is also interesting to calculate the k-size-conductance ϕ_k(G), which is described as the minimal set conductance over all subsets of V with the fixed size k.

Definition 2.11. —

The k-size-conductance of the graph G, for k ≥ 1, is defined as:

$ϕ_{k} (G) = min_{S \subseteq V, | S | = k} ϕ (S) .$

In consequence, k · ϕ_k(G) gives us a lower bound on the number of edges going outside the set nodes of the size k and this characteristic is useful in describing the optimal codon structures.

3. Results

In this section, we present a step by step procedure which allows us to extend the SGC from 64 up to 216 meaningful codons. Codons are added to the code gradually in three stages. The first step extends the SGC to 160 codons, the second step to 208 codons and the third to all possible 216 codons. The EGC created at each stage must be optimal in terms of minimization of point mutations.

3.1. The optimal extension of the standard genetic code to 160 meaningful codons

Following the properties of the graph G, we formulate some characteristics, which are useful in describing the properties of the subgraph G[V_n], 1 ≤ n ≤ 96 induced by the set of codons V_n and at the same time in developing the optimal EGC. At the beginning, we propose some optimization criteria in order to find the best possible solution.

Using the notation from the previous sections, let us define

\bar{V_{n}} = V ∖ V_{n},

which is a set of unassigned codons. Moreover, let us denote by A_n a set of n new codons involved in a given genetic code extension

A_{n} = V_{n} ∖ V_{0},

where 1 ≤ |A_n| ≤ 96. Thanks to that, we can define two measures describing the properties of G[V_n]. They are

E (V_{0}, A_{n})

3.1

and

E (V_{n}, \bar{V_{n}}),

3.2

where E(V₀, A_n) is the total number of edges, extracted from the graph G, crossing from the set of canonical codons V₀ to A_n, whereas $E (V_{n}, \bar{V_{n}})$ is the total number of edges crossing from the set of codons which constitute the EGC V_n to unassigned codons.

Interestingly, by applying (3.1) and (3.2), it is possible to characterize the properties of a given subgraph G[V_n] and at the same time the EGC induced by the codons belonging to V_n. In the definition below, we give some conditions which constitute the EGC optimality. Thanks to that, we can find the best genetic code extended by 1 ≤ n ≤ 96 new codons.

Definition 3.1. —

The set $V_{n}^{*},$ $V_{0} \subset V_{n}^{*}$ with exactly 1 ≤ n ≤ 96 non-canonical codons is an optimal extension of SGC, if

$V_{n}^{*} = \underset{{V_{n} : V_{n} = V_{0} \cup A_{n}}}{\arg \min} E (V_{n}, \bar{V_{n}}),$ 3.3

where A_n possesses the feature

$A_{n} = \underset{{S : S \subseteq \bar{V_{0}}, | S | = n}}{\arg \max} E (V_{0}, S) .$ 3.4

These two restrictions have a sensible interpretation. By minimizing the condition (3.3), we reduce the possibility that a point mutation can generate a codon belonging to the ‘non-coding zone’ $\bar{V_{n}}$ , i.e. the set of unassigned codons. On the other hand, maximizing the value of A_n according to (3.4), we claim that the number of connections between two sets, namely, the canonical and newly assigned codons E(V₀, A_n) is as large as possible (figure 2).

Figure 2. — Examples of the optimal set of four new codons A₄ involved in a genetic code extension up to 160 codons.

These two assumptions maximize the number of connections between standard and newly incorporated codons and simultaneously decrease the probability of losing genetic information from the whole system due to point mutations. Therefore, we focus on the V_n sets, when $A_{n} = V_{n} ∖ V_{0}$ fulfils the property (3.4). Then, let us denote by

V_{n} = {V_{n} : V_{n} = V_{0} \cup A_{n}}

a class of all sets V_n with exactly n non-canonical codons and let us assume that $A_{n} = V_{n} ∖ V_{0}$ fulfils the property (3.4). It is clear that all optimal EGCs, in terms of (3.4) and (3.3), belong to $V_{n}$ .

These features appear to be very useful for characterizing possible extensions of the SGC. In the next theorem, we describe the optimal extension of the SGC up to 160 meaningful codons. Interestingly, this extension can be described in terms of k-size conductance ϕ_k(G[B_i]), i = 1, 2, 3 calculated for induced subgraphs G[B_i], i = 1, 2, 3. We begin our investigation with a lemma, which gives us some characterizations of the optimal sets V_n.

Lemma 3.2. —

Let $V_{n} \in V_{n}$ be a set of codons, where 1 ≤ n ≤ 96 and $A_{n} = V_{n} ∖ V_{0},$ then

$A_{n} \subseteq B_{1} \cup B_{2} \cup B_{3} .$

Proof. —

The proof of this lemma follows directly from proposition 2.4(a,c) and the definition of $V_{n}$ . ▪

Thanks to that, we can formulate a theorem, which gives us a lower bound on the number of edges crossing from V_n to its complement.

Theorem 3.3. —

Let $V_{n} \in V_{n}$ be a set of codons, where 1 ≤ n ≤ 96. Then the following inequality holds:

$E (V_{n}, \bar{V_{n}}) \geq E (V_{0}, \bar{V_{0}}) + \sum_{i = 1}^{3} n_{i} \cdot ϕ_{n_{i}} (G [B_{i}]) = 384 + \sum_{i = 1}^{3} n_{i} \cdot ϕ_{n_{i}} (G [B_{i}]),$

where G[B_i], i = 1, 2, 3 is the induced subgraph of G, and $n_{i} = | A_{n} \cap B_{i} |,$ n₁ + n₂ + n₃ = n.

Proof. —

We begin the proof with an observation

$E (V_{n}, \bar{V_{n}}) = E (V_{0}, \bar{V_{0}}) - E (V_{0}, A_{n}) + E (A_{n}, \bar{V_{n}}) .$ 3.5

Interestingly, following the definition 2.10, we can calculate the set conductance of V₀. In this case, we have ϕ(V₀) = 0.4. Hereby, we get immediately

$E (V_{0}, \bar{V_{0}}) = 64 \cdot 15 \cdot ϕ (V_{0}) = 384.$ 3.6

In addition, using proposition 2.4(b) we get the following equality:

$E (V_{0}, A_{n}) = 4 n .$

Therefore, we can rewrite the equality (3.5) in the following way:

$E (V_{n}, \bar{V_{n}}) = 384 - 4 n + E (A_{n}, \bar{V_{n}}) .$ 3.7

In our next step, we observe

$E (A_{n}, \bar{V_{n}}) = \sum_{i = 1}^{3} E (A_{n} \cap B_{i}, \bar{V_{n}} \cap B_{i}) + \sum_{i = 1}^{3} E (A_{n} \cap B_{i}, B_{12} \cup B_{13} \cup B_{23} \cup B_{123}),$

where $\sum_{i = 1}^{3} E (A_{n} \cap B_{i}, B_{12} \cup B_{13} \cup B_{23} \cup B_{123}) = 4 n$ according to proposition 2.4(b). As a consequence, we can reformulate equation (3.7) as follows:

$E (V_{n}, \bar{V_{n}}) = 384 - 4 n + \sum_{i = 1}^{3} E (A_{n} \cap X_{i}, \bar{V_{n}} \cap B_{i}) + 4 n .$

Furthermore, taking into account that the set $\bar{V_{n}} \cap B_{i} = B_{i} ∖ (A_{n} \cap B_{i})$ and using definitions 2.10 and 2.11, we have

$\begin{aligned} \sum_{i = 1}^{3} E (A_{n} \cap B_{i}, \bar{V_{n}} \cap B_{i}) \\ \geq \sum_{i = 1}^{3} min_{S \subseteq B_{i}, | S | = n_{i}} \frac{E (S, \bar{S})}{vol (S)} \cdot v o l (S) \\ \geq \sum_{i = 1}^{3} ϕ_{n_{i}} (G [B_{i}]) \cdot n_{i} . \end{aligned}$

Finally, we obtain

$E (V_{n}, \bar{V_{n}}) \geq 384 + \sum_{i = 1}^{3} n_{i} \cdot ϕ_{n_{i}} (G [B_{i}]) .$ 3.8

▪

Therefore, to extend the SGC using 1 ≤ n ≤ 96 codons in the optimal way according to the definition 3.1, we have to choose codons only from the sets B₁, B₂ and B₃. Interestingly, the lower bound on the value of $E (V_{n}, \bar{V_{n}})$ presented in this theorem depends on the n-size conductance of new codon groups. What is more, the EGC being optimal in terms of the definition 3.1 and including 160 codons is described by the set C¹ because in this case we get

E (C^{1}, \bar{C^{1}}) = E (V_{0}, \bar{V_{0}}) = 384.

3.9

3.2. The properties of the optimal genetic code including up to 160 meaningful codons

We pose a question about the properties of the optimal codon set for which the lower bound

E (V_{n}, \bar{V_{n}}) = 384 + \sum_{i = 1}^{3} n_{i} \cdot ϕ_{n_{i}} (G [B_{i}])

is attained under the additional restriction 1 ≤ n₁ + n₂ + n₃ ≤ 96, where n_i, i = 1, 2, 3 is the number of new codons introduced into EGC and belonging to B_i, i = 1, 2, 3, respectively. Moreover, it is also interesting to find the best possible genetic code extension for every 1 ≤ n ≤ 96.

We begin our consideration with presenting some features of induced graphs G[B_i], i = 1, 2, 3. These properties allow us to describe the optimal codon group in terms of ϕ_k(G[B_i]). Following lemma 2.5, we get that G[B_i], i = 1, 2, 3 are isomorphic to each other, hereby it is enough to consider the properties of the graph G[B₁] (figure 3) because all potential code structures and also their properties can be transmitted unambiguously from B₁ to B₂ and B₃. Since the graph G[B₁] has a representation as a Cartesian product of graphs (see proposition 2.7), in the light of theorem 2.3 from [35], we get that the collection of the first n vertices of G[B₁] taken in the lexicographic order is characterized by the set conductance values, which are optimal in terms of k-size conductance. Therefore, for every V_n, we can find a lower bound, i.e. an EGC which is composed of the subsets of lexicographically ordered codons belonging to $B_{1} \cup B_{2} \cup B_{3}$ .

Figure 3. — The graphical of the graph G[B₁] which is an induced subgraph of the graph G(V, E). Each node is a codon belonging to the set B₁, B₁ ⊂ V, whereas its edges are taken from the set E.

In table 1, we present the list of all G[B₁] nodes taken in the selected lexicographic order. What is more, we evaluate also all possible k-size conductance values for the respective sets. Using these results, we can propose a method for finding the best possible genetic code extension in the class $V_{n}, 1 \leq n \leq 96$ . Let us start with the following observation: if n₁, n₂ and n₃ defined in theorem 3.3 fulfil the condition n₁ + n₂ + n₃ ≤ 32, then we get the following inequality:

3 \cdot min (ϕ_{n_{1}} (G [B_{i}]), ϕ_{n_{2}} (G [B_{i}]), ϕ_{n_{3}} (G [B_{i}])) \geq ϕ_{n_{1} + n_{2} + n_{3}} (G [B_{i}]) .

This formula results from the fact that the calculated values of ϕ_n(G[B₁]) decrease, in general, with the size of codon groups n (table 1). Therefore, to create the optimal genetic code extension $V_{n}^{*}$ , it is enough to choose new codons from the set B_i until the total number of codons n exceeds 32. Then, this procedure should be continued and additional codons from the next B_i-type set should be selected until the total number of codons reaches 64.

Table 1.

The sequence of codons composing the set B₁. They are ordered according to a selected lexicographic order. The values of the k-size conductance ϕ_k(G[B₁]) calculated for the first k codons in order are also presented.

codon	k	ϕ_k(G[B₁])
XAA	1	1
XAT	2	0.857
XAG	3	0.714
XAC	4	0.571
ATA	5	0.600
XTT	6	0.571
XTG	7	0.510
XTC	8	0.428
XGA	9	0.428
XGT	10	0.400
XGG	11	0.350
XGC	12	0.285
XCA	13	0.274
XCT	14	0.244
XCG	15	0.200
XCC	16	0.143
YAA	17	0.176
YAT	18	0.190
YAG	19	0.188
YAC	20	0.171
YTA	21	0.184
YTT	22	0.182
YTG	23	0.168
YTC	24	0.143
YGA	25	0.143
YGT	26	0.132
YGG	27	0.111
YGC	28	0.082
YCA	29	0.074
YCT	30	0.057
YCG	31	0.032
YCC	32	0

Open in a new tab

3.3. The optimal extension of the standard genetic code with more than 160 meaningful codons

In order to extend the genetic code over 160 meaningful codons, we have to make some observations. From (3.9), we get immediately that C¹ is the best genetic code extension involving 96 additional codons. In addition, applying proposition 2.4(c), we get that C¹ includes all non-canonical codons that are directly connected with V₀. As a result, the condition (3.4) is non-restrictive in the case when we try to extend V₀ in consecutive steps using definition 3.1 for n > 96. Therefore, we propose to reformulate the problem of optimal V₀ extension into the question of optimal extension of the C¹ set.

Let us denote by V^′_n a set of codons such that C¹⊆ V^′_n with exactly n, 1 ≤ n ≤ 48 new codons in comparison to C¹. Therefore, the optimal genetic code extension can be characterized in the following way.

Definition 3.4. —

The set $V_{n}^{^{'} *}, C^{1} \subset V_{n}^{^{'} *}$ with exactly n additional codons is optimal if

$V_{n}^{^{'} *} = {\arg \min}_{{V_{n}^{'} : V_{n}^{'} = C^{1} \cup A_{n}^{'},}} E (V_{n}^{'}, \bar{V_{n}^{'}}),$ 3.10

where A^′_n possess the feature

$A_{n}^{'} = {\arg \max}_{{S : S \subseteq \bar{C^{1}}, | S | = n}} E (C^{1}, S) .$ 3.11

Similarly to the method presented in the previous subsections, we introduce a definition which is useful in describing the optimality of the EGC.

Definition 3.5. —

Let us define by ${V^{'}}_{n}$ a class of sets V^′_n, whose 1 ≤ n ≤ 48 additional codons and $A_{n}^{'} = V_{n}^{'} ∖ C^{1}$ fulfil the property (3.11). Then,

${V^{'}}_{n} = {V_{n}^{'} : V_{n}^{'} = C^{1} \cup A_{n}^{'}},$

is a class of all possible extensions of the C¹ set with exactly n new codons.

Thanks to that, we are able to give the optimal C¹ extension with a given size n. In order to increase the SGC by over 160 codons in total, it is enough to extend the set C¹ by incorporating new codons from the set

B_{12} \cup B_{13} \cup B_{23} \cup B_{123},

in such a way that the number of connections between a new code and its complement is minimized according to the condition (3.10), whereas the number of possible connections between the ‘basic’ coding system C¹ and newly added codons is maximized at the same time according to the condition (3.11).

Interestingly, we can find the optimal C¹ extension for 1 ≤ n ≤ 48 in a similar way to that presented in §3.1. We begin by introducing the following lemma.

Lemma 3.6. —

Let $V_{n}^{'} \in {V^{'}}_{n}$ be a set of codons where 1 ≤ n ≤ 48 and $A_{n}^{'} = V_{n}^{'} ∖ C^{1}$ . If A^′_n fulfils the condition (3.11), then

$A_{n}^{'} \subseteq B_{12} \cup B_{13} \cup B_{23} \cup B_{123} .$

Proof. —

The proof of this lemma follows directly from proposition 2.4(d,g). ▪

Then, we can formulate the following theorem.

Theorem 3.7. —

Let $V_{n}^{'} \in {V^{'}}_{n}$ be a set of codons, where 1 ≤ n ≤ 48 and $A_{n}^{'} = V_{n}^{'} ∖ C^{1}$ fulfil the condition (3.11). Then the following inequality holds:

$E (V_{n}^{'}, \bar{V_{n}^{'}}) \geq 384 - 6 n + \sum_{i j} n_{i j} \cdot ϕ_{n_{i j}} (G [B_{i j}]),$

where $n_{i j} = | A_{n}^{'} \cap X_{i j} |, \sum_{i j} n_{i j} = n$ and G[B_ij] is the induced subgraph of G.

Proof. —

Similarly to the proof of theorem 3.3, we start with the equation:

$E (V_{n}^{'}, \bar{V_{n}^{'}}) = E (C^{1}, \bar{C^{1}}) - E (C^{1}, A_{n}^{'}) + E (A_{n}^{'}, \bar{V_{n}^{'}}) .$ 3.12

Using equation (3.9) and proposition 2.4(d,g), we get immediately two equalities

$E (C^{1}, \bar{C^{1}}) = 384, E (C^{1}, A_{n}^{'}) = 8 n .$

Therefore, we can rewrite equation (3.12) in the following way:

$E (V_{n}^{'}, \bar{V_{n}^{'}}) = 384 - 8 n + E (A_{n}^{'}, \bar{V_{n}^{'}}) .$

In the next step, we make a simple observation

$E (A_{n}^{'}, \bar{V_{n}^{'}}) = \sum_{i j} E (A_{n}^{'} \cap B_{i j}, \bar{V_{n}^{'}} \cap B_{i j}) + \sum_{i j} E (A_{n}^{'} \cap B_{i j}, B_{123}),$

where $\sum_{i j} E (A_{n}^{'} \cap B_{i j}, B_{123}) = 2 n$ according to proposition 2.4(e). Then following the definitions 2.10 and 2.11, we get:

$\sum_{i j} E (A_{n}^{'} \cap X_{i j}, \bar{V_{n}^{'}} \cap B_{i j}) \geq \sum_{i j} n_{i j} \cdot ϕ_{n_{i j}} (G [B_{i j}]) .$

In consequence, we can reformulate the inequality 3.12 as follows:

$E (V_{n}^{'}, \bar{V_{n}^{'}}) \geq 384 - 6 n + \sum_{i j} n_{i j} \cdot ϕ_{n_{i j}} (G [B_{i j}]) .$

▪

As a result, we found the lower bound of the value of $E (V_{n}^{'}, \bar{V_{n}^{'}})$ , where the size n of the set V^′_n is a number between 1 ≤ n ≤ 48. Similarly to theorem 3.3, the optimality of the EGC depends strongly on the properties of newly created codon groups. Clearly, the best codon groups attain the n-size conductance values $ϕ_{n_{i j}} (G [B_{i j}])$ for their size n_ij. What is more, the optimal EGC, in terms of the definition 3.4 with 208 codons in total, is described by the set C² because in this case we get

E (C^{2}, \bar{C^{2}}) = 384 - 6 \cdot 48 = 96.

3.13

3.4. The optimal codon block structures including up to 208 meaningful codons

As was mentioned in the previous section, the properties of the newly incorporated codons have a decisive impact on the optimality of the EGC. Applying theorem 3.7, the lower limitation on the value of $E (V_{n}^{'}, \bar{V_{n}^{'}})$ , under the condition (3.11), is determined by the codon blocks that are optimal in terms of the k-size conductance. Following the results presented in §3.2, we have to consider some properties of induced subgraphs G[B_ij], ij = 12, 13, 23, because they allow us to describe the optimal codon groups. Using lemma 2.6, we obtain that graphs G[B_ij] are isomorphic to each other. Thanks to that, it is sufficient to consider the properties of the graph G[B₁₂] (figure 4). Similarly to the previous results, G[B₁₂] can be represented as a Cartesian product of graphs(lemma 2.8). Therefore, using again theorem 2.3 from [35], we obtain that the set of the first n codons (nodes) of G[B₁₂] ordered in the lexicographic order possess the optimal k-size conductance ϕ_k(G[B₁₂]). In table 2, we present the list of all G[B₁₂] nodes ordered in the lexicographic order. What is more, we evaluated also all possible values of ϕ_k(G[B₁₂]) for the sets composed of the first k nodes.

Figure 4. — The graphical of the graph G[B₁₂] which is the induced subgraph of the graph G(V, E). Each node is a codon belonging to the set B₁₂, B₁₂ ⊂ V, whereas the edges are incorporated from the set E.

Table 2.

The codons composing the set B₁₂. They are arranged according to a selected lexicographic order. The values of the k-size conductance ϕ_k(G[B₁₂]) calculated for the first k codons in order are also presented.

codon	k	ϕ_k(G[B₁₂])
XXA	1	1
XXT	2	0.1
XXG	3	0.6
XXC	4	0.4
AYA	5	0.440
XYT	6	0.4
XYG	7	0.314
XYC	8	0.2
YXA	9	0.244
YXT	10	0.240
YXG	11	0.2
YXC	12	0.133
YYA	13	0.138
YYT	14	0.114
YYG	15	0.067
YYC	16	0

Open in a new tab

Similarly to the previous results, the best genetic code extensions, namely, $V_{n}^{^{'} *}, 1 \leq n \leq 48$ have the nested structure of the optimal codon blocks. It can be obtained by addition of the subsequent codons according to their lexicographic order. The new codons are selected from the subsequent sets of type B_ij until the total number of included codons in a given set reaches 16.

3.5. The optimal extension of the standard genetic code up to 216 codons

The methodology presented in the previous section allows us to extend C¹ up to the C² set of codons involving 208 out of 216 possible codons. In order to extend the genetic code by over 208 meaningful codons, we must conduct a reasoning. From (3.13), we get that C² is the best C¹ extension involving 48 additional codons. What is more, applying proposition 2.4(g) we get that C² includes all non-standard codons, which are connected to C¹. As a result, the property 3.11 is not restrictive in the case when we try to extend C¹ in consecutive steps using definition 3.4 for n > 48. Therefore, similarly to the method presented in the previous section, we reformulate the problem of the optimal C¹ extension into the question of the optimal extension of the C² set.

Definition 3.8. —

The set $V_{n}^{^{″} *}$ , $C^{2} \subset V_{n}^{^{″} *}$ with exactly 1 ≤ n ≤ 8 additional codons is optimal if

$V_{n}^{^{″} *} = {\arg \min}_{{V_{n}^{″} : V_{n}^{″} = C^{2} \cup A_{n}^{″},}} E (V_{n}^{″}, \bar{V_{n}^{″}}),$ 3.14

where A″_n possess one additional feature

$A_{n}^{″} = {\arg \max}_{{S : S \subseteq \bar{C^{2}}, | S | = n}} E (C^{2}, S) .$ 3.15

We introduce also a definition which is useful in describing the optimality of the genetic code extension.

Definition 3.9. —

Let us denote by ${V^{″}}_{n}$ a class of sets C² ⊂ V″_n with n ≥ 1 additional codons and $A_{n}^{″} = V_{n}^{″} ∖ C^{2}$ fulfils the property (3.15). Then

${V^{″}}_{n} = {V_{n}^{″} : V_{n}^{″} = C^{2} \cup A_{n}^{″}}$

is a class of all possible extensions of the C² set with exactly n, 1 ≤ n ≤ 8 additional codons.

Using definition 3.8 of optimality, we get the following characterization of the set $V_{n}^{^{″} *}$ .

Theorem 3.10. —

For every 1 ≤ n ≤ 8, the following equation holds

$V_{n}^{^{″} *} = C^{2} \cup A_{n}^{″},$

where $V_{n}^{^{″} *} \in V_{n}^{″}$ and A″_n⊆ B₁₂₃ is optimal in terms of ϕ_k(G[B₁₂₃]).

Proof. —

The proof of this theorem is an immediate consequence of proposition 2.4(g) and definition 2.11. ▪

Furthermore, the induced subgraph G[B₁₂₃] (figure 5) can be also represented as a Cartesian product of graphs (proposition 2.9). Using again theorem 2.3 from [35], we obtain that the collection of the first n codons of G[B₁₂₃] taken in the lexicographic order possess the optimal k-size conductance ϕ_k(G[B₁₂₃]). In table 3, we present the list of all G[B₁₂₃] nodes taken in the selected lexicographic order. We evaluated also all possible values of the k-size conductance for the sets composed of the first k nodes.

Table 3.

The sequence of codons composing the set B₁₂₃. They are ordered according to a selected lexicographic order. The values of the k-size conductance ϕ_k(G[B₁₂₃]) calculated for the first k codons in order are also presented.

codon	k	ϕ_k(G[B₁₂₃])
XXX	1	1
XXY	2	0.667
XYX	3	0.556
XYY	4	0.333
YXX	5	0.333
YXY	6	0.222
YYX	7	0.143
YYY	8	0

Open in a new tab

4. Discussion

The huge number of combinations of non-canonical and canonical bases in the creation of new codon groups means that the genetic code can be extended in various ways. Here, we propose the SGC extension in three steps consisting of the addition of codons including an increasing number of non-canonical bases. These steps extend the genetic code to 160 meaningful codons, then to 208 codons and finally to 216 codons. The extension of the SGC proposed by us is a general approach, which does not take into account properties of coded amino acids or other compounds associated with the newly added codons. We focused on the global structure of the code including arrangement of codons in groups (blocks) coding a given amino acid which differed usually in one codon position. The codons are added according to a fixed lexicographic order, which makes the EGC robust to changes causing the loss of genetic information. This approach conforms the assumption of the adaptation hypothesis, which postulates that the SGC evolved to minimize harmful consequences of mutations or mistranslations of coded proteins [24,29,32,33]. The SGC turned out to be quite well optimized in this respect when compared with a sample of randomly generated codes [25,26,28,31,36] but the application of optimization algorithms revealed that the SGC is not perfectly optimized in this respect and more robust codes can be found [34,37–44]. The minimization of mutation errors is important from a biological point of view, because it protects organisms against losing genetic information. Then, the reduction of the mutational load seems favoured by biological systems and can occur directly at the level of the mutational pressure [45–49]. Nevertheless, in the global scale, the SGC shows a general tendency to error minimization [37,44], which is more exhibited by its alternative versions [50], evolved later. Therefore, the extension of the SGC according to this rule seems to be a natural consequence of its evolution.

Our approach assumes a stepwise extension of the code similarly to the gradual addition of new amino acids to the evolving primordial SGC, when they were produced by increasingly more complex biosynthetic pathways evolving in parallel [51–60]. The addition of amino acids was also driven by the selection for the increasing diversity of amino acids [61–64] as well as decreasing disruption of already coded proteins and their composition [65]. The similar assumptions are included in our model, which assumes the minimization of differences between the newly added codons and those already defined in the code. The codons added in the first step contain, besides two canonical bases (N), only one non-canonical base (X), i.e. XNN, NXN and NNX, thus differing from the typical codons (NNN) in only one position. The next added codons include already two non-canonical bases, i.e. XXN, XNX and NXX. Finally, codons consisting exclusively of three non-canonical bases (XXX) can be used to extend the code. This method of codon addition causes the newly added codons to differ from the current ones in one point mutation.

Thanks to this gradual addition of new codons with assigned new amino acids, the whole system, i.e. an organism with the extended code, can have a quite high probability of surviving. The inclusion of new codons that differ in one mutation step between themselves and the canonical ones means that in the case of such mutation there is a small probability of undefined codons being generated, which could cause premature termination of translation of coded products and their non-functionality. Assuming that reverse mutations are more frequent, i.e. substitutions of a non-canonical base by a canonical one, the EGC can be reduced with time to the SGC. The stepwise addition can also give an organism time for adaptation and tuning the molecular processes to the new products. Moreover, it enables better monitoring and control of the organism's modification.

However, from an experimentalist point of view, such reversions would not be desirable because the modified system would revert to the original one. Therefore, we can imagine an alternative way of the genetic code extension by adding codons that cannot be mutated in a single step to the already defined codons in the code. To extend the SGC in this way, the first added codon sets should contain at least two non-canonical bases, i.e. XXX or XXN, XNX and NXX. Then, any single mutation of these codons would cause generation of undefined codons and organisms bearing such a mutation could be naturally eliminated from the whole population if the mutation is deleterious. Our model of the SGC extension can be upgraded to include properties of newly added amino acids or other compounds which are introduced into the code.

Supplementary Material

Reviewer comments

rsos191384_review_history.pdf^{(520.2KB, pdf)}

Acknowledgements

We are very grateful to two anonymous reviewers for their insightful comments and remarks, which significantly improved the manuscript.

Data accessibility

All the data generated or analysed during this study are included in this published article.

Authors' contributions

Conceived and designed the study: P.B. and P.M. Wrote and corrected the paper: P.B., P.M., M.W. and D.M. Responded to reviewers: P.B. and P.M. All authors participated in the improvement of the manuscript and approved the final version.

Competing interests

The authors declare that they have no competing interests.

Funding

This work was supported by the National Science Centre, Poland (Narodowe Centrum Nauki, Polska) under grant no. 2017/27/N/NZ2/00403.

References

1.Chin JW. 2014. Expanding and reprogramming the genetic code of cells and animals. Annu. Rev. Biochem. 83, 379–408. ( 10.1146/annurev-biochem-060713-035737) [DOI] [PubMed] [Google Scholar]
2.Chin JW. 2017. Expanding and reprogramming the genetic code. Nature 550, 53–60. ( 10.1038/nature24031) [DOI] [PubMed] [Google Scholar]
3.Italia JS, Addy PS, Wrobel CJ, Crawford LA, Lajoie MJ, Zheng Y, Chatterjee A. 2017. An orthogonalized platform for genetic code expansion in both bacteria and eukaryotes. Nat. Chem. Biol. 13, 446–450. ( 10.1038/nchembio.2312) [DOI] [PubMed] [Google Scholar]
4.Noren CJ, Anthony-Cahill SJ, Griffith MC, Schultz PG. 1989. A general method for site-specific incorporation of unnatural amino acids into proteins. Science 244, 182–8. ( 10.1126/science.2649980) [DOI] [PubMed] [Google Scholar]
5.Young DD, Schultz PG. 2018. Playing with the molecules of life. ACS Chem. Biol. 13, 854–870. ( 10.1021/acschembio.7b00974) [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Ozer E, Chemla Y, Schlesinger O, Aviram HY, Riven I, Haran G, Alfonta L. 2017. In vitro suppression of two different stop codons. Biotechnol. Bioeng. 114, 1065–1073. ( 10.1002/bit.26226) [DOI] [PubMed] [Google Scholar]
7.Anderson JC, Wu N, Santoro SW, Lakshman V, King DS, Schultz PG. 2004. An expanded genetic code with a functional quadruplet codon. Proc. Natl Acad. Sci. USA 101, 7566–7571. ( 10.1073/pnas.0401517101) [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Hohsaka T, Ashizuka Y, Murakami H, Sisido M. 1996. Incorporation of nonnatural amino acids into streptavidin through in vitro frame-shift suppression. J. Am. Chem. Soc. 118, 9778–9779. ( 10.1021/ja9614225) [DOI] [Google Scholar]
9.Neumann H, Wang K, Davis L, Garcia-Alai M, Chin JW. 2010. Encoding multiple unnatural amino acids via evolution of a quadruplet-decoding ribosome. Nature 464, 441–444. ( 10.1038/nature08817) [DOI] [PubMed] [Google Scholar]
10.Forster AC, Tan ZP, Nalam MNL, Lin HN, Qu H, Cornish VW, Blacklow SC. 2003. Programming peptidomimetic syntheses by translating genetic codes designed de novo. Proc. Natl Acad. Sci. USA 100, 6353–6357. ( 10.1073/pnas.1132122100) [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Goto Y, Katoh T, Suga H. 2011. Flexizymes for genetic code reprogramming. Nat. Protoc. 6, 779–790. ( 10.1038/nprot.2011.331) [DOI] [PubMed] [Google Scholar]
12.Josephson K, Hartman MCT, Szostak JW. 2005. Ribosomal synthesis of unnatural peptides. J. Am. Chem. Soc. 127, 11 727–11 735. ( 10.1021/ja0515809) [DOI] [PubMed] [Google Scholar]
13.Tajima K, Katoh T, Suga H. 2018. Genetic code expansion via integration of redundant amino acid assignment by finely tuning trna pools. Curr. Opin Chem. Biol. 46, 212–218. ( 10.1016/j.cbpa.2018.07.010) [DOI] [PubMed] [Google Scholar]
14.Iwane Y, Hitomi A, Murakami H, Katoh T, Goto Y, Suga H. 2016. Expanding the amino acid repertoire of ribosomal polypeptide synthesis via the artificial division of codon boxes. Nat. Chem. 8, 317–325. ( 10.1038/nchem.2446) [DOI] [PubMed] [Google Scholar]
15.Plotkin JB, Kudla G. 2011. Synonymous but not the same: the causes and consequences of codon bias. Nat. Rev. Genet. 12, 32–42. ( 10.1038/nrg2899) [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Dien VT, Morris SE, Karadeema RJ, Romesberg FE. 2018. Expansion of the genetic code via expansion of the genetic alphabet. Curr. Opin Chem. Biol. 46, 196–202. ( 10.1016/j.cbpa.2018.08.009) [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Hamashima K, Kimoto M, Hirao I. 2018. Creation of unnatural base pairs for genetic alphabet expansion toward synthetic xenobiology. Curr. Opin Chem. Biol. 46, 108–114. ( 10.1016/j.cbpa.2018.07.017) [DOI] [PubMed] [Google Scholar]
18.Ishikawa M, Hirao I, Yokoyama S. 2000. Synthesis of 3-(2-deoxy-beta-d-ribofuranosyl)pyridin-2-one and 2-amino-6-(n,n-dimethylamino)-9-(2-deoxy-beta-d-ribofuranosyl)purine derivatives for an unnatural base pair. Tetrahedron Lett. 41, 3931–3934. ( 10.1016/S0040-4039(00)00520-7) [DOI] [Google Scholar]
19.Kimoto M, Kawai R, Mitsui T, Yokoyama S, Hirao I. 2009. An unnatural base pair system for efficient PCR amplification and functionalization of DNA molecules. Nucleic Acids Res. 37, e14 ( 10.1093/nar/gkn956) [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Malyshev DA, Seo YJ, Ordoukhanian P, Romesberg FE. 2009. PCR with an expanded genetic alphabet. J. Am. Chem. Soc. 131, 14 620–14 621. ( 10.1021/ja906186f) [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Ohtsuki T, Kimoto M, Ishikawa M, Mitsui T, Hirao I, Yokoyama S. 2001. Unnatural base pairs for specific transcription. Proc. Natl Acad. Sci. USA 98, 4922–4925. ( 10.1073/pnas.091532698) [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Yang Z, Sismour AM, Sheng P, Puskar NL, Benner SA. 2007. Enzymatic incorporation of a third nucleobase pair. Nucleic Acids Res. 35, 4238–49. ( 10.1093/nar/gkm395) [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Zhang Y, Ptacin JL, Fischer EC, Aerni HR, Caffaro CE, San Jose K, Feldman AW, Turner CR, Romesberg FE. 2017. A semi-synthetic organism that stores and retrieves increased genetic information. Nature 551, 644–647. ( 10.1038/nature24659) [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Epstein CJ. 1966. Role of the amino-acid ‘code’ and of selection for conformation in the evolution of proteins. Nature 210, 25–28. ( 10.1038/210025a0) [DOI] [PubMed] [Google Scholar]
25.Freeland SJ, Hurst LD. 1998. The genetic code is one in a million. J. Mol. Evol. 47, 238–248. ( 10.1007/PL00006381) [DOI] [PubMed] [Google Scholar]
26.Freeland SJ, Hurst LD. 1998. Load minimization of the genetic code: history does not explain the pattern. Proc. R. Soc. B 265, 2111–2119. ( 10.1098/rspb.1998.0547) [DOI] [Google Scholar]
27.Freeland SJ, Wu T, Keulmann N. 2003. The case for an error minimizing standard genetic code. Orig. Life Evol. Biosph. 33, 457–477. ( 10.1023/A:1025771327614) [DOI] [PubMed] [Google Scholar]
28.Gilis D, Massar S, Cerf NJ, Rooman M. 2001. Optimality of the genetic code with respect to protein stability and amino-acid frequencies. Genome Biol. 2, research0049.1–research0049.12. ( 10.1186/gb-2001-2-11-research0049) [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Goldberg AL, Wittes RE. 1966. Genetic code: aspects of organization. Science 153, 420–424. ( 10.1126/science.153.3734.420) [DOI] [PubMed] [Google Scholar]
30.Goodarzi H, Najafabadi HS, Torabi N. 2005. Designing a neural network for the constraint optimization of the fitness functions devised based on the load minimization of the genetic code. BioSystems 81, 91–100. ( 10.1016/j.biosystems.2005.02.002) [DOI] [PubMed] [Google Scholar]
31.Haig D, Hurst LD. 1991. A quantitative measure of error minimization in the genetic code. J. Mol. Evol. 33, 412–417. ( 10.1007/BF02103132) [DOI] [PubMed] [Google Scholar]
32.Sonneborn T. 1965. Degeneracy of the genetic code: extent, nature, and genetic implications. New York, NY: Academic Press, pp. 377–397. [Google Scholar]
33.Woese CR. 1965. On the evolution of the genetic code. Proc. Natl Acad. Sci. USA 54, 1546–1552. ( 10.1073/pnas.54.6.1546) [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Błażej P, Kowalski D, Mackiewicz D, Wnetrzak M, Aloqalaa D, Mackiewicz P. 2018. The structure of the genetic code as an optimal graph clustering problem. bioRxiv 332478 https://www.biorxiv.org/content/early/2018/05/28/332478 [DOI] [PubMed] [Google Scholar]
35.Bezrukov SL. 1999. Edge isoperimetic problems on graphs. Graph Theory Comb. Biol. 7, 157–197. Akademia Kiado, Budapest [Google Scholar]
36.Freeland SJ, Knight RD, Landweber LF. 2000. Measuring adaptation within the genetic code. Trends Biochem. Sci. 25, 44–45. ( 10.1016/S0968-0004(99)01531-5) [DOI] [PubMed] [Google Scholar]
37.Błażej P, Wnetrzak M, Mackiewicz D, Mackiewicz P. 2018. Optimization of the standard genetic code according to three codon positions using an evolutionary algorithm. PLoS ONE 13, e0201715 ( 10.1371/journal.pone.0201715) [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Błażej P, Wnetrzak M, Mackiewicz D, Mackiewicz P. 2019. The influence of different types of translational inaccuracies on the genetic code structure. BMC Bioinf. 20, 114 ( 10.1186/s12859-019-2661-4) [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Błażej P, Wnetrzak M, Mackiewicz P. 2016. The role of crossover operator in evolutionary-based approach to the problem of genetic code optimization. BioSystems 150, 61–72. ( 10.1016/j.biosystems.2016.08.008) [DOI] [PubMed] [Google Scholar]
40.Massey SE. 2008. A neutral origin for error minimization in the genetic code. J. Mol. Evol. 67, 510–516. ( 10.1007/s00239-008-9167-4) [DOI] [PubMed] [Google Scholar]
41.Novozhilov AS, Wolf YI, Koonin EV. 2007. Evolution of the genetic code: partial optimization of a random code for robustness to translation error in a rugged fitness landscape. Biol. Direct 2, 24 ( 10.1186/1745-6150-2-24) [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Santos J, Monteagudo A. 2017. Inclusion of the fitness sharing technique in an evolutionary algorithm to analyze the fitness landscape of the genetic code adaptability. BMC Bioinf. 18, 195 ( 10.1186/s12859-017-1608-x) [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Santos MAS, Gomes AC, Santos MC, Carreto LC, Moura GR. 2011. The genetic code of the fungal CTG clade. C. R. Biol. 334, 607–611. ( 10.1016/j.crvi.2011.05.008) [DOI] [PubMed] [Google Scholar]
44.Wnetrzak M, Błażej P, Mackiewicz D, Mackiewicz P. 2018. The optimality of the standard genetic code assessed by an eight-objective evolutionary algorithm. BMC Evol. Biol. 18, 192 ( 10.1186/s12862-018-1304-0) [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Błażej P, Mackiewicz D, Grabinska M, Wnetrzak M, Mackiewicz P. 2017. Optimization of amino acid replacement costs by mutational pressure in bacterial genomes. Sci. Rep. 7, 1061 ( 10.1038/s41598-017-01130-7) [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Błażej P, Mackiewicz P, Cebrat S, Wanczyk M. 2013. Using evolutionary algorithms in finding of optimized nucleotide substitution matrices. In Genetic and evolutionary computation conference, GECCO ’13, Amsterdam, The Netherlands, 6–10 July 2013, Companion Material Proceedings, pp. 41–42.
47.Błażej P, Miasojedow B, Grabinska M, Mackiewicz P. 2015. Optimization of mutation pressure in relation to properties of protein-coding sequences in bacterial genomes. PLoS ONE 10, e0130411 ( 10.1371/journal.pone.0130411) [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Dudkiewicz A. et al. 2005. Correspondence between mutation and selection pressure and the genetic code degeneracy in the gene evolution. Future Gener. Comput. Syst. 21, 1033–1039. ( 10.1016/j.future.2004.03.003) [DOI] [Google Scholar]
49.Mackiewicz P, Biecek P, Mackiewicz D, Kiraga J, Baczkowski K, Sobczynski M, Cebrat S. 2008. Optimisation of asymmetric mutational pressure and selection pressure around the universal genetic code. In Computational Science - ICCS 2008, Pt 3, Lecture Notes in Computer Science, 5103, pp. 100–109.
50.Błażej P, Wnetrzak M, Mackiewicz D, Gagat P, Mackiewicz P. 2019. Many alternative and theoretical genetic codes are more robust to amino acid replacements than the standard genetic code. J. Theor. Biol. 464, 21–32. ( 10.1016/j.jtbi.2018.12.030) [DOI] [PubMed] [Google Scholar]
51.Di Giulio M. 1997. The origin of the genetic code. Trends Biochem. Sci. 22, 49–50. ( 10.1016/S0968-0004(97)84911-0) [DOI] [PubMed] [Google Scholar]
52.Di Giulio M. 2004. The coevolution theory of the origin of the genetic code. Phys. Life Rev. 1, 128–137. ( 10.1016/j.plrev.2004.05.001) [DOI] [Google Scholar]
53.Di Giulio M. 2008. An extension of the coevolution theory of the origin of the genetic code. Biol. Direct 3, 37 ( 10.1186/1745-6150-3-37) [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Di Giulio M. 2016. The lack of foundation in the mechanism on which are based the physico-chemical theories for the origin of the genetic code is counterposed to the credible and natural mechanism suggested by the coevolution theory. J. Theor. Biol. 399, 134–40. ( 10.1016/j.jtbi.2016.04.005) [DOI] [PubMed] [Google Scholar]
55.Di Giulio M. 2017. Some pungent arguments against the physico-chemical theories of the origin of the genetic code and corroborating the coevolution theory. J. Theor. Biol. 414, 1–4. ( 10.1016/j.jtbi.2016.11.014) [DOI] [PubMed] [Google Scholar]
56.Di Giulio M, Medugno M. 1999. Physicochemical optimization in the genetic code origin as the number of codified amino acids increases. J. Mol. Evol. 49, 1–10. ( 10.1007/PL00006522) [DOI] [PubMed] [Google Scholar]
57.Guimaraes RC. 2011. Metabolic basis for the self-referential genetic code. Orig. Life Evol. Biosph. 41, 357–371. ( 10.1007/s11084-010-9226-x) [DOI] [PubMed] [Google Scholar]
58.Wong JT. 1975. A co-evolution theory of the genetic code. Proc. Natl Acad. Sci. USA 72, 1909–1912. ( 10.1073/pnas.72.5.1909) [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Wong JT, Ng SK, Mat WK, Hu T, Xue H. 2016. Coevolution theory of the genetic code at age forty: pathway to translation and synthetic life. Life (Basel) 6, 12 ( 10.3390/life6010012) [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Wong TS, Roccatano D, Schwaneberg U. 2007. Challenges of the genetic code for exploring sequence space in directed protein evolution. Biocatal. Biotransform. 25, 229–241. ( 10.1080/10242420701444280) [DOI] [Google Scholar]
61.Higgs PG, Pudritz RE. 2009. A thermodynamic basis for prebiotic amino acid synthesis and the nature of the first genetic code. Astrobiology 9, 483–490. ( 10.1089/ast.2008.0280) [DOI] [PubMed] [Google Scholar]
62.Koonin EV, Novozhilov AS. 2017. Origin and evolution of the universal genetic code. Annu. Rev. Genet. 51, 45–62. ( 10.1146/annurev-genet-120116-024713) [DOI] [PubMed] [Google Scholar]
63.Sengupta S, Higgs PG. 2015. Pathways of genetic code evolution in ancient and modern organisms. J. Mol. Evol. 80, 229–243. ( 10.1007/s00239-015-9686-8) [DOI] [PubMed] [Google Scholar]
64.Weberndorfer G, Hofacker IL, Stadler PF. 2003. On the evolution of primitive genetic codes. Orig. Life Evol. Biosph. 33, 491–514. ( 10.1023/A:1025753712110) [DOI] [PubMed] [Google Scholar]
65.Higgs PG. 2009. A four-column theory for the origin of the genetic code: tracing the evolutionary pathways that gave rise to an optimized code. Biol. Direct 4, 16 ( 10.1186/1745-6150-4-16) [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Reviewer comments

rsos191384_review_history.pdf^{(520.2KB, pdf)}

Data Availability Statement

All the data generated or analysed during this study are included in this published article.

[RSOS191384C1] 1.Chin JW. 2014. Expanding and reprogramming the genetic code of cells and animals. Annu. Rev. Biochem. 83, 379–408. ( 10.1146/annurev-biochem-060713-035737) [DOI] [PubMed] [Google Scholar]

[RSOS191384C2] 2.Chin JW. 2017. Expanding and reprogramming the genetic code. Nature 550, 53–60. ( 10.1038/nature24031) [DOI] [PubMed] [Google Scholar]

[RSOS191384C3] 3.Italia JS, Addy PS, Wrobel CJ, Crawford LA, Lajoie MJ, Zheng Y, Chatterjee A. 2017. An orthogonalized platform for genetic code expansion in both bacteria and eukaryotes. Nat. Chem. Biol. 13, 446–450. ( 10.1038/nchembio.2312) [DOI] [PubMed] [Google Scholar]

[RSOS191384C4] 4.Noren CJ, Anthony-Cahill SJ, Griffith MC, Schultz PG. 1989. A general method for site-specific incorporation of unnatural amino acids into proteins. Science 244, 182–8. ( 10.1126/science.2649980) [DOI] [PubMed] [Google Scholar]

[RSOS191384C5] 5.Young DD, Schultz PG. 2018. Playing with the molecules of life. ACS Chem. Biol. 13, 854–870. ( 10.1021/acschembio.7b00974) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C6] 6.Ozer E, Chemla Y, Schlesinger O, Aviram HY, Riven I, Haran G, Alfonta L. 2017. In vitro suppression of two different stop codons. Biotechnol. Bioeng. 114, 1065–1073. ( 10.1002/bit.26226) [DOI] [PubMed] [Google Scholar]

[RSOS191384C7] 7.Anderson JC, Wu N, Santoro SW, Lakshman V, King DS, Schultz PG. 2004. An expanded genetic code with a functional quadruplet codon. Proc. Natl Acad. Sci. USA 101, 7566–7571. ( 10.1073/pnas.0401517101) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C8] 8.Hohsaka T, Ashizuka Y, Murakami H, Sisido M. 1996. Incorporation of nonnatural amino acids into streptavidin through in vitro frame-shift suppression. J. Am. Chem. Soc. 118, 9778–9779. ( 10.1021/ja9614225) [DOI] [Google Scholar]

[RSOS191384C9] 9.Neumann H, Wang K, Davis L, Garcia-Alai M, Chin JW. 2010. Encoding multiple unnatural amino acids via evolution of a quadruplet-decoding ribosome. Nature 464, 441–444. ( 10.1038/nature08817) [DOI] [PubMed] [Google Scholar]

[RSOS191384C10] 10.Forster AC, Tan ZP, Nalam MNL, Lin HN, Qu H, Cornish VW, Blacklow SC. 2003. Programming peptidomimetic syntheses by translating genetic codes designed de novo. Proc. Natl Acad. Sci. USA 100, 6353–6357. ( 10.1073/pnas.1132122100) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C11] 11.Goto Y, Katoh T, Suga H. 2011. Flexizymes for genetic code reprogramming. Nat. Protoc. 6, 779–790. ( 10.1038/nprot.2011.331) [DOI] [PubMed] [Google Scholar]

[RSOS191384C12] 12.Josephson K, Hartman MCT, Szostak JW. 2005. Ribosomal synthesis of unnatural peptides. J. Am. Chem. Soc. 127, 11 727–11 735. ( 10.1021/ja0515809) [DOI] [PubMed] [Google Scholar]

[RSOS191384C13] 13.Tajima K, Katoh T, Suga H. 2018. Genetic code expansion via integration of redundant amino acid assignment by finely tuning trna pools. Curr. Opin Chem. Biol. 46, 212–218. ( 10.1016/j.cbpa.2018.07.010) [DOI] [PubMed] [Google Scholar]

[RSOS191384C14] 14.Iwane Y, Hitomi A, Murakami H, Katoh T, Goto Y, Suga H. 2016. Expanding the amino acid repertoire of ribosomal polypeptide synthesis via the artificial division of codon boxes. Nat. Chem. 8, 317–325. ( 10.1038/nchem.2446) [DOI] [PubMed] [Google Scholar]

[RSOS191384C15] 15.Plotkin JB, Kudla G. 2011. Synonymous but not the same: the causes and consequences of codon bias. Nat. Rev. Genet. 12, 32–42. ( 10.1038/nrg2899) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C16] 16.Dien VT, Morris SE, Karadeema RJ, Romesberg FE. 2018. Expansion of the genetic code via expansion of the genetic alphabet. Curr. Opin Chem. Biol. 46, 196–202. ( 10.1016/j.cbpa.2018.08.009) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C17] 17.Hamashima K, Kimoto M, Hirao I. 2018. Creation of unnatural base pairs for genetic alphabet expansion toward synthetic xenobiology. Curr. Opin Chem. Biol. 46, 108–114. ( 10.1016/j.cbpa.2018.07.017) [DOI] [PubMed] [Google Scholar]

[RSOS191384C18] 18.Ishikawa M, Hirao I, Yokoyama S. 2000. Synthesis of 3-(2-deoxy-beta-d-ribofuranosyl)pyridin-2-one and 2-amino-6-(n,n-dimethylamino)-9-(2-deoxy-beta-d-ribofuranosyl)purine derivatives for an unnatural base pair. Tetrahedron Lett. 41, 3931–3934. ( 10.1016/S0040-4039(00)00520-7) [DOI] [Google Scholar]

[RSOS191384C19] 19.Kimoto M, Kawai R, Mitsui T, Yokoyama S, Hirao I. 2009. An unnatural base pair system for efficient PCR amplification and functionalization of DNA molecules. Nucleic Acids Res. 37, e14 ( 10.1093/nar/gkn956) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C20] 20.Malyshev DA, Seo YJ, Ordoukhanian P, Romesberg FE. 2009. PCR with an expanded genetic alphabet. J. Am. Chem. Soc. 131, 14 620–14 621. ( 10.1021/ja906186f) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C21] 21.Ohtsuki T, Kimoto M, Ishikawa M, Mitsui T, Hirao I, Yokoyama S. 2001. Unnatural base pairs for specific transcription. Proc. Natl Acad. Sci. USA 98, 4922–4925. ( 10.1073/pnas.091532698) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C22] 22.Yang Z, Sismour AM, Sheng P, Puskar NL, Benner SA. 2007. Enzymatic incorporation of a third nucleobase pair. Nucleic Acids Res. 35, 4238–49. ( 10.1093/nar/gkm395) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C23] 23.Zhang Y, Ptacin JL, Fischer EC, Aerni HR, Caffaro CE, San Jose K, Feldman AW, Turner CR, Romesberg FE. 2017. A semi-synthetic organism that stores and retrieves increased genetic information. Nature 551, 644–647. ( 10.1038/nature24659) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C24] 24.Epstein CJ. 1966. Role of the amino-acid ‘code’ and of selection for conformation in the evolution of proteins. Nature 210, 25–28. ( 10.1038/210025a0) [DOI] [PubMed] [Google Scholar]

[RSOS191384C25] 25.Freeland SJ, Hurst LD. 1998. The genetic code is one in a million. J. Mol. Evol. 47, 238–248. ( 10.1007/PL00006381) [DOI] [PubMed] [Google Scholar]

[RSOS191384C26] 26.Freeland SJ, Hurst LD. 1998. Load minimization of the genetic code: history does not explain the pattern. Proc. R. Soc. B 265, 2111–2119. ( 10.1098/rspb.1998.0547) [DOI] [Google Scholar]

[RSOS191384C27] 27.Freeland SJ, Wu T, Keulmann N. 2003. The case for an error minimizing standard genetic code. Orig. Life Evol. Biosph. 33, 457–477. ( 10.1023/A:1025771327614) [DOI] [PubMed] [Google Scholar]

[RSOS191384C28] 28.Gilis D, Massar S, Cerf NJ, Rooman M. 2001. Optimality of the genetic code with respect to protein stability and amino-acid frequencies. Genome Biol. 2, research0049.1–research0049.12. ( 10.1186/gb-2001-2-11-research0049) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C29] 29.Goldberg AL, Wittes RE. 1966. Genetic code: aspects of organization. Science 153, 420–424. ( 10.1126/science.153.3734.420) [DOI] [PubMed] [Google Scholar]

[RSOS191384C30] 30.Goodarzi H, Najafabadi HS, Torabi N. 2005. Designing a neural network for the constraint optimization of the fitness functions devised based on the load minimization of the genetic code. BioSystems 81, 91–100. ( 10.1016/j.biosystems.2005.02.002) [DOI] [PubMed] [Google Scholar]

[RSOS191384C31] 31.Haig D, Hurst LD. 1991. A quantitative measure of error minimization in the genetic code. J. Mol. Evol. 33, 412–417. ( 10.1007/BF02103132) [DOI] [PubMed] [Google Scholar]

[RSOS191384C32] 32.Sonneborn T. 1965. Degeneracy of the genetic code: extent, nature, and genetic implications. New York, NY: Academic Press, pp. 377–397. [Google Scholar]

[RSOS191384C33] 33.Woese CR. 1965. On the evolution of the genetic code. Proc. Natl Acad. Sci. USA 54, 1546–1552. ( 10.1073/pnas.54.6.1546) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C34] 34.Błażej P, Kowalski D, Mackiewicz D, Wnetrzak M, Aloqalaa D, Mackiewicz P. 2018. The structure of the genetic code as an optimal graph clustering problem. bioRxiv 332478 https://www.biorxiv.org/content/early/2018/05/28/332478 [DOI] [PubMed] [Google Scholar]

[RSOS191384C35] 35.Bezrukov SL. 1999. Edge isoperimetic problems on graphs. Graph Theory Comb. Biol. 7, 157–197. Akademia Kiado, Budapest [Google Scholar]

[RSOS191384C36] 36.Freeland SJ, Knight RD, Landweber LF. 2000. Measuring adaptation within the genetic code. Trends Biochem. Sci. 25, 44–45. ( 10.1016/S0968-0004(99)01531-5) [DOI] [PubMed] [Google Scholar]

[RSOS191384C37] 37.Błażej P, Wnetrzak M, Mackiewicz D, Mackiewicz P. 2018. Optimization of the standard genetic code according to three codon positions using an evolutionary algorithm. PLoS ONE 13, e0201715 ( 10.1371/journal.pone.0201715) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C38] 38.Błażej P, Wnetrzak M, Mackiewicz D, Mackiewicz P. 2019. The influence of different types of translational inaccuracies on the genetic code structure. BMC Bioinf. 20, 114 ( 10.1186/s12859-019-2661-4) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C39] 39.Błażej P, Wnetrzak M, Mackiewicz P. 2016. The role of crossover operator in evolutionary-based approach to the problem of genetic code optimization. BioSystems 150, 61–72. ( 10.1016/j.biosystems.2016.08.008) [DOI] [PubMed] [Google Scholar]

[RSOS191384C40] 40.Massey SE. 2008. A neutral origin for error minimization in the genetic code. J. Mol. Evol. 67, 510–516. ( 10.1007/s00239-008-9167-4) [DOI] [PubMed] [Google Scholar]

[RSOS191384C41] 41.Novozhilov AS, Wolf YI, Koonin EV. 2007. Evolution of the genetic code: partial optimization of a random code for robustness to translation error in a rugged fitness landscape. Biol. Direct 2, 24 ( 10.1186/1745-6150-2-24) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C42] 42.Santos J, Monteagudo A. 2017. Inclusion of the fitness sharing technique in an evolutionary algorithm to analyze the fitness landscape of the genetic code adaptability. BMC Bioinf. 18, 195 ( 10.1186/s12859-017-1608-x) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C43] 43.Santos MAS, Gomes AC, Santos MC, Carreto LC, Moura GR. 2011. The genetic code of the fungal CTG clade. C. R. Biol. 334, 607–611. ( 10.1016/j.crvi.2011.05.008) [DOI] [PubMed] [Google Scholar]

[RSOS191384C44] 44.Wnetrzak M, Błażej P, Mackiewicz D, Mackiewicz P. 2018. The optimality of the standard genetic code assessed by an eight-objective evolutionary algorithm. BMC Evol. Biol. 18, 192 ( 10.1186/s12862-018-1304-0) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C45] 45.Błażej P, Mackiewicz D, Grabinska M, Wnetrzak M, Mackiewicz P. 2017. Optimization of amino acid replacement costs by mutational pressure in bacterial genomes. Sci. Rep. 7, 1061 ( 10.1038/s41598-017-01130-7) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C46] 46.Błażej P, Mackiewicz P, Cebrat S, Wanczyk M. 2013. Using evolutionary algorithms in finding of optimized nucleotide substitution matrices. In Genetic and evolutionary computation conference, GECCO ’13, Amsterdam, The Netherlands, 6–10 July 2013, Companion Material Proceedings, pp. 41–42.

[RSOS191384C47] 47.Błażej P, Miasojedow B, Grabinska M, Mackiewicz P. 2015. Optimization of mutation pressure in relation to properties of protein-coding sequences in bacterial genomes. PLoS ONE 10, e0130411 ( 10.1371/journal.pone.0130411) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C48] 48.Dudkiewicz A. et al. 2005. Correspondence between mutation and selection pressure and the genetic code degeneracy in the gene evolution. Future Gener. Comput. Syst. 21, 1033–1039. ( 10.1016/j.future.2004.03.003) [DOI] [Google Scholar]

[RSOS191384C49] 49.Mackiewicz P, Biecek P, Mackiewicz D, Kiraga J, Baczkowski K, Sobczynski M, Cebrat S. 2008. Optimisation of asymmetric mutational pressure and selection pressure around the universal genetic code. In Computational Science - ICCS 2008, Pt 3, Lecture Notes in Computer Science, 5103, pp. 100–109.

[RSOS191384C50] 50.Błażej P, Wnetrzak M, Mackiewicz D, Gagat P, Mackiewicz P. 2019. Many alternative and theoretical genetic codes are more robust to amino acid replacements than the standard genetic code. J. Theor. Biol. 464, 21–32. ( 10.1016/j.jtbi.2018.12.030) [DOI] [PubMed] [Google Scholar]

[RSOS191384C51] 51.Di Giulio M. 1997. The origin of the genetic code. Trends Biochem. Sci. 22, 49–50. ( 10.1016/S0968-0004(97)84911-0) [DOI] [PubMed] [Google Scholar]

[RSOS191384C52] 52.Di Giulio M. 2004. The coevolution theory of the origin of the genetic code. Phys. Life Rev. 1, 128–137. ( 10.1016/j.plrev.2004.05.001) [DOI] [Google Scholar]

[RSOS191384C53] 53.Di Giulio M. 2008. An extension of the coevolution theory of the origin of the genetic code. Biol. Direct 3, 37 ( 10.1186/1745-6150-3-37) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C54] 54.Di Giulio M. 2016. The lack of foundation in the mechanism on which are based the physico-chemical theories for the origin of the genetic code is counterposed to the credible and natural mechanism suggested by the coevolution theory. J. Theor. Biol. 399, 134–40. ( 10.1016/j.jtbi.2016.04.005) [DOI] [PubMed] [Google Scholar]

[RSOS191384C55] 55.Di Giulio M. 2017. Some pungent arguments against the physico-chemical theories of the origin of the genetic code and corroborating the coevolution theory. J. Theor. Biol. 414, 1–4. ( 10.1016/j.jtbi.2016.11.014) [DOI] [PubMed] [Google Scholar]

[RSOS191384C56] 56.Di Giulio M, Medugno M. 1999. Physicochemical optimization in the genetic code origin as the number of codified amino acids increases. J. Mol. Evol. 49, 1–10. ( 10.1007/PL00006522) [DOI] [PubMed] [Google Scholar]

[RSOS191384C57] 57.Guimaraes RC. 2011. Metabolic basis for the self-referential genetic code. Orig. Life Evol. Biosph. 41, 357–371. ( 10.1007/s11084-010-9226-x) [DOI] [PubMed] [Google Scholar]

[RSOS191384C58] 58.Wong JT. 1975. A co-evolution theory of the genetic code. Proc. Natl Acad. Sci. USA 72, 1909–1912. ( 10.1073/pnas.72.5.1909) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C59] 59.Wong JT, Ng SK, Mat WK, Hu T, Xue H. 2016. Coevolution theory of the genetic code at age forty: pathway to translation and synthetic life. Life (Basel) 6, 12 ( 10.3390/life6010012) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191384C60] 60.Wong TS, Roccatano D, Schwaneberg U. 2007. Challenges of the genetic code for exploring sequence space in directed protein evolution. Biocatal. Biotransform. 25, 229–241. ( 10.1080/10242420701444280) [DOI] [Google Scholar]

[RSOS191384C61] 61.Higgs PG, Pudritz RE. 2009. A thermodynamic basis for prebiotic amino acid synthesis and the nature of the first genetic code. Astrobiology 9, 483–490. ( 10.1089/ast.2008.0280) [DOI] [PubMed] [Google Scholar]

[RSOS191384C62] 62.Koonin EV, Novozhilov AS. 2017. Origin and evolution of the universal genetic code. Annu. Rev. Genet. 51, 45–62. ( 10.1146/annurev-genet-120116-024713) [DOI] [PubMed] [Google Scholar]

[RSOS191384C63] 63.Sengupta S, Higgs PG. 2015. Pathways of genetic code evolution in ancient and modern organisms. J. Mol. Evol. 80, 229–243. ( 10.1007/s00239-015-9686-8) [DOI] [PubMed] [Google Scholar]

[RSOS191384C64] 64.Weberndorfer G, Hofacker IL, Stadler PF. 2003. On the evolution of primitive genetic codes. Orig. Life Evol. Biosph. 33, 491–514. ( 10.1023/A:1025753712110) [DOI] [PubMed] [Google Scholar]

[RSOS191384C65] 65.Higgs PG. 2009. A four-column theory for the origin of the genetic code: tracing the evolutionary pathways that gave rise to an optimized code. Biol. Direct 4, 16 ( 10.1186/1745-6150-4-16) [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Basic principles of the genetic code extension

Paweł Błażej

Małgorzata Wnetrzak

Dorota Mackiewicz

Paweł Mackiewicz

Abstract

1. Introduction

2. Methods

2.1. The extension of the standard genetic code

Definition 2.1. —

Definition 2.2. —

2.2. The properties of the graph G

Proposition 2.3. —

Figure 1.

Proposition 2.4. —

Lemma 2.5. —

Proof. —

Lemma 2.6. —

Proof. —

Proposition 2.7. —

Proposition 2.8. —

Proposition 2.9. —

2.3. The optimality of codon group

Definition 2.10. —

Definition 2.11. —

3. Results

3.1. The optimal extension of the standard genetic code to 160 meaningful codons

Definition 3.1. —

Figure 2.

Lemma 3.2. —

Proof. —

Theorem 3.3. —

Proof. —

3.2. The properties of the optimal genetic code including up to 160 meaningful codons

Figure 3.

Table 1.

3.3. The optimal extension of the standard genetic code with more than 160 meaningful codons

Definition 3.4. —

Definition 3.5. —

Lemma 3.6. —

Proof. —

Theorem 3.7. —

Proof. —

3.4. The optimal codon block structures including up to 208 meaningful codons

Figure 4.

Table 2.

3.5. The optimal extension of the standard genetic code up to 216 codons

Definition 3.8. —

Definition 3.9. —

Theorem 3.10. —

Proof. —

Figure 5.

Table 3.

4. Discussion

Supplementary Material

Acknowledgements

Data accessibility

Authors' contributions

Competing interests

Funding

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases