Generation of Cry11 Variants of Bacillus thuringiensis by Heuristic Computational Modeling

Efraín Hernando Pinzón-Reyes; Daniel Alfonso Sierra-Bueno; Miguel Orlando Suarez-Barrera; Nohora Juliana Rueda-Forero; Sebastián Abaunza-Villamizar; Paola Rondón-Villareal

doi:10.1177/1176934320924681

. 2020 Jul 27;16:1176934320924681. doi: 10.1177/1176934320924681

Generation of Cry11 Variants of Bacillus thuringiensis by Heuristic Computational Modeling

Efraín Hernando Pinzón-Reyes ^1,^2,^✉, Daniel Alfonso Sierra-Bueno ³, Miguel Orlando Suarez-Barrera ¹, Nohora Juliana Rueda-Forero ¹, Sebastián Abaunza-Villamizar ¹, Paola Rondón-Villareal ¹

PMCID: PMC7385851 PMID: 32782424

Abstract

Directed evolution methods mimic in vitro Darwinian evolution, inducing random mutations and selective pressure in genes to obtain proteins with enhanced characteristics. These techniques are developed using trial-and-error testing at an experimental level with a high degree of uncertainty. Therefore, in silico modeling of directed evolution is required to support experimental assays. Several in silico approaches have reproduced directed evolution, using statistical, thermodynamic, and kinetic models in an attempt to recreate experimental conditions. Likewise, optimization techniques using heuristic models have been used to understand and find the best scenarios of directed evolution. Our study uses an in silico model named HeurIstics DirecteD EvolutioN, which is based on a genetic algorithm designed to generate chimeric libraries from 2 parental genes, cry11Aa and cry11Ba, of Bacillus thuringiensis. These genes encode crystal-shaped δ-endotoxins with 3 conserved domains. Cry11 toxins are of biotechnological interest because they have shown to be effective as biopesticides for disease-spreading vectors. With our heuristic model, we considered experimental parameters such as DNA fragmentation length, number of generations or simulation cycles, and mutation rate, to get characteristics of Cry11 chimeric libraries such as percentage of population identity, truncation of variants obtained from the presence of internal stop codons, percentage of thermodynamic diversity, and stability of variants. Our study allowed us to focus on experimental conditions that may be useful for the design of in vitro and in silico experiments of directed evolution with Cry toxins of 3 conserved domains. Furthermore, we obtained in silico libraries of Cry11 variants, in which structural characteristics of wild Cry families were observed in a review of a sample of in silico sequences. We consider that future studies could use our in silico libraries and heuristic computational models, as the one suggested here, to support in vitro experiments of directed evolution.

Keywords: Heuristics, directed molecular evolution, protein engineering, Bacillus thuringiensis

Introduction

Directed evolution methods mimic evolutionary principles at the laboratory level. For this, strategies for the generation of genic diversity are implemented, either by inducing mutations or by recombining DNA and, after this, selective pressure is carried out.¹ Directed evolution methods, through modified polymerase chain reaction (PCR) cycles, allow obtaining chimeric libraries of recombined genes. In this regard, 2 highly homogeneous and fragmented parental genes are subjected to PCR cycles without primers, until obtaining recombined genes with lengths close to the parental genes.² From these libraries, enhanced genes are selected, which are used to obtain a new chimeric library (see Figure 1).

Figure 1. — Recombinant DNA techniques of directed evolution.

First, in silico approaches of directed evolution were present from the construction of statistical models^3-5; then models were enriched, including intrinsic thermodynamic information of parental genes^6,7; and later, kinetic information of reactions was included.⁸ These initial models laid the basis for understanding optimal experimental conditions that favored the efficiency and diversity of chimeric protein libraries such as triazine hydrolase, dioxygenases, green fluorescent protein, and beta-lactamases.^4,8,9

Subsequent studies explored the potential of heuristic techniques to model directed evolution experiments. They assessed the incidence of experimental parameters in the generation of chimeric libraries, recreating the epistasis given in genic sequences through NK landscapes and providing suggestions about favorable experimental conditions in experiments of directed evolution. The experimental parameters assessed were the number of cycles, selective pressure, and mutation rate under high- and low-stringency conditions.¹⁰

This study presents a heuristic model based on a genetic algorithm, designed to obtain chimeric libraries of cry11 genes. We have selected this group of genes as our biological model, given their high biotechnological potential.^11,12 These genes are known to be present in a sporulated Gram-positive bacteria named Bacillus thuringiensis¹³ and encode toxic proteins (δ-endotoxins), which are useful for the biological control of disease-spreading Diptera.¹⁴

The biological model of Cry toxins presents at least 2 relevant characteristics for our study. First, Cry toxins have a structure of 3 conserved domains for the 74 or so groups and more than 290 holotypes described so far, and despite their structural conservation, each reported group has a high specificity in its target organism (http://www.lifesci.sussex.ac.uk/home/Neil_Crickmore/Bt/).

Second, experimental models of directed evolution have been reported, where at least 2 Cry holotypes have been used, cry1Ca and cry11A12 genes. In the study of Lassner and Bedbrook,¹⁵ an increase in toxicity of the Cry1Ca protein has been reported against green doughnut (Spodoptera exigua) and fruit worm (Helicoverpa zea), while the study developed by Craveiro et al¹⁶ was able to extend the action spectrum of the Cry11A12 toxin to the giant sugarcane borer species (Telchin licus licus), for which the toxin produced by the parental gene was not lethal.^15,16 These studies are an alternative to increase the biopesticide action of native toxins and react to resistant insects.^17-19

Our study uses a heuristic model, which considers the intrinsic information of cry11Aa and cry11Ba genes, to generate chimeric libraries and explore the incidence of experimental parameters of directed evolution on the characteristics of chimeric libraries generated in silico, in terms of Diversity, Identity, Delta Energy, and Sequence Truncation.

Materials and Methods

We have implemented a software named HeurIstics DirecteD EvolutioN (HIDDEN), which was written in Python 3 language and simulates a recombining technique of directed evolution by using a genetic algorithm, predicting chimeric libraries from 2 parental genes (http://soft-hidden.com)

HeurIstics DirecteD EvolutioN takes advantage of the common basis of Darwinian evolution used by the evolutionary techniques of artificial intelligence and the recombinant DNA techniques, achieving to reproduce the processes of diversity and selective pressure generation through a genetic algorithm, by which potentially improved genetic variants are obtained. This software is designed to generate libraries from genes with high homology, with preference to genes encoding proteins with 3 conserved domains; it can be used for other gene sequences other than Cry, as an example, Botulinum parental gene cross-linking is presented (see http://soft-hidden.com/help).

Creation of initial populations

The genetic algorithm generates 2 initial populations, 1 for each parental gene, so that mutated genes are created from the cry11Aa and cry11Ba parental genes until completing the desired number of individuals in the initial populations. The new mutated genes correspond to the genes obtained by crossing the parental genes and performing random mutations. For each cross, 2 parental genes are used, and as a result, a mutated gene is obtained for each of the 2 populations (see Figure 2).

Iterative cycles given by the number of generations

Once the initial populations have been created, the genetic algorithm starts its iterative process given by the desired number of generations. This iterative cycle includes the following actions: assessment of individual fitness, creation of the offspring, and replacement of a percentage of the population.

Assessment of individual fitness

The process of fitness assessment is carried out by evaluating the energy delta for every gene in each population. The energy delta is calculated by dividing the sequence of a gene in its possible 2-mers. Later, the energy contribution of each 2-mer is added, and the final energy delta of the gene is obtained without exclusion of nucleotides in the generated sequence, because the DeltaG allows all the combinations in the genetic code.

Creation of the offspring

For each individual in the offspring, a parent from each of the populations must be selected. The process of choosing a parent of the population includes selecting 2 candidates by using the roulette method, considering their fitness value. Then both candidates must compete, but only the best one in terms of penalty gets selected as the parent of such population to be used in a cross. The penalization value is calculated as the sum of 3 terms: penalization for mutations (pm), penalization for the size of the open reading frame (ORF) (ps), and penalization for delta value (pd)

The penalization for mutations is equal to 0 if the mutations in domains 1, 2, and 3 of the new gene are not higher than the desired number of mutations for each domain. If the number of mutations is higher than the desired number, then the penalization value is the sum of the additional mutations in each of the 3 domains.

The penalization for the size of the ORF is equal to 10 if its size is smaller than the ORF of the parental gene. Otherwise, the value is equal to 0.

The penalization for delta value is the absolute value of the difference between the delta value of the new gene and the limit of the desired interval. If the value belongs to this interval, then the penalization value is equal to 0.

In this regard, the contest for choosing a parent from each population must be carried out as many times as new individuals are to be created in each population. Once the 2 parents have been selected, the new children will be created by following the methodology explained in Figure 2.

Replacement of a percentage of the population

Then, selective pressure is applied, rejecting a part of the population with deficient fitness values to be replaced by the individuals of the new population. The number of individuals to be replaced is ruled by the parameter of population replacement, which aims to ensure that the most recent generation or population contains the most suitable individuals from the previous generation.

These actions are repeated cyclically over a determined number of generations (see Figure 3).

Diversity generation

HeurIstics DirecteD EvolutioN uses 2 parameters to generate diversity: mutation rate and fragmentation length. The mutation rate is a probability value that rules the allowed rate of DNA changes in the parental genes and can be considered as homogeneous or non-homogeneous throughout the parental gene. In the latter case, HIDDEN takes advantage of intrinsic thermodynamic markers of parental genes, calculated with the SANAFold software.²⁰ SANAFold allows the characterization of genes from the thermodynamic behavior of their genic regions to favor the formation of secondary DNA structures, under conditions of directed evolution experiments.^21,22 The mutation rate with non-homogeneous distribution is based on the assumption that the formation of secondary DNA structures does not favor the gene recombination or the mutation appearance.²⁰

On the contrary, the fragmentation length is considered as the second parameter that generates diversity, because it regulates the crossing operation of individuals in the genetic algorithm. This parameter corresponds to the number of base pairs (bp) expected in the small pieces of DNA of parental genes, which act as substitutes for primers in a DNA shuffling experiment.² This parameter is incorporated into HIDDEN through a Poisson probability model, indicating to the genetic algorithm the location of the crossing point where any 2 genes of the population are recombined.⁵

Simulation scenarios

Simulation scenarios were designed for conditions of low computational performance,¹⁰ in which each simulation produces a library of up to 100 individuals or chimeric sequences. The DNA sequences used as parental genes belong to the group of Cry11 toxins of Bacillus thuringiensis, and cry11Aa and cry11Ba genes with high homology were selected from this group (see Table 1).

Table 1.

Information on cry11 genes used in in silico simulations.

Toxin	Source of extraction	Open reading frame
Toxin	Source of extraction	AA	St-Sp (Bp)	GenBank access ID
Cry11Aa1	Bt israelensis	646	32-1972	M31737-J03510
Cry11Ba1	Bt jegathesan	724	64-2238	X86902

Cry11 15 generations vs 100 generations
15 generations	Delta energy	Truncated energy	Identity	Diversity	100 generations
Cry11Aa 15 Generations	22/24	1/24	1/24	8/24	Cry11Aa 100 Generations
Cry11Aa 15 Generations	20/24	2/24	2/24	6/24	Cry11Ba 100 Generations

Estimators	Cry11Aa variants		Cry11Ba variants
Estimators	15 generations	100 generations	15 generations	100 generations
Diversity	0.95 ± 0.03	0.84 ± 0.1	0.96 ± 0.03	0.86 ± 0.09
Identity	0.67 ± 0.12	0.63 ± 0.14	0.62 ± 0.13	0.59 ± 0.12
Truncated proteins	0.92 ± 0.17	0.81 ± 0.34	0.92 ± 0.16	0.80 ± 0.35
Delta energy^a	–2391.33 ± 31.17	–2437.86 ± 30.02	–2675.83 ± 33.48	–2721.06 ± 32.71

Fragmentation length: Cry11 75Bp vs 150Bp
75Bp	Delta energy	Truncated proteins	Identity	Diversity	150Bp
75bp fragmentation length for Cry11Aa	4/24	1/24	3/24	0/24	150bp fragmentation length for Cry11Aa
75bp fragmentation length for Cry11Ba	2/24	2/24	3/24	2/24	150bp fragmentation length for Cry11Ba

Estimators	Cry11Aa variants		Cry11Ba variants
Estimators	FL 75Bp	FL 150Bp	FL 75Bp	FL 150Bp
Diversity	0.88 ± 0.09	0.88 ± 0.07	0.92 ± 0.07	0.91 ± 0.06
Identity	0.67 ± 0.97	0.63 ± 0.13	0.62 ± 0.10	0.59 ± 0.12
Truncated proteins	0.82 ± 0.23	0.91 ± 0.10	0.80 ± 0.20	0.91 ± 0.10
Delta energy^a	–2411.33 ± 37.28	–2417.80 ± 36.65	–2694.32 ± 38.37	–2702.48 ± 38.15

Mutation rate distribution: Cry11 homogeneous vs non-homogeneous
Homogeneous	Delta energy	Truncated proteins	Identity	Diversity	Non-homogeneous
Homogeneous distribution for Cry11Aa	3/24	2/24	1/24	0/24	Non-homogeneous distribution for Cry11Aa
Homogeneous distribution for Cry11Ba	3/24	2/24	2/24	1/24	Non-homogeneous distribution for Cry11Ba

Estimators	Cry11Aa variants		Cry11Ba variants
Estimators	H-MR	NH-MR	H-MR	NH-MR
Diversity	0.89 ± 0.08	0.88 ± 0.11	0.92 ± 0.07	0.90 ± 0.10
Identity	0.65 ± 0.13	0.65 ± 0.13	0.62 ± 0.13	0.59 ± 0.12
Truncated proteins	0.92 ± 0.17	0.81 ± 0.34	0.92 ± 0.16	0.80 ± 0.35
Delta energy^a	–2415.35 ± 40.25	–2413.85 ± 36.52	–2698.99 ± 42.29	–2697.89 ± 37.74

Delta energy for Cry11Aa
	0.001	0.003	0.005	0.01	0.02	0.05
0.001		0	0	1/8	8/8	8/8
0.003	0		0	0	8/8	8/8
0.005	0	0		0	5/8	7/8
0.01	1/8	0	0		1/8	6/8
0.02	8/8	8/8	5/8	1/8		6/8
0.05	8/8	8/8	7/8	6/8	6/8

	Cry11Aa	Cry11Ba	H15pop1-6	H15pop1-25	H15pop1-15	S15pop1-1	S100pop1-1a	S100pop1-1b
Cry11Aa		53.7	80.9	75.5	82.3	85.1	83.1	81.1
Cry11Ba	67.7		61.0	67.1	65.2	64.1	62.4	59.4
H15pop1-6	87.8	71.4		69.8	79.5	78.9	79.3	73.9
H15pop1-25	84.8	76.1	81.4		81.8	83.1	77.7	67.7
H15pop1-15	89.3	75.0	86.5	89.8		91.4	87.1	75.6
S15pop1-1	89.9	74.7	86.8	89.9	95.0		84.0	76.8
S100pop1-1a	89.8	73.1	86.2	87.0	91.3	89.9		75.2
S100pop1-1b	88.5	70.4	82.6	80.6	86.0	85.7	84.8

Variant	DOMAIN I	Total^a	DOMAIN II	Total^a	DOMAIN III	Total^a	Total
H15pop1-6	6.59 SUS	8.53	5.68 SUS	6.73	2.64 SUS	2.64	6.40
	0.78 INS		0.75 INS		0.0 INS
	1.16 DEL		0.30 DEL		0.0 DEL
H15pop1-25	4.52 SUS	5.3	11.21 SUS	11.96	8.92 SUS	9.12	8.57
	0.39 INS		0.45 INS		0.0 INS
	0.39 DEL		0.30 DEL		0.20 DEL
H15pop1-15	4.01 SUS	5.18	9.87 SUS	10.32	1.01 SUS	1.01	5.89
	0.39 INS		0.45 INS		0.0 INS
	0.78 DEL		0.0 DEL		0.0 DEL
S15pop1-1	3.49 SUS	4.27	9.12 SUS	9.72	0.0 SUS	0.0	5.06
	0.39 INS		0.30 INS		0.0 INS
	0.39 DEL		0.30 DEL		0.0 DEL
S100pop1-1a	5.04 SUS	7.10	7.32 SUS	7.77	1.42 SUS	1.82	5.99
	0.90 INS		0.45 INS		0.20 INS
	1.16 DEL		0.0 DEL		0.20 DEL
S100pop1-1b	7.63 SUS	10.73	4.48 SUS	5.08	2.43 SUS	3.25	6.92
	1.55 INS		0.30 INS		0.41 INS
	1.55 DEL		0.30 DEL		0.41 DEL

PERMALINK

Generation of Cry11 Variants of Bacillus thuringiensis by Heuristic Computational Modeling

Efraín Hernando Pinzón-Reyes

Daniel Alfonso Sierra-Bueno

Miguel Orlando Suarez-Barrera

Nohora Juliana Rueda-Forero

Sebastián Abaunza-Villamizar

Paola Rondón-Villareal

Abstract

Introduction

Figure 1.

Materials and Methods

Creation of initial populations

Figure 2.

Iterative cycles given by the number of generations

Assessment of individual fitness

Creation of the offspring

Replacement of a percentage of the population

Figure 3.

Diversity generation

Simulation scenarios

Table 1.

Statistical analysis

Results and Discussion

Number of generations

Table 2.

Table 3.

Fragmentation length

Table 4.

Table 5.

Distribution of mutation rates

Table 6.

Table 7.

Mutation rates

Delta energy

Table 8.

Table 9.

Figure 4.

Identity

Table 10.

Table 11.

Figure 5.

Truncated proteins

Table 12.

Table 13.

Figure 6.

Diversity

Table 14.

Figure 7.

Structural analysis of the best sequences obtained using HIDDEN

Table 15.

Table 16.

Conclusions

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases