Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2000 Oct 15;28(20):e88. doi: 10.1093/nar/28.20.e88

High efficiency family shuffling based on multi-step PCR and in vivo DNA recombination in yeast: statistical and functional analysis of a combinatorial library between human cytochrome P450 1A1 and 1A2

Valérie Abécassis 1, Denis Pompon 1, Gilles Truan 1,a
PMCID: PMC110804  PMID: 11024190

Abstract

The design of a family shuffling strategy (CLERY: Combinatorial Libraries Enhanced by Recombination in Yeast) associating PCR-based and in vivo recombination and expression in yeast is described. This strategy was tested using human cytochrome P450 CYP1A1 and CYP1A2 as templates, which share 74% nucleotide sequence identity. Construction of highly shuffled libraries of mosaic structures and reduction of parental gene contamination were two major goals. Library characterization involved multiprobe hybridization on DNA macro-arrays. The statistical analysis of randomly selected clones revealed a high proportion of chimeric genes (86%) and a homogeneous representation of the parental contribution among the sequences (55.8 ± 2.5% for parental sequence 1A2). A microtiter plate screening system was designed to achieve colorimetric detection of polycyclic hydrocarbon hydroxylation by transformed yeast cells. Full sequences of five randomly picked and five functionally selected clones were analyzed. Results confirmed the shuffling efficiency and allowed calculation of the average length of sequence exchange and mutation rates. The efficient and statistically representative generation of mosaic structures by this type of family shuffling in a yeast expression system constitutes a novel and promising tool for structure–function studies and tuning enzymatic activities of multicomponent eucaryote complexes involving non-soluble enzymes.

INTRODUCTION

Diversification of protein function is driven by duplication, mutation, recombination and selection (1,2). Combinatorial molecular evolution (CME) mimics, on the laboratory time scale, the different processes involved in natural evolution. Classical approaches use random mutagenesis and recombination by PCR to construct libraries of gene variants (36). CME is a powerful approach used for the tuning of protein functions for biotechnology purposes (1,612) and for investigation of biochemical mechanisms driving substrate recognition (13) or catalysis (14).

Libraries can be generated by random or segment directed mutagenesis of a single sequence (15) or, alternatively, a group of related genes can be used as starting point (16). Family shuffling proved to accelerate the process of evolution (17) and to facilitate the emergence of new functional properties (15) such as enzymes exhibiting association of parental activities (18,19), more thermostable enzymes (15) or novel substrate specificities (20).

Cytochromes P450 (P450s) can recognize a wide variety of substrates and catalyze an even greater set of reactions. They are found in almost all living organisms (21). In mammals, P450s are involved in biosynthetic reactions like the formation of steroidogenic hormones but also have a predominant role in drug and pollutant metabolism and toxicity (2123). Human CYP1A1 and CYP1A2 share 71% identity at the amino acid level and have distinct while overlapping substrate specificities. They are among the most active in the metabolic activation of chemical carcinogens (24) and are implicated in human lung cancer for CYP1A1 (25) or food-derived promutagen activation and aflatoxin B1 induced liver cancers for CYP1A2 (26). Mammalian P450 functional diversity makes these enzymes particularly suitable for CME approaches of the design of new catalysts as well as for structure–function analysis (27,28).

One difficulty encountered with the CME approach is the strong tendency for reconstitution of parental structures by PCR-based reassembly methods. A low content of mosaic structures was thus frequently reported in libraries constructed using DNase I fragmentation (2932). To decrease the contamination by parental structures different techniques were developed including single-strand DNA shuffling (30) or restriction enzyme driven fragmentations (29,33). Some groups have used in vivo recombination to yield low complexity chimeric enzymes (31,3436).

The procedure (CLERY: Combinatorial Libraries Enhanced by Recombination in Yeast) presented in this paper takes advantage of the association between in vitro (3,4,16) and in vivo (37,38) recombination mechanisms to build a high complexity library containing low levels of parental structures. This library was built into yeast expression vectors (39) using an engineered Saccharomyces cerevisiae strain to optimize the environment (40,41) leading to efficient in vivo bioconversions. This paper presents, also for the first time, a detailed statistical analysis of the generated mosaic structures as well as functional screening tools well designed for molecular evolution purposes.

MATERIALS AND METHODS

Strains and transformation procedures

Saccharomyces cerevisiae strain W303-1B, also designated as W(N) (Mat a; ade2-1; his3, leu2, ura3, trp1, canR, cyr+), and W(R) derived from W(N) by inserting the strong inducible promoter GAL10-CYC1 in front of the open reading frame (ORF) of the P450 reductase gene, were described previously (40). The Escherichia coli strain used was DH5-1 (F, recA1, gyrA96, thi-1, hisR17, supE44, λ). Expression vectors p1A1/V60 (42) and p1A2/V60 (43) were constructed by insertion of human CYP1A1 and CYP1A2 ORFs between the BamHI/KpnI and BamHI/EcoRI sites of pYeDP60, respectively. Both expression vectors also contain URA3 and ADE2 as selection markers and place the P450 ORFs under the control of the GAL10-CYC1 promoter and PGK terminator (39). All media used were described previously (40,42).

Electrocompetent E.coli DH5-1 bacterial cells were prepared as described (44), and cells were transformed by electroporation using the manufacturer’s (Bio-Rad) protocol. For yeast transformation, overnight pre-cultures in 5 ml YPGA [1% (w/v) yeast extract, 2% (w/v) bacto-peptone, 2% (w/v) glucose, 0.002% (w/v) adenine] [for W(N) strain] or YPLA [1% (w/v) yeast extract, 2% (w/v) bacto-peptone, 2% (w/v) galactose, 0.002% (w/v) adenine] [for W(R) strain] were diluted in 50 ml of YPGA medium to a final density of 2106 cells/ml. After 6 h, cells were washed twice with sterile water and once with a TE–lithium acetate buffer (10 mM Tris–HCl pH 7.5, 1 mM EDTA, 100 mM lithium acetate). Cells were then resuspended in 1 ml TE–lithium acetate buffer. Transforming DNA was added to 50 µl of cell solution, with 50 µg of sonicated and heat-denatured salmon sperm DNA and 350 µl of 40% (w/v) PEG 4000 solution in water. This solution was incubated at 30°C for 30 min and at 42°C for 45 min. After centrifugation, the supernatant was removed and the cells were resuspended in 200 µl of 0.1 M NaCl solution and plated for selection on SWA6 plates as described previously (39,42).

Extraction of plasmidic DNA from yeast

Yeast colonies were suspended in 1 ml of a buffer containing 2 (v/v) Triton X-100, 50 mM Tris–HCl pH 8.0, 50 mM EDTA and 200 mM NaCl. One gram of glass beads (Braun Scientifics, diameter 0.45 mm) was added and the solution was vigorously vortexed for 2 min. Three-hundred microliters of phenol/chloroform/isoamylic alcohol (50:49:1 by volume) were added, and the DNA was recovered after ethanol precipitation and suspended in 50 µl of dezionized water.

Sequences

Sequences were obtained either by ESGS (ESGS, group Cybergene, Evry, France) or by using the ABI Bio Dye labeling kit and an ABI 310 sequencer following the manufacturer’s protocol (Perkin Elmer).

Improved PCR based DNA shuffling

The procedure used was derived from the method described by Stemmer (3,4,16). Random fragmentation with DNase I (Grade II, Sigma-Aldrich) in the presence of Mn2+ was realized with the modifications described by Lorimer and Pastan (45) and Zhao and Arnold (46). An aliquot of 2.5 µg of each plasmidic DNA (p1A1/V60 and p1A2/V60) were suspended separately in a buffer containing 50 mM Tris–HCl pH 7.4, 10 mM MnCl2 buffer to a final volume of 40 µl. DNase I was added at three different concentrations (0.0112 U, 0.0056 U and 0.0028 U). The digestion was performed at 20°C for 10 min and was terminated by heating to 90°C for 10 min. Fragments obtained were purified on a Centrisep column (Princeton Separation Inc., Philadelphia, NJ). For the reassembly reaction, purified fragments (10 µl of each plasmid fragment) were amplified in a 40 µl PCR reaction using 2.5 U of Taq polymerase (Stratagene) in the supplied buffer. The PCR program was: one cycle of denaturation at 96°C for 1.5 min; 35 cycles of 30 s denaturation step at 94°C, nine hybridization steps separated by 3°C from 65 to 41°C for 1.5 min each and an elongation step of 1.5 min at 72°C and finally a 7 min step at 72°C. The second amplification reaction was carried out with a 5′-primer located in the GAL10-CYC1 promoter region (5′-CGTGTATATAGCGTGGATGGCCAG-3′) and a 3′-primer located in the PGK terminator region (5′-GCACCACCACCAGTAG-3′). In this second step 1 µl of the reassembled DNA was amplified in a 100 µl PCR reaction using 1 U of Taq polymerase (Stratagene) in the supplied buffer. The PCR program was: 1 cycle of denaturation at 94°C for 1.5 min; 30 cycles of 30s denaturation step at 94°C, hybridization step at 56°C for 1.5 min and an elongation step of 1.5 min at 72°C; and finally a 7 min step at 72°C.

Library construction and characterization

The PCR amplification products were purified by agarose gel electrophoresis. DNAs were inserted in pYeDP60 using in vivo recombination (gap-repair) in yeast (37,38,43,47,48). Co-transformation of W303–1B yeast strain with ∼200–400 ng of the PCR product (insert) and 0.025 µg of pYeDP60 previously linearized using EcoRI and BamHI restriction enzymes was performed. Plasmidic DNA extracted from yeast was segregated in E.coli. Three-hundred and seventy-eight wells of a 384-well plate were inoculated with independent bacterial colonies randomly selected from the library, three wells with DH5-1 strain previously transformed with p1A1/V60 and the remaining three wells by DH5-1 strain transformed by p1A2/V60. After 24 h growth in TB medium (44) containing 100 µg/ml ampicillin, the 384-well plate was replicated onto six Nylon N+ (Amersham) membranes. Each filter was placed onto a plate containing LB solid medium (containing 100 µg/ml ampicillin). After 12 h growth, lysis of bacterial colonies, DNA fixation and denaturation, filter prehybridization was performed following the manufacturer’s protocols (Amersham).

Six oligonucleotide probes were chosen (Fig. 2) with three sequences belonging to CYP1A1 and three to CYP1A2. An aliquot of 11 pmol of each were incubated for 2 h at room temperature with 3.2 pmol of [γ-32P]ATP, 20 U of polynucleotide kinase and 18 µl of polynucleotide kinase buffer following the manufacturer’s conditions (New England Biolabs). The six different probes were added to six prehybridized filters and placed at 42°C. Filters were washed at room temperature in 2× SSPE containing 0.1% SDS for 10 min and autoradiographed for 3 h. Each probe was labeled a second time and hybridized to a different filter to ascertain that results were reproducible.

Figure 2.

Figure 2

Respective positions and sequences of the six probes used to characterize matrices. Numbers above or below correspond to the 5′ position of each probe on the sequence. Probes above and below matched the 1A1 and the 1A2 sequences respectively. The vertical bars in the middle rectangle represent all the mismatched positions between the 1A1 and the 1A2 sequences at the nucleotide level.

Functional selection of catalytically competent clones

Bacterial colonies were grown for 24 h in 96-well plates. DNA extraction was performed with a minipreparation protocol using Multiscreen 96-well filter plates (Millipore). Each DNA was used to transform W(R) yeast strain in 96-well plates by the lithium acetate procedure and cells were selected onto SWA6 plates. After 3 days growth, an aliquot of each colony was grown in 1 ml of SWA5 medium in a 96 Deep Well plate (ABGene) for 15 h. The medium was discarded and replaced by 1 ml of YPLA medium containing 1.6 mM naphthalene (Merck). For each individual culture, the medium was placed in a Multiscreen 96-well plate (MABV N12, Millipore) containing 90 µl of octadecyl functionalized silica gel (Aldrich). After vacuum filtration of the culture medium, the substrate and the products of the reaction were bound on the silica. The resin was washed twice with water and the metabolites were eluted with 50 µl of isopropanol. After addition of 20 µl of a 2 mg/ml diazo-Blue-B solution (Fluka) the red color of the dye generated by the coupling between the diazo precursor and the phenolic metabolites extracted from the media was observed.

Statistical analysis

For each probe a grid representing the hybridization intensity for the 384 clone replicas was constructed. Hybridization intensities were analyzed by visual inspection taking into account the local background. Spots significantly exceeding the average signal value for negative spots were considered as positive, even if of lower intensity than the most positive spots. Intermediate responses were expected to arise either from partial probe mismatch due to PCR induced mutations or to lower efficiency of DNA transfer on the filter. Ambiguous results were confirmed by further analysis involving a different filter but the same probe. The six 384 grids were entered into a Microsoft Excel spreadsheet and a statistical analysis was performed with homemade macros written in Microsoft Visual Basic. The program first converts hybridization signals into parental type data by suitable masking with a Boolean XOR function before statistical analysis.

Numerical simulations were performed using a random number generator and probability formatting routines (source code available on request). The program can be adjusted to simulate any biased situation for the probability of observing a 1A1 or 1A2 signal at each probe including various situations in which a cross-correlation occurred between adjacent or distant probes. A first set of parameters allowed us to modulate the relative probability of finding one of the two parental types at each probe position. A second set of parameters was designed to allow introduction of more or less important genetic linkage between two (or more) probed segments.

The statistical analysis program was carefully checked using the simulation program to generate grids with various bias situations. In all tests (data not shown), results of the statistical analysis were found to be coherent within statistical errors with the parameters injected in the simulation program. Coupling between the simulation and the analysis program was also used for the calculation of the expected statistical fluctuations on data by analysis of 10 repeats of simulation/analysis cycles for each parameter set. The random number generator was randomly initialized between each set of simulations to make them independent.

RESULTS

Construction of yeast expression libraries by family shuffling

Yeast is a particularly well adapted host for heterologous expression of membrane proteins including multi-component complexes. In the case of membrane-bound eucaryotic P450s, a self-sufficient system for functional expression was built by association of the W(R) strain engineered at the genomic level to overexpress NADPH-P450 reductase and a multicopy plasmid carrying the P450 expression cassette (39,40). We designed a strategy for family shuffling in yeast expression vectors taking advantage of the unique properties of homologous recombination of this host. The basic system, described in Figure 1, associated a redesigned PCR-based DNA shuffling step to a secondary shuffling step obtained by in vivo recombination in yeast. The latter step was also used as a cloning tool. Overall, this constituted a shuffling strategy allowing direct expression and functional selection in a eucaryotic cell without the need for intermediate cloning steps in E.coli.

Figure 1.

Figure 1

Principle of the library construction. Plasmidic DNA was subjected to DNase I digestion (see Materials and Methods) and fragments were separated on a 1% agarose gel. (A) Lane 1, DNA ladder (λ DNA digested by PstI); lanes 2–4 and 5–7 correspond to DNase I treated p1A1/V60 and p1A2/V60 respectively. Lanes 2 and 5 correspond to fragmentation with 0.0112 U, lanes 3 and 6 to 0.0056 U and lanes 4 and 7 to 0.0028 U of DNase I per µg of DNA. (B) Reassembly reaction. Lane 1, DNA ladder; lanes 2–4 correspond to reassembly reactions between fragmented p1A1/V60 and p1A2/V60 mixing the reactions from lanes 2 and 5, 3 and 6, 4 and 7 respectively. (C) Amplification reaction. Lane 1, DNA ladder; lanes 2–4 correspond to the amplification with full-length plasmid pYeDP60, p1A1/V60 and p1A2/V60, respectively; lanes 5–7 correspond to the amplification with previously reassembled DNA as a matrix (lanes B2, B3 and B4 respectively). The band presented in (C) lane 6 was purified and used as such to cotransform S.cerevisiae with linearized pYeDP60.

The first step (Fig. 1) involved DNase I catalyzed double strand breaks of the entire expression vector leading to low size DNA fragments (Fig. 1A). Fragments from p1A1/V60 and p1A2/V60 (Fig. 1A, lanes 2 and 5; 3 and 6; 4 and 7) were mixed in equal proportion and submitted to a ‘progressive hybridization’ PCR program (see Materials and Methods) involving nine hybridization steps from 65 to 41°C to force low homology recombination. As seen in Figure 1B, a large smear of high molecular weight DNA was formed from fragments obtained with either one of the three DNase I concentrations. This raw material was found to exhibit some yeast transforming properties directly due to reconstitution of a fully functional large-scale (11 kb) yeast vector. A second PCR step, involving primers located on the cDNA flanking regions, was performed (Fig. 1C, lanes 5–7). Three bands were amplified from reassociated DNA fragments of lower molecular sizes (Fig. 1C, lane 5) and only the largest corresponded to the expected sizes for the cDNA amplifications (control amplifications on parental vectors in lanes 3 and 4). On the contrary, reassociation of larger molecular size DNA fragments led to the amplification of a single product of the expected size. However, this product was not used for library construction because of the potential contamination by parental structures coming from non-digested parental vectors (Fig. 1A, lanes 4 and 7). Finally, when starting from middle range sized fragments (Fig. 1B, lane 3), amplification of a well-defined band around 1.9 kb was observed (Fig. 1C, lane 6). This PCR product was mixed with pYeDP60 linearized at the expression site and used to cotransform yeast, promoting in vivo recombination events between PCR products and cloning into the yeast vector. The selection of transformed cells for uracil prototrophy was based only on the recircularization of the vector following one or multiple recombination events. Typical experiments generated ∼10 000 clones. DNA extraction from a single yeast clone, followed by plasmid segregation and amplification in E.coli indicated that yeast clones contained multiple plasmid variants as judged by the heterogeneity of the cDNAs. Therefore, the real complexity of the yeast library widely exceeded the number of yeast colonies and probably ranged between 25 000 and 100 000 mosaic structures.

Statistical analysis of a sub-population of the library

Plasmidic DNA was prepared from the whole yeast library and used to transform E.coli. This step allowed segregation of individual plasmids initially present as a mixed population in yeast colonies. A matrix was built on a 384-well microtiter plate for primary structure analysis. The matrix included 378 randomly selected E.coli clones from the library, and the remaining wells included control plasmids (either p1A1/V60 or p1A2/V60). The six probes (Fig. 2) were chosen to match alternatively the two parental sequences in regions of low sequence similarity: three probes belonged to p1A1/V60 and three probes to p1A2/V60. Each probe was 32P-labeled and used for hybridization in stringent conditions. The experiment was repeated using different filter–probe associations to eliminate potential artifacts. Analysis of hybridization intensities was performed by visual inspection. Intermediate levels of hybridization (∼15% of spots) were considered as positive responses and likely to correspond to partially mismatched probes either due to PCR-induced mutations as suggested by sequencing data (see later) or to unequal DNA transfer efficiencies on filters.

Figure 3 presents the global hybridization pattern analysis for the six probes. The calculated frequency of parental probe pattern in the library (Fig. 3A, red squares) was 11.4% for the 1A2 and 2.4% for the 1A1. The sum of the two frequencies (13.8%) was higher than a theoretical value of 3.1% [(0.5)6 + (0.5)6] for a totally random reassociation of parental sequences. False color coding of mosaic structures (Fig. 3B) illustrated the excess of 1A2 parental clone (in white) over the 1A1 clone (in black) but suggested that the general repartition of mosaic structures was homogeneous. To gain more insight, a statistical analysis was performed using home-designed Excel sheets and Visual Basic routines. The probability of presence of each parental sequence at each of the six probed positions was calculated (Table 1). This frequency was found quite homogeneous (0.56 ± 0.02 for the 1A2 sequence) for all analyzed sequence segments and stands within the expected statistical error. The slight excess in the P450 1A2 contribution probably reflected the error in the evaluation of parental DNA amounts upon mixing of the DNase I treated fragments before the PCR reassembly. The theoretical proportion of parental probe pattern taking in to account the calculated parental contributions was calculated to be 3.7% (0.586 + 0.426). Nonetheless, the latter value was still insufficient to account for the discrepancy for the observed frequency of parental structures (13.8%).

Figure 3.

Figure 3

Hybridization results were computed in Microsoft Excel generating a 384 dot grid with the following color coding: (A) red squares represent a parental type (1A1 or 1A2) for the six probed positions and green squares represent mosaic structures; (B) the same experimental grid was false color coded using a RGB code of (15,15,15) + P1* (120,0,0) + P2* (60,60,0) + P3* (0,120,0) + P4* (0,60,60) + P5* (0,0,120) + P6* (60,0,60). Pn values are 0 for 1A1 and 1 for 1A2 sequence types.

Table 1. Fequency of mosaic sequence parts belonging to each parental type at probed positions.

Probe Frequency of 1A1 type Frequency of 1A2 type
P1 0.48 0.52
P2 0.43 0.57
P3 0.45 0.55
P4 0.45 0.55
P5 0.44 0.56
P6 0.41 0.59
Average SD 0.43 0.02 0.56 0.02

P1–P6 probes are centered on P450 1A1 sequence position: 3, 612, 683, 879, 1377 and 1513 respectively. For each probe the number of hybridization spots matching either the 1A1 or 1A2 response was calculated and divided by the total number of spots (378) tested.

A cumulative frequency curve for the probability of observation of the 64 detectable classes of chimeras was calculated (Fig. 4). Mosaic structures were arbitrarily encoded using a binary code associating the nature (1A1 or 1A2) of each segment to value 0 or 1 of bits 1 to 6 (segment index) respectively. Thus, the full 1A1 and 1A2 parental sequences correspond respectively to codes 0 and 63. The experimental curve (Fig. 4, open circles) exhibited an irregular aspect including five steps. Three theoretical curves were also calculated as described in Materials and Methods by Monte-Carlo like approaches (numeric simulations) using three different hypotheses: (i) an equal probability of the presence of 1A1 or 1A2 sequence types and the absence of linkage between the parental type of sequence segments; (ii) the same as hypothesis (i) except for a 0.557 (instead of 0.5) probability of the presence of 1A2 sequence type; (iii) the same as hypothesis (ii) except that a variable linkage, simulating imperfect shuffling, was introduced between the type of consecutive sequence segments. The cumulated frequency curves corresponding to cases (i) and (ii) were linear and curved respectively (Fig. 4). In these two cases the slightly irregular aspect was found to be related only to statistical fluctuations in the simulation. Interestingly, the curve for case (ii) reproduced the general shape of the experimental curve except for the presence in the latter part of marked steps widely exceeding statistical fluctuations. Several curves corresponding to case (iii) were generated with different linkage schemes and adjusted by trial and error assays. Interestingly, probabilities of parental type linkages of 0.1, 0.6, 0.85, 0.1, 0.1 between probed segments 1–2, 2–3, 3–4, 4–5 and 5–6, respectively, gave rise to a suitable fitting of the experimental trace. Although the solution might not be unique, this result suggested that the probability of shuffling between probed segments was rather variable and depended on the part of the sequence considered even if the weight of each parental type was homogeneous along the mosaic sequences.

Figure 4.

Figure 4

Experimental and theoretical cumulative frequencies for the observation of 64 types of mosaic structures. Horizontal axis corresponds to mosaic structure coding using N = P1 + 2*P2 + 4*P3 + 8*P4 + 16*P5 + 32*P6, where P1–P6 have the value 0 or 1 depending if the probed segment matched the 1A1 or 1A2 sequence respectively. Open circles represent the experimental curve deduced from the hybridization status of the 384 grid probed with the six oligonucleotide probes (see Materials and Methods). The dashed line was a theoretical trace when considering a homogenous proportion (0.50:0.50) for the 1A2 and 1A1 parental sequence contribution and a perfect shuffling (absence of parental type linkage between probed segments). The full line represented the same trace for a 56:44 1A2 versus 1A1 proportion. Closed circles represent the theoretical trace obtained by simulation assuming an homogenous proportion of 0.56:0.44 for the 1A2 and 1A1 parental sequence contribution and a probability of parental type linkages of 0.1, 0.6, 0.85, 0.1, 0.1 between probed segments 1–2, 2–3, 3–4, 4–5 and 5–6 respectively. Linkage is defined as follows: 0 corresponds to full independence (random drawing of parental types for each segment) and 1 to complete a linkage (for example for the X–Y couple: parental type of the segment Y was always of the same nature as the parental type of segment X). Intermediate linkages were calculated as a ponderated average.

Expected frequencies of parental type patterns (all 1A1 or 1A2) in the population was simulated again after incorporation of the fitted probabilities of genetic linkage into the model. Ten simulations were averaged, allowing us to calculate expected parental structure frequencies of 9.8 ± 1.4% (all 1A2), 4.1 ± 1.09% (all 1A1) and 13.9 ± 1.3% (total parental). These values tightly fitted observed values of 11.4% (all 1A2), 2.4% (all 1A1) and 13.8%; (total parental). Hence, the heterogeneity in the probability of shuffling between probed segments can perfectly account for the apparent excess of parental probed patterns in the population.

For further characterization of linkages between probed sequences, the frequencies of parental versus shuffled probe associations were calculated for 11 combinations of probes (Fig. 5). Linkages between vicinal probed segments (Fig. 5), fell into two classes: P1–P2, P4–P5 and P5–P6 linkages had the expected values for a random segment reassociation. On the contrary, P2–P3 and P3–P4 linkage patterns displayed some increase in the frequency of parental type coincidence and suggested a less efficient shuffling between these segments. This less efficient shuffling was expected from the limited sequence similarity between these segments (Fig. 2). Figure 5B and C showed similar calculations for non-adjacent segments. Surprisingly a very strong linkage between P2 and P4 segment types was evident in comparison to the limited linkage observed between segments P2–P3 and P3–P4. All other associations (including not shown) presented expected patterns for random shuffling. Overall, these results were consistent with linkages deduced from the simulation of the horizontal steps in the experimental trace of Figure 4. The particularly high P2–P4 linkage compared to P2–P3 and P3–P4 linkages was nevertheless unexpected considering the physical map of the probed sequence segments.

Figure 5.

Figure 5

Representation of the probability of parental type association for each probed segment couple. (A) Association between two adjacent probed segments; (B) association between probed segments separated by one probe; (C) association between distant probed segments. Dark blue and pink histogram represents single parent association while cyan and yellow represents mixed parent association frequencies.

Selection of catalytically competent clones

A major advantage of the developed shuffling strategy was that direct construction of expression libraries in a eucaryotic micro-organism allowed functional in vivo selection, including membrane or multicomponent complexes. Transformed yeast clones derived from the shuffling step were used as such for functional screening. The use of the primary library offered the advantage that each clone contained multiple mosaic plasmids, which dramatically improved the library complexity, a particularly useful feature when associated with an in vivo colorimetric detection of activity in microtiter plates.

A new universal colorimetric detection method for polycyclic aromatic hydrocarbon hydroxylation was designed, based on dye formation following in vivo bioconversion by transformed yeast cells within 96-well microtiter plate cultures. Cells were cultivated in a 96-well plate containing rich media and substrate. Phenol derivatives were extracted from the media by hydrophobic binding interactions (C18 linked silica gel) directly on the plates. Coupling with diazo-fast dye precursors was performed, allowing direct colorimetric detection (Fig. 6). The screening of the 1A1–1A2 mosaic library was achieved using naphthalene, a good substrate for both parental enzymes and indicated that ∼20% of clones expressed a detectable activity. To determine the real proportion of functional structures, the initial plasmid library in yeast was transferred to E.coli and 96 segregated clones were used to retransform yeast in microtiter plates. The proportion of functional clones (11 clones out of 93) detected with the colorimetric test (11.8%) was confirmed by HPLC analysis of the extracts. Comparison of HPLC and colorimetric detection indicated that the latter was reliable and sensitive enough to detect mosaic structures exhibiting as low as 10% of the naphthalene hydroxylase parental activity. The detection method was also found to be applicable with phenanthrene and others polycyclic aromatic phenols (data not shown).

Figure 6.

Figure 6

Visual detection of naphthalene hydroxylating clones expressing mosaic structures. Bioconversion was realized into 1 ml of yeast culture in the presence of 1.6 mM naphthalene for 48 h. Solid phase extraction and color development was fully realized in microtiter plates as described in Materials and Methods. Red color indicates positive clones. Well A1, yeast transformed with control plasmid pYeDP60; well A2, yeast transformed with parental plasmid p1A1/V60; well A3, yeast transformed with parental plasmid p1A2/V60.

Sequence characterization of the library

Five clones were randomly selected without any functional criteria and five more clones were randomly picked from among those functionally competent clones for naphthalene hydroxylation. The 10 clones were fully sequenced and their structures and point mutations are depicted in Figure 7. The figure is based on an alignment of the mosaic structure and the two parental sequences using homemade software (available on request). The presence of mutations was also marked. For each structure we determined the minimum number of distinct parental fragments composing the mosaic. All analyzed sequences were mosaics and the average number of parental fragments composing each mosaic structure was found to be 5.4 ± 2.2 (n = 10). The size distribution of continuous parental segments was considered: 32 of them ranged between 0 and 200 bp, 12 between 200 and 500 bp and 10 between 500 and 1000 bp. In summary, ∼60% of the parental segments were <200 bp, the smallest observed size being 20 bp. This result is in good agreement with the average size of the DNase I fragments (200–300 bp, see Fig. 1A).

Figure 7.

Figure 7

Schematic sequence representation of 10 randomly selected mosaic-structures: (A) in the total population; (B) in the sub-population of functionally competent clones. For each sequence a nucleotide alignment was realized with the two parental sequences. Alignment files were used as input for a homemade C++ written sequence analysis and visualization program. The output of the program is a colored horizontal line. Each break in color corresponds to the change of sequence (either 1A1 or 1A2 type). The green and blue color parts correspond to sequences belonging to the P450 1A1 and 1A2 parents respectively. The thin yellow horizontal line corresponds to sequence segments for which either one of the two parental type is possible based on sequence analysis (region where 1A1 and 1A2 sequences are similar). Small vertical ticks (either blue or green) indicate nucleotide mismatch between the two parental structures. The red marks indicate sequence positions that did not match any of the parental sequences and thus correspond to de novo mutations.

Analysis of the naphthalene activity of the five randomly selected clones revealed that one of them (clone A1) was active. The average number of mutations in active and non-active clones was calculated. In the non-functional clones (A2, A3, A4, A5), the average number of mutations was 14.0 ± 4.2 compared to 8.3 ± 3.2 for the functionally competent clones. In all non-functional clones analyzed, at least one internal stop codon was generated, thereby truncating the protein. Finally, the different findings deduced from the statistical analysis were confirmed by the sequence data. Particularly, the parental type linkage between probed segment P2, P3 and P4 was also manifest on the available sequences. Although the number of sequenced clones (10) was limited, the analysis provides interesting insight into mosaic structure confirming the high efficiency of the strategy developed.

DISCUSSION

The developed procedure (CLERY) for family shuffling of eucaryotic genes combined a PCR-based technique adapted from previous works of Stemmer (3,4,16) with modifications described by Lorimer and Pastan (45) and an in vivo recombination step in yeast acting also as a cloning tool. No step in E.coli was required for the whole procedure except when subcloning was necessary. The procedure was illustrated by the construction of an expression library of mosaic structures involving human CYP1A1 or CYP1A2 ORFs, which share 75% nucleotide sequence identity. The modified PCR program allowed the reconstruction of DNA fragments as large as a full yeast expression plasmid (∼11 kb). Our PCR process involved long multi-step cycles of hybridization (from high to very low temperature) expected to favor low stringency hybridization and partial elongation. Additional amplification using primers located on the 5′- and 3′-vector flanking regions of the cDNA followed by recombination of the PCR fragments together and with a new vector matrix in yeast, directly provided a yeast expression library. The in vivo recombination step played a double role: (i) to constitute a second round of DNA shuffling involving different molecular mechanisms (37,38) than the PCR based technique, (ii) to perform an efficient cloning step of the resulting mosaic sequences in a large yeast expression vector. The relatively low content (14%) of parental type patterns in the library was found to be higher than the 3.2% value expected in the case of an equal contribution of each parental sequence and ‘perfect’ shuffling but in close agreement with values expected from modelization when experimental parental contributions and linkages were considered.

Obviously, a perfect shuffling is impossible at least because DNA polymerase proved to be extremely inefficient at elongating DNA duplexes containing <4–6 consecutive base pairs at their 3′-end. From a practical point of view, in vitro hybridization between partially mismatched sequences was very inefficient for fragments <12–20 bp (depending on GC content). Thus the smallest shuffled segments by PCR were expected to be >12–20 bp and probably much larger in regions of low sequence similarity. Mechanisms directing in vitro recombination widely differed from ones for PCR-based shuffling. They were also expected to differ between E.coli and yeast, the preferred mechanism being strand invasion in the former case (34) and Holliday junction formation and resolution in the latter case (38). In the Holliday junction mechanism, the observable junction point is the resolution and not the initiation point leading to a high density of recombination at the junction between a region of heterology and a region of homology (38). Statistical analysis of experimental data indicated that, while the probability of presence of both parental types along the mosaic sequences was perfectly homogeneous, significant cross-correlation was observed between parental types for probed segments P2, P3 and P4. Careful examination of the alignment of the two parental nucleic acid sequences (Fig. 2) revealed that the two parental regions between probes 2 and 4 shared only 60% identity whereas the whole sequence was ∼75%. Moreover there are only a few stretches of six or more perfectly matched nucleotides (seven stretches over 270 nucleotides). These two features probably denied PCR reassembly and artificially diminished the recombination in this region, which well explains the cross-correlation. Currently, we do not have a complete explanation of the phenomenon.

A high proportion of parental sequences was frequently reported in the functionally competent subpopulation of family shuffling libraries (2932). In contrast, sequencing of five functional clones from our library did not reveal any parental structure. Similar high rates of non-parental functional mosaic structures were recently reported using an approach involving several steps of restriction enzyme digestion. Indeed, for two genes exhibiting 84% sequence identity, the use of fragments digested by five different enzymes gave close to 100% chimeras (29). Although efficient, this method has the drawback that restriction enzyme digestion leads to non-random shuffling. Several studies aimed to decrease the parental bias of the library (2932). A recent publication claimed 1% of parental sequences in a library constructed after DNase I digestion of two genes with 84% sequence identity (30). Some methods used single-stranded DNA as a template to force recombination between the two parents giving a parental proportion of 14% (30).

In our work the value of 14% of parental patterns obtained by the six probe hybridization technique on ‘DNA macro-arrays’ was found to be identical within statistical errors with the simulations. The frequency of true parental structures is obviously expected to be lower as a very limited fraction (∼8%) of the length of the cDNA sequence was probed and any shuffled segment not extending over one of the probes remained undetected. The fact that simulated and experimental fractions of parental patterns were identical, and that random sequencing of ten clones did not provide evidence for any parental structure, strongly suggested that very low, if any, parental contamination was present under our conditions.

The relatively low number of functionally competent clones observed (12%) was rather similar to values (10–20%) reported in the literature when using Taq polymerase (11,46). Further improvement of the method would involve the use of a proofreading polymerase combined with Taq polymerase to adjust the rate of mutation to a suitable level (46). While too many mutations reduced the library quality by accumulation of inactive structures, maintaining a suitable rate of de novo mutation is nonetheless useful for directed evolution to compensate structural incompatibilities (49).

For directed evolution, the screening method was critical. The method we developed allows easy and sensitive detection of functionally competent P450 structures based on aromatic hydrocarbon oxidation. For the 1A1–1A2 mosaics, naphthalene oxidation was found to be a useful detection tool, but the method can be extended to a large range of phenolic compounds. Use of yeast strains expressing human epoxide hydrolase (50) from a genomic DNA integrated expression cassette will be a future improvement.

Finally, the macro-array technique proved to be very efficient to analyze the structure of the library. It would be of particular interest to analyze in parallel a complete library, by the statistical method as well as different activities in an effort to understand rapidly what kind of mosaics generate the different activities and thereby provide a way to predict function.

Acknowledgments

ACKNOWLEDGEMENTS

We wish to thank Linda Sperling for the careful proofreading of the manuscript. The work was supported in part by a grant from the GIP Hoechst Marion Roussel. Valérie Abécassis is a pre-doctoral fellow supported by a fellowship from the Ministère de la Recherche et de l’Enseignement Supérieur.

REFERENCES

  • 1.Kumamaru T., Suenaga,H., Mitsuoka,M., Watanabe,T. and Furukawa,K. (1998) Enhanced degradation of polychlorinated biphenyls by directed evolution of biphenyl dioxygenase. Nat. Biotechnol., 16, 663–666. [DOI] [PubMed] [Google Scholar]
  • 2.van der Meer J.R., de Vos,W.M., Harayama,S. and Zehnder,A.J. (1992) Molecular mechanisms of genetic adaptation to xenobiotic compounds. Microbiol. Rev., 56, 677–694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Stemmer W.P. (1994) Rapid evolution of a protein in vitro by DNA shuffling. Nature, 370, 389–391. [DOI] [PubMed] [Google Scholar]
  • 4.Stemmer W.P. (1994) DNA shuffling by random fragmentation and reassembly: in vitro recombination for molecular evolution. Proc. Natl Acad. Sci. USA, 91, 10747–10751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Crameri A., Dawes,G., Rodriguez,E.,Jr, Silver,S. and Stemmer,W.P. (1997) Molecular evolution of an arsenate detoxification pathway by DNA shuffling. Nat. Biotechnol., 15, 436–438. [DOI] [PubMed] [Google Scholar]
  • 6.Zhang J.H., Dawes,G. and Stemmer,W.P. (1997) Directed evolution of a fucosidase from a galactosidase by DNA shuffling and screening. Proc. Natl Acad. Sci. USA, 94, 4504–4509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Crameri A., Whitehorn,E.A., Tate,E. and Stemmer,W.P. (1996) Improved green fluorescent protein by molecular evolution using DNA shuffling. Nat. Biotechnol., 14, 315–319. [DOI] [PubMed] [Google Scholar]
  • 8.Crameri A., Cwirla,S. and Stemmer,W.P. (1996) Construction and evolution of antibody-phage libraries by DNA shuffling. Nat. Med., 2, 100–102. [DOI] [PubMed] [Google Scholar]
  • 9.Giver L. and Arnold,F.H. (1998) Combinatorial protein design by in vitro recombination. Curr. Opin. Chem. Biol., 2, 335–338. [DOI] [PubMed] [Google Scholar]
  • 10.Giver L., Gershenson,A., Freskgard,P.O. and Arnold,F.H. (1998) Directed evolution of a thermostable esterase. Proc. Natl Acad. Sci. USA, 95, 12809–12813. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Moore J.C., Jin,H.M., Kuchner,O. and Arnold,F.H. (1997) Strategies for the in vitro evolution of protein function: enzyme evolution by random recombination of improved sequences. J. Mol. Biol., 272, 336–347. [DOI] [PubMed] [Google Scholar]
  • 12.Moore J.C. and Arnold,F.H. (1996) Directed evolution of a para-nitrobenzyl esterase for aqueous-organic solvents. Nat. Biotechnol., 14, 458–467. [DOI] [PubMed] [Google Scholar]
  • 13.Yano T., Oue,S. and Kagamiyama,H. (1998) Directed evolution of an aspartate aminotransferase with new substrate specificities. Proc. Natl Acad. Sci. USA, 95, 5511–5515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Altamirano M.M., Blackburn,J.M., Aguayo,C. and Fersht,A.R. (2000) Directed evolution of new catalytic activity using the alpha/beta-barrel scaffold. Nature, 403, 617–622. [DOI] [PubMed] [Google Scholar]
  • 15.Harayama S. (1998) Artificial evolution by DNA shuffling. Trends Biotechnol., 16, 76–82. [DOI] [PubMed] [Google Scholar]
  • 16.Crameri A., Raillard,S.A., Bermudez,E. and Stemmer,W.P. (1998) DNA shuffling of a family of genes from diverse species accelerates directed evolution. Nature, 391, 288–291. [DOI] [PubMed] [Google Scholar]
  • 17.Nixon A.E., Ostermeier,M. and Benkovic,S.J. (1998) Hybrid enzymes: manipulating enzyme design. Trends Biotechnol., 16, 258–264. [DOI] [PubMed] [Google Scholar]
  • 18.Kimura N., Nishi,A., Goto,M. and Furukawa,K. (1997) Functional analyses of a variety of chimeric dioxygenases constructed from two biphenyl dioxygenases that are similar structurally but different functionally. J. Bacteriol., 179, 3936–3943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Back K. and Chappell,J. (1996) Identifying functional domains within terpene cyclases using a domain-swapping strategy. Proc. Natl Acad. Sci. USA, 93, 6841–6845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Campbell R.K., Bergert,E.R., Wang,Y., Morris,J.C. and Moyle,W.R. (1997) Chimeric proteins can exceed the sum of their parts: implications for evolution and protein design. Nat. Biotechnol., 15, 439–443. [DOI] [PubMed] [Google Scholar]
  • 21.Nelson S.D. and Harvinson,P.J. (1987) In Guenguerich,F.P. (ed.), Mammalian cytochrome P-450. CRC Press, Boca Raton, FL, pp. 19–79. [Google Scholar]
  • 22.Harris C.C. (1989) Interindividual variation among humans in carcinogen metabolism, DNA adduct formation and DNA repair. Carcinogenesis, 10, 1563–1566. [DOI] [PubMed] [Google Scholar]
  • 23.Kadlubar F.F. and Hammons,G.J. (1987) In Guenguerich,F.P. (ed.), Mammalian Cytochrome P-450. CRC Press, Boca Raton, FL, pp. 81–130. [Google Scholar]
  • 24.Buters J.T., Doehmer,J. and Gonzalez,F.J. (1999) Cytochrome P450-null mice. Drug Metab. Rev., 31, 437–447. [DOI] [PubMed] [Google Scholar]
  • 25.Kawajiri K., Nakachi,K., Imai,K., Watanabe,J. and Hayashi,S. (1993) The CYP1A1 gene and cancer susceptibility. Crit. Rev. Oncol. Hematol., 14, 77–87. [DOI] [PubMed] [Google Scholar]
  • 26.Mace K., Gonzalez,F.J., McConnell,I.R., Garner,R.C., Avanti,O., Harris,C.C. and Pfeifer,A.M. (1994) Activation of promutagens in a human bronchial epithelial cell line stably expressing human cytochrome P450 1A2. Mol. Carcinog., 11, 65–73. [DOI] [PubMed] [Google Scholar]
  • 27.Joo H., Arisawa,A., Lin,Z. and Arnold,F.H. (1999) A high-throughput digital imaging screen for the discovery and directed evolution of oxygenases. Chem. Biol., 6, 699–706. [DOI] [PubMed] [Google Scholar]
  • 28.Shao Z. and Arnold,F.H. (1996) Engineering new functions and altering existing functions. Curr. Opin. Struct. Biol., 6, 513–518. [DOI] [PubMed] [Google Scholar]
  • 29.Kikuchi M., Ohnishi,K. and Harayama,S. (1999) Novel family shuffling methods for the in vitro evolution of enzymes. Gene, 236, 159–167. [DOI] [PubMed] [Google Scholar]
  • 30.Kikuchi M., Ohnishi,K. and Harayama,S. (2000) An effective family shuffling method using single-stranded DNA. Gene, 243, 133–137. [DOI] [PubMed] [Google Scholar]
  • 31.Arnold F.H. (1998) When blind is better: protein design by evolution. Nat. Biotechnol., 16, 617–618. [DOI] [PubMed] [Google Scholar]
  • 32.Michnick S.W. and Arnold,F.H. (1999) “Itching” for new strategies in protein engineering. Nat. Biotechnol., 17, 1159–1160. [DOI] [PubMed] [Google Scholar]
  • 33.Ostermeier M., Shim,J.H. and Benkovic,S.J. (1999) A combinatorial approach to hybrid enzymes independent of DNA homology. Nat. Biotechnol., 17, 1205–1209. [DOI] [PubMed] [Google Scholar]
  • 34.Volkov A.A., Shao,Z. and Arnold,F.H. (1999) Recombination and chimeragenesis by in vitro heteroduplex formation and in vitro repair. Nucleic Acids Res., 27, e18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Okuta A., Ohnishi,K. and Harayama,S. (1998) PCR isolation of catechol 2,3-dioxygenase gene fragments from environmental samples and their assembly into functional genes. Gene, 212, 221–228. [DOI] [PubMed] [Google Scholar]
  • 36.Cherry J.R., Lamsa,M.H., Schneider,P., Vind,J., Svendsen,A., Jones,A. and Pedersen,A.H. (1999) Directed evolution of a fungal peroxidase. Nat. Biotechnol., 17, 379–384. [DOI] [PubMed] [Google Scholar]
  • 37.Pompon D. and Nicolas,A. (1989) Protein engineering by cDNA recombination in yeasts: shuffling of mammalian cytochrome P-450 functions. Gene, 83, 15–24. [DOI] [PubMed] [Google Scholar]
  • 38.Mezard C., Pompon,D. and Nicolas,A. (1992) Recombination between similar but not identical DNA sequences during yeast transformation occurs within short stretches of identity. Cell, 70, 659–670. [DOI] [PubMed] [Google Scholar]
  • 39.Cullin C. and Pompon,D. (1988) Synthesis of functional mouse cytochromes P-450 P1 and chimeric P-450 P3-1 in the yeast Saccharomyces cerevisiae. Gene, 65, 203–217. [DOI] [PubMed] [Google Scholar]
  • 40.Truan G., Cullin,C., Reisdorf,P., Urban,P. and Pompon,D. (1993) Enhanced in vivo monooxygenase activities of mammalian P450s in engineered yeast cells producing high levels of NADPH-P450 reductase and human cytochrome b5. Gene, 125, 49–55. [DOI] [PubMed] [Google Scholar]
  • 41.Pompon D., Gautier,J.C., Perret,A., Truan,G. and Urban,P. (1997) Simulation of human xenobiotic metabolism in microorganisms. Yeast a good compromise between E. coli and human cells. J. Hepatol., 26, 81–85. [DOI] [PubMed] [Google Scholar]
  • 42.Urban P., Cullin,C. and Pompon,D. (1990) Maximizing the expression of mammalian cytochrome P-450 monooxygenase activities in yeast cells. Biochimie, 72, 463–472. [DOI] [PubMed] [Google Scholar]
  • 43.Bellamine A., Gautier,J.C., Urban,P. and Pompon,D. (1994) Chimeras of the human cytochrome P450 1A family produced in yeast. Accumulation in microsomal membranes, enzyme kinetics and stability. Eur. J. Biochem., 225, 1005–1013. [DOI] [PubMed] [Google Scholar]
  • 44.Maniatis T., Fritsch,E.F. and Sambrook,J. (eds) (1989) Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. [Google Scholar]
  • 45.Lorimer I.A. and Pastan,I. (1995) Random recombination of antibody single chain Fv sequences after fragmentation with DNaseI in the presence of Mn2+. Nucleic Acids Res., 23, 3067–3068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Zhao H. and Arnold,F.H. (1997) Optimization of DNA shuffling for high fidelity recombination. Nucleic Acids Res., 25, 1307–1308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Pompon D., Louerat,B., Bronine,A. and Urban,P. (1996) Yeast expression of animal and plant P450s in optimized redox environments. Methods Enzymol., 272, 51–64. [DOI] [PubMed] [Google Scholar]
  • 48.Pompon D. (1988) cDNA cloning and functional expression in yeast Saccharomyces cerevisiae of beta-naphthoflavone-induced rabbit liver P-450 LM4 and LM6. Eur. J. Biochem., 177, 285–293. [DOI] [PubMed] [Google Scholar]
  • 49.Chen K. and Arnold,F.H. (1993) Tuning the activity of an enzyme for unusual environments: sequential random mutagenesis of subtilisin E for catalysis in dimethylformamide. Proc. Natl Acad. Sci. USA, 90, 5618–5622. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Gautier J.C., Urban,P., Beaune,P. and Pompon,D. (1996) Simulation of human benzo[a]pyrene metabolism deduced from the analysis of individual kinetic steps in recombinant yeast. Chem. Res. Toxicol., 9, 418–425. [DOI] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES