Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 2010 Feb 16;19(4):786–795. doi: 10.1002/pro.358

Comparative characterization of random-sequence proteins consisting of 5, 12, and 20 kinds of amino acids

Junko Tanaka 1, Nobuhide Doi 1,*, Hideaki Takashima 1, Hiroshi Yanagawa 1,*
PMCID: PMC2867018  PMID: 20162614

Abstract

Screening of functional proteins from a random-sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random-sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random-sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random-sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279–284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120-amino acid, random-sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random-sequence proteins arbitrarily chosen from these libraries. We found that random-sequence proteins constructed with the 12-member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20-member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids.

Keywords: random DNA library, primitive amino acids, reduced alphabet, protein solubility, mRNA display, in vitro selection

Introduction

How were present-day proteins selected from the huge sequence space during protein evolution? The size of the protein sequence space is typically estimated to be 20100 (1.3 × 10130) for a 100-residue polypeptide composed of the 20 kinds of natural amino acids. This number is enormous, compared with the number of proteins that may have existed in nature throughout the history of life on the Earth, which has been estimated to be less than 1050 molecules1 or 1021–1043 molecules.2 Thus, the nature has tested only a tiny fraction of the possible protein sequence space since the origin of life. However, the size of the protein sequence space at the early stage of protein evolution may have been overestimated. It is commonly believed that the 20 amino acids that comprise the alphabet of present-day proteins did not appear simultaneously on the evolutionary scene.3 It has been proposed that primordial proteins were constructed from a limited set of amino acids, which could have been formed in abundance at an early stage of chemical evolution, and that primitive cells have gradually introduced new amino acids into the repertoire for protein synthesis.4

Several researchers have demonstrated that the amino acid usage of various natural globular proteins and enzymes can be restricted to 5–13 members without substantial alteration of their structures and biological functions.58 Riddle et al. simplified the sequence of a small β-sheet protein, the SH3 domain, by using phage display selection, and produced two SH3 variants in which 90% of the sequence, excluding the binding region, utilized only five amino acids (Ala, Gly, Glu, Ile, and Lys).5 Silverman et al. generated variants of the prototypical (β/α)8 barrel enzyme, triosephosphate isomerase (TIM), in which the amino acids at 142 of 182 structural positions were simplified to seven kinds (Ala, Glu, Val, Lys, Phe, Leu, and Gln) by means of in vivo selection for TIM activity.6 Akanuma et al. fabricated 88% of the 213-residue orotate phosphoribosyltransferase with a reduced set of nine amino acids (Ala, Gly, Asp, Val, Arg, Thr, Leu, Pro, and Tyr) by means of growth-related phenotype selection, though the total number of amino acid types was 13.7 Walter et al. focused on an α-helical protein, chorismate mutase, and created an active enzyme constructed entirely from a set of nine amino acids (Asp, Glu, Arg, Met, Ile, Asn, Lys, Phe, and Leu) by in vivo selection.8

Other researchers have also produced folded de novo proteins from pattern sequence libraries constructed from limited numbers of amino acids. Hecht's group designed four helix bundle proteins by arranging five kinds of nonpolar amino acids (Val, Met, Ile, Phe, and Leu) and six kinds of polar amino acids (Asp, Glu, Asn, Lys, Gln, and His) in an appropriate order.911 Jumawid et al. have constructed α3β3 de novo proteins through binary combination of simplified hydrophobic (Val, Ile, and Leu) and hydrophilic (Ala, Glu, Lys, and Thr) amino acid sets.12 These findings all support the view that the full amino acid alphabet set is not essential for protein folding.

As an alternative approach, several researchers have shown that random-sequence proteins constructed from limited alphabets have different characters from random-sequence proteins with the 20-amino acid alphabet. Although random-sequence proteins with the 20-amino acid alphabet showed no remarkable secondary structure,1316 random-sequence proteins formed with only three kinds of amino acids (Gln, Leu, and Arg) had helical structure but show a strong tendency to aggregate.17,18 We have reported that random-sequence proteins consisting of five primitive amino acids (Ala, Gly, Val, Asp, and Glu), encoded by GNN codons (N = T, C, A, or G), show high solubility.19 Thus, random-sequence libraries with limited alphabets, especially the GNN codon-based library, may be appropriate for functional protein searching. However, the previously constructed GNN codon-based library included many sequences with frameshift and stop codons, and the lengths of the sequences varied. Further, the functionality of GNN codon-based proteins may be limited, because the alphabet contains no positively charged amino acid.

In this study, we constructed random-sequence gene libraries with limited amino acid alphabets by using mRNA display to eliminate incomplete sequences with frameshift and stop codons from the libraries.20,21 In the mRNA display technique, each mRNA in a library is covalently bound to its corresponding protein through puromycin,22,23 but mRNA sequences containing a stop codon or frameshift cannot form mRNA–protein conjugates or cannot be properly translated to generate the C-terminal tag, respectively. Thus, they can be washed away in affinity selection based on a property of the peptide portion. We constructed a new random-sequence gene library containing RNN (R = A or G) codons encoding a 12-amino acid alphabet, that is, ANN codons for three primitive amino acids (Ser, Thr, and Arg) and four advanced amino acids (Met, Ile, Asn, and Lys) and GNN codons for five primitive amino acids (Ala, Gly, Val, Asp, and Glu). Therefore, the RNN codon-based library contains not only two kinds of negatively charged amino acids, Asp and Glu, but also two kinds of positively charged amino acids, Arg and Lys. We also constructed a GNN library and an NNN library so that we could compare the physical properties of random-sequence proteins with alphabets consisting of 5, 12, and 20 amino acids.

Results and Discussion

Construction of random-sequence protein libraries with limited alphabets

We constructed three kinds of random-sequence DNA libraries with more than 100 contiguous random codons GNN (for Ala, Gly, Asp, Glu, and Val), RNN (GNN plus Ile, Met, Thr, Asn, Lys, Ser, and Arg), or NNN (all 20 kinds of amino acids). We designed DNA cassettes consisting of 15 consecutive random codons between constant sequences to be assembled for construction of DNA libraries with 120 random codons, as previously described.19

When the synthesized DNA cassettes of NNN, RNN, and GNN were sequenced, not only the NNN cassette but also the GNN and RNN cassettes, included stop codons (NNN, 66%; GNN, 20%; RNN, 12%), probably due to chemical synthetic errors. Thus, we planned to eliminate sequences containing stop codons and frameshifts from the synthesized random-sequence DNA library by “preselection” using mRNA display, as previously described.21 When a library of mRNA-displayed proteins with a C-terminal FLAG tag is purified by using anti-FLAG antibody-immobilized beads, sequences containing stop codons or frameshifts can be eliminated from the library, because mRNA sequences containing a stop codon cannot associate with any protein on the ribosome through puromycin, and sequences containing frameshifts cannot be translated to polypeptides with the C-terminal FLAG tag (Fig. 1). After preselection and RT-PCR, the fraction of DNA sequence containing stop codons and frameshifts was effectively decreased, and no such sequences were observed in 30 samples sequenced from each library.

Figure 1.

Figure 1

Schematic representation of preselection for removing sequences containing stop codons and frameshifts from a library by mRNA display. (1) Random DNA library is transcribed and ligated with a PEG spacer bearing puromycin. (2) The RNA library is translated in vitro. An RNA template without a stop codon and frameshift displays a full-length protein with the C-terminal FLAG tag (red). An RNA template with a stop codon displays no protein. An RNA template with a frameshift displays the corresponding frame-shifted polypeptide (blue) lacking the FLAG tag. (3) The resulting mRNA–protein conjugates are purified with anti-FLAG antibody-immobilized beads. (4) The RNA portion of the bound molecules is amplified by RT-PCR to form a random DNA library without stop codons and frameshifts.

The preselected DNA cassettes were then assembled through digestion and ligation, as previously described.19 The random region of each library was flanked by fixed sequences encoding affinity tags, the N-terminal T7·tag, and the C-terminal FLAG tag. The lengths of sequences arbitrarily chosen from the constructed random-sequence libraries were almost equal (Fig. 2), whereas the lengths varied in the previous study without preselection.19

Figure 2.

Figure 2

Predicted amino acid sequences of arbitrarily chosen random-sequence proteins from the GNN library (G1–7), RNN library (R1–16), and NNN library (N1–10). The common sequences, including the N-terminal T7·tag (1–11 aa) and the C-terminal His6 tag (152–157 aa) sequences, are also shown. The repeated Trp residues were derived from random cassette junctions and used for fluorescence studies and ultraviolet measurements for protein quantitation.

Expression of arbitrarily chosen random-sequence proteins with limited amino acid alphabets

As solubility in aqueous solution is one of the most important properties of globular proteins, we examined the solubility of random-sequence proteins arbitrarily chosen from each library. The random-sequence proteins were overexpressed in E. coli, and proteins in the soluble fraction and the insoluble fraction were detected by Western blotting with anti-T7·tag antibody. As shown in Figure 3 and Table I, all of the GNN proteins (G1–7 except G3, which was not expressed) from the GNN library were present in the soluble fraction [Fig. 3(A)], in agreement with the previous study,19 whereas all of the NNN proteins (N1–10 except N1, N3, N6, and N9, which were not expressed) from the NNN library were present in the insoluble fraction [Fig. 3(C)]. The RNN proteins (R1–16 except R6 and R11, which were not expressed) from the RNN library were intermediate in character; that is, one RNN protein (R16) was expressed only in the soluble fraction, 11 RNN proteins were expressed only in the insoluble fraction, and two (R4 and R7) were expressed in both fractions [Fig. 3(B)].

Figure 3.

Figure 3

Expression of random-sequence proteins from the GNN library (A), RNN library (B), and NNN library (C). The soluble (lanes S) and insoluble (lanes I) fractions of overexpressed proteins were analyzed by 16.5% Tricine SDS-PAGE. The proteins were detected by Coomassie brilliant blue staining (top) or Western blotting with anti-T7·tag antibody (bottom). The arrowheads indicate the positions of recombinant proteins.

Table I.

Numbers of Random-Sequence Proteins in Various Solubility Ranges

Solubility (%)
Library 100 ≥50 <50 0
GNN 2 4 0 0
RNN 1 1 1 11
NNN 0 0 0 6

Some sequences, such as G3, R6, R11, N1, N3, N6, and N9, were not expressed in E. coli. Interestingly, the content of unexpressed proteins in the NNN library (40%) was higher than that in the RNN library (13%) or the GNN library (14%). It is well known that codon bias, tRNA abundance, and gene expression are correlated.24 As the NNN includes many low-usage codons when compared with the RNN, the amount of protein production might be very less because of the presence of low-usage codon clusters. Recently, Kudla et al. found that strong secondary structure at the 5′ end of an mRNA (up to around 30-nt downstream from the start codon) blocks ribosome binding and obstructs translation initiation.25 In this study, however, a common T7·tag and linker sequence (51-nt) were introduced after the start codon for all three libraries (Fig. 2), and thus, this is not the reason for the bias. Another possibility is that translation products may be degraded in a sequence-specific manner in E. coli.

We examined the relationship between the solubility of random-sequence proteins and several properties of the amino acid sequences. Wilkinson and Harrison suggested that protein solubility is strongly affected by net charge and the fraction of turn-forming residues (Asp, Asn, Pro, Gly, and Ser) and is weakly affected by the hydrophobicity and the protein size.26 As shown in Table II, we found no relation between the solubility and the fraction of turn-forming residues, hydrophobicity, or protein size. High solubility of almost all GNN proteins could be explained by the parameter of net charge, because all GNN proteins are highly negatively charged (−29 to −39). The soluble RNN proteins have higher net charge (−12 and −13) and lower hydrophobicity (−1.05 and −0.70) than the other RNN proteins (−13 to +5 and −0.56 to 0.17, respectively). However, the low solubility of NNN proteins with high net charge (+13 and +16) and low hydrophobicity (−0.74 and −0.82) cannot be easily explained.

Table II.

Parameters of Random-Sequence Proteins

Protein Solubilitya Kyte-Doolittleb Net Chargec Turnd
G1 100 +0.38 −29 47
G2 100 +0.27 −29 42
G3 +0.03 −39 49
G4 ≥50 +0.41 −30 50
G5 ≥50 −0.20 −39 56
G6 ≥50 +0.31 −29 43
G7 ≥50 +0.20 −31 50
R1 0 −0.19 +4 42
R2 0 −0.44 −8 52
R3 0 −0.36 −13 41
R4 <50 −0.05 −4 49
R5 0 −0.03 +5 42
R6 −0.27 −7 48
R7 ≥50 −0.70 −13 45
R8 0 −0.03 −3 39
R9 0 −0.56 +4 44
R10 0 −0.06 −11 37
R11 −0.27 −7 48
R12 0 +0.17 −6 40
R13 0 +0.05 −2 43
R14 0 −0.44 −7 47
R15 0 −0.29 −5 40
R16 100 −1.05 −12 52
N1 −0.49 +4 43
N2 0 −0.74 +16 44
N3 −0.49 +7 41
N4 0 −1.04 +8 46
N5 0 −0.17 +7 45
N6 −0.64 +9 39
N7 0 −0.82 +13 52
N8 0 −0.22 +5 59
N9 −0.80 +2 42
N10 0 −0.74 +1 63
a

—, Not expressed.

b

Calculated based on the index.27

c

The number of Arg and Lys minus the number of Asp and Glu.

d

The number of Asp, Asn, Pro, Gly, and Ser.

Yomo's group reported that 20% (5 of 25) of random-sequence proteins with the 20 alphabet were soluble in E. coli,14 and this is inconsistent with our present result. We found no soluble protein among 10 NNN proteins. As the mean values of net charge and hydrophobicity of the random-sequence proteins were almost equal in the two cases, differences in these values can not explain the difference of solubility. Instead, the difference may be due to the difference of promoter: the tac promoter was used in the previous study,14 whereas the strong T7 promoter was used in this study for expression of NNN proteins as well as RNN and GNN proteins.

Structural characterization of soluble R7 and R16 proteins

Although random-sequence proteins with an alphabet of 5 or 20 amino acids have been structurally characterized in previous reports,14,19 random-sequence proteins with an intermediate alphabet (e.g., 12 amino acids) have not been examined. Even though random-sequence proteins in the RNN library tended to be soluble, they are unlikely to be functional if they form extended random-coil structure. Thus, we purified two soluble random-sequence proteins, R7 and R16, from the RNN library by using the C-terminal His-tag (Fig. 4) and characterized their secondary, tertiary, and oligomeric structures by means of circular dichroism (CD) spectroscopy, 4,4′-dianilino-1,1′-binaphthyl-5,5′-disulfonic acid (bis-ANS) binding studies, and gel filtration, respectively.

Figure 4.

Figure 4

Purification of two soluble proteins from the RNN library. The random-sequence proteins, R7 and R16 with His6 tag, were overexpressed in E. coli, and the soluble fractions of the crude lysate were purified on Ni-NTA resins. The samples before (N, nonpurified) and after purification (P, purified) were resolved by 15% SDS-PAGE and stained with Coomassie brilliant blue. M, marker.

The CD spectra of two RNN proteins had a strong minimum at 200 nm and a weak minimum at 222 nm in an aqueous solution (Fig. 5), indicating the existence of a substantial proportion of random-coil conformation. However, the spectra measured in the presence of trifluoroethanol (TFE)28 had minima at 208 and 222 nm, which is typical of an α-helical protein (Fig. 5). These results indicated that RNN proteins have the potential to form at least partial secondary structure. Similarly, random-sequence proteins using a 20-amino acid alphabet14 have been reported to take helical structure only in the presence of TFE. Thus, random-sequence proteins using 12- and 20-amino acid alphabets showed similar propensity for secondary-structure formation. On the other hand, QLR proteins18 have helical structure in the absence of TFE, but most QLR proteins tend to aggregate, and addition of GuHCl is necessary for solubilization. Thus, there may be a trade-off between secondary-structure formation and high solubility in the case of random-sequence proteins.

Figure 5.

Figure 5

Circular dichroism spectra of random-sequence proteins R7 (A) and R16 (B). CD spectra of purified proteins were measured at different concentrations of TFE (0, 20, and 40%).

A bis-ANS binding experiment indicated the presence of hydrophobic clusters in the RNN proteins (Fig. 6). Bis-ANS is a fluorescent probe that detects molten globule states of a protein through binding to an accessible hydrophobic core, resulting in enhanced fluorescence emission.29 As the fluorescence change in the presence of R16 was larger than that in the presence of R7 (Fig. 6), R16 may have more extensive hydrophobic clusters than R7.

Figure 6.

Figure 6

Fluorescence spectra of 10 μM bis-ANS (excitation at 393 nm) in the absence or presence of 2 μM R7 protein or 2 μM R16 protein.

Size exclusion chromatography showed that two RNN proteins form monomeric structures with more compact shapes than the random-coil structures of denatured proteins with similar molecular weight, but more extended shapes than the globular structures of natural proteins (Fig. 7). The Stokes radius of R16 is smaller than that of R7, whereas the molecular weight of R16 is larger than that of R7. This result indicates that R16 forms a more compact shape than R7, and this is consistent with the results of the bis-ANS binding experiment. GNN proteins19 formed monomeric structures with extended shapes, as did RNN proteins, but random-sequence proteins using a 20-amino acid alphabet14 have been reported to form oligomeric structures, owing to their tendency to aggregate.

Figure 7.

Figure 7

Stokes radius plot for R7 and R16 proteins. The vertical arrows indicate the elution volumes determined by gel filtration of purified R7 and R16 proteins. The circles indicate the elution volumes of four control proteins (BSA, 67.0 kDa; ovalbumin, 43.0 kDa; chymotrypsinogen A, 25.0 kDa; and RNase, 13.7 kDa) plotted against their known Stokes radii, and the line represents an empirical equation relating Stokes radius to elution position. The Stokes radius of unfolded protein and folded globular protein (horizontal arrows) was calculated from the literature equation.30

Thus, the two RNN proteins have partial secondary structure and a hydrophobic core, and their estimated sizes indicate that they are more compact than would be expected for expanded random-coil structures. These properties are similar to those reported for other soluble random-sequence proteins with a 5 or 20 alphabet.14,19 Although the proteins do not exhibit extensive, well-folded structure, a large number of intrinsically unstructured domains, which become structured only during binding to the target (i.e., induced fit), have already been identified in nature31 and designed by protein engineers.8,30,32 From the viewpoint of molecular evolution, such partially structured polypeptides might have been the first evolutionary intermediates, and their function and structure would have coevolved.33

Toward the creation of novel functional proteins with reduced amino acid alphabets

Although the frequency of occurrence of functional proteins in a randomized library has been estimated to be about 1 in 1 × 1011,15,34 the frequency of functional proteins in the sequence space will depend on the choice of structural motif and the level of catalytic activity required for selection. For instance, Taylor et al. calculated the required library size for selection of chorismate mutase activity as 5 × 1023: this size far exceeds what is currently accessible in laboratory experiments.35 Therefore, the possibility that the choice of an appropriate restricted amino acid alphabet could be a powerful tool in designing randomized libraries for screening is of great interest. Reetz et al. compared the quality of randomized libraries, in which five different sites of epoxide hydrolase were replaced by codon NNK (20 kinds of amino acids) or codon NDT (12 kinds of amino acids; Val, Asp, Gly, Ile, Asn, Ser, Arg, Leu, His, Phe, Tyr, and Cys), and they found that the NDT library showed a much higher frequency of positive variants and yielded a greater improvement of catalytic activity in comparison with the NNK library.36 The amino acid composition of the NDT library is different from that of the RNN library, which has no aromatic residues; however, it is believed that aromatic amino acids did not exist in the early stage of evolution, because they would have been quickly destroyed by ultraviolet irradiation.37 As the RNN library was completely randomized, whereas the NDT library was randomized only for three residues from the active site, the results obtained with the two libraries cannot be directly compared. Moreover, Reetz et al. and we did not investigate other libraries with a 12 alphabet. Thus, the best set of amino acids for functional screening still remains to be identified, and further investigation is required.

In this study, we have shown that random-sequence proteins exhibit unique biophysical characters that depend on the amino acid alphabet employed. Random-sequence proteins from the RNN library tended to have higher solubility than those from the NNN library. Because other biophysical properties seemed to be almost the same among the RNN proteins and the NNN proteins, it will be of interest to test whether the frequency of occurrence of functional proteins in a random-sequence library based on the codon RNN, which mainly encodes primitive amino acids, is higher than that in a library based on a 20-amino acid alphabet. Experiments along this line are in progress in our laboratory by using mRNA display for in vitro selection of functional proteins from randomized RNN and NNN libraries. It should be noted that the higher solubility of the RNN proteins in E. coli may be particularly favorable for screening methods based on E. coli expression systems, including phage display, when compared with those based on in vitro translation systems.

Materials and Methods

Construction of random-sequence DNA library

DNAs consisting of 15 consecutive random codons between constant sequences [5′-GGTAGATCTGGA CCTGCAGGATGGGNN (GNN, RNN, or NNN)14 TGGGCGAGACCGCTCGAGGTTC-3′] were synthesized by Fasmac, Japan. The random-sequence cassettes were amplified by PCR using primers 5′-AT GGCTAGCATGACTGGTGGACAGCAAATGGGTAGA TCTGGACCTGCAGGATG-3′ and 5′-TTTTTTTTCTT GTCGTCATCGTCCTTGTAGTCAAGAACCTCGAGC GGTCTCGC-3′ with KOD-plus DNA polymerase (Toyobo). The PCR products were purified with a QIAquick PCR purification kit (Qiagen). The purified DNA was reamplified by PCR using primers 5′-ATTTAGGTGACACTATAGAACAACAACAACAACAA ACAACAACAAAATGGCTAGCATGACTGGTGGAC-3′ and 5′-TTTTTTTTCTTGTCGTCATCGTCCTTGTAG-3′. The DNA was purified with a QIAquick PCR purification kit.

mRNA display selection was performed as previously described.20 Briefly, the purified DNA was transcribed with a RiboMax large-scale RNA production system-SP6 (Promega). The resulting RNA was purified with an RNeasy mini kit (Qiagen) and ligated with polyethylene glycol (PEG)-Puro spacer [p(dCp)2-T(Fluor)p-PEGp-(dCp)2-puromycin] using T4 RNA ligase (Takara). The ligated RNA was purified with the RNeasy mini kit. A 40-μL aliquot of a wheat germ extract reaction mixture (Promega) containing 10 pmol of the ligated random-sequence RNA cassette, 80 μM amino acid mixture, 76 mM potassium acetate, and 40 U of RNase inhibitor (Invitrogen) was incubated for 1 h at 26°C. The reaction mixture containing mRNA-displayed proteins (mRNA–protein conjugates) was added to 60 μL of anti-FLAG M2 antibody-immobilized agarose beads (Sigma) preequilibrated with 40 μL of TBST and mixed on a rotator at 4°C for 1 h. The beads were washed with 300 μL of TBST three times. The mRNA-displayed proteins were eluted with TBST containing 1 mg/mL FLAG M2 peptide (Sigma) at 4°C for 1 h. The mRNA portion of eluted mRNA-displayed proteins was amplified by RT-PCR with a One-Step RT-PCR kit (Qiagen) using primers 5′-GGTAGATCTGGACCTGCAGGATG-3′ and 5′-GAACCTCGAGCGGTCTCGC-3′. The RT-PCR products were purified with a QIAquick PCR purification kit.

The purified products were separated into two equal aliquots that were restricted with either BsaI or BfuAI. The resulting fragments were purified by 8% PAGE, ligated together using T4 DNA ligase (Promega), and amplified by PCR using primers 5′-GGTAGATCTGGACCTGCAGGATG-3′ and 5′-GAACCTCGAGCGGTCTCGC-3′. Repeating this procedure three times yielded a final library with eight contiguous random regions.

Cloning, expression, and purification of the random-sequence proteins

Randomly selected clones from the DNA libraries were sequenced with an ABI PRISM 3100 (Applied Biosystems). The random-sequence regions of the clones were digested with BglII and XhoI and subcloned into the pET20 vector (Novagen) containing the N-terminal T7·tag sequence and the C-terminal His6 tag sequence. The individual plasmids were transformed into Escherichia coli BL21(DE3)-CodonPlus cells (Stratagene). The bacteria were grown in LB broth containing 100 μg/mL ampicillin and 40 μg/mL chloramphenicol at 37°C, and protein expression was induced by adding 0.1 mM isopropylthio-ß-d-galactoside. After an additional 3 h of growth, the bacteria were harvested by centrifugation and lysed in a BugBuster (Novagen) containing a protease inhibitor cocktail (Sigma). The centrifuged supernatants were used as soluble fractions. The pellets were resuspended in a buffer containing 8M urea, and the supernatants after centrifugation were used as insoluble fractions. The proteins in these fractions were analyzed by 16.5% Tricine SDS-PAGE and detected by Coomassie brilliant blue staining or Western blotting with anti-T7·tag antibody. The proteins in soluble fractions were purified by affinity chromatography using Ni-NTA Superflow resin (Qiagen), from which they were eluted with a pH gradient. The protein molar concentration was determined by a BCA protein assay kit (Pierce).

CD spectroscopy

CD spectra of purified proteins were measured using a J-820 spectropolarimeter (Jasco) at 4°C in the presence of different concentrations of 2,2,2-trifluoroethanol (TFE, Wako). The light pathlength used was 2 mm. The results were expressed as mean residue molar ellipticity [θ].

Fluorescence spectroscopy

The fluorescence spectra of 10 μM 4,4′-dianilino-1,1′-binaphthyl-5,5′-disulfonic acid (bis-ANS, Molecular Probes) in the absence and presence of 1 μM protein were measured at 4°C on a FP-777 spectrofluorometer (Jasco) with excitation at 393 nm.

Size exclusion chromatography

Gel filtration experiments on purified proteins were performed using Superdex-75 (Amersham). Molecular weights were determined by linear regression analysis using a Gel Filtration LMW Calibration Kit (Amersham). The Stokes radii of the random-sequence proteins and the control proteins (bovine serum albumin, ovalbumin, chymotrypsinogen, and ribonuclease) were calculated from their elution volumes as previously described.38

Glossary

Abbreviations:

aa

amino acid

bis-ANS

4,4′-dianilino-1,1′-binaphthyl-5,5′-disulfonic acid

BSA

bovine serum albumin

CD

circular dichroism

E. coli

Escherichia coli

LB

Luria-Bertani

mRNA

messenger RNA

Ni-NTA

nickel-nitrilotriacetic acid

PCR

polymerase chain reaction

PEG

polyethylene glycol

RNase

ribonuclease

RT

reverse transcription

SDS-PAGE

sodium dodecyl sulfate-polyacrylamide gel electrophoresis

TBS

tris-buffered saline

TFE

trifluoroethanol

TIM

triosephosphate isomerase.

References

  • 1.Mandecki W. The game of chess and searches in protein sequence space. Trends Biotechnol. 1998;16:200–202. [Google Scholar]
  • 2.Dryden DTF, Thomson AR, White JH. How much of protein sequence space has been explored by life on earth? J R Soc Interface. 2008;5:953–956. doi: 10.1098/rsif.2008.0085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wong JT. Coevolution theory of the genetic code at age thirty. BioEssays. 2005;27:416–425. doi: 10.1002/bies.20208. [DOI] [PubMed] [Google Scholar]
  • 4.Trifonov EN. The triplet code from first principles. J Biomol Struct Dyn. 2004;22:1–11. doi: 10.1080/07391102.2004.10506975. [DOI] [PubMed] [Google Scholar]
  • 5.Riddle DS, Santiago JV, Bray-Hall ST, Doshi N, Grantcharova VP, Yi Q, Baker D. Functional rapidly folding proteins from simplified amino acid sequences. Nat Struct Biol. 1997;4:805–809. doi: 10.1038/nsb1097-805. [DOI] [PubMed] [Google Scholar]
  • 6.Silverman JA, Balakrishnan R, Harbury PB. Reverse engineering the (α/β)8 barrel fold. Proc Natl Acad Sci USA. 2001;98:3092–3097. doi: 10.1073/pnas.041613598. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Akanuma S, Kigawa T, Yokoyama S. Combinatorial mutagenesis to restrict amino acid usage in an enzyme to a reduced set. Proc Natl Acad Sci USA. 2002;99:13549–13553. doi: 10.1073/pnas.222243999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Walter KU, Vamvaca K, Hilvert D. An active enzyme constructed from a 9-amino acid alphabet. J Biol Chem. 2005;280:37742–37746. doi: 10.1074/jbc.M507210200. [DOI] [PubMed] [Google Scholar]
  • 9.Kamtekar S, Schiffer JM, Xiong H, Babik JM, Hecht MH. Protein design by binary patterning of polar and nonpolar amino acids. Science. 1993;262:1680–1685. doi: 10.1126/science.8259512. [DOI] [PubMed] [Google Scholar]
  • 10.Go A, Kim S, Baum J, Hecht MH. Structure and dynamics of de novo proteins from a designed superfamily of 4-helix bundles. Protein Sci. 2008;17:821–832. doi: 10.1110/ps.073377908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Patel SC, Bradley LH, Jinadasa SP, Hecht MH. Cofactor binding and enzymatic activity in an unevolved superfamily of de novo designed 4-helix bundle proteins. Protein Sci. 2009;18:1388–1400. doi: 10.1002/pro.147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Jumawid MT, Takahashi T, Yamazaki T, Ashigai H, Mihara H. Selection and structural analysis of de novo proteins from an α3β3 genetic library. Protein Sci. 2009;18:384–398. doi: 10.1002/pro.41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mandecki W. A method for construction of long randomized open reading frames and polypeptides. Protein Eng. 1990;3:221–226. doi: 10.1093/protein/3.3.221. [DOI] [PubMed] [Google Scholar]
  • 14.Yamauchi A, Yomo T, Tanaka F, Prijambada ID, Ohhashi S, Yamamoto K, Shima Y, Ogasahara K, Yutani K, Kataoka M, Urabe I. Characterization of soluble artificial proteins with random sequences. FEBS Lett. 1998;421:147–151. doi: 10.1016/s0014-5793(97)01552-4. [DOI] [PubMed] [Google Scholar]
  • 15.Keefe AD, Szostak JW. Functional proteins from a random-sequence library. Nature. 2001;410:715–718. doi: 10.1038/35070613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Watters AL, Baker D. Searching for folded proteins in vitro and in silico. Eur J Biochem. 2004;271:1615–1622. doi: 10.1111/j.1432-1033.2004.04072.x. [DOI] [PubMed] [Google Scholar]
  • 17.Davidson AR, Sauer RT. Folded proteins occur frequently in libraries of random amino acid sequences. Proc Natl Acad Sci USA. 1994;91:2146–2150. doi: 10.1073/pnas.91.6.2146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Davidson AR, Lumb KJ, Sauer RT. Cooperatively folded proteins in random sequence libraries. Nat Struct Biol. 1995;2:856–864. doi: 10.1038/nsb1095-856. [DOI] [PubMed] [Google Scholar]
  • 19.Doi N, Kakukawa K, Oishi Y, Yanagawa H. High solubility of random-sequence proteins consisting of five kinds of primitive amino acids. Protein Eng Des Sel. 2005;18:279–284. doi: 10.1093/protein/gzi034. [DOI] [PubMed] [Google Scholar]
  • 20.Miyamoto-Sato E, Takashima H, Fuse S, Ishizaka M, Tateyama S, Horisawa K, Sawasaki T, Endo Y, Yanagawa H. Highly stable and efficient mRNA templates for mRNA-protein fusions and C-terminally labeled proteins. Nucleic Acids Res. 2003;31:e78. doi: 10.1093/nar/gng078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Cho G, Keefe AD, Liu R, Wilson DS, Szostak JW. Constructing high complexity synthetic libraries of long ORFs using in vitro selection. J Mol Biol. 2000;297:309–319. doi: 10.1006/jmbi.2000.3571. [DOI] [PubMed] [Google Scholar]
  • 22.Nemoto N, Miyamoto-Sato E, Husimi Y, Yanagawa H. In vitro virus: bonding of mRNA bearing puromycin at the 3′-terminal end to the C-terminal end of its encoded protein on the ribosome in vitro. FEBS Lett. 1997;414:405–408. doi: 10.1016/s0014-5793(97)01026-0. [DOI] [PubMed] [Google Scholar]
  • 23.Roberts RW, Szostak JW. RNA-peptide fusions for the in vitro selection of peptides and proteins. Proc Natl Acad Sci USA. 1997;94:12297–12302. doi: 10.1073/pnas.94.23.12297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Ikemura T. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J Mol Biol. 1981;151:389–409. doi: 10.1016/0022-2836(81)90003-6. [DOI] [PubMed] [Google Scholar]
  • 25.Kudla G, Murray AW, Tollervey D, Plotkin JB. Coding-sequence determinants of gene expression in Escherichia coli. Science. 2009;324:255–258. doi: 10.1126/science.1170160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Wilkinson DL, Harrison RG. Predicting the solubility of recombinant proteins in Escherichia coli. Biotechnology. 1991;9:443–448. doi: 10.1038/nbt0591-443. [DOI] [PubMed] [Google Scholar]
  • 27.Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982;157:105–132. doi: 10.1016/0022-2836(82)90515-0. [DOI] [PubMed] [Google Scholar]
  • 28.Shiraki K, Nishikawa K, Goto Y. Trifluoroethanol-induced stabilization of the alpha-helical structure of beta-lactoglobulimplication for non-hierarchical protein folding. J Mol Biol. 1995;245:180–194. doi: 10.1006/jmbi.1994.0015. [DOI] [PubMed] [Google Scholar]
  • 29.Stryer L. The interaction of a naphthalene dye with apomyoglobin and apohemoglobin. A fluorescent probe of non-polar binding sites. J Mol Biol. 1965;13:482–495. doi: 10.1016/s0022-2836(65)80111-5. [DOI] [PubMed] [Google Scholar]
  • 30.Vamvaca K, Vögeli B, Kast P, Pervushin K, Hilvert D. An enzymatic molten globule: efficient coupling of folding and catalysis. Proc Natl Acad Sci USA. 2004;101:12860–12864. doi: 10.1073/pnas.0404109101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Wright PE, Dyson HJ. Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J Mol Biol. 1999;293:321–331. doi: 10.1006/jmbi.1999.3110. [DOI] [PubMed] [Google Scholar]
  • 32.Chaput JC, Szostak JW. Evolutionary optimization of a nonbiological ATP binding protein for improved folding stability. Chem Biol. 2004;11:865–874. doi: 10.1016/j.chembiol.2004.04.006. [DOI] [PubMed] [Google Scholar]
  • 33.Tokuriki N, Tawfik DS. Protein dynamism and evolvability. Science. 2009;324:203–207. doi: 10.1126/science.1169375. [DOI] [PubMed] [Google Scholar]
  • 34.Seelig B, Szostak JW. Selection and evolution of enzymes from a partially randomized non-catalytic scaffold. Nature. 2007;448:828–831. doi: 10.1038/nature06032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Taylor SV, Walter KU, Kast P, Hilvert D. Searching sequence space for protein catalysts. Proc Natl Acad Sci USA. 2001;98:10596–10601. doi: 10.1073/pnas.191159298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Reetz MT, Kahakeaw D, Lohmer R. Addressing the numbers problem in directed evolution. Chembiochem. 2008;9:1797–1804. doi: 10.1002/cbic.200800298. [DOI] [PubMed] [Google Scholar]
  • 37.Wong JT. Evolution and mutation of the amino acid code. In: Ricard J, Cornish-Bowden A, editors. Dynamics of biochemical systems. New York: Plenum Press; 1984. pp. 247–257. [Google Scholar]
  • 38.Uversky VN. Use of fast protein size-exclusion liquid chromatography to study the unfolding of proteins which denature through the molten globule. Biochemistry. 1993;32:13288–13298. doi: 10.1021/bi00211a042. [DOI] [PubMed] [Google Scholar]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES