Abstract
We report a unique case of a gene containing three homologous and contiguous repeat sequences, each of which, after excision, cloning, and expression in Escherichia coli, is shown to code for a peptide catalyzing the same reaction as the native protein, Gonyaulax polyedra luciferase (Mr = 137). This enzyme, which catalyzes the light-emitting oxidation of a linear tetrapyrrole (dinoflagellate luciferin), exhibits no sequence similarities to other luciferases in databases. Sequence analysis also reveals an unusual evolutionary feature of this gene: synonymous substitutions are strongly constrained in the central regions of each of the repeated coding sequences.
Gonyaulax polyedra is a marine photosynthetic dinoflagellate that is often responsible for the bioluminescence of the ocean at night. Mechanical stimulation of the organism is followed by light emission as brief (≈100 msec), bright (≈109 photons) flashes (1) from unique small (≈0.4 microns) spherical organelles called scintillons (2–4), which though cytoplasmic organelles project into the vacuole, with the enclosing membrane formed from and contiguous with the vacuolar membrane. They contain only the enzyme (luciferase; LCF), the substrate (luciferin), and a luciferin-binding protein (LBP) (5, 6).
The biochemistry of dinoflagellate bioluminescence is different from that of other luminous organisms (7). The luciferin is an open-chain tetrapyrrole (8), which at cytoplasmic pH is sequestered by the LBP, preventing it from reacting with the LCF (9, 10). LCF-catalyzed oxidation of luciferin by molecular oxygen results in an electronically excited species, which then emits light. No other components are required for light emission. In vivo the flash is postulated to be triggered by a vacuolar action potential, causing a transient pH decrease in the scintillon and releasing the luciferin from LBP (4).
We isolated a full-length Gonyaulax LCF cDNA clone, which upon sequencing was found to contain three homologous repeat sequences each about 1.1 kb long, with no intervening nucleotides. In earlier work it had been found that proteolytic peptides having molecular weights of ≈35 kDa retained near-full activity (11), and that protein expressed from a partial clone had LCF activity (12). We therefore cloned and expressed separately each of the regions of the gene encompassing the repeat sequences. All three were found to be active, and to have at least 25% of the activity of the full-length protein, indicating that there are three active sites in the single polypeptide chain. Sequence comparisons revealed that the more central parts of the repeat units are more highly conserved, as might be expected if they code for active sites. In addition, synonymous (silent) nucleotide substitutions are significantly less frequent in these regions, and trivial explanations for this have been excluded.
MATERIALS AND METHODS
Isolation of the Full-Length lcf cDNA Clone.
A Gonyaulax polyedra cDNA library (λ-ZAP II phagemid) was screened with a fluorescein-labeled (DuPont/NEN) partial lcf cDNA probe. After the first screening, the positive plaques were further identified through a PCR using a lcf-specific primer P1, located at the 5′ end of the reported partial lcf cDNA (pYB143) (12), and primer T3, located at the 5′ side of the cloning site on the vector. Plaques with positive PCR bands were further purified and subsequently in vivo excised according the manufacturer’s protocol (Stratagene). The clone with the longest insert (4.0 kb) was completely sequenced and named as pBS:LCF.
Expression of LCF as Glutathione S-Transferase (GST) Fusion Proteins.
LCF cDNA fragments with various lengths were generated using convenient restriction enzyme digestions and subsequently cloned in-frame with the GST expression plasmid, pGEX:3X (Pharmacia). Escherichia coli cells (JM109) containing GST–LCF expression plasmids were grown overnight at 23°C in 2YT-G medium (Pharmacia) supplemented with 100 μg/ml of ampicillin. The overnight culture was diluted 10 times in fresh 2YT-G medium supplemented with ampicillin (100 μg/ml) and grown for 3 h. Isopropyl β-d-thiogalactoside was added to a final concentration of 0.1 mM, and the cells were allowed to grow for additional 3 h before harvest. Extraction, purification, and elution of GST–LCF were performed according the manufacturer’s protocol (Pharmacia).
LCF Activity Assay.
LCF was assayed as before (12) by rapidly mixing purified GST–LCF and dinoflagellate luciferin with 1.5 ml of 0.1 M sodium citrate (pH 6.3) in a vial placed in a light tight chamber of a photometer (13).
Sequence Analysis.
The complete lcf cDNA sequence was compared with the current DNA databases (GenBank, EMBL, DDBJ, and PDB) to identify any homologous sequences in the databases. For the peptide sequence alignment of the N-terminal regions of LCF and LBP, the blast pam 240 program was used.
RESULTS AND DISCUSSION
The complete nucleotide sequence and deduced amino acid sequence for the full-length LCF cDNA clone (pBS:LCF) are shown in Fig. 1. Confirming data obtained from the partial sequence (12), the codon usage in the third position is highly biased (82.5% G or C). A large ORF within the cDNA gives a polypeptide product with a molecular weight of 136,799 Da, consistent with previously reported values of ≈130–135 kDa for LCF (6, 14). The sequences of two LCF peptides previously obtained by trypsin digestion of purified G. polyedra LCF (D. Morse and J.W.H., unpublished work) are found within this ORF (underlined in Fig. 1).
Two homologous regions, 796 and 793 nt long, separated by 332 nt, were previously reported in the coding region of the 2.4-kb partial cDNA of lcf, and a partial lcf cDNA clone, pYB144, containing a 1.6-kb insert, was shown to produce an enzymatically active protein when expressed in E. coli (12). Analysis of the full-length cDNA sequence now reveals a third repeat region, and allows us to see that there are no linker nucleotides between the repeats (Fig. 1), as had been previously thought. The three tandem repeats comprise more than 90% of the coding region, with 1,131, 1,131, and 1,128 nt for repeats 1, 2, and 3, respectively.
Four cDNA fragments coding for the three individual domains of LCF (D1, D2, and D3) were cloned in frame with the GST expression vectors pGEX:3X or pGEX:2T (Pharmacia). We also cloned the full-length lcf and the pYB144 insert into the same expression vector. The resulting expression constructs are named pLLa (full-length lcf), pLLb (D1), pLLc (pYB144; D3), pLLd (D2), and pLLe (D3). The positions and sizes of the peptides corresponding to each of the lcf cDNA fragments are indicated in Fig. 2A. The identity of each expression construct was confirmed by sequence analysis.
E. coli cells containing the above expression constructs were grown and induced to produce the different fusion proteins, and their LCF activities were determined. The proteins obtained were of the expected sizes and not noticeably degraded (Fig. 2B). All were active in the LCF assays (Fig. 3), indicating that each of the repeat regions of LCF possesses a functional catalytic site for the same biochemical reaction. Expression, isolation, and assays were repeated on three separate sets of experiments for all five constructs, with consistent results; the averages for each of the three runs are shown for each construct.
Our research shows a fully documented example of a single polypeptide that contains more than two functional and homologous domains catalyzing the same biochemical reaction. Possible cases include creatine kinase from sperm flagella of the sea urchin (Strongylocentrotus purpuratus), which is a Mr 145 protein having three apparently homologous regions (15); these could be catalytic domains, but their individual activities were not determined. Also, the cDNA of a polysaccharide hydrolase encodes three multi-functional catalytic domains, each with three different enzymatic activities (16), but no sequence data were provided.
One possible advantage to having three catalytic domains in a single molecule of LCF, instead of three LCF molecules with only one catalytic domain, is to reduce the number of LCF molecules in a scintillon, and hence reduce its colloidal osmotic pressure. Dinoflagellate scintillons are known as “dense bodies” (4, 17), containing high concentrations of the protein components required for light emission. The concentrations of LBP and LCF in scintillons, calculated based on the values reported by Desjardins and Morse (6) are about about 20 mM for LBP and 5 mM for LCF. The values used were 5 mg of LBP and 2 mg of LCF in 108 G. polyedra cells, and 3 μl for the total volume of scintillons in 108 cells (18).
Searches in the current databases revealed no sequences that are significantly homologous to lcf cDNA, except for its N-terminal region (amino acids 4–103), which has 50% identity with the N-terminal domain of LBP (19) (amino acids 6–105; Fig. 4A). This suggests that the N-terminal regions of LCF and LBP serve a similar function. The possibility that this constitutes a signal peptide is not supported by an evaluation of hydrophobicity in relation to sequence and, in fact, LCF and LBP do not appear to transit a membrane in the formation of the scintillons (4). Also, it seems unlikely that this sequence involves substrate binding or the light emitting reaction, because as shown in Fig. 3 and earlier (12), LCF peptides synthesized from clones lacking this domain are capable of catalyzing light emission in the presence of luciferin. It is possible that this region is involved in the interaction of the two proteins, or in their association with the vacuolar membranes. Immunolocalization studies have shown that LBP and LCF aggregate in the cytoplasm before migrating to the vacuolar membrane to form scintillons (4). It is also possible that the N-terminal domain harbors a signal for degradation. Both LBP and LCF, which are synthesized during the early night phase, are indeed rapidly degraded at the end of night phase (20), along with the scintillons (18), under the control of the circadian clock.
Exon recombination (21) is a probable mechanism for the existence of homologous N-terminal regions in LCF and LBP. Such a mechanism is generally considered to require the presence of introns (22), but sequencing of the corresponding genomic DNA indicates that this LCF gene, like lbp (19), contains no introns. It is possible that this gene had introns at an earlier time, but that these have been lost during evolution.
Indeed, certain structural features of the gene are unique with regard to its evolutionary status. The aligned amino acid sequences of the three repeat domains (D1, D2, and D3) are shown in Fig. 4B. Considering the full length of the repeats, the sequence identities are 75% for D1 and D2, 75% for D1 and D3, and 80% for D2 and D3, whereas the corresponding nucleotide identities are 74%, 74%, and 78%, respectively. The boundary regions, though clearly homologous, possess numerous amino acid differences, indicating that the triplication is not recent. In contrast, the nucleotide sequences are highly conserved in the central regions of the three repeats, which represent 39% of the domains. In a 437-nt stretch (nucleotides 742–1,179, 1,873–2,310, and 3,001–3,438) the sequences are 92% identical among the three regions. This is consistent with the presumption that the central region codes for the catalytic active site.
However, synonymous substitutions in this highly conserved region, which would not alter amino acids, are also constrained. As shown in Table 1, there are 130 conserved amino acids in the central region and, for the repeats, 106 of them have identical codons (82%), whereas in the boundary region only 53 of the 126 conserved amino acids (42%) have identical codons. The corresponding synonymous substitution rate is 8% for the central region (25 synonymous substitutions out of a possible 310) and 30% for the boundary region (87 synonymous substitutions out of a possible 286). This nearly 4-fold difference cannot be attributed to a bias in the amino acid usage because the amino acid composition is similar in the two regions (Table 1). Codon bias is also not the factor responsible, because the conserved amino acids that have identical codons in the boundary regions use the biased codon in almost all cases (one CTG is used for leucine and one AGC is used for serine), while many other unbiased codons are used in the central region (Table 1). Our research shows an example of such nucleotide conservation in sequences that code for protein.
Table 1.
Amino acid (favored codon) | Central region
|
Boundary region
|
||||
---|---|---|---|---|---|---|
Total | With identical codon | % | Total | With identical codon | % | |
I (AUC) | 6 | 4 (AUC 4) | 67 | 3 | 0 | 0 |
P (CCC) | 10 | 8 (CCC 4, CCG 2, CCU 1, CCA 1) | 80 | 12 | 1 (CCC 1) | 8 |
V (GUC) | 8 | 8 (GUC 5, GUG 3) | 100 | 6 | 1 (GUC 1) | 17 |
G(GGC) | 19 | 15 (GGC 14, GGG 1) | 79 | 17 | 8 (GGC 8) | 47 |
A (GCC) | 10 | 8 (GCC 5, GCG 2, GCU 1) | 80 | 7 | 2 (GCC 2) | 29 |
S (UCC) | 5 | 5 (UCC 4, UCU 1) | 100 | 4 | 2 (UCC 1, AGC 1) | 50 |
R (CGC) | 6 | 5 (CGC 3, CGG 1, CGU 1) | 83 | 6 | 3 (CGC 3) | 50 |
L (CUC) | 11 | 7 (CUC 4, CUG 3) | 64 | 13 | 2 (CUC 1, CUG 1) | 15 |
All | 130 | 106 | 82 | 126 | 53 | 42 |
Synonymous substitution constraint in the central region of the repeats. The conserved amino acids are defined as those amino acids that are identical in the three repeats. The total number of conserved amino acids (130 for the central region and 126 for the boundary region) do not include methionine and tryptophan. Amino acids that use 2-fold degenerate codons are not shown but are included in the calculation.
Possible explanations for such unusual constraint in synonymous substitutions in the central region of the repeats include (i) the duplication events were recent, but conditions were such that there were unusually rapid base substitutions in the boundary regions of the repeats; (ii) the duplication events were not recent, but lcf RNA (or DNA) serves some function other than coding for LCF, which has restrained silent nucleotide substitutions; and (iii) the homogeneity of nucleotide sequences in the central regions among the three repeats was generated subsequent to the duplication events by a process similar to gene conversion (21). This last possibility is favored because there are at least two, and probably many, copies of this gene (L.L. and J.W.H., unpublished work), as has also been reported for lbp (19).
Acknowledgments
We thank Dr. Thérèse Wilson for extensive discussions of the data and assistance in preparing the manuscript and Drs. R. C. Lewontin, M. Y. Long, R. Milkman, and D. Hartl for helpful comments. This research was supported in part by research grants from the National Science Foundation (MCB 9306879 and 9631935) and the Office of Naval Research (N00014-94-1-0575 and N00014-96-1-1118).
ABBREVIATIONS
- LCF
luciferase
- LBP
luciferin-binding protein
- GST
glutathione S-transferase
References
- 1.Krasnow R, Dunlap J, Taylor W, Hastings J W, Vetterling W, Gooch V D. J Comp Physiol. 1980;138:19–26. [Google Scholar]
- 2.Fogel M, Hastings J W. Proc Natl Acad Sci USA. 1972;69:690–693. doi: 10.1073/pnas.69.3.690. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Johnson C H, Inoue S, Flint A, Hastings J W. J Cell Biol. 1985;100:1435–1446. doi: 10.1083/jcb.100.5.1435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Nicolas M-T, Nicolas G, Johnson C H, Bassot J-M, Hastings J W. J Cell Biol. 1987;105:723–735. doi: 10.1083/jcb.105.2.723. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Nicolas M-T, Morse D, Bassot J-M, Hastings J W. Protoplasma. 1991;160:159–166. [Google Scholar]
- 6.Desjardins M, Morse D. Biochem Cell Biol. 1993;71:176–182. doi: 10.1139/o93-028. [DOI] [PubMed] [Google Scholar]
- 7.Hastings J W. Gene. 1996;173:5–11. doi: 10.1016/0378-1119(95)00676-1. [DOI] [PubMed] [Google Scholar]
- 8.Nakamura H, Kishi Y, Shimomura O, Morse D, Hastings J W. J Am Chem Soc. 1989;111:7607–7611. [Google Scholar]
- 9.Fogel M, Hastings J W. Arch Biochem Biophys. 1971;142:310–321. doi: 10.1016/0003-9861(71)90289-x. [DOI] [PubMed] [Google Scholar]
- 10.Morse D, Pappenheimer A M, Hastings J W. J Biol Chem. 1989;264:11822–11826. [PubMed] [Google Scholar]
- 11.Krieger N, Njus D, Hastings J W. Biochemistry. 1974;13:2871–2877. doi: 10.1021/bi00711a015. [DOI] [PubMed] [Google Scholar]
- 12.Bae Y M, Hastings J W. Biochim Biophys Acta. 1994;1219:449–456. doi: 10.1016/0167-4781(94)90071-x. [DOI] [PubMed] [Google Scholar]
- 13.Mitchell G, Hastings J W. Anal Biochem. 1971;39:243–250. doi: 10.1016/0003-2697(71)90481-7. [DOI] [PubMed] [Google Scholar]
- 14.Dunlap J, Hastings J W. J Biol Chem. 1981;256:10509–10518. [PubMed] [Google Scholar]
- 15.Wothe D D, Charbonneau H, Shapiro B M. Proc Natl Acad Sci USA. 1990;87:5203–5207. doi: 10.1073/pnas.87.13.5203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Xue G-P, Gobius K S, Orpin C G. J Gen Microbiol. 1992;138:2397–2403. doi: 10.1099/00221287-138-11-2397. [DOI] [PubMed] [Google Scholar]
- 17.Fogel M, Schmitter R, Hastings J W. J Cell Sci. 1972;11:305–317. doi: 10.1242/jcs.11.1.305. [DOI] [PubMed] [Google Scholar]
- 18.Fritz L, Morse D, Hastings J W. J Cell Sci. 1990;95:321–328. doi: 10.1242/jcs.95.2.321. . , [DOI] [PubMed] [Google Scholar]
- 19.Lee D-H, Mittag M, Sczekan S, Morse D, Hastings J W. J Biol Chem. 1993;268:8842–8850. [PubMed] [Google Scholar]
- 20.Morse D, Milos P M, Roux E, Hastings J W. Proc Natl Acad Sci USA. 1989;86:172–176. doi: 10.1073/pnas.86.1.172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Li W-H, Graur D. Fundamentals of Molecular Evolution. Sunderland, MA: Sinauer Associates; 1991. [Google Scholar]
- 22.Gilbert W. Nature (London) 1978;271:501. doi: 10.1038/271501a0. [DOI] [PubMed] [Google Scholar]
- 23.Altschul S F, Gish W, Miller W, Myers E W, Lipman D J. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]