Abstract
The crystal structure of a recombinant αEC domain from human fibrinogen-420 has been determined at a resolution of 2.1 Å. The protein, which corresponds to the carboxyl domain of the αE chain, was expressed in and purified from Pichia pastoris cells. Felicitously, during crystallization an amino-terminal segment was removed, apparently by a contaminating protease, allowing the 201-residue remaining parent body to crystallize. An x-ray structure was determined by molecular replacement. The electron density was clearly defined, partly as a result of averaging made possible by there being eight molecules in the asymmetric unit related by noncrystallographic symmetry (P1 space group). Virtually all of an asparagine-linked sugar cluster is present. Comparison with structures of the β- and γ-chain carboxyl domains of human fibrinogen revealed that the binding cleft is essentially neutral and should not bind Gly-Pro-Arg or Gly-His-Arg peptides of the sort bound by those other domains. Nonetheless, the cleft is clearly evident, and the possibility of binding a carbohydrate ligand like sialic acid has been considered.
Keywords: x-ray structure/blood clotting/protein evolution
Fibrinogen is a protein found in the blood plasma that is the precursor of the fibrin clot. In all vertebrates examined, the protein is composed of two copies each of α, β, and γ subunits. The β and γ chains are homologous throughout their full lengths, but ordinarily α chains are homologous with β and γ chains over the course of their first 200 residues only, their carboxy-terminal regions having been displaced by some kind of genetic exchange at an early point in vertebrate evolution (1). There are minor forms of fibrinogen, however, in which α chains have carboxyl-terminal domains that are homologous to those found at the carboxyl-termini of β and γ chains. In humans, this minor form is referred to as fibrinogen-420, a reflection of an increase in molecular weight from 340 kDa to 420 kDa as a function of the alternative extended α (αE) chains (2).
These alternative forms of fibrinogen were first brought to light by the discovery of a bipartite messenger RNA for the α chain in chickens (3). Thus, a cDNA was isolated that encoded the major form of the α chain but also included a second ORF downstream; the second ORF was clearly homologous to the carboxyl domains of β and γ chains. A comparable situation was found to exist in human α-chain transcripts, and it was shown that alternative splicing led to an αE chain; the larger molecular weight fibrinogen containing these αE chains was found to be synthesized by hepatoma cells (4). Meanwhile, a similar protein was uncovered in lampreys, except that the major and minor forms were encoded by separate genes (5). Sequence-based phylogenetic trees showed that the minor form αEC domains diverged only slightly before the divergence of β and γ chain domains (5).§
The fact that in lamprey the two genes encode different amino-terminal sequences made it possible to measure the minor form in plasma simply by quantifying its unique fibrinopeptide; the minor form accounted for 5–10% of the circulating fibrinogen molecules (7). In human plasma, fibrinogen-420 was found to account for ≈1–2% of all fibrinogen molecules (2). The function of these minor forms of fibrinogen remains unknown, although in humans the relative concentration is significantly elevated in cord blood (8).
In lampreys, incorporation of the alternative form into fibrin is rapid, and the carboxyl domain has been implicated in fibrin crosslinking (9). In contrast, the human equivalent, in the form of a recombinant domain, is not crosslinked by factor XIII (D.A., unpublished data). In lampreys, the intact domain is easily removed from fibrinogen by light trypsin action, but the isolated domain does not bind to Gly-Pro-Arg columns the way γ-chain domains do (9). The corresponding recombinant αEC domain from humans does not bind to these columns either (D.A. and G.G., unpublished data).
The successful expression of relatively large amounts of the recombinant human protein in the yeast Pichia pastoris (D.A. and G.G., unpublished data) made it feasible to consider crystallization. It was hoped that a crystal structure might shed light both on the function of the αEC domain in particular and the evolution of fibrinogen in general, especially in light of recently published structures of the γC domain (10) and fragment D (11) from human fibrinogen.
EXPERIMENTAL PROCEDURES
Expression and Purification.
The recombinant domain, designated rαEC, was expressed in the yeast P. pastoris and purified by chromatography on a mono-Q column (Pharmacia). Full details and characterization are described elsewhere (D.A. and G.G., unpublished data). Briefly put, a segment encoding residues Val-610–Gln-847 of the human αE chain was inserted into a pPIC9 vector and expressed according to the manufacturer’s protocol (Invitrogen). The purified product was homogeneous on SDS gels. Amino-terminal sequencing revealed six residues of the vector upstream of Val-610, increasing the calculated mass of the fusion protein to 27,653. Additional mass is contributed by a cluster of carbohydrate (2). The final product was dialyzed against 0.02 M Tris buffer, pH 7.5 and shipped to San Diego for crystallization.
Crystallography.
Crystals were obtained by vapor diffusion from sitting drops at room temperature. A large range of conditions were explored, including a partial factorial approach with precipitants from a Hampton Research screening kit (12). After numerous attempts, the final conditions used in the well were: 20% polyethylene glycol 3350, 0.15 M CaCl2, 0.1 M imidazole-acetate (pH 5.5), and 0.002 M sodium azide; the concentration of the protein solution was 8 mg/ml 0.02 M Tris, pH 7.5. Drops were set with 5 μl of well solution and 5 μl of protein solution. These drops were streak-seeded from a long series of seedings with low-quality crystals, including needle clusters, that had grown spontaneously. Some of these serial seedings were conducted only after drops were set and preincubated for 7–10 days. In the end, a number of good, small crystals and two large crystals were obtained (Fig. 1).
Data were collected at Brookhaven National Synchrotron Light Source beamline X12C. The crystals diffracted to a minimum Bragg spacing of better than 2.1 Å at room temperature (Table 1). Analysis of the data with Denzo and Scalepack (13) showed that the crystals belong to space group P1 with unit cell 71.40 Å, 105.71 Å, 71.35 Å, 104.63°, 108.91°, and 71.53°, although there was an aspect of pseudo-symmetry reminiscent of the space group C2. When the postrefined unit cell dimensions were analyzed by the Collaborative Computing Project No. 4 program tracer (14), no higher symmetry cell compatible with the data was revealed. If a solvent content of 50% were assumed, this would amount to eight molecules in the unit cell. A native Patterson synthesis of the reduced data showed a large peak (50% of origin) at x = 0.5, y = 0.0, and z = 0.0 (in fractional coordinates). The Patterson suggested that only four unique orientations needed to be derived, the other four molecules being related to the first set by a translation of one-half the unit cell in the x direction. Self-rotation functions produced with polarrfn (14) with an integration radius of 20 Å on all data between 8.0 and 4.0 Å showed large two-fold peaks around the x, y, and z axis, respectively.
Table 1.
Data statistics | |
Space group | P1 |
Unit cell dimensions | a = 71.40, b = 105.71, c = 71.35, α = 104.63, β = 108.91, γ = 71.53 |
Molecules/asym. unit | 8 |
No. of crystals | 3 |
Resolution, Å | 2.1 |
Observations, N | 1,169,328 |
Unique reflections, N | 110,459 |
Mean redundancy | 5.2 |
Completeness, % | 91.2 |
Rsym (I), % | 9.8 |
Refinement statistics | |
Resolution range, Å | 20.0–2.1 |
No. of model atoms | 12,896 |
No. water molecules | 200 |
R-value† | 0.195 |
Free R-value‡ | 0.255 |
rmsd from ideals | |
Bond length, Å | 0.013 |
Bond angle, ○ | 2.5 |
Average B value | 33.3 |
Rsym = (Σ|I − 〈I〉|)/(Σ|I|).
Crystallographic R-value (Σ∥F(Obs)| − |F(Calc)∥)/(Σ|F(Obs)|)) with 95% of the native data for refinement.
Free R-value: R-value based on 5% of the native data withheld from refinement.
Molecular replacement was carried out with xplor (15) with the coordinates of a γC domain taken from a 2.3 Å structure of crosslinked fragments D (16) as a search model. A rotation function followed by PC (Patterson Correlation) refinement (17) on all data between 8.0 and 4.0 Å (integration radius 20 Å) produced four peaks, each related by the self rotation angles. The correlation coefficients of the four peaks were 11.8%, 9.7%, 7.6%, and 7.7%. The highest rotational solution was then used to fix the origin of the P1 cell; the other three solutions were searched over the whole unit cell, producing solutions with correlation coefficients of 18.3%, 14.7%, and 14.5%, at 9.7, 6.5, and 6.5 SD above the mean, respectively. A combination of all four solutions together produced a correlation coefficient of 23.2% and an R factor of 0.552. The four molecules were then translated by one-half a unit cell in the x direction to account for the full volume in the unit cell, and rigid body refinement carried out on the resulting eight molecules. This resulted in a correlation coefficient of 35% and an R factor of 0.50 on all data between 8.0 and 2.3 Å.
An analysis of packing of the eight molecules revealed that the first 40–50 residues of each γ domain molecule were clashing severely. Accordingly, the first 50 residues were removed from the model, and rigid body refinement was carried out once more; this time the correlation coefficient rose to 43% and the R factor dropped to 0.47. An initial model was built based on residues 190–390 of the γC domain noted above. Sequence differences between the γ and αE domains were represented by alanines, and loops containing deletions or insertions were omitted. Structure factors for the model were calculated using sfall (14); figures of merit for the phases were calculated by the sigmaa-weighting technique of Read (18). The resulting phases were improved by density modification incorporating the 8-fold averaging, histogram matching, and solvent flattening with the program dm (14). The resultant map showed unambiguous electron density for almost all loops, side chains, and glycosylated residues not included in the initial model, indicating the correctness of the molecular replacement solution (Fig. 2). The model was rebuilt (19) with correct sidechains, 198 of the 201 residues in all, and refined with refmac (20). Initially, strong noncrystallographic restraints were applied, as well as phase restraints with the starting DM phases. These were gradually removed as the model improved. In the end, the working R factor dropped to 0.195 and the free R factor to 0.255.
RESULTS AND DISCUSSION
Crystals.
The fact that the first 40–50 residues were missing from the electron density map forced us to compare the chemical makeup of the crystals with the starting protein solutions. It was a fortuitous event that led to long incubations at room temperature before streak-seeding because during that period a 37-residue subdomain (plus six residues from the vector) was proteolytically removed from the starting material. Thus, when crystals were removed from drops, washed with well solution, and examined on SDS gels, the material was significantly smaller (data not shown). Moreover, when harvested crystals themselves were redissolved and examined by amino-terminal sequencing (University of California, San Diego, Sequencing Facility), the sequence began with ETSLGGWLLI, which corresponds to αE residues 647–656, indicating that 37 residues of the construct and the six-amino acid residuum of the vector had been removed by incipient proteolysis.
Structure.
As expected, the structure of the αEC domain closely resembles those of the β and γ domains (Fig. 3). The rms deviation (rsmd) for Cα atoms from the αEC and γC domains is 1.26 (192 residues compared), and that of the αEC and the βC domains is 0.96 (193 residues compared). The rmsd for β–γ comparison over the same region is 1.06 (189 residues compared). Thus, in a structural sense, the three domains are just about equally different. Some other expected features included the calcium bound at residues 772–778, as predicted (D.A. and G.G., unpublished data), and the carbohydrate cluster at Asn-667 (6). Also as anticipated (6), there is no evidence for carbohydrate at Asn-812, even though there is a serine two residues further along. The absence of carbohydrate is consistent with a recent report (21) that glycosylation does not occur effectively when the residue after the obligatory serine or threonine is a proline. In the present case, it may be significant, also, that the proline is in the cis conformation. The loop region that includes this Asn-Asn-Ser-Pro constellation (res. 812–815) is somewhat disordered.
The most interesting feature of the structure is doubtless the binding cleft. As expected on the basis of sequence comparisons, the cleft cannot possibly bind positively charged peptides of the sort exemplified by Gly-Pro-Arg or Gly-His-Arg. Nonetheless, the cleft remains well defined (Fig. 4) and, as discussed below, is likely functional.
Evolution.
It has long been supposed that the vertebrate fibrinogen molecule was assembled during evolution from two principal components found in other animal proteins: a three-stranded α-helical-coiled coil and a globular carboxyl-terminal domain. From the beginning, it was postulated that the prototypic molecule was a homotrimer or homohexamer (1). That the major type of α chain is homologous to the β and γ chains over the course of its first third only was attributed to the original carboxyl-terminal domain of the chain having been lost in some kind of genetic and evolutionary exchange (1), doubtless a part of the specialization and differentiation that accompanied the evolution of fibrin formation. All other known proteins containing three-stranded coiled coils are homotrimers or homohexamers (22). One natural advantage of homopolymers is the amplification that accrues with weak binding for repetitive ligands of the sort that occur on many microbial surfaces (23). As such, the prototypic fibrinogen may have evolved from entities involved in host defense. Whatever the case, all vertebrate fibrinogens are now dimers of heterotrimers, the three constituent chains of which have differentiated greatly from their common ancestor.
Previous sequence analysis had made it clear that the αEC domain is not only homologous to the β and γ carboxyl-terminal domains of vertebrate fibrinogen and other proteins (4) but that it also is more closely related to those two fibrinogen domains than are any other known fibrinogen-related structures (5). In a phylogenetic sense, then, the αE represents a good candidate for a component of the prototypic fibrinogen. Arguments based on function make an even more compelling case that αE chains may represent the prototype α-chain.
Function.
That the ancient α-chain carboxyl domain has survived throughout the vertebrate radiation is testimony to its being essential, and yet the function of the extended fibrinogen α chain and the role of its carboxyl-terminal domain remain mysterious. Among the possibilities, however, are binding to particular cell types or other macromolecules. It is significant that the majority of other animal proteins that contain fibrinogen-related carboxyl domains are involved in cell adhesion (4, 24). This observation is wholly consistent with suggestions that fibrinogen evolved from a lectin-like molecule (10, 22, 24). Indeed, there is circumstantial evidence that the potential binding cleft on the αEC domain may be suitable for binding some kind of sugar. In this regard, of all the fibrinogen-related domains reported, only the βC and γC domains are known to bind peptides. The key residues in ligand binding for the β and γ chains include β397–398 (11) and γ329–330 (25), which are Glu-Asp and Gln-Asp, respectively. The corresponding residues in αEC are a Val and Tyr at αE783–784. Exactly the same residues occur in a sialic acid-binding lectin found in an invertebrate, the slug Limax flavus (26). Coincidentally, also, the αEC domains have a gap at the very same place as does the slug lectin (Fig. 5), a deletion which involves a significant shortening of one of the loops adjacent to the binding pocket.
Given these similarities, we would cautiously suggest that the most likely role of the αEC domain involves carbohydrate binding of some sort. Whether such binding would involve other factors or cells involved in clotting or the extracellular matrix remains to be determined. It will be of great interest to find if rαEC binds sialic acid or other sugars.
The αEC domain also contains a 10-residue segment (αE809–818), the sequence of which is unchanged in all those mammals examined and which appears unique among the fibrinogen-related domains (Fig. 5). The segment forms a well defined loop that could easily be involved in binding to other macromolecules. The αEC domain is significantly more electronegative than the βC and γC domains (Table 2), a feature that may bear on its attachments. The electronegative character is mainly attributable to a dearth of lysine sidechains, rather than to a build-up of negative charges.
Table 2.
γC (res. 152–411) | βC (210–461) | αEC (612–847) | |
---|---|---|---|
Amino acids | |||
Aspartate | 21 | 18 | 15 |
Glutamate | 10 | 13 | 20 |
Histidine | 7 | 4 | 4 |
Lysine | 20 | 17 | 4 |
Arginine | 6 | 13 | 12 |
Net charge* | −3.6 | −0.2 | −18.2 |
Net charge at pH 7.3, presuming pK for histidines = 6.5.
Finally, it should be mentioned that the carbohydrate at Asn αE667 is located at a position where several other fibrinogen-related domains have potential glycosylation sites (24). In addition to all the other known αEC domains in mammals and chicken and the equivalent domain in lamprey, putative carbohydrate-binding sites occur at this position in the Drosophila scabrous protein (27), in the lamprey γ chain (28), and in the mammalian fibrinogen-related protein known as PT49 (29, 30).
Acknowledgments
This research was supported by National Institutes of Health Grants HL37457 (to C.R.), HL26873 to (R.F.D.), and HL51050 (to G.G.). The Brookhaven National Synchrotron Light Source is supported by the U.S. Dept. of Energy and the National Science Foundation.
ABBREVIATION
- αE
extended α chain
Footnotes
Data deposition: The atomic coordinates for the αEC structure have been deposited in the Protein Data Bank, Brookhaven National Laboratory, Upton, NY 11973 (PDB ID code 1FZD).
The designation αE for extended α chain is appropriate for birds and mammals (6), but not lampreys, in which the alternative α chain is actually shorter than the major type; nonetheless, for simplicity we refer to the lamprey domain as an αEC domain, also.
References
- 1.Doolittle R F, Watt K W K, Cottrell B A, Strong D D, Riley M. Nature (London) 1979;280:464–468. doi: 10.1038/280464a0. [DOI] [PubMed] [Google Scholar]
- 2.Fu Y, Grieninger G. Proc Natl Acad Sci USA. 1994;91:2625–2628. doi: 10.1073/pnas.91.7.2625. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Weissbach L, Grieninger G. Proc Natl Acad Sci USA. 1990;87:5198–5202. doi: 10.1073/pnas.87.13.5198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Fu Y, Weissbach L, Plant P W, Oddoux C, Cao Y, Liang T J, Roy S N, Redman C M, Grieninger G. Biochemistry. 1992;31:11968–11972. doi: 10.1021/bi00163a002. [DOI] [PubMed] [Google Scholar]
- 5.Pan Y, Doolittle R F. Proc Natl Acad Sci USA. 1992;89:2066–2070. doi: 10.1073/pnas.89.6.2066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Fu Y, Cao Y, Hertzberg K M, Grieninger G. Genomics. 1995;30:71–76. doi: 10.1006/geno.1995.0010. [DOI] [PubMed] [Google Scholar]
- 7.Doolittle R F, Riley R, Pan Y. Thromb Res. 1992;68:489–493. doi: 10.1016/0049-3848(92)90062-f. [DOI] [PubMed] [Google Scholar]
- 8.Grieninger G, Lu X, Cao Y, Fu Y, Kudryk B, Galanakis D K, Hertzberg K M. Blood. 1997;90:2609–2614. [PubMed] [Google Scholar]
- 9.Shipwash E, Pan Y, Doolittle R F. Proc Natl Acad Sci USA. 1995;92:968–972. doi: 10.1073/pnas.92.4.968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Yee V C, Pratt K P, Cote H C, LeTrong I, Chung D W, Davie E W, Stenkamp R E, Teller D C. Structure (London) 1997;5:125–138. doi: 10.1016/s0969-2126(97)00171-8. [DOI] [PubMed] [Google Scholar]
- 11.Spraggon G, Everse S J, Doolittle R F. Nature (London) 1997;389:455–462. doi: 10.1038/38947. [DOI] [PubMed] [Google Scholar]
- 12.Jancarik J, Kim S H. J Appl Crystallogr. 1991;24:409–411. [Google Scholar]
- 13.Otwinowski Z, Minor W. Methods Enzymol. 1997;276:307–326. doi: 10.1016/S0076-6879(97)76066-X. [DOI] [PubMed] [Google Scholar]
- 14.Collaborative Computing Project Number 4 (1994) The CCP4 Suite: Programs for Protein Crystallography, Version 3.1, Acta Crystallogr. D50, 760–763. [DOI] [PubMed]
- 15.Brunger A T. x-plor A System for X-Ray Crystallography and NMR. New Haven, CT: Yale Univ. Press; 1992. , Version 3.1. [Google Scholar]
- 16.Everse S J, Spraggon G, Veerapandian L, Riley M, Doolittle R F. Biochemistry. 1998;37:8637–8642. doi: 10.1021/bi9804129. [DOI] [PubMed] [Google Scholar]
- 17.Brunger A T. Acta Crystallogr A. 1990;46:46–57. doi: 10.1107/s0108767390002355. [DOI] [PubMed] [Google Scholar]
- 18.Read R J. Acta Crystallogr A. 1986;42:140–149. [Google Scholar]
- 19.Jones T A, Zou J-Y, Cowan S W, Kjeldgaard M. Acta Crystallogr A. 1991;47:110–119. doi: 10.1107/s0108767390010224. [DOI] [PubMed] [Google Scholar]
- 20.Murshudov G N, Vagin A A, Dodson E J. Acta Crystallogr D. 1997;53:240–255. doi: 10.1107/S0907444996012255. [DOI] [PubMed] [Google Scholar]
- 21.Mellquist J L, Kasturi L, Spitalnik S L, Shakin-Eshleman S H. Biochemistry. 1998;37:6833–6837. doi: 10.1021/bi972217k. [DOI] [PubMed] [Google Scholar]
- 22.Doolittle R F, Spraggon G, Everse S J. In: Plasminogen Related Growth Factors. Bock G R, Goode J A, editors. New York: Wiley; 1997. pp. 4–23. [Google Scholar]
- 23.Weis W I, Drickamer K. Annu Rev Biochem. 1996;65:441–473. doi: 10.1146/annurev.bi.65.070196.002301. [DOI] [PubMed] [Google Scholar]
- 24.Doolittle R F. Protein Sci. 1992;1:1563–1577. doi: 10.1002/pro.5560011204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Pratt K P, Cote H C F, Chung D W, Stenkamp R E, Davie E W. Proc Natl Acad Sci USA. 1997;94:7176–7181. doi: 10.1073/pnas.94.14.7176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Knibbs R N, Osborne S E, Glick G D, Goldstein I J. J Biol Chem. 1993;268:18524–18531. [PubMed] [Google Scholar]
- 27.Baker N E, Mlodzik M, Rubin G M. Science. 1990;250:1370–1377. doi: 10.1126/science.2175046. [DOI] [PubMed] [Google Scholar]
- 28.Strong D D, Moore M, Cottrell B A, Bohonus V L, Pontes M, Evans B, Riley M, Doolittle R F. Biochemistry. 1985;24:92–101. doi: 10.1021/bi00322a014. [DOI] [PubMed] [Google Scholar]
- 29.Koyama T, Hall L R, Haser W G, Tonegawa S, Saito H. Proc Natl Acad Sci USA. 1987;84:1609–1613. doi: 10.1073/pnas.84.6.1609. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Ruegg C, Pytela R. Gene. 1995;160:257–262. doi: 10.1016/0378-1119(95)00240-7. [DOI] [PubMed] [Google Scholar]
- 31.Kraulis P J. J Appl Crystallogr. 1991;24:946–950. [Google Scholar]
- 32.Esnouf R M. J Mol Graphics. 1997;15:133–138. doi: 10.1016/S1093-3263(97)00021-1. [DOI] [PubMed] [Google Scholar]
- 33.Bacon D J, Anderson W F. J Mol Graphics. 1988;6:219–220. [Google Scholar]
- 34.Merritt E A, Murphy M E P. Acta Crystallogr D. 1994;50:869–873. doi: 10.1107/S0907444994006396. [DOI] [PubMed] [Google Scholar]
- 35.Nicholls A, Sharp K, Honig B. Proteins Struct Funct Genet. 1991;11:281–296. doi: 10.1002/prot.340110407. [DOI] [PubMed] [Google Scholar]