Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 1997 Dec 23;94(26):14231–14236. doi: 10.1073/pnas.94.26.14231

Classification of mononuclear zinc metal sites in protein structures

Samuel Karlin 1,, Zhan-Yang Zhu 1
PMCID: PMC24919  PMID: 9405595

Abstract

Our study of the extended metal environment, particularly of the second shell, focuses in this paper on zinc sites. Key findings include: (i) The second shell of mononuclear zinc centers is generally more polar than hydrophobic and prominently features charged residues engaged in an abundance of hydrogen bonding with histidine ligands. Histidine–acidic or histidine–tyrosine clusters commonly overlap the environment of zinc ions. (ii) Histidine tautomeric metal bonding patterns in ligating zinc ions are mixed. For example, carboxypeptidase A, thermolysin, and sonic hedgehog possess the same ligand group (two histidines, one unibidentate acidic ligand, and a bound water), but their histidine tautomeric geometries markedly differ such that the carboxypeptidase A makes only Nδ1 contacts, thermolysin makes only Nɛ2 contacts, and sonic hedgehog uses one of each. Thus the presence of a similar ligand cohort does not necessarily imply the same topology or function at the active site. (iii) Two close histidine ligands HXmH, m ≤ 5, rarely both coordinate a single metal ion in the Nδ1 tautomeric conformation, presumably to avoid steric conflicts. Mononuclear zinc sites can be classified into six types depending on the ligand composition and geometry. Implications of the results are discussed in terms of divergent and convergent evolution.


This paper highlights similarities and dissimilarities of the extended metal environment for Zn metal sites. For definitions, methods, and corresponding results on Cu, Fe, and Mn metal centers, see ref. 1. In a protein structure with one or more metal cofactors, we defined and analyzed the extended metal environments, emphasizing three layers of interactions: the metal core, the ligand group, and the second shell, which consists of all residues within 3.5-Å distance to some ligand (see legend to Table 1). We further identified distinctive residue clusters of the protein structures often overlapping the metal environment. For example, histidine–cysteine–methionine clusters are common in the neighborhood of type I copper sites (1); histidine–tyrosine clusters are prominent in many cases about iron metal sites (1).

Table 1.

Single Zn ions in protein three-dimensional (3D) structures

graphic file with name pq2573332t1a.jpg

graphic file with name pq2573332t1b.jpg

A ligand residue or a residue in the second shell is shown by the one-letter amino acid code followed by chain identifier and primary sequence position (Protein Data Bank residue number). Bond lengths are in Å. Residues are underlined when the residue is in an α-helix; in italic when the residue is in a β-strand; or in ordinary font when the residue is in a coil location. Boldface letters indicate that the residue side-chain atoms are buried (side-chain solvent accessibility is less than 10%). A residue in second shells is designated by the symbol * when one of its side-chain atoms forms a hydrogen bond with a side-chain atom of a ligand; the symbol  

when one of its main-chain atoms forms a hydrogen bond with a side-chain atom of a ligand; the symbol  

when one of its side-chain atoms forms a hydrogen bond with a main-chain atom of ligand. Exclusive main chain and main-chain hydrogen bonds are not indicated. Data set: a representative set of protein structures was based on the list of Hobohm and Sander (2) version of December, 1996, with pairwise sequence identity less than 25%. Total number of proteins in the list is 443, and the number of those with metal, heme, or iron–sulfur linkage is 129 (29%). The structure data set was augmented with several recent protein structures known to contain metal centers.  

The Classification of Zinc Metal Sites

Zinc serves structural, chemical, and regulatory roles in biological systems and is an essential ingredient of the active site in many protease and phosphatase proteins (35). Lipscomb and Sträter (5) review families, motifs, and enzymology of protein structures containing one or more zinc metal atoms. Table 1 exhibits the ligand groups and second-shell residues of a representative collection of available protein structures containing one or more mononuclear Zn2+ ions. The principal ligands coordinating Zn ions comprise combinations of H (histidine), acidic (D and/or E), and C (cysteine) residues, water ligands, and sometimes the residues Y, N, S, and T. The ligand composition and geometry suggest six natural classes (Table 1): class I, ligand group involving at least three histidine residues, which share an elongated zinc binding motif HEXXHXXGXXH (E and G are not ligands) (5); class II, ligand arrays that feature a proximal histidine pair HXH and a third histidine rather distant in the sequence; class III, combinations of H and C ligands; class IV, two separated H ligands, an acidic unibidentate plus a bound water molecule; class V, predominantly acidic ligands; and class VI, other ligand compositions.

Histidine Ligand Tautomeric Conformation.

See also ref. 1 for a discussion of histidine tautomeric patterns for copper and iron proteins. The tautomeric preferences of histidine ligands of a mononuclear zinc ion are as follows: (i) A ligand group of three H abiding by the sequence motif HX3HX5H invariably invokes Nɛ2 contacts. (ii) A ligand group containing three H residues in the arrangement HXH and a distant H adopts for the proximal histidines the conformation Nɛ2, whereas the distant ligand acquires about equally the Nɛ2 or Nδ1 conformation, with the exception of the metallo-β-lactamase (1bmc) structure. (iii) A ligand group including precisely two H, HXmH, m ≤ 4, bonds to the metal via the Nɛ2 contact. (iv) A ligand group of two H, from HXmH, m ≥ 5 adopts about equally the tautomeric alternatives: both Nɛ2, both Nδ1, or mixed with one Nɛ2 and one Nδ1. (v) The ligand group with a single H generally favors ligation via Nδ1. There are variants on the foregoing coordination geometry: the two zinc histidine ligands of lysozyme (1lba) both ligate via Nδ1 and similarly for carboxypeptidase A (2ctc). The tumor suppressor p53 protein also adopts the Nδ1 atom for its histidine contact. The bond lengths of histidine ligands to the Zn ion are generally in the tight range of 1.95–2.10 Å, with the closest atom mostly attained by the Nɛ2 nitrogen attachment.

Second Shell.

A prominent feature of the second shell of a mononuclear zinc environment is that of charged residues, which generally engage ligand residues via hydrogen bonds. The second shell tends to contain more polar residues than nonpolar residues and often by a factor of at least 2. The second shell involves more buried than exposed residues (about 1/3 are solvent exposed).

There are two separated Zn ions in each chain of the tramtrack protein, whose second shell is largely exposed. The ligands for Zn-A171 and Zn-A172 show exactly the same ligand group and identical spacings in the primary sequence. These correspond to two tandem zinc finger motifs where, apart from the two C residues and two H residues, there is hardly any conservation in the intervening sequences. It is tantalizing that these two zinc metal centers are dramatically deviant in the second shell. The first zinc atom has a preferred polar second shell, whereas the second zinc atom in its second shell is devoid of polar residues but contains a surfeit of hydrophobic residues (Table 1).

Diverse Statistically Significant 3D Residue Clusters in Zinc Protein Structures.

For background on statistically significant residue clusters, see the preceding paper (1). Many of the zinc structures possess histidine {H} and/or cysteine–histidine {CH} and/or histidine–acidic {HED} residue clusters generally overlapping the zinc metal environments (Table 2). Aminopeptidase (1lcp) contains a significant mixed charge cluster {KRED} at the interface just reaching to the zinc-490.

Table 2.

Significant residue clusters in protein 3D structures with single zinc ion center

graphic file with name pq25733320t2.jpg

The same sequential numbers of proteins are used as in Table 1. There is no statistically significant residue cluster (at 1% level) in atrolysin (1. 1atl), tramtrack zinc finger (15. 2drp), glycerol kinase (17. 1glc), carboxypeptidase (19. 2ctc), and enolase (22. 4enl). Procedures for identifying diverse residue clusters are described in the Methods sections of Karlin and Zhu (6) and Zhu and Karlin (7); see also the companion paper by Karlin et al. (1). A residue is described in two parts: part one, one-letter amino acid code and part two, a chain identifier (if any) and a residue number used in the Protein Data Bank file of the structure. The part one is underlined when the residue is in an α-helix; in italic when the residue is in a β-strand, and in ordinary font when the residue is in a coil. Boldface letters indicate that the residue side-chain atoms are buried (side-chain solvent accessibility less than 10%). Part two of a residue is doubly underlined when the residue is a ligand to the proximal metal ion or singly underlined when the residue is in the second shell of the proximal metal ion. Unless stated otherwise, a cluster is confined to one of identical chains and has analogs in the other chains.  

Class I.

These are characterized as mononuclear zinc proteins having a histidine ligand group following the motif HEX2HX2GX2H (5). Several of these structures have an additional Y ligand. We propose the more precise motif H(a)EX2H(b)X2G(L/M/F)XH(c), where the displayed E invariably occurs in the second shell and establishes a hydrogen bond of its carboxylate with the Nɛ2 nitrogen of H(b). One of (L/M/F) is pervasively in the second shell close to H(b). The G of the motif is external to the second shell. The second shell is about equally hydrophobic and polar, with R and E manifest among the polar residues. The first two histidine ligands of the motif HX3HX5H are in a common α-helix and the third histidine is in the following coil, giving a stable base for zinc ions to interact with.

Class II.

These are examples of zinc metals involving three histidine ligands with two in the sequence order HXH and the third H more than 20 positions away. In these cases histidines predominantly ligate the zinc ion via Nɛ2. The metallo-β-lactamase (8), which has its zinc coordination group involving H-86, H-88, H-149, is an anomaly in that the proximal histidines entail ligation by Nɛ2 for H-86, but Nδ1 for H-88. The second shell of the class II zinc environments mostly favor polar (including several charged) residues (Table 1). In all these structures a significant {H}, {HED}, or {HY} cluster overlaps the extended zinc environment. In the protein structures of adenosine deaminase and carbonic anhydrase II, the paired one-apart histidine ligands occur in the same β-strand. However, this is not the case for tonin and metallo-β-lactamase, where both proximal histidines occur in a common coil.

Class III.

A striking number of zinc protein structures have cysteine pairs coordinating Zn positioned in the order CX2C, reminiscent of cytochrome heme bonding forms following the motif CX2CH and tandem zinc finger segments following the motif CX2CX12HX2–3H. Other examples of CX2–4C occur frequently in the environment of iron–sulfur clusters (9). Class III two-apart cysteine ligands occur in two structural forms. The first of two cysteines and the two intervening residues are part of a turn. This motif is seen in the zinc ligand groups of tumor suppressor p53, aspartate carbamoyltransferase, and the tramtrack zinc finger protein. The other motif places the second cysteine (or histidine) ligand in a short helix, while the first cysteine ligand occurs as the N-cap residue of the helix. The second motif is seen in core GP32, β-3 alcohol dehydrogenase, and cytidine deaminase.

Mononuclear zinc ligand groups of distributed histidines (HXmH, m > 4), pairs of cysteine ligands, and a possible acidic residue have the two histidines generally contact the metal in the mixed conformation of one Nɛ2 and one Nδ1 tautomeric geometries. The tramtrack protein coordinates its two separate zinc ions in the Nɛ2 contact. Cytidine deaminase has a single histidine ligand (Nδ1 contact), an acidic ligand, and two close cysteines CXHC. (here H is in the second shell). Core GP32 uses a single histidine (Nδ1 contact) and three cysteines. Similarly, tumor suppressor p53 coordinates Zn-A with a single histidine (Nδ1 contact) and three cysteines. The mononuclear zinc ligand groups composed of only H and C residues are mostly part of a cysteine {C} and/or a cysteine–histidine {CH} cluster (Table 2). The second shell features predominantly charged residues often interacting with ligands.

Class IV.

These are mononuclear zinc ion proteins coordinated by two histidines, one acidic unibidentate residue, and a bound water. Four prominent current structures are of this kind: carboxypeptidase A (2ctc), thermolysin (8tln), sonic hedgehog (1vhh), and glycerol kinase (1glc). Carboxypeptidase A and thermolysin are stated to share structural similarity about the mononuclear zinc metal center (5, 10). However, our analysis, as shown in Table 3, indicates that the coordination topologies and other second-shell features of the foregoing four structures are innately different.

Table 3.

Properties of four class IV proteins

Carboxypeptidase A Thermolysin Sonic hedgehog Glycerol kinase
Ligands H, H, acidic unibidentate, bound water
In coils and β-strands. H residues are distant in sequence. All ligands in α-helices. Classic zinc protease motif: HEX2H All buried in strands and coils; not a protease but autoproteolytic Two buried H 15 residues apart in chain F; one H in coil, one H in β-strand; E exposed in coil
His ligand contact 2 Nδ1 2 Nɛ2 1 Nδ1, 1 Nɛ2 2 Nɛ2
His bond lengths Moderate, 2.08–2.13 Å Tight, 1.93–1.97 Å Moderate, 2.06–2.08 Å Long, 2.44–2.21 Å
2nd shell In buried coils and β-strands Mostly in α-helices, partly exposed Many in β-strands, partly exposed Partly exposed; mixed in secondary structures
Special residues of 2nd shell D-142 H-bonds to ligand H-69 D-170 H-bonds to ligand H-142 E-54 H-bonds to ligand H-183 T-95 H-bonds to H-75
Significant clusters None {HED} about Zn2+; {ED} about dicalcium {HED} about Zn2+; {ED} cluster covers the zinc site None

The zinc ligand group in 1glc traverses both chains, with zinc at the interface. The second shell of 1glc involves E-475G and R-482G, which mutually salt bridge. The second shell in 1vhh features E-54, which hydrogen bonds with the ligand H-183, and H-135, which interacts with the ligand H-141. Another similarity relates a second glutamate residue—E-270 in 2ctc, E-143 in 8tln, and E-177 in 1vhh—all bound to the zinc-bound water; and also a positively charged residue—R-127 in 2ctc, H-231 in 8tln, and H-135 in 1vhh—provides an essential general acid/base flexible addendum.

Thermolysin (8tln) features a {HED} cluster about the Zn2+ ion (Table 2). Around the dicalcium core of 8tln there is a significant acidic {ED} cluster featuring 3 E and 3 D residues, which putatively augment stability of the thermolysin structure (6). Thermolysin contains four calcium ions, all apparently contributing in structural capacities that are considered responsible for the high thermal stability of this protein. Two are enclosed by an acidic charge cluster and two others are partially exposed at the protein surface, coordinated with one or two direct acidic ligands and other carbonyl attachments. Sonic hedgehog (1vhh) distinguishes a significant {HED} cluster substantially overlapping the zinc environment. Somewhat analogous to thermolysin, there is also an acidic cluster {ED} that does include the zinc metal center. It is possible that, similar to thermolysin, adventitious Ca2+ ions might bind in the midst of the {ED} cluster of 1vhh.

Class V.

The Zn atom of fructose-1,6-bisphosphatase (1frp) is ligated only by acidic residues plus a fructose 2,6-bisphosphate molecule. The zinc center in each chain is close to the fructose 2,6-bisphosphate interface of the homodimer. Although all residue ligands of enolase (4enl) are acidic and the second shell contains two acidic and two lysine residues, there are no statistically significant charge clusters or residue clusters of any kind. The two lysines K-396 and K-345 salt bridge to the ligands E-295 (or D-246) and D-320, respectively.

Numbers of Hydrophobic (Φ) vs. Polar (Π) Residues of the Second Shell and Numbers of the Different Types of Closest Atoms

Table 4 gives the aggregate residue counts in Φ and Π [glycine (G), proline (P), and water counted separately], and the counts of the closest atoms from second-shell residues to ligands [carbon (C), nitrogen (N), oxygen (O), and sulfur (S)] for the six zinc classes. These results demonstrate that for the zinc classes II–V and marginally for class VI the counts unambiguously favor polar over hydrophobic residues in the second shell. Moreover, oxygen is the predominant closest atom contact with zinc ligands. Only in Zn class I does the count of Φ exceed the count of Π.

Table 4.

Numbers of residues and closest atoms in the zinc ion environment

Zn No.
Aggregate residue counts
No. of closest atoms
Structures Ligands 2nd-shell residues Φ G P HOH C N O HOH S
Class I 4 19 55 20 16 3 2 12 11 6 40 12 0
Class II 4 16 36 10 22 0 1 2 9 10 21 2 1
Class III 10 40 80 24 39 2 3 10 30 17 34 10 0
Class IV 4 16 51 11 25 2 1 11 18 2 29 11 0
Class V 2 10 36 7 15 1 2 11 8 15 9 11 0
Class VI 3 13 29 10 12 0 1 6 10 1 16 6 1

Functional and Evolutionary Implications

Examination of the extended environment of zinc metal centers reveals the following general features. The second shell generally contains a majority of polar residues, including several charged residues (mostly R but occasionally H) and often one or more acidic residues. Most of the zinc protein structures possess a significant {HED} cluster and/or a {CHY} cluster overlapping the zinc environment.

There are zinc metal centers (class I, a subclass of hydrolase structures) which are coordinated by three histidine residues arranged in the sequence motif HEX2HX2G(M/L/F)XH. The geometry of ligation is persistently that of Nɛ2. Generally, Table 1 reveals that zinc ligation by two or more histidine residues having some two histidines proximal in the primary sequence HXmH, m < 5, broadly adopt the Nɛ2 tautomeric conformation (an exception is the metallo-β-lactamase (1bmc) structure). Why is the Nɛ2 geometry advantageous for proximal histidine residues? A Nδ1 ligation occupies more space (with the backbone closer) in contacting the metal. It is, therefore, difficult for two proximal histidines in the sequence to both ligate a metal ion via Nδ1 without causing steric conflicts. However, two histidines distant in the sequence can move around to surround a metal ion on different sides. The Nɛ2 approach generally has the histidine backbone pointing away from the metal center and has little surface area of the metal involved in the contact. From this perspective, the following principle applies to histidine ligation configurations: Two close histidine ligands HXmH, m < 5 rarely simultaneously coordinate a single metal ion in the Nδ1 conformation.

Is there substantial divergent or convergent evolution among metal centers, or neither? How do similarities and differences in the metal, in the ligand set, and in the second-shell constituents reflect evolutionary and functional processes? Recall (see Legend to Table 1) that the protein 3D structures analyzed were collected as a representative set (443 nonredundant structures) having pairwise sequence identity less than 25%. There are numerous instances of structural conservation without sequence conservation. For example, the two manganese protein structures 2chr and 2mnr discussed in ref. 1 show <10% sequence identity but possess rather similar structural architecture. However, 2chr possesses two mixed-charge residue clusters on the protein surface, whereas 2mnr shows no residue cluster of any kind. A possible case of convergent evolution: Almost all mononuclear type I Cu ion proteins are of the all-β structural class.

There are abundant differences. In fact, although the type I Cu metal centers share the same ligand set, they adopt at least three different secondary structural motifs of the ligand set (1). As a similarity, the second shell about each type I Cu ion contains on average three exposed residues, mostly located in coil elements. These solvent-accessible coil residues suggest an intrinsic relationship between the type I Cu metal environment and the protein surface. From this perspective, the second shell provides flexibility in the metal environment that may facilitate electron transfer activity. Also, the histidine tautomeric conformation for all type I Cu ligands invariably uses the nitrogen Nδ1 ligand–metal contact. This bonding preference contrasts sharply with type II Cu ion and multinuclear Cu ligation patterns, which ligate predominantly via Nɛ2. An opposite pattern occurs with respect to iron metal centers. Mononuclear iron coordination universally use Nɛ2 tautomeric contacts, whereas diiron centers predominantly make Nδ1 contacts.

Mononuclear zinc environments are rather diverse. Are the members within each zinc class (I–V) evolutionarily related? All examples of class I follow the ligand motif HX3HX5H and always have the first two histidine ligands part of the same α-helix facing the metal. By contrast, the examples of class II having histidine ligands (HXH) divide into at least two local structural motifs. Class IV members, which share an identical ligand array, mostly diverge in their metal bonding geometry. The charge environment about class V zinc centers consists of ligands that are mostly acidic balanced by positively charged residues in the second shell. A paramount second-shell influence is stabilizing the metal–ligand organization and establishing catalytic function. A histidine–acidic cluster intermeshed with hydrogen-bonding networks, which commonly traverses the metal environment of active zinc sites, may be conducive to these objectives.

Further questions and issues include the following: How does the second shell contribute in function, in structural stability, in protein–protein interactions, and in substrate channeling? Is there a correlation between protein tertiary structure (e.g., structural class, quaternary structure) and the extended metal environment? Are there situations where it is clear that second-shell residues are “preorganizing” direct ligands via hydrogen-bonding and other specific interactions, thereby controlling metal-binding affinity as well as situations in which the second shell appears to have little directing function on the direct ligands? The answers may be helpful for predicting metal sites, for purposes of protein engineering, and for suggesting designs for testing function/structure characteristics. The second-shell residues, in particular, offer attractive sites for single or casette mutagenesis.

Acknowledgments

S.K. was supported in part by National Institutes of Health Grants 5R01GM10452-33 and 5R01HG00335-09 and National Science Foundation Grant DMS9403553-002.

References


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES